refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
This commit is contained in:
@@ -0,0 +1,139 @@
|
||||
// File: specs/100-Infrastructures/141-server-consolidation/research.md
|
||||
// Change Log:
|
||||
// - 2026-06-20: Initial research for Single-Host Server Consolidation
|
||||
|
||||
# Research: Single-Host Server Consolidation
|
||||
|
||||
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
|
||||
|
||||
## R1: Docker Network Isolation Strategy
|
||||
|
||||
**Decision**: Use two Docker bridge networks — `dms-internal` (all services) and `dms-frontend` (Frontend + Backend only, for LAN publish).
|
||||
|
||||
**Rationale**: Docker bridge networks provide L2 isolation. Services on `dms-internal` without `ports` mapping are unreachable from LAN. Only Frontend (3000) and Backend (3000) need LAN access. This replaces VLAN/firewall ACL reliance with Docker-native isolation.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Single bridge network + iptables rules — more complex, error-prone
|
||||
- Docker Swarm overlay network — overkill for single host
|
||||
- Host network mode — no isolation, security risk
|
||||
|
||||
## R2: CIFS Mount Strategy for ASUSTOR
|
||||
|
||||
**Decision**: Use Docker named volume with CIFS driver to mount ASUSTOR share `//192.168.10.9/np-dms-as/data/uploads` as `asustor_uploads` volume, mounted at `/mnt/uploads` in sidecar and `/app/uploads` in backend.
|
||||
|
||||
**Rationale**: Docker CIFS volume driver handles mount lifecycle with container start/stop. Credentials in `.env` (gitignored). Both backend and sidecar see the same files via the same CIFS mount point.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Host-level `mount -t cifs` then bind mount — requires host OS config, not portable
|
||||
- SSHFS — slower than CIFS for file operations
|
||||
- Sync files to local SSD — adds complexity, storage duplication
|
||||
|
||||
**Key Consideration**: Previous Desk-5439 setup had issues with Docker Desktop WSL2 + CIFS (see memory). On Linux host, CIFS volume driver works natively without WSL2 layer.
|
||||
|
||||
## R3: MariaDB Migration Strategy
|
||||
|
||||
**Decision**: Use `mariadb-dump` (logical dump) from QNAP MariaDB 11.8, pipe directly to new host MariaDB 11.8 container.
|
||||
|
||||
**Rationale**: Same MariaDB version (11.8) on both hosts → logical dump is safest. Database is small enough (<10GB estimated) that dump/restore completes within maintenance window.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- `mariabackup` (physical backup) — faster but requires same filesystem layout
|
||||
- Replication (binlog) — overkill for one-time migration
|
||||
- Copy raw data files — risky, requires same version + config
|
||||
|
||||
**Migration Command**:
|
||||
```bash
|
||||
# From QNAP (source) — dump all databases
|
||||
mariadb-dump --single-transaction --routines --triggers \
|
||||
-h 127.0.0.1 -u root -p"$DB_ROOT_PASSWORD" \
|
||||
--all-databases > qnap-full-dump.sql
|
||||
|
||||
# On new host — restore
|
||||
docker exec -i lcbp3-mariadb mariadb -u root -p"$DB_ROOT_PASSWORD" < qnap-full-dump.sql
|
||||
```
|
||||
|
||||
## R4: Elasticsearch Migration Strategy
|
||||
|
||||
**Decision**: Use ES snapshot/restore API — create snapshot on QNAP ES, transfer to new host, restore.
|
||||
|
||||
**Rationale**: ES snapshot API is the official migration path. Handles index mappings, settings, and data. Works across same ES version (8.11.x).
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Copy raw data directory — risky, requires identical ES config
|
||||
- Re-index from MariaDB — slow, loses search index tuning
|
||||
- Logstash pipeline — overkill for one-time migration
|
||||
|
||||
**Migration Steps**:
|
||||
1. Register shared filesystem repo on QNAP ES
|
||||
2. Create snapshot of all indices
|
||||
3. Copy snapshot files to new host ES data volume
|
||||
4. Register repo on new host ES
|
||||
5. Restore snapshot
|
||||
|
||||
## R5: GPU VRAM Management on Single Host
|
||||
|
||||
**Decision**: Rely on ADR-040 D3 (Adaptive OCR Residency via `calculate_ocr_residency()`) and ADR-040 D4 (CPU Fallback Retrieval). LLM-First GPU Ownership from CONTEXT.md.
|
||||
|
||||
**Rationale**: RTX 5060 Ti 16GB must serve:
|
||||
- np-dms-ai (Typhoon-2.5 ~7-8B): ~6-8GB VRAM
|
||||
- np-dms-ocr (Typhoon OCR): ~5GB VRAM
|
||||
- nomic-embed-text: ~0.5GB VRAM
|
||||
- CUDA overhead: ~1.5GB
|
||||
- Total: ~13-15GB → tight but feasible with adaptive residency
|
||||
|
||||
**Key Policy**: When LLM (np-dms-ai) needs to load, OCR model is unloaded first (`keep_alive=0` for OCR). BGE-M3 + Reranker use CPU fallback when GPU is occupied.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Force GPU-resident for all models — OOM risk (15.5GB > 16GB with overhead)
|
||||
- CPU-only for all AI — too slow for production
|
||||
- Second GPU — not available on new host
|
||||
|
||||
## R6: RAM Budget Allocation
|
||||
|
||||
**Decision**: Per-container memory limits in Docker Compose:
|
||||
|
||||
| Service | Memory Limit | Notes |
|
||||
|---------|-------------|-------|
|
||||
| MariaDB | 8G | Largest consumer, tune innodb_buffer_pool |
|
||||
| Elasticsearch | 4G | ES_JAVA_OPTS=-Xms2g -Xmx2g |
|
||||
| Backend (NestJS) | 2G | Node.js + BullMQ workers |
|
||||
| Frontend (Next.js) | 1G | Standalone mode |
|
||||
| Redis | 1G | In-memory + AOF |
|
||||
| Qdrant | 1G | Vector DB |
|
||||
| OCR Sidecar | 1G | Python + PyMuPDF |
|
||||
| Ollama | 2G | Model loading + inference |
|
||||
| ClamAV | 2G | Virus definitions |
|
||||
| ollama-metrics | 256M | Lightweight proxy |
|
||||
| **Total** | **~22.3G** | Leaves ~9.7G for OS + swap |
|
||||
|
||||
**Rationale**: 32GB total - 22.3GB containers = ~9.7GB for OS kernel + page cache + swap. Comfortable margin.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- No limits — risk of OOM killer affecting critical services
|
||||
- Tighter limits — may cause ES/MariaDB instability
|
||||
|
||||
## R7: CI/CD Pipeline Update
|
||||
|
||||
**Decision**: Update Gitea Actions `ci-deploy.yml` to SSH-deploy to new host IP instead of QNAP IP. ASUSTOR Gitea runner stays unchanged.
|
||||
|
||||
**Rationale**: Gitea runner on ASUSTOR (192.168.10.9) can reach new host via VLAN 10. Only the deploy target IP changes. `deploy.sh` path to compose file updates to `New-Host/docker-compose.new-host.yml`.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Move Gitea runner to new host — unnecessary, runner works remotely
|
||||
- Manual deployment — not sustainable for ongoing releases
|
||||
|
||||
## R8: Rollback Strategy
|
||||
|
||||
**Decision**: Multi-step rollback plan documented in `rollback.sh`:
|
||||
1. Stop services on new host (`docker compose down`)
|
||||
2. Restore services on QNAP (start existing containers with old data)
|
||||
3. Restore services on Desk-5439 (start Ollama + sidecar)
|
||||
4. Revert DNS/NPM to point to QNAP
|
||||
5. Revert Gitea CI/CD deploy target to QNAP
|
||||
6. Re-enable X-API-Key in sidecar + backend
|
||||
|
||||
**Rationale**: QNAP retains all data (MariaDB, ES, Redis, files) until verified stable. Rollback is fast (<2 hours) because old infrastructure is intact.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- No rollback (accept SPOF) — too risky for production DMS
|
||||
- Hot failover with replication — overkill for current scale
|
||||
Reference in New Issue
Block a user