Files
lcbp3/specs/06-Decision-Records/ADR-041-server-consolidation.md
T
admin a80ebef285
CI / CD Pipeline / build (push) Successful in 7m37s
CI / CD Pipeline / deploy (push) Failing after 20m15s
refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
2026-06-20 16:37:04 +07:00

337 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!-- File: specs/06-Decision-Records/ADR-041-server-consolidation.md -->
<!-- Change Log
- 2026-06-20: Created initial ADR-041 documenting server consolidation decision.
- Co-locate all services on single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB).
- QNAP remains NAS for uploads/permanent storage via CIFS.
- Enables ADR-040 network-only auth for sidecar via Docker-internal isolation.
-->
# ADR-041: Single-Host Server Consolidation
**Status:** Proposed
**Date:** 2026-06-20
**Related Documents:**
- [ADR-040: OCR Sidecar Refactor](./ADR-040-ocr-sidecar-refactor.md)
- [ADR-016: Security & Authentication](./ADR-016-security-authentication.md)
- [ADR-023A: Unified AI Architecture](./ADR-023A-unified-ai-architecture.md)
- [ADR-034: AI Model Change](./ADR-034-AI-model-change.md)
- [CONTEXT.md](../../00-overview/CONTEXT.md)
---
## 🎯 Context and Problem Statement
### Current Architecture
ปัจจุบัน LCBP3-DMS กระจาย services ไว้บนหลายเครื่อง:
| Service | Host | Hardware | Network |
|---------|------|----------|---------|
| Ollama (np-dms-ai, np-dms-ocr, nomic-embed) | Desk-5439 | RTX 4060 Ti 16GB | VLAN 10 (192.168.10.100) |
| OCR Sidecar (FastAPI) | Desk-5439 | Same as above | VLAN 10 (192.168.10.100) |
| Backend (NestJS) | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| Frontend (Next.js) | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| Redis | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| MariaDB | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| Elasticsearch | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| File Storage | QNAP NAS | - | CIFS share `np-dms-as` |
### Problems Identified
1. **Cross-Host Trust Boundary:** Backend ↔ sidecar/Ollama ผ่าน LAN (VLAN 10) — ต้องพึ่ง VLAN/firewall ACL สำหรับ isolation (ADR-040 §4)
2. **Management Complexity:** Services กระจายบน 2 hosts → deployment, monitoring, troubleshooting ซับซ้อน
3. **GPU Resource Fragmentation:** Desk-5439 มี GPU แต่ CPU/RAM น้อย → ไม่สามารถรัน backend ได้
4. **Network Latency:** Backend ↔ Ollama ผ่าน LAN เพิ่ม latency สำหรับ AI inference
5. **Hardware Underutilization:** QNAP NAS มี CPU/RAM แต่ไม่มี GPU → ไม่สามารถรัน AI models ได้
### New Hardware
มีเซิร์ฟเวอร์ใหม่พร้อมใช้งาน:
- **CPU:** Ryzen 5 5600 (6 cores / 12 threads)
- **RAM:** 32GB DDR4
- **GPU:** RTX 5060 Ti 16GB
- **Storage:** SSD (OS) + HDD (data)
---
## ⚙️ Decision Drivers
* **Simplify Architecture:** ลดจำนวน hosts จาก 2 → 1
* **Enable Docker-Internal Isolation:** Sidecar + backend อยู่บน Docker bridge เดียวกัน → network auth จริง (ADR-040 D5)
* **Better Resource Utilization:** Single host มีทั้ง CPU, RAM, GPU ในเครื่องเดียว
* **Reduce Network Latency:** Backend ↔ Ollama ผ่าน localhost แทน LAN
* **Maintain Data Separation:** QNAP ยังคงเป็น NAS สำหรับ file storage
---
## 🏛️ Decisions
### D1: Co-locate All Services on Single Docker Host
ย้าย services ทั้งหมดไปรันบนเซิร์ฟเวอร์ใหม่:
- Ollama (np-dms-ai, np-dms-ocr, nomic-embed)
- OCR Sidecar (FastAPI)
- Backend (NestJS)
- Frontend (Next.js)
- Redis
- MariaDB
- Elasticsearch
**Retire Desk-5439** หลัง cutover สำเร็จ
### D2: ASUSTOR as Primary NAS, QNAP as Backup
QNAP (192.168.10.8) ลดบทบาทเป็น backup server เท่านั้น
ASUSTOR (192.168.10.9) เป็น Primary NAS สำหรับ:
- Upload temp storage (`/data/uploads/temp`)
- Permanent file storage (`/data/uploads/permanent`)
- CIFS share `np-dms-as` ถูก mount บน new host ผ่าน:
- `/mnt/uploads/temp``//192.168.10.9/np-dms-as/data/uploads/temp`
- `/mnt/uploads/permanent``//192.168.10.9/np-dms-as/data/uploads/permanent`
### D3: Docker-Internal Network Only for Sidecar/Ollama
- Sidecar และ Ollama **ไม่ publish ports ไป LAN** (ใช้ `expose` แทน `ports`)
- Services อยู่บน internal Docker bridge network (`dms-internal`)
- Backend ติดต่อ sidecar/Ollama ผ่าน `http://sidecar:8765` และ `http://ollama:11434` (service names)
- Frontend ติดต่อ backend ผ่าน `http://backend:3000`
- เฉพาะ Frontend และ Backend เท่านั้นที่ publish ports ไป LAN (80, 443, 3000)
**Enables ADR-040 D5:** Network isolation ผ่าน Docker-internal bridge → ลบ `X-API-Key` ได้จริง
### D4: GPU VRAM Management Reinforced
RTX 5060 Ti 16GB ต้องรองรับ:
- `np-dms-ai` (Typhoon-2.5 ~78B) ~68GB
- `np-dms-ocr` (Typhoon OCR) ~5GB
- `nomic-embed-text` ~0.5GB
- BGE-M3 + Reranker (ถ้า GPU-resident) ~4.5GB
- CUDA overhead ~1.5GB
**Total ≈ 15.5GB → OOM risk หาก load พร้อมกันทั้งหมด**
**Mandatory:**
- ADR-040 D3 (Adaptive OCR Residency via `calculate_ocr_residency()`)
- ADR-040 D4 (CPU Fallback Retrieval for embed/rerank)
- LLM-First GPU Ownership (CONTEXT.md)
- ไม่บังคับ BGE+Reranker GPU-resident ถาวร
### D5: RAM Budget Considerations
32GB RAM ต้องรองรับ:
- Node.js (Frontend) ~500MB
- NestJS (Backend) ~12GB
- MariaDB ~48GB (ขึ้นกับ dataset size)
- Redis ~500MB
- Elasticsearch ~24GB (ขึ้นกับ index size)
- Python (Sidecar) ~500MB
- Ollama ~12GB
- BGE/Reranker CPU-fallback tensors ~24GB
**Action Items:**
- Size DB/ES/Redis memory limits ก่อน cutover
- Monitor RAM usage หลัง cutover
- พิจารณา swap space ถ้าจำเป็น
### D6: Single Point of Failure (SPOF) Mitigation
Single host = SPOF risk
**Mitigation:**
- Regular backup ของ database และ file storage (QNAP)
- Disaster recovery plan สำหรับ hardware failure
- พิจารณา cold standby หรือ failover strategy ในอนาคต
---
## 📋 Implementation Tasks
| Task ID | Phase | Summary | Status |
| :--- | :--- | :--- | :--- |
| T001 | Provision | Install Docker + Docker Compose on new host | Pending |
| T002 | Provision | Mount CIFS share from ASUSTOR to `/mnt/uploads` | Pending |
| T003 | Deploy | Create `docker-compose.yml` for new host topology | Pending |
| T004 | Deploy | Configure internal bridge network (`dms-internal`) | Pending |
| T005 | Deploy | Deploy services (Ollama, sidecar, backend, frontend, Redis, DB, ES) | Pending |
| T006 | Migrate | Migrate MariaDB data from QNAP to new host | Pending |
| T007 | Migrate | Migrate Elasticsearch indices from QNAP to new host | Pending |
| T008 | Cutover | Update DNS/load balancer to point to new host | Pending |
| T009 | Cutover | Run smoke tests on new host | Pending |
| T010 | ADR-040 | Remove `X-API-Key` from sidecar + backend (ADR-040 D5) | Pending |
| T011 | Cleanup | Stop services on QNAP (QNAP becomes backup server) | Pending |
| T012 | Cleanup | Retire Desk-5439 | Pending |
---
## 📋 Target docker-compose Layout (Draft)
```yaml
version: '3.8'
networks:
dms-internal:
driver: bridge
dms-frontend:
driver: bridge
services:
# GPU Services (internal-only, no LAN publish)
ollama:
image: ollama/ollama:latest
container_name: lcbp3-ollama
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
- ollama_models:/root/.ollama
networks:
- dms-internal
expose:
- "11434"
environment:
- OLLAMA_KEEP_ALIVE=-1
ocr-sidecar:
build:
context: ./specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar
container_name: lcbp3-ocr-sidecar
restart: unless-stopped
volumes:
- asustor_uploads:/mnt/uploads:ro # Read-only CIFS mount from ASUSTOR
networks:
- dms-internal
expose:
- "8765"
depends_on:
- ollama
environment:
- OLLAMA_API_URL=http://ollama:11434
- OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
# Backend Services (internal-only)
backend:
build:
context: ./backend
container_name: lcbp3-backend
restart: unless-stopped
volumes:
- asustor_uploads:/app/uploads:ro
networks:
- dms-internal
- dms-frontend
expose:
- "3000"
depends_on:
- ollama
- ocr-sidecar
- redis
- mariadb
- elasticsearch
environment:
- OCR_API_URL=http://ocr-sidecar:8765
- OLLAMA_API_URL=http://ollama:11434
# Frontend (LAN publish)
frontend:
build:
context: ./frontend
container_name: lcbp3-frontend
restart: unless-stopped
networks:
- dms-frontend
ports:
- "3000:3000"
depends_on:
- backend
# Data Services
redis:
image: redis:7-alpine
container_name: lcbp3-redis
restart: unless-stopped
networks:
- dms-internal
volumes:
- redis_data:/data
mariadb:
image: mariadb:10.11
container_name: lcbp3-mariadb
restart: unless-stopped
networks:
- dms-internal
volumes:
- mariadb_data:/var/lib/mysql
environment:
- MYSQL_ROOT_PASSWORD=${DB_ROOT_PASSWORD}
- MYSQL_DATABASE=lcbp3
elasticsearch:
image: elasticsearch:8.11.0
container_name: lcbp3-elasticsearch
restart: unless-stopped
networks:
- dms-internal
volumes:
- es_data:/usr/share/elasticsearch/data
environment:
- discovery.type=single-node
- xpack.security.enabled=false
volumes:
ollama_models:
asustor_uploads:
driver: local
driver_opts:
type: cifs
o: "username=${ASUSTOR_USER},password=${ASUSTOR_PASS},vers=3.0,uid=0,gid=0"
device: "//192.168.10.9/np-dms-as/data/uploads"
redis_data:
mariadb_data:
es_data:
```
---
## 📋 Consequences
### Positive
* **Simplified Architecture:** Single host → easier deployment, monitoring, troubleshooting
* **True Network Isolation:** Docker-internal bridge enables ADR-040 D5 (network-only auth)
* **Reduced Latency:** Backend ↔ Ollama ผ่าน localhost
* **Better Resource Utilization:** Single host มีทั้ง CPU, RAM, GPU
* **Data Separation Maintained:** ASUSTOR เป็น Primary NAS → data แยกจาก compute; QNAP เป็น backup server
### Negative
* **SPOF Risk:** Single host = single point of failure
* **RAM Pressure:** 32GB ต้องรองรับ services ทั้งหมด + CPU-fallback tensors
* **Migration Complexity:** ต้อง migrate DB + ES + file paths
* **GPU VRAM Pressure:** 16GB ต้องอาศัย adaptive residency + CPU fallback
---
## 🔄 Rollback Plan
1. Stop services บน new host
2. Restore services บน QNAP (backend, frontend, Redis, DB, ES)
3. Restore services บน Desk-5439 (Ollama, sidecar)
4. Revert DNS/load balancer ไป QNAP
5. Update CIFS mount กลับไป ASUSTOR (192.168.10.9) บน QNAP
6. Restore `X-API-Key` ใน sidecar + backend (ADR-040 rollback)
---
## 📝 Verification Plan
1. Smoke tests บน new host:
- Backend health check
- Frontend accessible via LAN
- OCR endpoint functional
- AI inference functional
- File upload/download via CIFS
2. Monitor RAM/VRAM usage 2448 hours หลัง cutover
3. Verify ADR-040 D5 (network-only auth) ทำงานได้จริง
4. Verify ADR-040 D3/D4 (adaptive residency + CPU fallback) ทำงานได้จริง