15 KiB
// File: specs/100-Infrastructures/141-server-consolidation/tasks.md // Change Log: // - 2026-06-20: Initial task list for Single-Host Server Consolidation // - 2026-06-20: Fix C1-C5 from analysis: backend env var update, port conflict, GPU residency, ollama-metrics port, n8n endpoints
Tasks: Single-Host Server Consolidation
Input: Design documents from /specs/100-Infrastructures/141-server-consolidation/
Prerequisites: plan.md, spec.md, research.md, data-model.md, contracts/
Related ADRs: ADR-041, ADR-040, ADR-016, ADR-023A, ADR-034
Organization: Tasks are grouped by user story to enable independent implementation and testing of each story.
Format: [ID] [P?] [Story] Description
- [P]: Can run in parallel (different files, no dependencies)
- [Story]: Which user story this task belongs to (e.g., US1, US2, US3)
- Include exact file paths in descriptions
Phase 1: Setup (Shared Infrastructure)
Purpose: Create directory structure and initial files for the new host deployment
- T001 Create
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/directory structure with subdirectories:ocr-sidecar/,scripts/ - T002 [P] Create
.env.templateatspecs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.templatewith all required env vars from contracts - T003 [P] Create
README.mdatspecs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/README.mdwith deployment overview
Phase 2: Foundational (Blocking Prerequisites)
Purpose: Provision the new host OS and create the unified Docker Compose stack — MUST be complete before any user story can proceed
⚠️ CRITICAL: No user story work can begin until this phase is complete
- T004 Create
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/provision-host.sh— installs Docker Engine, Docker Compose v2, NVIDIA drivers, nvidia-container-toolkit, CIFS utils, creates directory structure - T005 Create
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml— unified compose with all 10 services, 2 networks (dms-internal, dms-frontend), CIFS volume, named volumes, memory limits per data-model.md. Backend publishes3001:3000to LAN (NPM routesbackend.np-dms.work→ :3001); Frontend publishes3000:3000; ollama-metrics publishes9924:9924to LAN for Prometheus scraping from ASUSTOR - T006 [P] Copy OCR sidecar code from
specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tospecs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/— adaptOLLAMA_API_URLtohttp://ollama:11434(Docker DNS), removeportsmapping, useexposeonly - T007 [P] Update
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/Dockerfile— verify GPU access via nvidia-container-toolkit, ensure poppler-utils installed - T008 [P] Update
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/requirements.txt— verify typhoon-ocr, PyMuPDF, httpx, fastapi versions match Desk-5439 - T008b Update backend environment variables for renamed service names:
REDIS_HOST=redis(wascache),ELASTICSEARCH_HOST=elasticsearch(wassearch) inspecs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.templateandspecs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.ymlbackend environment section — these service names changed from QNAP compose where Redis wascacheand ES wassearch
Checkpoint: New host directory structure and unified compose file ready — user story implementation can now begin
Phase 3: User Story 1 - Provision and Deploy on New Host (Priority: P1) 🎯 MVP
Goal: Administrator provisions the new host, mounts ASUSTOR CIFS, and deploys all services with Docker internal network isolation
Independent Test: Run docker compose up -d on the new host and verify all containers are healthy via docker ps and health check endpoints
Implementation for User Story 1
- T009 [US1] Run
provision-host.shon new host — verify Docker, NVIDIA, CIFS mount at/mnt/uploads - T010 [US1] Pull Ollama models on new host:
ollama pull np-dms-ai:latest,ollama pull np-dms-ocr:latest,ollama pull nomic-embed-text:latest— verify withollama list - T011 [US1] Copy
.env.templateto.env, fill in all secrets from QNAP.env(DB passwords, JWT secrets, Redis password, ASUSTOR CIFS credentials) - T012 [US1] Run
docker compose --env-file .env -f docker-compose.new-host.yml up -dand verify all 10 containers start - T013 [US1] Verify network isolation:
nmap -p 11434 <new-host-ip>from another VLAN 10 machine should show closed/refused;nmap -p 8765should show closed/refused;nmap -p 3000(frontend) andnmap -p 3001(backend) should show open;nmap -p 9924(ollama-metrics) should show open for Prometheus - T014 [US1] Verify health checks:
curl http://localhost:3001/health(backend on published port 3001),curl http://localhost:3000/(frontend),curl http://ocr-sidecar:8765/health(from inside backend container via Docker DNS)
Checkpoint: All services running on new host with correct network isolation — MVP achieved
Phase 4: User Story 2 - Migrate Data from QNAP to New Host (Priority: P2)
Goal: Migrate MariaDB and Elasticsearch data from QNAP to the new host with zero data loss
Independent Test: Compare row counts and index document counts between QNAP (source) and new host (destination) after migration
Implementation for User Story 2
- T015 [P] [US2] Create
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-mariadb.sh— dump from QNAP MariaDB 11.8 viamariadb-dump --single-transaction --routines --triggers, pipe to new host container - T016 [P] [US2] Create
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-elasticsearch.sh— create snapshot on QNAP ES, transfer files, register repo on new host, restore - T017 [US2] Run
migrate-mariadb.sh— verify all table row counts match between QNAP and new host - T018 [US2] Run
migrate-elasticsearch.sh— verify all index document counts match between QNAP and new host - T019 [US2] Create and run
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/verify-data-parity.sh— automated row count + document count comparison script - T020 [US2] Verify CIFS file access: list files in
/app/uploads/tempand/app/uploads/permanentfrom backend container, compare with ASUSTOR share
Checkpoint: All data migrated and verified — new host has complete production data
Phase 5: User Story 3 - Cutover and Smoke Test (Priority: P3)
Goal: Perform production cutover from old 2-host architecture to new single host, verify all DMS functions work end-to-end
Independent Test: Access application via new host IP, perform core DMS operations (login, document upload, search, AI inference)
Implementation for User Story 3
- T021 [P] [US3] Create
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/smoke-test.sh— automated tests for: backend health, frontend accessible, login flow, document list, OCR endpoint, AI inference, full-text search - T022 [US3] Update Gitea secrets:
HOST→ new host IP,COMPOSE_FILE→specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml - T023 [US3] Update
scripts/deploy.sh— changeCOMPOSE_FILEpath to New-Host directory - T024 [US3] Update NPM (Nginx Proxy Manager) on QNAP:
lcbp3.np-dms.work→ new host IP:3000 (frontend),backend.np-dms.work→ new host IP:3001 (backend) - T024b [US3] Update n8n workflow endpoints on QNAP: change all backend API URLs from
http://192.168.10.8:3000/api(QNAP) tohttp://<new-host-ip>:3001/api(new host) — n8n stays on QNAP but must reach backend on new host via LAN port 3001 - T025 [US3] Run
smoke-test.shon new host — verify all 7 smoke tests pass - T026 [US3] Verify from external machine on VLAN 10: access
https://lcbp3.np-dms.work, login, create a test Correspondence, upload a PDF, trigger OCR, perform search
Checkpoint: New host is production-active — all DMS functions verified end-to-end
Phase 6: User Story 4 - Remove X-API-Key and Verify Network-Only Auth (Priority: P4)
Goal: Remove X-API-Key authentication from sidecar and backend, relying solely on Docker-internal network isolation per ADR-040 D5
Independent Test: Attempt to access sidecar from outside Docker network (should fail); verify backend calls sidecar without API key (should succeed)
Implementation for User Story 4
- T027 [P] [US4] Remove
OCR_SIDECAR_API_KEYfromspecs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.ymlocr-sidecar environment - T028 [P] [US4] Remove API key validation from
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/app.py— removeX-API-Keyheader check middleware - T029 [US4] Remove
X-API-Keyheader frombackend/src/modules/ai/services/ocr.service.ts— remove API key from HTTP client headers - T030 [US4] Remove
OCR_SIDECAR_API_KEYfrombackend/.env.exampleand any backend config that sets it - T031 [US4] Rebuild and redeploy sidecar + backend containers — verify backend can call sidecar without API key
- T032 [US4] Verify external access blocked:
curl http://<new-host-ip>:8765/healthfrom VLAN 10 machine should fail (connection refused)
Checkpoint: Network-only auth verified — no API key needed, Docker isolation sufficient
Phase 7: User Story 5 - Decommission Old Hosts (Priority: P5)
Goal: Stop services on QNAP (becomes backup) and retire Desk-5439, completing the consolidation
Independent Test: Verify QNAP services stopped (except backup), Desk-5439 powered off, new host unaffected
Implementation for User Story 5
- T033 [P] [US5] Create
specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/rollback.sh— emergency rollback: stop new host, restore QNAP + Desk-5439 services, revert DNS, revert CI/CD - T034 [US5] Monitor new host for 24-48 hours: RAM usage (
docker stats), VRAM usage (nvidia-smi), container health, application logs - T034b [US5] Verify Adaptive OCR Residency (ADR-040 D3) on new RTX 5060 Ti: load
np-dms-aiandnp-dms-ocrconcurrently, confirmcalculate_ocr_residency()unloads OCR model when LLM needs VRAM; verify CPU Fallback Retrieval (ADR-040 D4) activates for BGE-M3/Reranker when GPU is occupied by LLM - T035 [US5] Stop QNAP app services:
ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose down' - T036 [US5] Stop QNAP service stack:
ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose down' - T037 [US5] Retire Desk-5439:
ssh user@192.168.10.100 'sudo shutdown -h now'(or repurpose) - T038 [US5] Verify new host still fully operational after old hosts decommissioned — re-run
smoke-test.sh - T039 [US5] Take QNAP backup snapshot:
mariadb-dumpon QNAP MariaDB (if still running) or verify existing backup is current
Checkpoint: Consolidation complete — single host is sole production, old hosts decommissioned
Phase 8: Polish & Cross-Cutting Concerns
Purpose: Documentation, monitoring, and final verification
- T040 [P] Update
specs/04-Infrastructure-OPS/04-00-docker-compose/README.md— add New-Host section, mark QNAP as backup, mark Desk-5439 as retired - T041 [P] Update
CONTEXT.md— update infrastructure topology to reflect single-host architecture - T042 [P] Update
AGENTS.md— update infrastructure references (Desk-5439 → New Host, QNAP → backup) - T043 Update
specs/04-Infrastructure-OPS/04-00-docker-compose/.env.template— add ASUSTOR_USER, ASUSTOR_PASS, NEW_HOST_IP variables - T044 [P] Update Prometheus/Grafana scrape config on ASUSTOR — update ollama-metrics target from
192.168.10.100:9924to new host internal or host-published port - T045 Run
quickstart.mdvalidation — follow all steps end-to-end on a fresh provision - T046 [P] Document disaster recovery procedure — backup schedule, restore from QNAP backup, estimated RTO/RPO
Dependencies & Execution Order
Phase Dependencies
- Setup (Phase 1): No dependencies — can start immediately
- Foundational (Phase 2): Depends on Setup — BLOCKS all user stories
- US1 (Phase 3): Depends on Foundational — requires physical access to new host
- US2 (Phase 4): Depends on US1 (services must be running to receive migrated data)
- US3 (Phase 5): Depends on US1 + US2 (services running + data migrated for cutover)
- US4 (Phase 6): Depends on US3 (cutover complete, network isolation verified)
- US5 (Phase 7): Depends on US3 + US4 (stable production before decommissioning)
- Polish (Phase 8): Can start after US3; some tasks depend on US5
User Story Dependencies
- US1 (P1): Foundational → US1 — no dependencies on other stories
- US2 (P2): US1 → US2 — needs running services to receive data
- US3 (P3): US1 + US2 → US3 — needs running services + migrated data
- US4 (P4): US3 → US4 — needs cutover complete to verify network isolation in production
- US5 (P5): US3 + US4 → US5 — needs stable production before decommissioning
Parallel Opportunities
- T002, T003 can run in parallel (different files)
- T006, T007, T008 can run in parallel (sidecar files, no dependencies)
- T015, T016 can run in parallel (different migration scripts)
- T027, T028 can run in parallel (different files: compose vs app.py)
- T040, T041, T042, T044 can run in parallel (different doc files)
- T027, T028, T030 can run in parallel (different files: compose, app.py, .env.example)
Implementation Strategy
MVP First (User Story 1 Only)
- Complete Phase 1: Setup (create directory structure)
- Complete Phase 2: Foundational (provision host + create compose)
- Complete Phase 3: User Story 1 (deploy services)
- STOP and VALIDATE: All containers healthy, network isolation verified
- Demo to stakeholders if ready
Incremental Delivery
- Setup + Foundational → Infrastructure ready
- Add US1 → Services deployed → Validate (MVP!)
- Add US2 → Data migrated → Validate parity
- Add US3 → Cutover complete → Validate end-to-end
- Add US4 → Security hardened → Validate network-only auth
- Add US5 → Old hosts retired → Validate stability
- Polish → Documentation updated → Final validation
Notes
- This is an infrastructure task — most work is shell scripts, Docker Compose YAML, and manual operations
- Physical access to the new host is required for US1
- Data migration (US2) requires SSH access to QNAP
- Cutover (US3) requires DNS/NPM access and coordination with users
- Decommission (US5) should only proceed after 24-48 hours of stable monitoring
- Rollback plan must be tested before cutover
- All env secrets must come from
.env(gitignored) — never commit real secrets