refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
This commit is contained in:
@@ -0,0 +1,36 @@
|
||||
# Specification Quality Checklist: Single-Host Server Consolidation
|
||||
|
||||
**Purpose**: Validate specification completeness and quality before proceeding to planning
|
||||
**Created**: 2026-06-20
|
||||
**Feature**: [spec.md](../spec.md)
|
||||
|
||||
## Content Quality
|
||||
|
||||
- [x] No implementation details (languages, frameworks, APIs) — spec focuses on operational outcomes
|
||||
- [x] Focused on user value and business needs — admin/ops workflows clearly defined
|
||||
- [x] Written for non-technical stakeholders — user stories describe journeys, not code
|
||||
- [x] All mandatory sections completed — User Scenarios, Requirements, Success Criteria all filled
|
||||
|
||||
## Requirement Completeness
|
||||
|
||||
- [x] No [NEEDS CLARIFICATION] markers remain — all requirements have clear definitions
|
||||
- [x] Requirements are testable and unambiguous — each FR has measurable acceptance criteria
|
||||
- [x] Success criteria are measurable — SC-001 through SC-010 have specific metrics
|
||||
- [x] Success criteria are technology-agnostic — focus on outcomes (parity, latency, uptime) not tools
|
||||
- [x] All acceptance scenarios are defined — 5 user stories with Given/When/Then scenarios
|
||||
- [x] Edge cases are identified — 7 edge cases covering GPU OOM, RAM, CIFS, SPOF, network, migration failures
|
||||
- [x] Scope is clearly bounded — includes provisioning, migration, cutover, security, decommission
|
||||
- [x] Dependencies and assumptions identified — 7 assumptions documented
|
||||
|
||||
## Feature Readiness
|
||||
|
||||
- [x] All functional requirements have clear acceptance criteria — FR-001 through FR-015 mapped to user stories
|
||||
- [x] User scenarios cover primary flows — P1 (provision) → P2 (migrate) → P3 (cutover) → P4 (security) → P5 (decommission)
|
||||
- [x] Feature meets measurable outcomes defined in Success Criteria — 10 measurable outcomes
|
||||
- [x] No implementation details leak into specification — Docker/tech names are inherent to infra spec but kept at architecture level
|
||||
|
||||
## Notes
|
||||
|
||||
- This is an infrastructure specification based on ADR-041; some technical terms (Docker, CIFS, VRAM) are inherent to the domain
|
||||
- ADR-040 (OCR Sidecar Refactor) is a hard dependency for FR-008 (remove X-API-Key) and FR-009 (GPU VRAM management)
|
||||
- Spec is ready for `/speckit-clarify` or `/speckit-plan`
|
||||
+69
@@ -0,0 +1,69 @@
|
||||
# Docker Compose Contract: New Host
|
||||
|
||||
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
|
||||
|
||||
This contract defines the service topology for the consolidated single-host deployment.
|
||||
The actual `docker-compose.new-host.yml` will be created at:
|
||||
`specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
|
||||
|
||||
## Service Topology
|
||||
|
||||
| Service | Image | Networks | LAN Ports | Internal Port | Memory Limit | Depends On |
|
||||
|---------|-------|----------|-----------|---------------|--------------|------------|
|
||||
| ollama | ollama/ollama:latest | dms-internal | none | 11434 | 2G (host) | — |
|
||||
| ocr-sidecar | build (local) | dms-internal | none | 8765 | 1G | ollama |
|
||||
| backend | lcbp3-backend:latest | dms-internal, dms-frontend | 3001→3000 | 3000 | 2G | ollama, ocr-sidecar, redis, mariadb, elasticsearch, qdrant, clamav |
|
||||
| frontend | lcbp3-frontend:latest | dms-frontend | 3000 | 3000 | 1G | backend |
|
||||
| redis | redis:7-alpine | dms-internal | none | 6379 | 1G | — |
|
||||
| mariadb | mariadb:11.8 | dms-internal | none | 3306 | 8G | — |
|
||||
| elasticsearch | elasticsearch:8.11.1 | dms-internal | none | 9200 | 4G | — |
|
||||
| qdrant | qdrant/qdrant:v1.16.1 | dms-internal | none | 6333 | 1G | — |
|
||||
| clamav | clamav/clamav:1.4.4 | dms-internal | none | 3310 | 2G | — |
|
||||
| ollama-metrics | ghcr.io/norskhelsenett/ollama-metrics:latest | dms-internal | 9924 | 9924 | 256M | ollama |
|
||||
|
||||
## Network Topology
|
||||
|
||||
```
|
||||
dms-internal (bridge, no LAN access)
|
||||
├── ollama:11434
|
||||
├── ocr-sidecar:8765
|
||||
├── backend:3000 (also on dms-frontend)
|
||||
├── redis:6379
|
||||
├── mariadb:3306
|
||||
├── elasticsearch:9200
|
||||
├── qdrant:6333
|
||||
├── clamav:3310
|
||||
└── ollama-metrics:9924
|
||||
|
||||
dms-frontend (bridge, LAN published)
|
||||
├── frontend:3000 → LAN:3000
|
||||
├── backend:3000 → LAN:3001 (NPM routes backend.np-dms.work → :3001)
|
||||
└── ollama-metrics:9924 → LAN:9924 (Prometheus scrape target)
|
||||
```
|
||||
|
||||
## Environment Variables (New)
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| ASUSTOR_USER | (required) | CIFS share username |
|
||||
| ASUSTOR_PASS | (required) | CIFS share password |
|
||||
| NEW_HOST_IP | (required) | New host LAN IP for CI/CD deploy target |
|
||||
|
||||
## Environment Variables (Changed from QNAP)
|
||||
|
||||
| Variable | Old Value (QNAP) | New Value (New Host) |
|
||||
|----------|------------------|---------------------|
|
||||
| DB_HOST | mariadb | mariadb (unchanged — Docker DNS) |
|
||||
| REDIS_HOST | cache | redis (service name change) |
|
||||
| ELASTICSEARCH_HOST | search | elasticsearch (service name change) |
|
||||
| QDRANT_HOST | qdrant | qdrant (unchanged) |
|
||||
| OCR_API_URL | http://192.168.10.100:8765 | http://ocr-sidecar:8765 |
|
||||
| OLLAMA_API_URL | http://192.168.10.100:11434 | http://ollama:11434 |
|
||||
| CLAMAV_HOST | clamav | clamav (unchanged) |
|
||||
|
||||
## Removed Environment Variables
|
||||
|
||||
| Variable | Reason |
|
||||
|----------|--------|
|
||||
| OCR_SIDECAR_API_KEY | ADR-040 D5 — network-only auth, no API key needed |
|
||||
| OCR_SIDECAR_UPLOAD_BASE | Still needed but value changes to /mnt/uploads (same) |
|
||||
@@ -0,0 +1,230 @@
|
||||
// File: specs/100-Infrastructures/141-server-consolidation/data-model.md
|
||||
// Change Log:
|
||||
// - 2026-06-20: Initial data model for Single-Host Server Consolidation
|
||||
|
||||
# Data Model: Single-Host Server Consolidation
|
||||
|
||||
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
|
||||
|
||||
## Infrastructure Entities
|
||||
|
||||
### 1. Docker Network: dms-internal
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| name | string | `dms-internal` |
|
||||
| driver | string | `bridge` |
|
||||
| scope | string | local (single host) |
|
||||
| published_ports | none | No ports published to LAN |
|
||||
|
||||
**Members**: ollama, ocr-sidecar, backend, redis, mariadb, elasticsearch, qdrant, clamav, ollama-metrics
|
||||
|
||||
### 2. Docker Network: dms-frontend
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| name | string | `dms-frontend` |
|
||||
| driver | string | `bridge` |
|
||||
| scope | string | local (single host) |
|
||||
| published_ports | 3000 (frontend), 3001→3000 (backend), 9924 (ollama-metrics) | Only ports published to LAN |
|
||||
|
||||
**Members**: frontend, backend
|
||||
|
||||
### 3. Docker Volume: asustor_uploads
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| driver | string | `local` |
|
||||
| type | string | `cifs` |
|
||||
| device | string | `//192.168.10.9/np-dms-as/data/uploads` |
|
||||
| mount_options | string | `username=${ASUSTOR_USER},password=${ASUSTOR_PASS},vers=3.0,uid=0,gid=0` |
|
||||
| mount_point (sidecar) | string | `/mnt/uploads` (read-only) |
|
||||
| mount_point (backend) | string | `/app/uploads` (read-write) |
|
||||
|
||||
### 4. Docker Volume: ollama_models
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| driver | string | `local` (named volume) |
|
||||
| mount_point | string | `/root/.ollama` |
|
||||
| content | string | Ollama model files (np-dms-ai, np-dms-ocr, nomic-embed-text) |
|
||||
|
||||
### 5. Docker Volume: mariadb_data
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| driver | string | `local` (named volume) |
|
||||
| mount_point | string | `/var/lib/mysql` |
|
||||
| content | string | MariaDB data files (migrated from QNAP) |
|
||||
|
||||
### 6. Docker Volume: es_data
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| driver | string | `local` (named volume) |
|
||||
| mount_point | string | `/usr/share/elasticsearch/data` |
|
||||
| content | string | Elasticsearch indices (migrated from QNAP) |
|
||||
|
||||
### 7. Docker Volume: redis_data
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| driver | string | `local` (named volume) |
|
||||
| mount_point | string | `/data` |
|
||||
| content | string | Redis AOF persistence + BullMQ queue data |
|
||||
|
||||
### 8. Docker Volume: qdrant_data
|
||||
|
||||
| Attribute | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| driver | string | `local` (named volume) |
|
||||
| mount_point | string | `/qdrant/storage` |
|
||||
| content | string | Qdrant vector collections |
|
||||
|
||||
## Service Definitions
|
||||
|
||||
### ollama
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| image | `ollama/ollama:latest` |
|
||||
| GPU | NVIDIA RTX 5060 Ti 16GB (passthrough) |
|
||||
| network | dms-internal only |
|
||||
| ports | none (expose 11434 internal only) |
|
||||
| volumes | ollama_models → /root/.ollama |
|
||||
| depends_on | none |
|
||||
| healthcheck | `ollama list` (verify API responsive) |
|
||||
|
||||
### ocr-sidecar
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| build | `./specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar` |
|
||||
| network | dms-internal only |
|
||||
| ports | none (expose 8765 internal only) |
|
||||
| volumes | asustor_uploads → /mnt/uploads (read-only) |
|
||||
| depends_on | ollama |
|
||||
| env | OLLAMA_API_URL=http://ollama:11434, OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads |
|
||||
| healthcheck | `curl -f http://localhost:8765/health` |
|
||||
|
||||
### backend
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| image | `lcbp3-backend:${BACKEND_IMAGE_TAG:-latest}` |
|
||||
| networks | dms-internal + dms-frontend |
|
||||
| ports | 3001:3000 (published to LAN — NPM routes `backend.np-dms.work` → :3001) |
|
||||
| volumes | asustor_uploads → /app/uploads (read-write) |
|
||||
| depends_on | ollama, ocr-sidecar, redis, mariadb, elasticsearch, qdrant, clamav |
|
||||
| env | OCR_API_URL=http://ocr-sidecar:8765, OLLAMA_API_URL=http://ollama:11434, DB_HOST=mariadb, REDIS_HOST=redis, ELASTICSEARCH_HOST=elasticsearch, QDRANT_HOST=qdrant |
|
||||
| healthcheck | `curl -f http://localhost:3000/health` |
|
||||
| memory_limit | 2G |
|
||||
|
||||
### frontend
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| image | `lcbp3-frontend:${FRONTEND_IMAGE_TAG:-latest}` |
|
||||
| networks | dms-frontend only |
|
||||
| ports | 3000:3000 (published to LAN) |
|
||||
| depends_on | backend |
|
||||
| env | INTERNAL_API_URL=http://backend:3000/api |
|
||||
| healthcheck | `curl -f http://localhost:3000/` |
|
||||
| memory_limit | 1G |
|
||||
|
||||
### redis
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| image | `redis:7-alpine` |
|
||||
| network | dms-internal only |
|
||||
| ports | none (expose 6379 internal only) |
|
||||
| volumes | redis_data → /data |
|
||||
| command | `redis-server --requirepass ${REDIS_PASSWORD} --appendonly yes --maxmemory-policy noeviction` |
|
||||
| healthcheck | `redis-cli -a ${REDIS_PASSWORD} --no-auth-warning ping` |
|
||||
| memory_limit | 1G |
|
||||
|
||||
### mariadb
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| image | `mariadb:11.8` |
|
||||
| network | dms-internal only |
|
||||
| ports | none (expose 3306 internal only) |
|
||||
| volumes | mariadb_data → /var/lib/mysql |
|
||||
| env | MARIADB_ROOT_PASSWORD, MARIADB_DATABASE=lcbp3, MARIADB_USER=center |
|
||||
| command | `--character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci` |
|
||||
| healthcheck | `healthcheck.sh --connect --innodb_initialized` |
|
||||
| memory_limit | 8G |
|
||||
|
||||
### elasticsearch
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| image | `elasticsearch:8.11.1` |
|
||||
| network | dms-internal only |
|
||||
| ports | none (expose 9200 internal only) |
|
||||
| volumes | es_data → /usr/share/elasticsearch/data |
|
||||
| env | discovery.type=single-node, xpack.security.enabled=false, ES_JAVA_OPTS=-Xms2g -Xmx2g |
|
||||
| healthcheck | `curl -s http://localhost:9200/_cluster/health` |
|
||||
| memory_limit | 4G |
|
||||
|
||||
### qdrant
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| image | `qdrant/qdrant:v1.16.1` |
|
||||
| network | dms-internal only |
|
||||
| ports | none (expose 6333 internal only) |
|
||||
| volumes | qdrant_data → /qdrant/storage |
|
||||
| healthcheck | TCP check on port 6333 |
|
||||
| memory_limit | 1G |
|
||||
|
||||
### clamav
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| image | `clamav/clamav:1.4.4` |
|
||||
| network | dms-internal only |
|
||||
| ports | none (expose 3310 internal only) |
|
||||
| healthcheck | `clamdcheck.sh` |
|
||||
| memory_limit | 2G |
|
||||
|
||||
### ollama-metrics
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| image | `ghcr.io/norskhelsenett/ollama-metrics:latest` |
|
||||
| network | dms-internal only |
|
||||
| ports | 9924:9924 (published to LAN — Prometheus on ASUSTOR scrapes `http://<new-host-ip>:9924/metrics`) |
|
||||
| env | OLLAMA_HOST=http://ollama:11434 |
|
||||
| memory_limit | 256M |
|
||||
|
||||
## Service Communication Map
|
||||
|
||||
```
|
||||
LAN (VLAN 10)
|
||||
│
|
||||
├── :3000 (Frontend) ──→ http://backend:3000/api (dms-frontend)
|
||||
├── :3001 (Backend) ──→ http://backend:3000/api (dms-frontend)
|
||||
└── :9924 (ollama-metrics) ──→ Prometheus scrape target
|
||||
│
|
||||
├──→ mariadb:3306 (dms-internal)
|
||||
├──→ redis:6379 (dms-internal)
|
||||
├──→ elasticsearch:9200 (dms-internal)
|
||||
├──→ qdrant:6333 (dms-internal)
|
||||
├──→ clamav:3310 (dms-internal)
|
||||
├──→ ocr-sidecar:8765 (dms-internal)
|
||||
└──→ ollama:11434 (dms-internal)
|
||||
```
|
||||
|
||||
## Path Mapping
|
||||
|
||||
| Service | Container Path | Source |
|
||||
|---------|---------------|--------|
|
||||
| Backend | `/app/uploads/temp` | ASUSTOR CIFS `/data/uploads/temp` |
|
||||
| Backend | `/app/uploads/permanent` | ASUSTOR CIFS `/data/uploads/permanent` |
|
||||
| Sidecar | `/mnt/uploads/temp` (read-only) | ASUSTOR CIFS `/data/uploads/temp` |
|
||||
| Sidecar | `/mnt/uploads/permanent` (read-only) | ASUSTOR CIFS `/data/uploads/permanent` |
|
||||
|
||||
**Note**: Backend uses `/app/uploads` (read-write), Sidecar uses `/mnt/uploads` (read-only). Both map to the same ASUSTOR CIFS share. Path remapping in `ocr.service.ts` (`remapPath()`) continues to work — strip `/app/uploads` and replace with `/mnt/uploads`.
|
||||
@@ -0,0 +1,124 @@
|
||||
// File: specs/100-Infrastructures/141-server-consolidation/plan.md
|
||||
// Change Log:
|
||||
// - 2026-06-20: Initial implementation plan for Single-Host Server Consolidation
|
||||
|
||||
# Implementation Plan: Single-Host Server Consolidation
|
||||
|
||||
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20 | **Spec**: [spec.md](./spec.md)
|
||||
**Input**: Feature specification from `/specs/100-Infrastructures/141-server-consolidation/spec.md`
|
||||
**Related ADRs**: [ADR-041](../../06-Decision-Records/ADR-041-server-consolidation.md), [ADR-040](../../06-Decision-Records/ADR-040-ocr-sidecar-refactor.md)
|
||||
|
||||
## Summary
|
||||
|
||||
Consolidate all LCBP3-DMS services from a 2-host architecture (QNAP NAS + Desk-5439) onto a single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB). ASUSTOR becomes primary NAS for file storage via CIFS. Docker internal bridge network isolates Ollama and OCR Sidecar from LAN, enabling removal of X-API-Key auth (ADR-040 D5). QNAP becomes backup server; Desk-5439 is retired.
|
||||
|
||||
## Technical Context
|
||||
|
||||
**Language/Version**: Docker Compose v2 (YAML), Bash scripts, PowerShell provisioning
|
||||
**Primary Dependencies**: Docker Engine 24+, Docker Compose v2, NVIDIA Container Toolkit, CIFS Utils
|
||||
**Storage**: MariaDB 11.8 (Docker volume), Elasticsearch 8.11 (Docker volume), Redis 7 (Docker volume), Qdrant v1.16 (Docker volume), ASUSTOR CIFS for file uploads
|
||||
**Testing**: Smoke tests (manual + scripted), health check endpoints, data parity verification scripts
|
||||
**Target Platform**: Linux (Ubuntu 22.04 LTS or Debian 12) on Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB
|
||||
**Project Type**: Infrastructure (Docker Compose stack + provisioning scripts)
|
||||
**Performance Goals**: Backend-to-Ollama latency <50ms (localhost vs ~2ms LAN), all containers healthy within 5 min
|
||||
**Constraints**: 32GB RAM total (target <28GB usage), 16GB VRAM (target <15GB usage), CIFS mount reliability
|
||||
**Scale/Scope**: 8 containers (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, ES, Qdrant) + ClamAV + ollama-metrics
|
||||
|
||||
## Constitution Check
|
||||
|
||||
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
|
||||
|
||||
| Principle | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| ADR-016 Security | ✅ Pass | Network isolation replaces API key; no ports published for internal services |
|
||||
| ADR-019 UUID | ✅ Pass | No UUID changes — infrastructure only |
|
||||
| ADR-009 Schema | ✅ Pass | No schema changes — data migration via dump/restore |
|
||||
| ADR-023/023A AI Boundary | ✅ Pass | Ollama isolated on Docker internal network; no direct DB/storage access |
|
||||
| ADR-040 D5 Network Auth | ✅ Pass | Docker bridge isolation enables X-API-Key removal |
|
||||
| ADR-008 BullMQ | ✅ Pass | Redis co-located on same host; queue behavior unchanged |
|
||||
| ADR-002 Document Numbering | ✅ Pass | Redis Redlock unchanged; co-located reduces lock latency |
|
||||
| SPOF Risk | ⚠️ Acknowledged | Single host = SPOF; mitigated by QNAP backup + DR plan |
|
||||
|
||||
**Gate Result**: PASS — no violations. SPOF risk is acknowledged in ADR-041 with mitigation plan.
|
||||
|
||||
## Project Structure
|
||||
|
||||
### Documentation (this feature)
|
||||
|
||||
```text
|
||||
specs/100-Infrastructures/141-server-consolidation/
|
||||
├── spec.md # Feature specification
|
||||
├── plan.md # This file
|
||||
├── research.md # Phase 0 output — research findings
|
||||
├── data-model.md # Phase 1 output — infrastructure data model
|
||||
├── quickstart.md # Phase 1 output — deployment guide
|
||||
├── contracts/ # Phase 1 output — docker-compose contracts
|
||||
│ └── docker-compose.new-host.yml
|
||||
├── checklists/
|
||||
│ └── requirements.md # Spec quality checklist
|
||||
└── tasks.md # Phase 2 output (/speckit.tasks command)
|
||||
```
|
||||
|
||||
### Source Code (repository root)
|
||||
|
||||
```text
|
||||
specs/04-Infrastructure-OPS/04-00-docker-compose/
|
||||
├── New-Host/ # NEW — consolidated host
|
||||
│ ├── docker-compose.new-host.yml # Unified compose for all 8+ services
|
||||
│ ├── .env.template # Environment template for new host
|
||||
│ ├── ocr-sidecar/ # Sidecar (copied from Desk-5439, adapted)
|
||||
│ │ ├── Dockerfile
|
||||
│ │ ├── app.py
|
||||
│ │ └── requirements.txt
|
||||
│ ├── scripts/
|
||||
│ │ ├── provision-host.sh # OS prep + Docker + NVIDIA toolkit
|
||||
│ │ ├── migrate-mariadb.sh # Dump from QNAP → restore to new host
|
||||
│ │ ├── migrate-elasticsearch.sh # Snapshot from QNAP → restore to new host
|
||||
│ │ ├── smoke-test.sh # Post-cutover verification
|
||||
│ │ └── rollback.sh # Emergency rollback to QNAP + Desk-5439
|
||||
│ └── README.md # Deployment guide for new host
|
||||
├── QNAP/ # EXISTING — becomes backup
|
||||
├── Desk-5439/ # EXISTING — retired after cutover
|
||||
└── ASUSTOR/ # EXISTING — Gitea runner stays
|
||||
```
|
||||
|
||||
**Structure Decision**: New `New-Host/` directory under existing `04-00-docker-compose/` follows the established per-host directory pattern (QNAP/, Desk-5439/, ASUSTOR/). The unified compose file replaces the split QNAP/app + QNAP/service + QNAP/mariadb + Desk-5439/ocr-sidecar pattern with a single stack.
|
||||
|
||||
## Complexity Tracking
|
||||
|
||||
> No constitution check violations — table not needed.
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Provision New Host (T001-T002)
|
||||
- Install Ubuntu 22.04 LTS / Debian 12
|
||||
- Install Docker Engine + Docker Compose v2
|
||||
- Install NVIDIA drivers + nvidia-container-toolkit
|
||||
- Mount ASUSTOR CIFS share to `/mnt/uploads`
|
||||
- Create directory structure for Docker volumes
|
||||
|
||||
### Phase 2: Create Unified Docker Compose (T003-T005)
|
||||
- Write `docker-compose.new-host.yml` with all services
|
||||
- Configure `dms-internal` bridge network (no LAN publish for Ollama/sidecar)
|
||||
- Configure `dms-frontend` bridge network (Frontend + Backend published)
|
||||
- Copy OCR sidecar code from Desk-5439, adapt for Docker-internal Ollama URL
|
||||
- Configure per-container memory limits per ADR-041 D5
|
||||
|
||||
### Phase 3: Migrate Data (T006-T007)
|
||||
- Dump MariaDB from QNAP → restore to new host container
|
||||
- Snapshot Elasticsearch from QNAP → restore to new host container
|
||||
- Verify row count + document count parity
|
||||
- Verify CIFS file access from backend container
|
||||
|
||||
### Phase 4: Cutover (T008-T010)
|
||||
- Update Gitea CI/CD deploy target to new host
|
||||
- Deploy services on new host
|
||||
- Run smoke tests (login, document CRUD, OCR, AI, search)
|
||||
- Remove X-API-Key from sidecar + backend (ADR-040 D5)
|
||||
- Update DNS/NPM to point to new host
|
||||
|
||||
### Phase 5: Decommission (T011-T012)
|
||||
- Stop services on QNAP (retain data for backup)
|
||||
- Retire Desk-5439 (power off or repurpose)
|
||||
- Monitor RAM/VRAM for 24-48 hours
|
||||
- Document rollback procedure
|
||||
@@ -0,0 +1,154 @@
|
||||
// File: specs/100-Infrastructures/141-server-consolidation/quickstart.md
|
||||
// Change Log:
|
||||
// - 2026-06-20: Initial quickstart guide for Single-Host Server Consolidation
|
||||
|
||||
# Quickstart: Single-Host Server Consolidation
|
||||
|
||||
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- New host with Ubuntu 22.04 LTS or Debian 12 installed
|
||||
- Ryzen 5 5600 / 32GB RAM / RTX 5060 Ti 16GB
|
||||
- Network access to VLAN 10 (192.168.10.x)
|
||||
- ASUSTOR NAS accessible at 192.168.10.9 with CIFS share `np-dms-as`
|
||||
- SSH access to QNAP (192.168.10.8) for data migration
|
||||
- Gitea CI/CD access for deploy target update
|
||||
|
||||
## Step 1: Provision Host
|
||||
|
||||
```bash
|
||||
# Run on new host (as root or sudo user)
|
||||
cd /opt/lcbp3
|
||||
bash specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/provision-host.sh
|
||||
```
|
||||
|
||||
This script:
|
||||
1. Installs Docker Engine + Docker Compose v2
|
||||
2. Installs NVIDIA drivers + nvidia-container-toolkit
|
||||
3. Creates CIFS mount for ASUSTOR at `/mnt/uploads`
|
||||
4. Creates Docker volume directories
|
||||
5. Verifies GPU access with `nvidia-smi`
|
||||
|
||||
## Step 2: Prepare .env
|
||||
|
||||
```bash
|
||||
cd /opt/lcbp3/specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host
|
||||
cp .env.template .env
|
||||
# Edit .env with real values:
|
||||
# - ASUSTOR_USER, ASUSTOR_PASS (CIFS credentials)
|
||||
# - DB_PASSWORD, DB_ROOT_PASSWORD (from QNAP .env)
|
||||
# - REDIS_PASSWORD (from QNAP .env)
|
||||
# - JWT_SECRET, JWT_REFRESH_SECRET (from QNAP .env)
|
||||
# - AUTH_SECRET (from QNAP .env)
|
||||
# - ELASTICSEARCH_PASSWORD (from QNAP .env)
|
||||
```
|
||||
|
||||
## Step 3: Migrate Data
|
||||
|
||||
```bash
|
||||
# Migrate MariaDB (from QNAP to new host)
|
||||
bash scripts/migrate-mariadb.sh
|
||||
|
||||
# Migrate Elasticsearch (from QNAP to new host)
|
||||
bash scripts/migrate-elasticsearch.sh
|
||||
|
||||
# Verify parity
|
||||
bash scripts/verify-data-parity.sh
|
||||
```
|
||||
|
||||
## Step 4: Deploy Services
|
||||
|
||||
```bash
|
||||
# Pull latest images from Gitea registry
|
||||
docker compose --env-file .env -f docker-compose.new-host.yml pull
|
||||
|
||||
# Start all services
|
||||
docker compose --env-file .env -f docker-compose.new-host.yml up -d
|
||||
|
||||
# Check health
|
||||
docker compose -f docker-compose.new-host.yml ps
|
||||
docker compose -f docker-compose.new-host.yml logs --tail=50
|
||||
```
|
||||
|
||||
## Step 5: Smoke Test
|
||||
|
||||
```bash
|
||||
# Run smoke tests
|
||||
bash scripts/smoke-test.sh
|
||||
```
|
||||
|
||||
Smoke tests verify:
|
||||
- Backend health check (`GET http://localhost:3001/health`)
|
||||
- Frontend accessible (`GET http://localhost:3000/`)
|
||||
- Login flow (POST /api/auth/login)
|
||||
- Document list (GET /api/correspondences)
|
||||
- OCR endpoint (POST /api/ai/sandbox/ocr)
|
||||
- AI inference (POST /api/ai/sandbox/extract)
|
||||
- Full-text search (GET /api/search)
|
||||
|
||||
## Step 6: Update CI/CD
|
||||
|
||||
Update Gitea secrets:
|
||||
- `HOST` → new host IP (e.g., `192.168.10.50`)
|
||||
- `COMPOSE_FILE` → `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
|
||||
|
||||
## Step 7: Cutover DNS
|
||||
|
||||
Update NPM (Nginx Proxy Manager) on QNAP:
|
||||
- `lcbp3.np-dms.work` → new host IP
|
||||
- `backend.np-dms.work` → new host IP
|
||||
|
||||
## Step 8: Remove X-API-Key (ADR-040 D5)
|
||||
|
||||
After verifying Docker-internal network isolation:
|
||||
1. Remove `OCR_SIDECAR_API_KEY` from sidecar environment
|
||||
2. Remove API key validation from `app.py`
|
||||
3. Remove `X-API-Key` header from backend `ocr.service.ts`
|
||||
4. Rebuild and redeploy sidecar + backend
|
||||
|
||||
## Step 9: Monitor (24-48 hours)
|
||||
|
||||
```bash
|
||||
# Monitor RAM usage
|
||||
docker stats --no-stream
|
||||
|
||||
# Monitor VRAM usage
|
||||
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 60
|
||||
|
||||
# Monitor container health
|
||||
watch -n 30 'docker compose -f docker-compose.new-host.yml ps'
|
||||
```
|
||||
|
||||
## Step 10: Decommission Old Hosts
|
||||
|
||||
After 24-48 hours of stable operation:
|
||||
|
||||
```bash
|
||||
# Stop QNAP services (retain data for backup)
|
||||
ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose down'
|
||||
ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose down'
|
||||
|
||||
# Power off Desk-5439
|
||||
ssh user@192.168.10.100 'sudo shutdown -h now'
|
||||
```
|
||||
|
||||
## Rollback (Emergency)
|
||||
|
||||
```bash
|
||||
# Stop new host services
|
||||
docker compose -f docker-compose.new-host.yml down
|
||||
|
||||
# Restore QNAP services
|
||||
ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose up -d'
|
||||
ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose up -d'
|
||||
|
||||
# Restore Desk-5439 services
|
||||
ssh user@192.168.10.100 'cd /opt/ocr-sidecar && docker compose up -d'
|
||||
|
||||
# Revert DNS
|
||||
# Update NPM to point back to QNAP (192.168.10.8)
|
||||
|
||||
# Revert CI/CD
|
||||
# Update Gitea secrets HOST back to 192.168.10.8
|
||||
```
|
||||
@@ -0,0 +1,139 @@
|
||||
// File: specs/100-Infrastructures/141-server-consolidation/research.md
|
||||
// Change Log:
|
||||
// - 2026-06-20: Initial research for Single-Host Server Consolidation
|
||||
|
||||
# Research: Single-Host Server Consolidation
|
||||
|
||||
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
|
||||
|
||||
## R1: Docker Network Isolation Strategy
|
||||
|
||||
**Decision**: Use two Docker bridge networks — `dms-internal` (all services) and `dms-frontend` (Frontend + Backend only, for LAN publish).
|
||||
|
||||
**Rationale**: Docker bridge networks provide L2 isolation. Services on `dms-internal` without `ports` mapping are unreachable from LAN. Only Frontend (3000) and Backend (3000) need LAN access. This replaces VLAN/firewall ACL reliance with Docker-native isolation.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Single bridge network + iptables rules — more complex, error-prone
|
||||
- Docker Swarm overlay network — overkill for single host
|
||||
- Host network mode — no isolation, security risk
|
||||
|
||||
## R2: CIFS Mount Strategy for ASUSTOR
|
||||
|
||||
**Decision**: Use Docker named volume with CIFS driver to mount ASUSTOR share `//192.168.10.9/np-dms-as/data/uploads` as `asustor_uploads` volume, mounted at `/mnt/uploads` in sidecar and `/app/uploads` in backend.
|
||||
|
||||
**Rationale**: Docker CIFS volume driver handles mount lifecycle with container start/stop. Credentials in `.env` (gitignored). Both backend and sidecar see the same files via the same CIFS mount point.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Host-level `mount -t cifs` then bind mount — requires host OS config, not portable
|
||||
- SSHFS — slower than CIFS for file operations
|
||||
- Sync files to local SSD — adds complexity, storage duplication
|
||||
|
||||
**Key Consideration**: Previous Desk-5439 setup had issues with Docker Desktop WSL2 + CIFS (see memory). On Linux host, CIFS volume driver works natively without WSL2 layer.
|
||||
|
||||
## R3: MariaDB Migration Strategy
|
||||
|
||||
**Decision**: Use `mariadb-dump` (logical dump) from QNAP MariaDB 11.8, pipe directly to new host MariaDB 11.8 container.
|
||||
|
||||
**Rationale**: Same MariaDB version (11.8) on both hosts → logical dump is safest. Database is small enough (<10GB estimated) that dump/restore completes within maintenance window.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- `mariabackup` (physical backup) — faster but requires same filesystem layout
|
||||
- Replication (binlog) — overkill for one-time migration
|
||||
- Copy raw data files — risky, requires same version + config
|
||||
|
||||
**Migration Command**:
|
||||
```bash
|
||||
# From QNAP (source) — dump all databases
|
||||
mariadb-dump --single-transaction --routines --triggers \
|
||||
-h 127.0.0.1 -u root -p"$DB_ROOT_PASSWORD" \
|
||||
--all-databases > qnap-full-dump.sql
|
||||
|
||||
# On new host — restore
|
||||
docker exec -i lcbp3-mariadb mariadb -u root -p"$DB_ROOT_PASSWORD" < qnap-full-dump.sql
|
||||
```
|
||||
|
||||
## R4: Elasticsearch Migration Strategy
|
||||
|
||||
**Decision**: Use ES snapshot/restore API — create snapshot on QNAP ES, transfer to new host, restore.
|
||||
|
||||
**Rationale**: ES snapshot API is the official migration path. Handles index mappings, settings, and data. Works across same ES version (8.11.x).
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Copy raw data directory — risky, requires identical ES config
|
||||
- Re-index from MariaDB — slow, loses search index tuning
|
||||
- Logstash pipeline — overkill for one-time migration
|
||||
|
||||
**Migration Steps**:
|
||||
1. Register shared filesystem repo on QNAP ES
|
||||
2. Create snapshot of all indices
|
||||
3. Copy snapshot files to new host ES data volume
|
||||
4. Register repo on new host ES
|
||||
5. Restore snapshot
|
||||
|
||||
## R5: GPU VRAM Management on Single Host
|
||||
|
||||
**Decision**: Rely on ADR-040 D3 (Adaptive OCR Residency via `calculate_ocr_residency()`) and ADR-040 D4 (CPU Fallback Retrieval). LLM-First GPU Ownership from CONTEXT.md.
|
||||
|
||||
**Rationale**: RTX 5060 Ti 16GB must serve:
|
||||
- np-dms-ai (Typhoon-2.5 ~7-8B): ~6-8GB VRAM
|
||||
- np-dms-ocr (Typhoon OCR): ~5GB VRAM
|
||||
- nomic-embed-text: ~0.5GB VRAM
|
||||
- CUDA overhead: ~1.5GB
|
||||
- Total: ~13-15GB → tight but feasible with adaptive residency
|
||||
|
||||
**Key Policy**: When LLM (np-dms-ai) needs to load, OCR model is unloaded first (`keep_alive=0` for OCR). BGE-M3 + Reranker use CPU fallback when GPU is occupied.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Force GPU-resident for all models — OOM risk (15.5GB > 16GB with overhead)
|
||||
- CPU-only for all AI — too slow for production
|
||||
- Second GPU — not available on new host
|
||||
|
||||
## R6: RAM Budget Allocation
|
||||
|
||||
**Decision**: Per-container memory limits in Docker Compose:
|
||||
|
||||
| Service | Memory Limit | Notes |
|
||||
|---------|-------------|-------|
|
||||
| MariaDB | 8G | Largest consumer, tune innodb_buffer_pool |
|
||||
| Elasticsearch | 4G | ES_JAVA_OPTS=-Xms2g -Xmx2g |
|
||||
| Backend (NestJS) | 2G | Node.js + BullMQ workers |
|
||||
| Frontend (Next.js) | 1G | Standalone mode |
|
||||
| Redis | 1G | In-memory + AOF |
|
||||
| Qdrant | 1G | Vector DB |
|
||||
| OCR Sidecar | 1G | Python + PyMuPDF |
|
||||
| Ollama | 2G | Model loading + inference |
|
||||
| ClamAV | 2G | Virus definitions |
|
||||
| ollama-metrics | 256M | Lightweight proxy |
|
||||
| **Total** | **~22.3G** | Leaves ~9.7G for OS + swap |
|
||||
|
||||
**Rationale**: 32GB total - 22.3GB containers = ~9.7GB for OS kernel + page cache + swap. Comfortable margin.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- No limits — risk of OOM killer affecting critical services
|
||||
- Tighter limits — may cause ES/MariaDB instability
|
||||
|
||||
## R7: CI/CD Pipeline Update
|
||||
|
||||
**Decision**: Update Gitea Actions `ci-deploy.yml` to SSH-deploy to new host IP instead of QNAP IP. ASUSTOR Gitea runner stays unchanged.
|
||||
|
||||
**Rationale**: Gitea runner on ASUSTOR (192.168.10.9) can reach new host via VLAN 10. Only the deploy target IP changes. `deploy.sh` path to compose file updates to `New-Host/docker-compose.new-host.yml`.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- Move Gitea runner to new host — unnecessary, runner works remotely
|
||||
- Manual deployment — not sustainable for ongoing releases
|
||||
|
||||
## R8: Rollback Strategy
|
||||
|
||||
**Decision**: Multi-step rollback plan documented in `rollback.sh`:
|
||||
1. Stop services on new host (`docker compose down`)
|
||||
2. Restore services on QNAP (start existing containers with old data)
|
||||
3. Restore services on Desk-5439 (start Ollama + sidecar)
|
||||
4. Revert DNS/NPM to point to QNAP
|
||||
5. Revert Gitea CI/CD deploy target to QNAP
|
||||
6. Re-enable X-API-Key in sidecar + backend
|
||||
|
||||
**Rationale**: QNAP retains all data (MariaDB, ES, Redis, files) until verified stable. Rollback is fast (<2 hours) because old infrastructure is intact.
|
||||
|
||||
**Alternatives Considered**:
|
||||
- No rollback (accept SPOF) — too risky for production DMS
|
||||
- Hot failover with replication — overkill for current scale
|
||||
@@ -0,0 +1,160 @@
|
||||
// File: specs/100-Infrastructures/141-server-consolidation/spec.md
|
||||
// Change Log:
|
||||
// - 2026-06-20: Initial specification for Single-Host Server Consolidation (ADR-041)
|
||||
|
||||
# Feature Specification: Single-Host Server Consolidation
|
||||
|
||||
**Feature Branch**: `141-server-consolidation`
|
||||
**Created**: 2026-06-20
|
||||
**Status**: Draft
|
||||
**Category**: 100-Infrastructures
|
||||
**Input**: ADR-041 — Consolidate all LCBP3-DMS services onto a single Docker host with ASUSTOR as primary NAS.
|
||||
**Related ADRs**: [ADR-041](../../06-Decision-Records/ADR-041-server-consolidation.md), [ADR-040](../../06-Decision-Records/ADR-040-ocr-sidecar-refactor.md), [ADR-016](../../06-Decision-Records/ADR-016-security-authentication.md), [ADR-023A](../../06-Decision-Records/ADR-023A-unified-ai-architecture.md), [ADR-034](../../06-Decision-Records/ADR-034-AI-model-change.md)
|
||||
|
||||
## User Scenarios & Testing _(mandatory)_
|
||||
|
||||
### User Story 1 - Provision and Deploy on New Host (Priority: P1)
|
||||
|
||||
System administrator provisions the new single host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB), installs Docker, mounts CIFS share from ASUSTOR, and deploys all services (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, Elasticsearch) using a single Docker Compose stack with internal bridge network isolation.
|
||||
|
||||
**Why this priority**: Without a running host, no other work can proceed. This is the foundation for all subsequent stories.
|
||||
|
||||
**Independent Test**: Can be fully tested by running `docker compose up` on the new host and verifying all containers are healthy via `docker ps` and health check endpoints.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** a fresh OS installation on the new host, **When** the administrator runs the provisioning script, **Then** Docker Engine and Docker Compose are installed and verified with `docker --version`
|
||||
2. **Given** Docker is installed, **When** the administrator mounts the ASUSTOR CIFS share, **Then** `/mnt/uploads/temp` and `/mnt/uploads/permanent` are accessible and writable by containers
|
||||
3. **Given** CIFS mounts are ready, **When** the administrator runs `docker compose up -d`, **Then** all 7 service containers start and report healthy within 5 minutes
|
||||
4. **Given** all containers are running, **When** the administrator checks network isolation, **Then** Ollama and OCR Sidecar ports are NOT accessible from LAN (only Frontend port 3000 and Backend port 3000 are published)
|
||||
|
||||
---
|
||||
|
||||
### User Story 2 - Migrate Data from QNAP to New Host (Priority: P2)
|
||||
|
||||
Database administrator migrates MariaDB data and Elasticsearch indices from QNAP to the new host, ensuring zero data loss and minimal downtime.
|
||||
|
||||
**Why this priority**: Data migration is the critical path for cutover. Without migrated data, the new host cannot serve production traffic.
|
||||
|
||||
**Independent Test**: Can be tested by comparing row counts and index document counts between source (QNAP) and destination (new host) after migration.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** the new host is running with empty MariaDB, **When** the administrator performs a database dump-and-restore from QNAP, **Then** all tables and row counts match the source exactly
|
||||
2. **Given** the new host is running with empty Elasticsearch, **When** the administrator migrates indices from QNAP, **Then** all index document counts match the source exactly
|
||||
3. **Given** data migration is complete, **When** the administrator runs a data integrity check script, **Then** all critical tables pass checksum verification with zero discrepancies
|
||||
4. **Given** file storage is on ASUSTOR CIFS mount, **When** the administrator verifies file access from the backend container, **Then** all existing uploaded files are accessible at the expected paths
|
||||
|
||||
---
|
||||
|
||||
### User Story 3 - Cutover and Smoke Test (Priority: P3)
|
||||
|
||||
Operations team performs the cutover from the old 2-host architecture (QNAP + Desk-5439) to the new single host, updates DNS/network routing, and runs smoke tests to verify all system functions work end-to-end.
|
||||
|
||||
**Why this priority**: Cutover is the final step that makes the new host production-active. It depends on P1 and P2 being complete.
|
||||
|
||||
**Independent Test**: Can be tested by accessing the application via the new host's IP/hostname and performing core DMS operations (login, document upload, search, AI inference).
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** data migration is verified, **When** the administrator updates DNS to point to the new host, **Then** users accessing the application URL reach the new host within the DNS TTL period
|
||||
2. **Given** DNS is updated, **When** a user logs in and creates a new Correspondence, **Then** the document is saved successfully and visible in the list
|
||||
3. **Given** the system is live on the new host, **When** a user uploads a PDF and triggers OCR, **Then** OCR text extraction completes successfully via the internal Docker network (sidecar → Ollama)
|
||||
4. **Given** the system is live, **When** a user performs a full-text search, **Then** Elasticsearch returns results with the same accuracy as before migration
|
||||
5. **Given** the system is live, **When** a user triggers AI metadata extraction, **Then** the AI inference completes successfully via the internal Docker network (backend → Ollama)
|
||||
|
||||
---
|
||||
|
||||
### User Story 4 - Remove X-API-Key and Verify Network-Only Auth (Priority: P4)
|
||||
|
||||
Security administrator removes the `X-API-Key` header authentication from the OCR Sidecar and Backend, relying solely on Docker-internal network isolation as per ADR-040 D5.
|
||||
|
||||
**Why this priority**: This is a key security improvement enabled by the consolidation. It simplifies the architecture but must be validated carefully.
|
||||
|
||||
**Independent Test**: Can be tested by attempting to access sidecar endpoints from outside the Docker network (should fail) and from within the Docker network (should succeed without API key).
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** all services are on the Docker internal bridge, **When** the backend calls the sidecar without `X-API-Key`, **Then** the sidecar processes the request successfully
|
||||
2. **Given** the sidecar is not publishing ports to LAN, **When** an external client attempts to reach the sidecar directly, **Then** the connection is refused
|
||||
3. **Given** the `X-API-Key` code is removed, **When** the administrator reviews the sidecar and backend configuration, **Then** no hardcoded API keys remain in the codebase
|
||||
|
||||
---
|
||||
|
||||
### User Story 5 - Decommission Old Hosts (Priority: P5)
|
||||
|
||||
Operations team stops services on QNAP (which becomes backup server) and retires Desk-5439, completing the consolidation.
|
||||
|
||||
**Why this priority**: Cleanup is the final step after the new host is verified stable. It frees up old hardware and reduces management complexity.
|
||||
|
||||
**Independent Test**: Can be tested by verifying that QNAP services are stopped (except backup-related) and Desk-5439 is powered off or repurposed.
|
||||
|
||||
**Acceptance Scenarios**:
|
||||
|
||||
1. **Given** the new host has been stable for 24-48 hours, **When** the administrator stops backend/frontend/Redis/DB/ES services on QNAP, **Then** QNAP remains available as a backup server with data intact
|
||||
2. **Given** QNAP services are stopped, **When** the administrator powers off Desk-5439, **Then** no LCBP3-DMS services are affected on the new host
|
||||
3. **Given** old hosts are decommissioned, **When** the administrator verifies monitoring dashboards, **Then** only the new host is tracked as the active production host
|
||||
|
||||
---
|
||||
|
||||
### Edge Cases
|
||||
|
||||
- **GPU OOM during concurrent AI + OCR load**: What happens when np-dms-ai and np-dms-ocr are loaded simultaneously and VRAM exceeds 16GB? ADR-040 D3 (Adaptive OCR Residency) must unload OCR model to make room for LLM.
|
||||
- **RAM exhaustion under heavy load**: What happens when MariaDB + Elasticsearch + CPU-fallback tensors consume more than 32GB? System must have swap space configured and memory limits per container.
|
||||
- **CIFS mount failure**: What happens when ASUSTOR NAS is unreachable? File upload/download will fail; system must degrade gracefully with clear error messages.
|
||||
- **Single host hardware failure**: What happens when the new host crashes? SPOF mitigation requires backup data on QNAP and a disaster recovery plan.
|
||||
- **Network misconfiguration**: What happens if Docker bridge network is accidentally exposed? Sidecar and Ollama would be accessible from LAN, breaking the security model.
|
||||
- **Database migration partial failure**: What happens if MariaDB migration fails midway? Rollback plan must restore QNAP as the active database host.
|
||||
- **Elasticsearch index corruption during migration**: What happens if ES indices are corrupted during transfer? Re-indexing from MariaDB data must be available as a fallback.
|
||||
|
||||
## Requirements _(mandatory)_
|
||||
|
||||
### Functional Requirements
|
||||
|
||||
- **FR-001**: System MUST co-locate all 7 services (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, Elasticsearch) on a single Docker host with a unified `docker-compose.yml`
|
||||
- **FR-002**: System MUST use ASUSTOR (192.168.10.9) as the primary NAS for file storage via CIFS mount at `/mnt/uploads`
|
||||
- **FR-003**: System MUST isolate Ollama and OCR Sidecar on a Docker internal bridge network (`dms-internal`) with no ports published to LAN
|
||||
- **FR-004**: System MUST publish only Frontend (port 3000) and Backend (port 3000) to the LAN
|
||||
- **FR-005**: System MUST enable backend-to-sidecar and backend-to-Ollama communication via Docker service names (`http://ocr-sidecar:8765`, `http://ollama:11434`)
|
||||
- **FR-006**: System MUST migrate MariaDB data from QNAP to the new host with zero data loss
|
||||
- **FR-007**: System MUST migrate Elasticsearch indices from QNAP to the new host with zero data loss
|
||||
- **FR-008**: System MUST remove `X-API-Key` authentication from sidecar and backend after confirming Docker-internal network isolation (ADR-040 D5)
|
||||
- **FR-009**: System MUST enforce GPU VRAM management via Adaptive OCR Residency (ADR-040 D3) and CPU Fallback Retrieval (ADR-040 D4)
|
||||
- **FR-010**: System MUST configure per-container memory limits to prevent any single service from exhausting 32GB RAM
|
||||
- **FR-011**: System MUST retain QNAP as a backup server with database and file storage data intact after cutover
|
||||
- **FR-012**: System MUST retire Desk-5439 after cutover is verified stable for 24-48 hours
|
||||
- **FR-013**: System MUST provide a rollback plan to restore services on QNAP and Desk-5439 if the new host fails
|
||||
- **FR-014**: System MUST verify all core DMS functions (login, document CRUD, OCR, AI inference, search) work end-to-end on the new host before decommissioning old hosts
|
||||
- **FR-015**: System MUST monitor RAM and VRAM usage for 24-48 hours post-cutover to detect resource pressure
|
||||
|
||||
### Key Entities _(include if feature involves data)_
|
||||
|
||||
- **Docker Compose Stack**: Single `docker-compose.yml` defining all 7 services, 2 networks (`dms-internal`, `dms-frontend`), and volumes (CIFS, named volumes for data)
|
||||
- **CIFS Volume Mount**: ASUSTOR network share mounted as Docker volume for file storage (`/mnt/uploads/temp`, `/mnt/uploads/permanent`)
|
||||
- **Docker Internal Network**: Bridge network (`dms-internal`) isolating Ollama, Sidecar, Backend, Redis, MariaDB, and Elasticsearch from LAN access
|
||||
- **GPU Resource Allocation**: NVIDIA GPU passthrough to Ollama container with VRAM management via adaptive residency policies
|
||||
|
||||
## Success Criteria _(mandatory)_
|
||||
|
||||
### Measurable Outcomes
|
||||
|
||||
- **SC-001**: All 7 service containers start and report healthy within 5 minutes of `docker compose up -d` on the new host
|
||||
- **SC-002**: Database migration completes with 100% row count parity between QNAP and new host for all critical tables
|
||||
- **SC-003**: Elasticsearch migration completes with 100% document count parity between QNAP and new host for all indices
|
||||
- **SC-004**: Core DMS operations (login, document upload, search, OCR, AI inference) complete successfully on the new host with zero functional regressions
|
||||
- **SC-005**: Ollama and OCR Sidecar are unreachable from LAN (port scan returns closed/refused for ports 11434 and 8765)
|
||||
- **SC-006**: Backend-to-Ollama latency is reduced by at least 50% compared to cross-host LAN communication (measured via AI inference response time)
|
||||
- **SC-007**: RAM usage remains below 28GB (87.5% of 32GB) under normal operational load for 24 hours post-cutover
|
||||
- **SC-008**: VRAM usage remains below 15GB (93.7% of 16GB) during concurrent AI inference and OCR workloads
|
||||
- **SC-009**: Rollback plan can be executed within 2 hours to restore services on QNAP and Desk-5439 if needed
|
||||
- **SC-010**: QNAP backup server retains a valid database snapshot within 24 hours of cutover
|
||||
|
||||
### Assumptions
|
||||
|
||||
- The new host hardware (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB) is physically available and OS-installed before provisioning begins
|
||||
- ASUSTOR NAS (192.168.10.9) has sufficient storage capacity for all file uploads (temp + permanent)
|
||||
- Network connectivity between the new host and ASUSTOR is via VLAN 10 with CIFS/SMB 3.0 support
|
||||
- NVIDIA drivers and Docker GPU runtime (nvidia-container-toolkit) are compatible with the RTX 5060 Ti
|
||||
- QNAP data (MariaDB, Elasticsearch) is in a consistent state suitable for dump-and-restore migration
|
||||
- ADR-040 (OCR Sidecar Refactor) is implemented concurrently or prior to cutover for network-only auth and adaptive residency
|
||||
- Gitea CI/CD pipeline can be updated to target the new host for deployment
|
||||
@@ -0,0 +1,221 @@
|
||||
// File: specs/100-Infrastructures/141-server-consolidation/tasks.md
|
||||
// Change Log:
|
||||
// - 2026-06-20: Initial task list for Single-Host Server Consolidation
|
||||
// - 2026-06-20: Fix C1-C5 from analysis: backend env var update, port conflict, GPU residency, ollama-metrics port, n8n endpoints
|
||||
|
||||
# Tasks: Single-Host Server Consolidation
|
||||
|
||||
**Input**: Design documents from `/specs/100-Infrastructures/141-server-consolidation/`
|
||||
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/
|
||||
**Related ADRs**: ADR-041, ADR-040, ADR-016, ADR-023A, ADR-034
|
||||
|
||||
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
|
||||
|
||||
## Format: `[ID] [P?] [Story] Description`
|
||||
|
||||
- **[P]**: Can run in parallel (different files, no dependencies)
|
||||
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
|
||||
- Include exact file paths in descriptions
|
||||
|
||||
## Phase 1: Setup (Shared Infrastructure)
|
||||
|
||||
**Purpose**: Create directory structure and initial files for the new host deployment
|
||||
|
||||
- [ ] T001 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/` directory structure with subdirectories: `ocr-sidecar/`, `scripts/`
|
||||
- [ ] T002 [P] Create `.env.template` at `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.template` with all required env vars from contracts
|
||||
- [ ] T003 [P] Create `README.md` at `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/README.md` with deployment overview
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Foundational (Blocking Prerequisites)
|
||||
|
||||
**Purpose**: Provision the new host OS and create the unified Docker Compose stack — MUST be complete before any user story can proceed
|
||||
|
||||
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
|
||||
|
||||
- [ ] T004 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/provision-host.sh` — installs Docker Engine, Docker Compose v2, NVIDIA drivers, nvidia-container-toolkit, CIFS utils, creates directory structure
|
||||
- [ ] T005 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` — unified compose with all 10 services, 2 networks (dms-internal, dms-frontend), CIFS volume, named volumes, memory limits per data-model.md. Backend publishes `3001:3000` to LAN (NPM routes `backend.np-dms.work` → :3001); Frontend publishes `3000:3000`; ollama-metrics publishes `9924:9924` to LAN for Prometheus scraping from ASUSTOR
|
||||
- [ ] T006 [P] Copy OCR sidecar code from `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/` to `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/` — adapt `OLLAMA_API_URL` to `http://ollama:11434` (Docker DNS), remove `ports` mapping, use `expose` only
|
||||
- [ ] T007 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/Dockerfile` — verify GPU access via nvidia-container-toolkit, ensure poppler-utils installed
|
||||
- [ ] T008 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/requirements.txt` — verify typhoon-ocr, PyMuPDF, httpx, fastapi versions match Desk-5439
|
||||
- [ ] T008b Update backend environment variables for renamed service names: `REDIS_HOST=redis` (was `cache`), `ELASTICSEARCH_HOST=elasticsearch` (was `search`) in `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.template` and `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` backend environment section — these service names changed from QNAP compose where Redis was `cache` and ES was `search`
|
||||
|
||||
**Checkpoint**: New host directory structure and unified compose file ready — user story implementation can now begin
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: User Story 1 - Provision and Deploy on New Host (Priority: P1) 🎯 MVP
|
||||
|
||||
**Goal**: Administrator provisions the new host, mounts ASUSTOR CIFS, and deploys all services with Docker internal network isolation
|
||||
|
||||
**Independent Test**: Run `docker compose up -d` on the new host and verify all containers are healthy via `docker ps` and health check endpoints
|
||||
|
||||
### Implementation for User Story 1
|
||||
|
||||
- [ ] T009 [US1] Run `provision-host.sh` on new host — verify Docker, NVIDIA, CIFS mount at `/mnt/uploads`
|
||||
- [ ] T010 [US1] Pull Ollama models on new host: `ollama pull np-dms-ai:latest`, `ollama pull np-dms-ocr:latest`, `ollama pull nomic-embed-text:latest` — verify with `ollama list`
|
||||
- [ ] T011 [US1] Copy `.env.template` to `.env`, fill in all secrets from QNAP `.env` (DB passwords, JWT secrets, Redis password, ASUSTOR CIFS credentials)
|
||||
- [ ] T012 [US1] Run `docker compose --env-file .env -f docker-compose.new-host.yml up -d` and verify all 10 containers start
|
||||
- [ ] T013 [US1] Verify network isolation: `nmap -p 11434 <new-host-ip>` from another VLAN 10 machine should show closed/refused; `nmap -p 8765` should show closed/refused; `nmap -p 3000` (frontend) and `nmap -p 3001` (backend) should show open; `nmap -p 9924` (ollama-metrics) should show open for Prometheus
|
||||
- [ ] T014 [US1] Verify health checks: `curl http://localhost:3001/health` (backend on published port 3001), `curl http://localhost:3000/` (frontend), `curl http://ocr-sidecar:8765/health` (from inside backend container via Docker DNS)
|
||||
|
||||
**Checkpoint**: All services running on new host with correct network isolation — MVP achieved
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: User Story 2 - Migrate Data from QNAP to New Host (Priority: P2)
|
||||
|
||||
**Goal**: Migrate MariaDB and Elasticsearch data from QNAP to the new host with zero data loss
|
||||
|
||||
**Independent Test**: Compare row counts and index document counts between QNAP (source) and new host (destination) after migration
|
||||
|
||||
### Implementation for User Story 2
|
||||
|
||||
- [ ] T015 [P] [US2] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-mariadb.sh` — dump from QNAP MariaDB 11.8 via `mariadb-dump --single-transaction --routines --triggers`, pipe to new host container
|
||||
- [ ] T016 [P] [US2] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-elasticsearch.sh` — create snapshot on QNAP ES, transfer files, register repo on new host, restore
|
||||
- [ ] T017 [US2] Run `migrate-mariadb.sh` — verify all table row counts match between QNAP and new host
|
||||
- [ ] T018 [US2] Run `migrate-elasticsearch.sh` — verify all index document counts match between QNAP and new host
|
||||
- [ ] T019 [US2] Create and run `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/verify-data-parity.sh` — automated row count + document count comparison script
|
||||
- [ ] T020 [US2] Verify CIFS file access: list files in `/app/uploads/temp` and `/app/uploads/permanent` from backend container, compare with ASUSTOR share
|
||||
|
||||
**Checkpoint**: All data migrated and verified — new host has complete production data
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: User Story 3 - Cutover and Smoke Test (Priority: P3)
|
||||
|
||||
**Goal**: Perform production cutover from old 2-host architecture to new single host, verify all DMS functions work end-to-end
|
||||
|
||||
**Independent Test**: Access application via new host IP, perform core DMS operations (login, document upload, search, AI inference)
|
||||
|
||||
### Implementation for User Story 3
|
||||
|
||||
- [ ] T021 [P] [US3] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/smoke-test.sh` — automated tests for: backend health, frontend accessible, login flow, document list, OCR endpoint, AI inference, full-text search
|
||||
- [ ] T022 [US3] Update Gitea secrets: `HOST` → new host IP, `COMPOSE_FILE` → `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
|
||||
- [ ] T023 [US3] Update `scripts/deploy.sh` — change `COMPOSE_FILE` path to New-Host directory
|
||||
- [ ] T024 [US3] Update NPM (Nginx Proxy Manager) on QNAP: `lcbp3.np-dms.work` → new host IP:3000 (frontend), `backend.np-dms.work` → new host IP:3001 (backend)
|
||||
- [ ] T024b [US3] Update n8n workflow endpoints on QNAP: change all backend API URLs from `http://192.168.10.8:3000/api` (QNAP) to `http://<new-host-ip>:3001/api` (new host) — n8n stays on QNAP but must reach backend on new host via LAN port 3001
|
||||
- [ ] T025 [US3] Run `smoke-test.sh` on new host — verify all 7 smoke tests pass
|
||||
- [ ] T026 [US3] Verify from external machine on VLAN 10: access `https://lcbp3.np-dms.work`, login, create a test Correspondence, upload a PDF, trigger OCR, perform search
|
||||
|
||||
**Checkpoint**: New host is production-active — all DMS functions verified end-to-end
|
||||
|
||||
---
|
||||
|
||||
## Phase 6: User Story 4 - Remove X-API-Key and Verify Network-Only Auth (Priority: P4)
|
||||
|
||||
**Goal**: Remove `X-API-Key` authentication from sidecar and backend, relying solely on Docker-internal network isolation per ADR-040 D5
|
||||
|
||||
**Independent Test**: Attempt to access sidecar from outside Docker network (should fail); verify backend calls sidecar without API key (should succeed)
|
||||
|
||||
### Implementation for User Story 4
|
||||
|
||||
- [ ] T027 [P] [US4] Remove `OCR_SIDECAR_API_KEY` from `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` ocr-sidecar environment
|
||||
- [ ] T028 [P] [US4] Remove API key validation from `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/app.py` — remove `X-API-Key` header check middleware
|
||||
- [ ] T029 [US4] Remove `X-API-Key` header from `backend/src/modules/ai/services/ocr.service.ts` — remove API key from HTTP client headers
|
||||
- [ ] T030 [US4] Remove `OCR_SIDECAR_API_KEY` from `backend/.env.example` and any backend config that sets it
|
||||
- [ ] T031 [US4] Rebuild and redeploy sidecar + backend containers — verify backend can call sidecar without API key
|
||||
- [ ] T032 [US4] Verify external access blocked: `curl http://<new-host-ip>:8765/health` from VLAN 10 machine should fail (connection refused)
|
||||
|
||||
**Checkpoint**: Network-only auth verified — no API key needed, Docker isolation sufficient
|
||||
|
||||
---
|
||||
|
||||
## Phase 7: User Story 5 - Decommission Old Hosts (Priority: P5)
|
||||
|
||||
**Goal**: Stop services on QNAP (becomes backup) and retire Desk-5439, completing the consolidation
|
||||
|
||||
**Independent Test**: Verify QNAP services stopped (except backup), Desk-5439 powered off, new host unaffected
|
||||
|
||||
### Implementation for User Story 5
|
||||
|
||||
- [ ] T033 [P] [US5] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/rollback.sh` — emergency rollback: stop new host, restore QNAP + Desk-5439 services, revert DNS, revert CI/CD
|
||||
- [ ] T034 [US5] Monitor new host for 24-48 hours: RAM usage (`docker stats`), VRAM usage (`nvidia-smi`), container health, application logs
|
||||
- [ ] T034b [US5] Verify Adaptive OCR Residency (ADR-040 D3) on new RTX 5060 Ti: load `np-dms-ai` and `np-dms-ocr` concurrently, confirm `calculate_ocr_residency()` unloads OCR model when LLM needs VRAM; verify CPU Fallback Retrieval (ADR-040 D4) activates for BGE-M3/Reranker when GPU is occupied by LLM
|
||||
- [ ] T035 [US5] Stop QNAP app services: `ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose down'`
|
||||
- [ ] T036 [US5] Stop QNAP service stack: `ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose down'`
|
||||
- [ ] T037 [US5] Retire Desk-5439: `ssh user@192.168.10.100 'sudo shutdown -h now'` (or repurpose)
|
||||
- [ ] T038 [US5] Verify new host still fully operational after old hosts decommissioned — re-run `smoke-test.sh`
|
||||
- [ ] T039 [US5] Take QNAP backup snapshot: `mariadb-dump` on QNAP MariaDB (if still running) or verify existing backup is current
|
||||
|
||||
**Checkpoint**: Consolidation complete — single host is sole production, old hosts decommissioned
|
||||
|
||||
---
|
||||
|
||||
## Phase 8: Polish & Cross-Cutting Concerns
|
||||
|
||||
**Purpose**: Documentation, monitoring, and final verification
|
||||
|
||||
- [ ] T040 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/README.md` — add New-Host section, mark QNAP as backup, mark Desk-5439 as retired
|
||||
- [ ] T041 [P] Update `CONTEXT.md` — update infrastructure topology to reflect single-host architecture
|
||||
- [ ] T042 [P] Update `AGENTS.md` — update infrastructure references (Desk-5439 → New Host, QNAP → backup)
|
||||
- [ ] T043 Update `specs/04-Infrastructure-OPS/04-00-docker-compose/.env.template` — add ASUSTOR_USER, ASUSTOR_PASS, NEW_HOST_IP variables
|
||||
- [ ] T044 [P] Update Prometheus/Grafana scrape config on ASUSTOR — update ollama-metrics target from `192.168.10.100:9924` to new host internal or host-published port
|
||||
- [ ] T045 Run `quickstart.md` validation — follow all steps end-to-end on a fresh provision
|
||||
- [ ] T046 [P] Document disaster recovery procedure — backup schedule, restore from QNAP backup, estimated RTO/RPO
|
||||
|
||||
---
|
||||
|
||||
## Dependencies & Execution Order
|
||||
|
||||
### Phase Dependencies
|
||||
|
||||
- **Setup (Phase 1)**: No dependencies — can start immediately
|
||||
- **Foundational (Phase 2)**: Depends on Setup — BLOCKS all user stories
|
||||
- **US1 (Phase 3)**: Depends on Foundational — requires physical access to new host
|
||||
- **US2 (Phase 4)**: Depends on US1 (services must be running to receive migrated data)
|
||||
- **US3 (Phase 5)**: Depends on US1 + US2 (services running + data migrated for cutover)
|
||||
- **US4 (Phase 6)**: Depends on US3 (cutover complete, network isolation verified)
|
||||
- **US5 (Phase 7)**: Depends on US3 + US4 (stable production before decommissioning)
|
||||
- **Polish (Phase 8)**: Can start after US3; some tasks depend on US5
|
||||
|
||||
### User Story Dependencies
|
||||
|
||||
- **US1 (P1)**: Foundational → US1 — no dependencies on other stories
|
||||
- **US2 (P2)**: US1 → US2 — needs running services to receive data
|
||||
- **US3 (P3)**: US1 + US2 → US3 — needs running services + migrated data
|
||||
- **US4 (P4)**: US3 → US4 — needs cutover complete to verify network isolation in production
|
||||
- **US5 (P5)**: US3 + US4 → US5 — needs stable production before decommissioning
|
||||
|
||||
### Parallel Opportunities
|
||||
|
||||
- T002, T003 can run in parallel (different files)
|
||||
- T006, T007, T008 can run in parallel (sidecar files, no dependencies)
|
||||
- T015, T016 can run in parallel (different migration scripts)
|
||||
- T027, T028 can run in parallel (different files: compose vs app.py)
|
||||
- T040, T041, T042, T044 can run in parallel (different doc files)
|
||||
- T027, T028, T030 can run in parallel (different files: compose, app.py, .env.example)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Strategy
|
||||
|
||||
### MVP First (User Story 1 Only)
|
||||
|
||||
1. Complete Phase 1: Setup (create directory structure)
|
||||
2. Complete Phase 2: Foundational (provision host + create compose)
|
||||
3. Complete Phase 3: User Story 1 (deploy services)
|
||||
4. **STOP and VALIDATE**: All containers healthy, network isolation verified
|
||||
5. Demo to stakeholders if ready
|
||||
|
||||
### Incremental Delivery
|
||||
|
||||
1. Setup + Foundational → Infrastructure ready
|
||||
2. Add US1 → Services deployed → Validate (MVP!)
|
||||
3. Add US2 → Data migrated → Validate parity
|
||||
4. Add US3 → Cutover complete → Validate end-to-end
|
||||
5. Add US4 → Security hardened → Validate network-only auth
|
||||
6. Add US5 → Old hosts retired → Validate stability
|
||||
7. Polish → Documentation updated → Final validation
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- This is an infrastructure task — most work is shell scripts, Docker Compose YAML, and manual operations
|
||||
- Physical access to the new host is required for US1
|
||||
- Data migration (US2) requires SSH access to QNAP
|
||||
- Cutover (US3) requires DNS/NPM access and coordination with users
|
||||
- Decommission (US5) should only proceed after 24-48 hours of stable monitoring
|
||||
- Rollback plan must be tested before cutover
|
||||
- All env secrets must come from `.env` (gitignored) — never commit real secrets
|
||||
Reference in New Issue
Block a user