refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
CI / CD Pipeline / build (push) Successful in 7m37s
CI / CD Pipeline / deploy (push) Failing after 20m15s

This commit is contained in:
2026-06-20 16:37:04 +07:00
parent d418d791a4
commit a80ebef285
70 changed files with 5762 additions and 452 deletions
@@ -0,0 +1,34 @@
# Specification Quality Checklist: OCR Sidecar Refactor
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-06-20
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Notes
All checklist items pass. Specification is ready for `/speckit-clarify` or `/speckit-plan`.
@@ -0,0 +1,246 @@
# Sidecar API Contract
**Version**: 1.0
**Date**: 2026-06-20
**Service**: OCR Sidecar (Desk-5439)
**Base URL**: `http://192.168.10.100:8765` (Phase 1) / `http://sidecar:8765` (Phase 2, Docker-internal)
## Overview
The OCR sidecar provides OCR processing capabilities as a pure compute worker. This document defines the API contract between backend services and the sidecar.
## Authentication
### Phase 1 (Before ADR-041 Consolidation)
All endpoints require `X-API-Key` header:
```http
X-API-Key: {OCR_SIDECAR_API_KEY}
```
If the header is missing or invalid, returns `401 Unauthorized`.
### Phase 2 (After ADR-041 Consolidation)
No authentication required. Relies on Docker-internal network isolation.
## Endpoints
### POST /ocr
Extract text from PDF file using Typhoon OCR.
**Request Headers**:
```http
Content-Type: application/json
X-API-Key: {key} # Phase 1 only
```
**Request Body**:
```json
{
"pdf_path": "/mnt/uploads/temp/abc123.pdf",
"system_prompt": "Extract document metadata from: {{ocr_text}}...",
"dms_tags": {
"document_number": "RFA-2025-001",
"document_date": "2025-01-15",
"received_date": "2025-01-16"
},
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
},
"page_range": {
"start": 1,
"end": 3
}
}
```
**Request Fields**:
- `pdf_path` (string, required): Absolute path to PDF file. Must be within whitelisted base path (`OCR_SIDECAR_UPLOAD_BASE`).
- `system_prompt` (string, optional): System prompt from Active Prompt. Contains `{{ocr_text}}` placeholder.
- `dms_tags` (object, optional): DMS extraction tags to inject into prompt.
- `document_number` (string, optional): Document number
- `document_date` (string, optional): Document date
- `received_date` (string, optional): Received date
- `runtime_params` (object, required): Runtime parameters from `ai_execution_profiles`.
- `temperature` (number, required): Temperature (0.0 - 2.0)
- `top_p` (number, required): Top P (0.0 - 1.0)
- `repeat_penalty` (number, required): Repeat penalty (typically 1.0 - 2.0)
- `max_tokens` (number, required): Max tokens
- `page_range` (object, optional): Page range for processing.
- `start` (number, required): Start page (1-indexed)
- `end` (number, required): End page (inclusive)
**Response (200 OK)**:
```json
{
"text": "Extracted text in Markdown format...",
"ocr_used": true,
"model_used": "typhoon-np-dms-ocr:latest",
"processing_time_ms": 1250,
"error": null
}
```
**Response Fields**:
- `text` (string): Extracted text in Markdown format
- `ocr_used` (boolean): Whether OCR was used (vs fast-path text layer)
- `model_used` (string): Model identifier
- `processing_time_ms` (number): Processing time in milliseconds
- `error` (string, nullable): Error message if failed
**Error Responses**:
- `400 Bad Request`: Invalid request body or parameters
- `401 Unauthorized`: Missing or invalid X-API-Key (Phase 1 only)
- `403 Forbidden`: Path outside whitelisted base directory
- `500 Internal Server Error`: Internal processing error
**Path Traversal Protection**:
- PDF path is canonicalized using `os.path.abspath()` + `os.path.realpath()`
- Path must start with whitelisted base path (`OCR_SIDECAR_UPLOAD_BASE`)
- Symlinks are resolved to their targets before whitelist check
- Returns `403 Forbidden` for any path outside base directory
### GET /health
Health check endpoint for monitoring.
**Response (200 OK)**:
```json
{
"status": "healthy",
"timestamp": "2026-06-20T10:30:00Z",
"version": "1.0.0"
}
```
**Response Fields**:
- `status` (string): Service status ("healthy" or "unhealthy")
- `timestamp` (string): ISO 8601 timestamp
- `version` (string): Service version
## Removed Endpoints
### POST /normalize (REMOVED)
This endpoint has been removed per ADR-040 D2. ThaiPreprocessProcessor has no consumers in the backend (verified by grep search).
## Rate Limiting
No rate limiting implemented on sidecar. Rate limiting is handled by backend services.
## Error Handling
All errors return JSON responses with consistent format:
```json
{
"error": "Error message",
"code": "ERROR_CODE",
"timestamp": "2026-06-20T10:30:00Z"
}
```
**Common Error Codes**:
- `INVALID_REQUEST`: Invalid request body or parameters
- `UNAUTHORIZED`: Missing or invalid authentication
- `FORBIDDEN`: Path outside whitelisted directory
- `INTERNAL_ERROR`: Internal processing error
- `OCR_FAILED`: OCR processing failed
## Examples
### Example 1: Basic OCR Request (Phase 1)
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/document.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
### Example 2: OCR with System Prompt and DMS Tags (Phase 1)
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/document.pdf",
"system_prompt": "Extract document metadata from: {{ocr_text}}",
"dms_tags": {
"document_number": "RFA-2025-001",
"document_date": "2025-01-15"
},
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
### Example 3: OCR Request (Phase 2, Docker-internal)
```bash
curl -X POST http://sidecar:8765/ocr \
-H "Content-Type: application/json" \
-d '{
"pdf_path": "/mnt/uploads/temp/document.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
### Example 4: Path Traversal Attempt (Rejected)
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Response: `403 Forbidden`
```json
{
"error": "Path outside whitelisted base directory",
"code": "FORBIDDEN",
"timestamp": "2026-06-20T10:30:00Z"
}
```
## Version History
- **1.0** (2026-06-20): Initial version for OCR sidecar refactor
- Added POST /ocr with parameter governance
- Added path traversal protection
- Removed POST /normalize endpoint
- Documented Phase 1/Phase 2 auth migration
@@ -0,0 +1,319 @@
# Data Model: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Define data contracts and entity relationships for OCR sidecar refactor
## Overview
The OCR sidecar is a pure compute worker with no database access (ADR-023/023A boundary). All data persistence and business logic remain in backend services. This document defines the data contracts between backend and sidecar.
## Entities
### OCR Request (Backend → Sidecar)
```typescript
interface OcrRequest {
pdfPath: string; // Absolute path to PDF file (whitelisted)
systemPrompt?: string; // System prompt from Active Prompt
dmsTags?: { // DMS extraction tags from Active Prompt
documentNumber?: string;
documentDate?: string;
receivedDate?: string;
};
runtimeParams: { // Runtime parameters from ai_execution_profiles
temperature: number;
top_p: number;
repeat_penalty: number;
max_tokens: number;
};
pageRange?: { // Page range for processing
start: number;
end: number;
};
}
```
### OCR Response (Sidecar → Backend)
```typescript
interface OcrResponse {
text: string; // Extracted text (Markdown format)
ocrUsed: boolean; // Whether OCR was used (vs fast-path text layer)
modelUsed: string; // Model identifier (e.g., "typhoon-np-dms-ocr")
processingTimeMs: number; // Processing time in milliseconds
error?: string; // Error message if failed
}
```
### AI Execution Profile (Database)
```sql
-- Existing table (no schema changes)
CREATE TABLE ai_execution_profiles (
id INT AUTO_INCREMENT PRIMARY KEY,
profile_name VARCHAR(100) UNIQUE NOT NULL,
model_name VARCHAR(100) NOT NULL,
parameters JSON NOT NULL, -- { temperature, top_p, repeat_penalty, max_tokens, keep_alive }
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
-- Row for OCR extraction:
-- profile_name = 'ocr-extract'
-- parameters = { temperature: 0.7, top_p: 0.9, repeat_penalty: 1.1, max_tokens: 4096 }
```
### Active Prompt (Database)
```sql
-- Existing table (no schema changes per ADR-029/037)
CREATE TABLE ai_prompts (
id INT AUTO_INCREMENT PRIMARY KEY,
public_id UUID,
prompt_type VARCHAR(50) NOT NULL, -- 'ocr_extraction'
template TEXT NOT NULL, -- System prompt template with {{ocr_text}} placeholder
context_config JSON, -- DMS tags configuration
version INT NOT NULL,
is_active TINYINT(1) DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY (prompt_type, version)
);
-- Active prompt for OCR extraction:
-- prompt_type = 'ocr_extraction'
-- template = "Extract document metadata from: {{ocr_text}}..."
-- context_config = { dmsTags: { documentNumber: true, documentDate: true, receivedDate: true } }
```
## Data Flow
### Phase 1: OCR Request Flow (Before ADR-041)
```
Backend OcrService
1. Resolve parameters from ai_execution_profiles (row 'ocr-extract')
2. Resolve Active Prompt from ai_prompts (type 'ocr_extraction')
3. Extract systemPrompt and DMS tags from Active Prompt
4. Build OcrRequest with parameters, systemPrompt, DMS tags
5. Send POST /ocr with X-API-Key header to sidecar
Sidecar (app.py)
1. Validate X-API-Key
2. Canonicalize pdfPath and check whitelist
3. Extract systemPrompt and DMS tags from request
4. Call calculate_ocr_residency(active_profile) for keep_alive
5. Process OCR with Ollama (inject systemPrompt + DMS tags)
6. Return OcrResponse
Backend OcrService
1. Parse OcrResponse
2. Return extracted text to caller
```
### Phase 2: OCR Request Flow (After ADR-041)
```
Backend OcrService
1. Resolve parameters from ai_execution_profiles (row 'ocr-extract')
2. Resolve Active Prompt from ai_prompts (type 'ocr_extraction')
3. Extract systemPrompt and DMS tags from Active Prompt
4. Build OcrRequest with parameters, systemPrompt, DMS tags
5. Send POST /ocr (NO X-API-Key header) to sidecar
Sidecar (app.py)
1. NO X-API-Key validation (network isolation only)
2. Canonicalize pdfPath and check whitelist
3. Extract systemPrompt and DMS tags from request
4. Call calculate_ocr_residency(active_profile) for keep_alive
5. Process OCR with Ollama (inject systemPrompt + DMS tags)
6. Return OcrResponse
Backend OcrService
1. Parse OcrResponse
2. Return extracted text to caller
```
## Backend Service Changes
### OcrService Parameter Resolution
```typescript
// backend/src/modules/ai/services/ocr.service.ts
async extractMetadata(documentId: string): Promise<AIMetadata> {
// 1. Resolve runtime parameters from ai_execution_profiles
const profile = await this.aiProfilesService.getActiveProfile('ocr-extract');
const runtimeParams = profile.parameters; // { temperature, top_p, repeat_penalty, max_tokens }
// 2. Resolve Active Prompt
const activePrompt = await this.aiPromptsService.getActivePrompt('ocr_extraction');
const systemPrompt = activePrompt.template;
const dmsTags = activePrompt.context_config?.dmsTags || {};
// 3. Build request
const ocrRequest: OcrRequest = {
pdfPath: document.filePath,
systemPrompt,
dmsTags,
runtimeParams,
};
// 4. Send to sidecar (with X-API-Key in Phase 1)
const response = await this.httpClient.post(
`${this.ocrApiUrl}/ocr`,
ocrRequest,
{ headers: { 'X-API-Key': this.ocrApiKey } } // Phase 1 only
);
return response.data;
}
```
### SandboxOcrEngineService Parameter Resolution
```typescript
// backend/src/modules/ai/services/sandbox-ocr-engine.service.ts
async processSandboxOcr(request: SandboxOcrRequest): Promise<SandboxOcrResult> {
// Same parameter resolution pattern as OcrService
const profile = await this.aiProfilesService.getActiveProfile('ocr-extract');
const activePrompt = await this.aiPromptsService.getActivePrompt('ocr_extraction');
const ocrRequest: OcrRequest = {
pdfPath: request.pdfPath,
systemPrompt: activePrompt.template,
dmsTags: activePrompt.context_config?.dmsTags || {},
runtimeParams: profile.parameters,
};
const response = await this.httpClient.post(
`${this.ocrApiUrl}/ocr`,
ocrRequest,
{ headers: { 'X-API-Key': this.ocrApiKey } } // Phase 1 only
);
return response.data;
}
```
## Sidecar API Changes
### POST /ocr Request Body
```python
# specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
from pydantic import BaseModel
class OcrRequest(BaseModel):
pdf_path: str
system_prompt: Optional[str] = None
dms_tags: Optional[Dict[str, str]] = None
runtime_params: RuntimeParams
page_range: Optional[PageRange] = None
class RuntimeParams(BaseModel):
temperature: float
top_p: float
repeat_penalty: float
max_tokens: int
class PageRange(BaseModel):
start: int
end: int
```
### POST /ocr Response Body
```python
class OcrResponse(BaseModel):
text: str
ocr_used: bool
model_used: str
processing_time_ms: float
error: Optional[str] = None
```
## Environment Variables
### Sidecar Environment Variables
```bash
# specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env
# Phase 1 (before ADR-041)
OCR_SIDECAR_API_KEY=required_value # Fail-fast if missing
# Phase 2 (after ADR-041) - remove OCR_SIDECAR_API_KEY
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads # CIFS mount base path
OLLAMA_API_URL=http://localhost:11434
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
```
### Backend Environment Variables
```bash
# backend/.env
# Phase 1 (before ADR-041)
OCR_API_URL=http://192.168.10.100:8765
OCR_API_KEY=required_value # Send-side X-API-Key
# Phase 2 (after ADR-041) - remove OCR_API_KEY
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/app/uploads # Backend view of uploads
```
## Validation Rules
### Path Canonicalization (Sidecar)
```python
def validate_pdf_path(pdf_path: str, base_path: str) -> str:
"""Canonicalize and whitelist PDF path"""
# 1. Canonicalize path
canonical = os.path.abspath(os.path.realpath(pdf_path))
# 2. Check whitelist
if not canonical.startswith(base_path):
raise HTTPException(
status_code=403,
detail="Path outside whitelisted base directory"
)
return canonical
```
### Parameter Validation (Backend)
```typescript
// Validate runtime parameters from ai_execution_profiles
function validateRuntimeParams(params: any): RuntimeParams {
if (!params.temperature || params.temperature < 0 || params.temperature > 2) {
throw new BusinessException('Invalid temperature value');
}
if (!params.top_p || params.top_p < 0 || params.top_p > 1) {
throw new BusinessException('Invalid top_p value');
}
// ... similar validation for other params
return params;
}
```
## No Schema Changes
This refactor does not require database schema changes:
- `ai_execution_profiles` table already exists (ADR-036)
- `ai_prompts` table already exists (ADR-029/037)
- No new tables or columns needed
- Per ADR-009: No TypeORM migrations (edit SQL directly if needed, but not needed here)
@@ -0,0 +1,147 @@
# Implementation Plan: OCR Sidecar Refactor
**Branch**: `140-ocr-sidecar-refactor` | **Date**: 2026-06-20 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/100-Infrastructures/140-ocr-sidecar-refactor/spec.md`
## Summary
Refactor the OCR sidecar on Desk-5439 to address security vulnerabilities (hardcoded API keys, path traversal), implement async I/O for performance, preserve GPU resource management policies (Adaptive OCR Residency, CPU Fallback Retrieval), and align with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System. The sidecar becomes a pure compute worker with all orchestration and parameter governance moved to backend services.
## Technical Context
**Language/Version**: Python 3.11+ (FastAPI)
**Primary Dependencies**: FastAPI 0.111.0, httpx 0.27.0, PyMuPDF 1.24.0, typhoon-ocr>=0.4.1, FlagEmbedding>=1.2.0, pythainlp 5.0.4
**Storage**: No database access (ADR-023/023A boundary - sidecar is pure compute worker)
**Testing**: pytest for path-traversal and residency wiring tests
**Target Platform**: Desk-5439 (192.168.10.100, Windows 10/11, RTX 5060 Ti 16GB GPU) via Docker
**Project Type**: Infrastructure (sidecar service)
**Performance Goals**: 20%+ throughput improvement with async I/O; VRAM exhaustion prevention under load
**Constraints**: Must preserve LLM-First GPU Ownership; must not bypass existing residency_policy.py; must align with ADR-036 Gap-2 (keep_alive as lazy resource param)
**Scale/Scope**: Single sidecar service; affects backend AI services (OcrService, SandboxOcrEngineService)
## Constitution Check
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
| Gate | Status | Justification |
|------|--------|---------------|
| ADR-019 UUID | ✅ PASS | Sidecar N/A (pure compute worker), Backend applies ADR-019 (parameter resolution in OcrService/SandboxOcrEngineService) |
| ADR-009 Schema | N/A | No database schema changes in sidecar |
| ADR-016 Security | ✅ PASS | Path traversal hardening; no hardcoded secrets; network isolation auth |
| ADR-002 Numbering | N/A | No document numbering in sidecar |
| ADR-008 BullMQ | N/A | Sidecar does not use BullMQ (backend does) |
| ADR-023/023A AI Boundary | ✅ PASS | Sidecar is pure compute worker; no DB/storage access; AI → DMS API → DB pattern preserved |
| ADR-007 Errors | ✅ PASS | FastAPI exception handling with user-friendly messages |
| TypeScript Strict | N/A | Python codebase |
## Project Structure
### Documentation (this feature)
```text
specs/100-Infrastructures/140-ocr-sidecar-refactor/
├── spec.md # Feature specification
├── plan.md # This file
├── research.md # Phase 0 output (technical decisions from ADR-040)
├── data-model.md # Phase 1 output (data contracts)
├── quickstart.md # Phase 1 output (deployment guide)
├── contracts/ # Phase 1 output (API contracts)
│ └── sidecar-api.md # Sidecar API specification
└── tasks.md # Phase 2 output (implementation tasks)
```
### Source Code
```text
specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/
├── app.py # FastAPI application (main refactor target)
├── residency_policy.py # Retain (Adaptive OCR Residency)
├── vram_monitor.py # Retain (VRAM monitoring)
├── requirements.txt # Python dependencies
├── Dockerfile # Container definition
├── docker-compose.yml # Orchestration
└── .env # Environment variables
backend/src/modules/ai/
├── services/
│ ├── ocr.service.ts # Parameter resolution + sidecar calls
│ └── sandbox-ocr-engine.service.ts # Sandbox parameter resolution
└── processors/
└── ai-batch.processor.ts # BullMQ processor (unchanged)
tests/
├── unit/
│ └── ocr-sidecar/ # Sidecar unit tests
│ ├── test_path_traversal.py # Path traversal tests
│ └── test_residency_wiring.py # Residency calculation tests
└── integration/
└── ocr-sidecar/ # Sidecar integration tests
```
**Structure Decision**: Infrastructure refactor targeting existing OCR sidecar on Desk-5439. Backend changes limited to parameter resolution in AI services. No new frontend changes.
## Complexity Tracking
> No constitution violations - all gates pass. This section not applicable.
## Phase 0: Research & Technical Decisions
All technical decisions are already documented in ADR-040. Key decisions:
### Security Decisions
- **Decision**: Remove hardcoded default API key; fail-fast if env missing
- **Rationale**: Security vulnerability - leaked key cannot be rotated without rebuild
- **Decision**: Implement path canonicalization + base-path whitelist
- **Rationale**: Prevent path traversal attacks (ADR-016)
### I/O Pattern Decisions
- **Decision**: Refactor to async I/O with shared AsyncClient via lifespan
- **Rationale**: Synchronous blocking I/O reduces throughput under load
- **Decision**: Replace `@app.on_event("startup")` with lifespan context manager
- **Rationale**: Deprecated pattern; lifespan provides better resource management
### GPU Resource Management Decisions
- **Decision**: Wire `calculate_ocr_residency()` into `process_ocr` for dynamic keep_alive
- **Rationale**: Preserve Adaptive OCR Residency policy (CONTEXT.md); avoid fixed values
- **Decision**: Retain vram_monitor.py and residency_policy.py
- **Rationale**: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
- **Decision**: Reject forced GPU-resident BGE-M3/Reranker
- **Rationale**: CPU fallback is required for VRAM pressure scenarios
### Parameter Governance Decisions
- **Decision**: Remove hardcoded runtime params; accept from backend job snapshot
- **Rationale**: ADR-036 Profile-Only Parameter Governance; dynamic tuning without rebuild
- **Decision**: Backend resolves systemPrompt and DMS tags from Active Prompt
- **Rationale**: ADR-029/037 Active Prompt System; prompt authority in DB not code
- **Decision**: Reject creating PromptBuilderService
- **Rationale**: Use existing Active Prompt system; avoid invented orchestration
### Auth Decisions
- **Decision**: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
- **Rationale**: Sequenced migration; network isolation only possible post-consolidation
- **Decision**: Interim period requires X-API-Key validation
- **Rationale**: Cross-host topology (before ADR-041) requires defense-in-depth
### Endpoint Decisions
- **Decision**: Remove /normalize endpoint
- **Rationale**: No consumers (verified by grep); ThaiPreprocessProcessor unused
- **Decision**: Fix mutable default argument `options_override={}`
- **Rationale**: Python anti-pattern; causes unexpected behavior
## Phase 1: Design & Contracts
### Data Model
See [data-model.md](./data-model.md) for detailed data contracts and entity relationships.
### API Contracts
See [contracts/sidecar-api.md](./contracts/sidecar-api.md) for sidecar API specification.
### Quickstart Guide
See [quickstart.md](./quickstart.md) for deployment and testing instructions.
## Phase 2: Implementation (Tasks)
See [tasks.md](./tasks.md) for detailed implementation tasks generated by `/speckit-tasks`.
@@ -0,0 +1,374 @@
# Quickstart: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Deployment and testing guide for OCR sidecar refactor
## Prerequisites
- Access to Desk-5439 (192.168.10.100) with Docker
- Access to backend services (QNAP 192.168.10.8)
- Python 3.11+ for local testing (optional)
- pytest for testing (optional)
## Phase 1: Deployment (Before ADR-041 Consolidation)
### Step 1: Update Sidecar Code
1. Navigate to sidecar directory:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
```
2. Update `app.py` with the following changes:
- Remove hardcoded default API key
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
- Implement async I/O with `httpx.AsyncClient` via lifespan
- Replace `@app.on_event("startup")` with lifespan context manager
- Wire `calculate_ocr_residency()` into `process_ocr`
- Implement path canonicalization + base-path whitelist on `/ocr`
- Remove hardcoded runtime parameters
- Receive systemPrompt and DMS tags from backend
- Remove `/normalize` endpoint
- Fix mutable default argument `options_override={}`
- Load models via `asyncio.to_thread` during lifespan
3. Update `requirements.txt`:
```text
PyMuPDF==1.24.0
fastapi==0.111.0
uvicorn[standard]==0.30.1
python-multipart==0.0.9
httpx==0.27.0
FlagEmbedding>=1.2.0
typhoon-ocr>=0.4.1
```
4. Update `.env`:
```bash
# Phase 1 (before ADR-041)
OCR_SIDECAR_API_KEY=your-secure-api-key-here
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
OLLAMA_API_URL=http://localhost:11434
OCR_MODEL=np-dms-ocr:latest
```
### Step 2: Update Backend Services
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
- Add parameter resolution from `ai_execution_profiles` (row `ocr-extract`)
- Add Active Prompt resolution from `ai_prompts` (type `ocr_extraction`)
- Extract systemPrompt and DMS tags from Active Prompt
- Send resolved parameters to sidecar in OCR requests
- Keep X-API-Key send-side (Phase 1)
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
- Same parameter resolution pattern as OcrService
- Keep X-API-Key send-side (Phase 1)
3. Update backend `.env`:
```bash
# Phase 1 (before ADR-041)
OCR_API_URL=http://192.168.10.100:8765
OCR_API_KEY=your-secure-api-key-here
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
```
### Step 3: Rebuild and Deploy Sidecar
1. Build Docker image on Desk-5439:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
docker-compose build
```
2. Stop existing container:
```bash
docker-compose down
```
3. Start new container:
```bash
docker-compose up -d
```
4. Verify health:
```bash
curl http://192.168.10.100:8765/health
```
Expected response:
```json
{
"status": "healthy",
"timestamp": "2026-06-20T10:30:00Z",
"version": "1.0.0"
}
```
### Step 4: Deploy Backend Changes
1. Build backend:
```bash
cd backend
pnpm run build
```
2. Deploy backend containers (via existing deploy script or manual):
```bash
# From repo root
./scripts/deploy.sh
```
3. Verify backend health:
```bash
curl http://localhost:3001/api/ai/health
```
## Phase 2: Deployment (After ADR-041 Consolidation)
**Note**: This phase can only be executed after ADR-041 server consolidation completes (single Docker host).
### Step 1: Remove X-API-Key from Sidecar
1. Update `app.py` on sidecar:
- Remove X-API-Key validation from all endpoints
- Remove `OCR_SIDECAR_API_KEY` environment variable check
2. Update `.env` on sidecar:
```bash
# Remove OCR_SIDECAR_API_KEY line
# Keep common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
OLLAMA_API_URL=http://localhost:11434
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
```
3. Rebuild and redeploy sidecar:
```bash
docker-compose down
docker-compose build
docker-compose up -d
```
### Step 2: Remove X-API-Key from Backend
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
- Remove X-API-Key header from sidecar requests
- Remove `OCR_API_KEY` environment variable usage
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
- Remove X-API-Key header from sidecar requests
- Remove `OCR_API_KEY` environment variable usage
3. Update backend `.env`:
```bash
# Remove OCR_API_KEY line
# Keep common variables
OCR_API_URL=http://sidecar:8765 # Docker-internal URL
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
```
4. Rebuild and redeploy backend:
```bash
cd backend
pnpm run build
./scripts/deploy.sh
```
## Testing
### Unit Tests (Sidecar)
1. Navigate to sidecar tests directory:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests
```
2. Run path traversal tests:
```bash
pytest test_path_traversal.py -v
```
Expected output: All tests pass, path traversal attempts return 403
3. Run residency wiring tests:
```bash
pytest test_residency_wiring.py -v
```
Expected output: All tests pass, `calculate_ocr_residency()` is called correctly
### Integration Tests (Backend)
1. Run backend AI service tests:
```bash
cd backend
pnpm test ai/ocr.service.spec.ts
pnpm test ai/sandbox-ocr-engine.service.spec.ts
```
2. Verify parameter resolution from database:
- Check that `ai_execution_profiles` row `ocr-extract` exists
- Check that `ai_prompts` has active row for `ocr_extraction` type
- Verify parameters are correctly resolved and sent to sidecar
### Manual Testing
1. Test path traversal protection:
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Expected: `403 Forbidden`
2. Test valid OCR request:
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/test.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Expected: `200 OK` with extracted text
3. Test parameter governance:
- Modify `ai_execution_profiles` row `ocr-extract` parameters
- Run OCR request
- Verify new parameters are used (check sidecar logs)
4. Test Active Prompt integration:
- Modify active prompt in `ai_prompts` for `ocr_extraction`
- Run OCR request
- Verify new system prompt is used
## Performance Testing
1. Benchmark async vs sync I/O:
```bash
# Use Apache Bench or similar tool
ab -n 1000 -c 10 -p ocr_request.json -T application/json \
http://192.168.10.100:8765/ocr
```
Expected: 20%+ throughput improvement with async I/O
2. Monitor VRAM usage:
```bash
# On Desk-5439, monitor GPU usage during OCR operations
nvidia-smi -l 1
```
Expected: VRAM usage stays within limits, no exhaustion
## Monitoring
### Health Checks
- Sidecar health: `GET http://192.168.10.100:8765/health`
- Backend AI health: `GET http://localhost:3001/api/ai/health`
### Logs
- Sidecar logs: `docker-compose logs -f ocr-sidecar`
- Backend logs: Check backend application logs
### Metrics
- Monitor OCR request latency
- Monitor VRAM usage on Desk-5439
- Monitor error rates (403 for path traversal, 500 for internal errors)
## Rollback
If issues arise during deployment:
### Rollback Sidecar
1. Revert `app.py` to previous version
2. Restore previous `.env` file
3. Rebuild and redeploy:
```bash
docker-compose down
docker-compose build
docker-compose up -d
```
### Rollback Backend
1. Revert service changes in `ocr.service.ts` and `sandbox-ocr-engine.service.ts`
2. Restore previous `.env` file
3. Rebuild and redeploy:
```bash
cd backend
pnpm run build
./scripts/deploy.sh
```
### Emergency Rollback
If immediate rollback is needed:
1. Revert `keep_alive` to fixed value `0` in `process_ocr`
2. Restore hardcoded runtime parameters
3. Restore X-API-Key validation
4. Rebuild and redeploy
## Troubleshooting
### Sidecar fails to start
1. Check environment variables are set correctly
2. Check `OCR_SIDECAR_API_KEY` is provided (Phase 1)
3. Check Docker logs: `docker-compose logs ocr-sidecar`
4. Verify Ollama is running on Desk-5439
### Path traversal returns 200 instead of 403
1. Verify `OCR_SIDECAR_UPLOAD_BASE` is set correctly
2. Check path canonicalization logic in `app.py`
3. Test with absolute paths to verify whitelist check
### Parameters not being used
1. Check `ai_execution_profiles` row `ocr-extract` exists
2. Check backend service parameter resolution logic
3. Check sidecar receives parameters in request body
4. Check sidecar passes parameters to Ollama
### VRAM exhaustion
1. Check `calculate_ocr_residency()` is being called
2. Check `vram_monitor.py` and `residency_policy.py` are present
3. Verify CPU fallback is working for `/embed` and `/rerank`
4. Monitor GPU usage with `nvidia-smi`
## References
- ADR-040: OCR Sidecar Refactor
- ADR-036: Profile-Only Parameter Governance
- ADR-029: Dynamic Prompt Management
- ADR-037: Active Prompt System
- ADR-041: Server Consolidation (dependency for Phase 2)
- [Sidecar API Contract](./contracts/sidecar-api.md)
@@ -0,0 +1,179 @@
# Research: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Document technical decisions and research findings from ADR-040
## Overview
All technical decisions for this refactor are already documented in ADR-040. This file consolidates those decisions for implementation reference.
## Security Decisions
### Hardcoded API Key Removal
- **Decision**: Remove hardcoded default API key (`lcbp3-dms-ocr-sidecar-secure-token-2026`) from `app.py`
- **Rationale**: Security vulnerability - if leaked, key cannot be rotated without rebuilding container
- **Implementation**: Fail-fast if `OCR_SIDECAR_API_KEY` environment variable is missing
- **Phase**: Phase 1 (before ADR-041 consolidation)
### Path Traversal Hardening
- **Decision**: Implement path canonicalization + base-path whitelist on `/ocr` endpoint
- **Rationale**: Prevent arbitrary file read attacks (ADR-016)
- **Implementation**:
- Use `os.path.abspath()` + `os.path.realpath()` for canonicalization
- Whitelist base path = `OCR_SIDECAR_UPLOAD_BASE` (CIFS mount base)
- Reject paths outside base path → 403 Forbidden
- **Alternatives Considered**:
- Using path validation regex only → rejected (insufficient for symlink attacks)
- Chroot jail → rejected (overkill for this use case)
## I/O Pattern Decisions
### Async I/O Refactor
- **Decision**: Refactor `process_ocr` to `async def` and use `httpx.AsyncClient` shared via lifespan
- **Rationale**: Synchronous blocking I/O reduces throughput under load; FastAPI event loop blocked
- **Implementation**:
- Replace `httpx.Client` with `httpx.AsyncClient`
- Create AsyncClient in lifespan context manager
- Load models via `asyncio.to_thread` to avoid blocking startup
- **Performance Target**: 20%+ throughput improvement under concurrent load
- **Alternatives Considered**:
- Keep sync I/O but add more workers → rejected (still blocks event loop)
- Use thread pool → rejected (adds complexity without solving root cause)
### Lifespan Pattern
- **Decision**: Replace `@app.on_event("startup")` with `@asynccontextmanager` lifespan
- **Rationale**: Deprecated pattern; lifespan provides better resource management and cleanup
- **Implementation**: Use FastAPI lifespan context manager for AsyncClient lifecycle
## GPU Resource Management Decisions
### Adaptive OCR Residency
- **Decision**: Wire `calculate_ocr_residency(active_profile)` into `process_ocr` for dynamic `keep_alive`
- **Rationale**: Preserve Adaptive OCR Residency policy from CONTEXT.md; avoid fixed values
- **Implementation**:
- Import `calculate_ocr_residency` from `residency_policy.py`
- Call function during OCR request to calculate appropriate keep_alive
- Do NOT accept explicit `options_override["keep_alive"]` from backend
- keep_alive is a lazy resource parameter calculated at process time (ADR-036 Gap-2)
- **Alternatives Rejected**:
- Fixed `keep_alive=0` (Claude plan) → rejected (violates ADR-036 Gap-2)
- Fixed `keep_alive=10m` (Qwen plan) → rejected (violates adaptive policy)
### Retain VRAM Monitor and Residency Policy
- **Decision**: Retain `vram_monitor.py` and `residency_policy.py` modules
- **Rationale**: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
- **Alternatives Rejected**:
- Delete these modules (Claude + Qwen plans) → rejected (violates CONTEXT.md resolved GPU policies)
### CPU Fallback for Retrieval
- **Decision**: Retain dynamic CPU/GPU selection for `/embed` and `/rerank` via `.to(device)` logic
- **Rationale**: CPU fallback required when GPU is under pressure; prevents VRAM exhaustion
- **Alternatives Rejected**:
- Force BGE-M3 and Reranker GPU-resident → rejected (violates LLM-First policy)
## Parameter Governance Decisions
### Remove Hardcoded Runtime Parameters
- **Decision**: Remove hardcoded `temperature`, `top_p`, `repeat_penalty`, `max_tokens` from sidecar
- **Rationale**: ADR-036 Profile-Only Parameter Governance; enable dynamic tuning without rebuild
- **Implementation**:
- Backend resolves parameters from `ai_execution_profiles` row `ocr-extract`
- Backend sends parameters to sidecar in every request
- Sidecar passes parameters to Ollama in every load/generate call
- Modfile serves as last-resort fallback only
- **Alternatives Rejected**:
- Keep hardcoded values in sidecar → rejected (violates ADR-036)
- Create new `PromptBuilderService` → rejected (use existing Active Prompt system)
### Active Prompt Integration
- **Decision**: Backend resolves systemPrompt and DMS tags from Active Prompt in `ai_prompts`
- **Rationale**: ADR-029/037 Active Prompt System; prompt authority in database not code
- **Implementation**:
- Backend resolves Active Prompt for `ocr_extraction` type
- Backend extracts systemPrompt and DMS tags (`<document_number>`, `<document_date>`, `<received_date>`)
- Backend sends systemPrompt and DMS tags to sidecar
- Sidecar receives and injects into Ollama request in every load/generate call
- **Alternatives Rejected**:
- Create new `PromptBuilderService` → rejected (use existing ADR-029/037 system)
- Hardcode DMS tags in sidecar → rejected (violates ADR-036 parameter governance)
## Authentication Decisions
### Two-Phase Auth Migration
- **Decision**: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
- **Rationale**: Sequenced migration; network isolation only possible after server consolidation
- **Phase 1 Implementation**:
- Remove hardcoded default API key
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
- Continue validating X-API-Key on both sidecar and backend
- **Phase 2 Implementation** (after ADR-041 consolidation):
- Remove X-API-Key validation from sidecar endpoints
- Remove X-API-Key send-side from `OcrService`
- Remove X-API-Key send-side from `SandboxOcrEngineService`
- Rely on Docker-internal network isolation
- **Interim Period**: X-API-Key validation must remain active until ADR-041 cutover
- **Alternatives Considered**:
- Remove X-API-Key immediately → rejected (cross-host topology requires defense-in-depth)
- Keep X-API-Key permanently → rejected (adds complexity without value post-consolidation)
## Endpoint Decisions
### Remove /normalize Endpoint
- **Decision**: Remove `/normalize` endpoint from sidecar
- **Rationale**: No consumers exist (verified by grep across backend codebase); ThaiPreprocessProcessor unused
- **Verification**: Grep search found no calls to `/normalize` or `THAI_PREPROCESS_URL`
- **Impact**: None - endpoint has no consumers
### Fix Mutable Default Argument
- **Decision**: Fix mutable default argument `options_override={}` in `process_with_typhoon_ocr`
- **Rationale**: Python anti-pattern; causes unexpected behavior when defaults are mutated
- **Implementation**: Change to `options_override: dict = None` and initialize to `{}` in function body
## Dependencies
### External Dependencies
- **FastAPI 0.111.0**: Web framework (already in use)
- **httpx 0.27.0**: Async HTTP client (upgrade from sync httpx)
- **PyMuPDF 1.24.0**: PDF processing (already in use)
- **typhoon-ocr>=0.4.1**: OCR library (already in use)
- **FlagEmbedding>=1.2.0**: Embedding model (already in use)
- **pythainlp 5.0.4**: Thai NLP (already in use)
### Internal Dependencies
- **residency_policy.py**: Must retain for Adaptive OCR Residency
- **vram_monitor.py**: Must retain for VRAM monitoring
- **backend AI services**: OcrService, SandboxOcrEngineService must be updated for parameter resolution
## Testing Strategy
### Path Traversal Tests
- Test cases for various path traversal patterns (`../../etc/passwd`, symlinks, etc.)
- Expect 403 Forbidden for all malicious paths
- Use pytest for automated testing
### Residency Wiring Tests
- Unit test to verify `calculate_ocr_residency()` is called in `process_ocr`
- Verify keep_alive value is calculated dynamically, not fixed
- Test with different VRAM pressure scenarios
### Performance Tests
- Benchmark async vs sync I/O under concurrent load
- Target: 20%+ throughput improvement
- Measure response times and resource utilization
## Rollback Plan
If issues arise during deployment:
1. Revert `app.py` to previous version
2. Restore X-API-Key send-side in backend services
3. Re-pin `keep_alive` default to `0` in `process_ocr`
4. Restore hardcoded runtime params if needed for emergency fallback
## References
- ADR-040: OCR Sidecar Refactor
- ADR-036: Profile-Only Parameter Governance
- ADR-029: Dynamic Prompt Management
- ADR-037: Active Prompt System
- ADR-041: Server Consolidation (dependency for Phase 2)
- CONTEXT.md: GPU Policy (LLM-First Ownership, CPU Fallback)
@@ -0,0 +1,168 @@
# Feature Specification: OCR Sidecar Refactor
**Feature Branch**: `140-ocr-sidecar-refactor`
**Created**: 2026-06-20
**Status**: Draft
**Input**: ADR-040: OCR Sidecar Refactor — Pure Compute Worker, Preserved GPU Policy, Network-Trust Boundary
## User Scenarios & Testing _(mandatory)_
### User Story 1 - Sidecar Security Hardening (Priority: P1)
System administrators need to ensure the OCR sidecar on Desk-5439 is secure from path traversal attacks and does not contain hardcoded secrets that cannot be rotated without rebuilding containers.
**Why this priority**: Security vulnerabilities (hardcoded API keys, path traversal) are critical risks that could lead to unauthorized access and data breaches.
**Independent Test**: Can be fully tested by attempting path traversal requests and verifying that hardcoded default keys are rejected when environment variables are missing, delivering immediate security validation.
**Acceptance Scenarios**:
1. **Given** the sidecar is running with a leaked API key, **When** an attacker attempts to use it, **Then** the system should allow key rotation without container rebuild
2. **Given** a malicious request with path traversal (e.g., `../../etc/passwd`), **When** the `/ocr` endpoint receives the request, **Then** the system returns 403 Forbidden
3. **Given** the sidecar starts without `OCR_SIDECAR_API_KEY` environment variable, **When** the container initializes, **Then** it fails fast with clear error message
---
### User Story 2 - GPU Resource Management (Priority: P1)
The system must prevent VRAM exhaustion on Desk-5439 (RTX 5060 Ti 16GB) by implementing adaptive OCR residency policy and CPU fallback for retrieval models, ensuring the LLM (Typhoon-2.5) has priority GPU access.
**Why this priority**: VRAM exhaustion causes complete system failure. The LLM-First GPU Ownership policy is critical for system stability.
**Independent Test**: Can be fully tested by monitoring VRAM usage during concurrent OCR and embedding operations, verifying that BGE-M3 and FlagReranker fall back to CPU when GPU is under pressure.
**Acceptance Scenarios**:
1. **Given** the GPU is under heavy load from LLM operations, **When** an OCR request comes in, **Then** the system uses `calculate_ocr_residency()` to determine appropriate `keep_alive` value
2. **Given** VRAM is nearly full, **When** embedding or reranking requests are made, **Then** BGE-M3 and FlagReranker automatically fall back to CPU
3. **Given** the sidecar loads OCR model, **When** the operation completes, **Then** the model is unloaded based on residency policy (not fixed `keep_alive=0` or `300`)
---
### User Story 3 - Parameter Governance via Active Prompt (Priority: P2)
Backend services need to control AI model parameters (temperature, top_p, repeat_penalty, max_tokens, keep_alive) from the database via `ai_execution_profiles` and `ai_prompts` tables, ensuring no hardcoded values in the sidecar.
**Why this priority**: This enables dynamic parameter tuning without container rebuilds, aligning with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System.
**Independent Test**: Can be fully tested by modifying `ai_execution_profiles` row `ocr-extract` and verifying that the sidecar uses the new parameters on the next request.
**Acceptance Scenarios**:
1. **Given** the `ai_execution_profiles` row `ocr-extract` has `temperature=0.7`, **When** the backend sends OCR request, **Then** the sidecar passes `temperature=0.7` to Ollama
2. **Given** the Active Prompt in `ai_prompts` contains system prompt and DMS tags, **When** the backend resolves the prompt, **Then** the sidecar receives and injects these into the Ollama request
3. **Given** a parameter is missing from the job snapshot, **When** the sidecar processes the request, **Then** it uses Modfile as last-resort fallback only
---
### User Story 4 - Async I/O Performance (Priority: P2)
The sidecar must use asynchronous I/O patterns to prevent blocking the FastAPI event loop, improving throughput and reducing latency for OCR operations.
**Why this priority**: Synchronous blocking I/O reduces system throughput and can cause request timeouts under load.
**Independent Test**: Can be fully tested by running concurrent OCR requests and measuring response times, verifying that async implementation handles load without blocking.
**Acceptance Scenarios**:
1. **Given** the sidecar receives multiple concurrent OCR requests, **When** processing with `httpx.AsyncClient`, **Then** requests do not block each other
2. **Given** the sidecar starts up, **When** models are loaded, **Then** loading happens via `asyncio.to_thread` to avoid blocking startup
3. **Given** the sidecar is under load, **When** measuring request latency, **Then** async implementation shows improved throughput compared to sync version
---
### User Story 5 - Network Isolation Auth (Phase 2, Post-Consolidation) (Priority: P3)
After ADR-041 server consolidation completes (single Docker host), the system should remove X-API-Key validation and rely solely on Docker-internal network isolation for authentication.
**Why this priority**: This is a future-phase improvement that simplifies the system after infrastructure consolidation. It's lower priority as it depends on ADR-041 completion.
**Independent Test**: Can be fully tested after consolidation by removing X-API-Key headers and verifying that requests from within Docker network succeed while external requests fail.
**Acceptance Scenarios**:
1. **Given** ADR-041 consolidation is complete (single Docker host), **When** backend calls sidecar without X-API-Key, **Then** the request succeeds via Docker-internal network
2. **Given** consolidation is complete, **When** external network attempts to call sidecar, **Then** the request is blocked by network isolation
3. **Given** the interim period (before consolidation), **When** backend calls sidecar, **Then** X-API-Key validation is still active
---
### Edge Cases
- What happens when the OCR sidecar receives a request for a PDF file that does not exist within the whitelisted base path? (Tested via path traversal test T007)
- How does the system handle VRAM exhaustion when both LLM and OCR models attempt to load simultaneously?
- What happens when the `ai_execution_profiles` row `ocr-extract` is missing or has invalid parameter values?
- How does the sidecar handle Ollama service unavailability or timeout during OCR processing? (Handled by FastAPI exception handling with user-friendly error messages per ADR-007)
- What happens when the Active Prompt system is unavailable during OCR request processing?
- How does the system handle concurrent requests when GPU is under extreme pressure (e.g., 95% VRAM usage)?
- What happens when path canonicalization resolves to a symlink outside the base path? (Tested via path traversal test T007 with symlink scenarios)
- How does the system behave during the transition period between Phase 1 (X-API-Key) and Phase 2 (Network Isolation)?
## Requirements _(mandatory)_
### Functional Requirements
- **FR-001**: Sidecar MUST remove hardcoded default API key and fail-fast if `OCR_SIDECAR_API_KEY` environment variable is missing
- **FR-002**: Sidecar MUST implement path canonicalization via `os.path.abspath()` + `os.path.realpath()` on all PDF path inputs
- **FR-003**: Sidecar MUST enforce base-path whitelist check on `/ocr` endpoint, rejecting paths outside `OCR_SIDECAR_UPLOAD_BASE` with 403 Forbidden
- **FR-004**: Sidecar MUST refactor `process_ocr` to use `async def` and `httpx.AsyncClient` via lifespan context manager
- **FR-005**: Sidecar MUST replace `@app.on_event("startup")` with `@asynccontextmanager` lifespan pattern
- **FR-006**: Sidecar MUST wire `calculate_ocr_residency(active_profile)` into `process_ocr` for dynamic `keep_alive` calculation
- **FR-007**: Sidecar MUST NOT accept explicit `options_override["keep_alive"]` from backend (keep_alive must be calculated lazily per ADR-036 Gap-2)
- **FR-008**: Sidecar MUST retain `vram_monitor.py` and `residency_policy.py` modules (reject deletion)
- **FR-009**: Sidecar MUST retain dynamic CPU/GPU selection for `/embed` and `/rerank` endpoints via `.to(device)` logic
- **FR-010**: Sidecar MUST remove hardcoded runtime parameters (temperature, top_p, repeat_penalty, max_tokens) and accept from backend job snapshot
- **FR-011**: Sidecar MUST receive systemPrompt and DMS extraction tags from backend and pass to Ollama in every load/generate call
- **FR-012**: Sidecar MUST remove `/normalize` endpoint (ThaiPreprocessProcessor has no consumers)
- **FR-013**: Sidecar MUST fix mutable default argument `options_override={}` in `process_with_typhoon_ocr`
- **FR-014**: Sidecar MUST load models via `asyncio.to_thread` during lifespan to avoid blocking startup
- **FR-015**: Backend MUST resolve runtime parameters from `ai_execution_profiles` row `ocr-extract` and send to sidecar
- **FR-016**: Backend MUST resolve systemPrompt and DMS tags from Active Prompt in `ai_prompts` (ADR-029/037)
- **FR-017**: Backend MUST send resolved parameters to sidecar in every OCR request
- **FR-018**: Phase 2 (post-ADR-041): Sidecar MUST remove X-API-Key validation from all endpoints
- **FR-019**: Phase 2 (post-ADR-041): Backend MUST remove X-API-Key send-side in `OcrService`
- **FR-020**: Phase 2 (post-ADR-041): Backend MUST remove X-API-Key send-side in `SandboxOcrEngineService`
### Key Entities
- **OCR Sidecar (FastAPI Service)**: Pure compute worker on Desk-5439 that provides `/ocr`, `/embed`, `/rerank` endpoints. No business logic or parameter governance. Receives parameters from backend.
- **ai_execution_profiles**: Database table containing runtime parameter profiles for different AI operations (row `ocr-extract` for OCR parameters)
- **ai_prompts**: Database table containing prompt templates with versioning and activation status (ADR-029/037)
- **Backend OcrService**: Service that orchestrates OCR requests, resolves parameters from database, and sends to sidecar
- **Backend SandboxOcrEngineService**: Service for OCR sandbox testing, similar parameter resolution as OcrService
## Success Criteria _(mandatory)_
### Measurable Outcomes
- **SC-001**: Path traversal attacks return 403 Forbidden in 100% of test cases (verified by pytest suite)
- **SC-002**: VRAM exhaustion is prevented under load; system remains stable with LLM-First GPU Ownership policy (verified by VRAM monitoring during stress test)
- **SC-003**: OCR request throughput improves by at least 20% with async I/O implementation (measured by concurrent request benchmark)
- **SC-004**: Parameter changes in `ai_execution_profiles` take effect immediately without container rebuild (verified by runtime parameter update test)
- **SC-005**: System startup time does not increase despite async model loading (measured by container startup benchmark)
- **SC-006**: No hardcoded secrets remain in sidecar codebase (verified by code audit)
- **SC-007**: All sidecar endpoints respect network isolation after ADR-041 consolidation (verified by network access test)
- **SC-008**: CPU fallback for BGE-M3 and FlagReranker activates correctly when GPU is under pressure (verified by VRAM monitoring test)
## Assumptions
- ADR-041 server consolidation will complete before Phase 2 (X-API-Key removal) can be implemented
- Desk-5439 (192.168.10.100) will continue to host the OCR sidecar with RTX 5060 Ti 16GB GPU
- Ollama service on Desk-5439 will continue to provide Typhoon OCR model
- ThaiPreprocessProcessor has no active consumers (verified by grep search across backend codebase)
- `calculate_ocr_residency()` function exists in `residency_policy.py` and is not currently wired into `process_ocr`
- VLAN/firewall ACL provides interim network security before ADR-041 consolidation
## Dependencies
- ADR-041 Server Consolidation must complete before Phase 2 (X-API-Key removal)
- ADR-036 Profile-Only Parameter Governance must be implemented for parameter resolution
- ADR-029 Dynamic Prompt Management must be implemented for Active Prompt system
- ADR-037 Active Prompt System must be operational for system prompt injection
- Desk-5439 infrastructure must remain stable (GPU, network, Ollama service)
## Out of Scope
- 1-page-1-request horizontal scaling rework (separate future ADR)
- OpenTelemetry/Prometheus/Grafana observability (separate ticket)
- `/normalize` endpoint functionality (removed per D2; ThaiPreprocessProcessor has no consumers)
@@ -0,0 +1,296 @@
# Tasks: OCR Sidecar Refactor
**Input**: Design documents from `/specs/100-Infrastructures/140-ocr-sidecar-refactor/`
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/sidecar-api.md, quickstart.md
**Tests**: Tests are included for path-traversal protection and residency wiring (per spec acceptance criteria)
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
- Include exact file paths in descriptions
## Path Conventions
- **Sidecar**: `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/`
- **Backend**: `backend/src/modules/ai/`
- **Tests**: `tests/unit/ocr-sidecar/`, `tests/integration/ocr-sidecar/`
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Project initialization and basic structure
- [x] T001 Create test directory structure in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests/
- [x] T002 Create test directory structure in tests/unit/ocr-sidecar/
- [x] T003 Create test directory structure in tests/integration/ocr-sidecar/
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
- [x] T004 Update requirements.txt in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/requirements.txt (add httpx 0.27.0, remove numpy if present)
- [x] T005 Update .env template in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env (add OCR_SIDECAR_API_KEY placeholder)
- [x] T006 Update backend .env.example in backend/.env.example (add OCR_API_URL, OCR_API_KEY placeholders)
**Checkpoint**: Foundation ready - user story implementation can now begin in parallel
---
## Phase 3: User Story 1 - Sidecar Security Hardening (Priority: P1) 🎯 MVP
**Goal**: Ensure the OCR sidecar is secure from path traversal attacks and does not contain hardcoded secrets that cannot be rotated without rebuilding containers.
**Independent Test**: Attempt path traversal requests and verify they return 403 Forbidden; verify sidecar fails fast when OCR_SIDECAR_API_KEY env is missing.
### Tests for User Story 1
- [x] T007 [P] [US1] Create path traversal test in tests/unit/ocr-sidecar/test_path_traversal.py (test various path patterns: ../../etc/passwd, symlinks outside base path, etc.)
- [x] T008 [P] [US1] Create API key validation test in tests/unit/ocr-sidecar/test_api_key_validation.py (test missing key, invalid key scenarios)
### Implementation for User Story 1
- [x] T009 [US1] Remove hardcoded default API key in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T010 [US1] Add fail-fast check for OCR_SIDECAR_API_KEY environment variable in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (raise error on startup if missing)
- [x] T011 [US1] Implement path canonicalization function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (using os.path.abspath + os.path.realpath)
- [x] T012 [US1] Implement base-path whitelist check in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check against OCR_SIDECAR_UPLOAD_BASE)
- [x] T013 [US1] Add path validation to POST /ocr endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (return 403 for invalid paths)
- [x] T014 [US1] Fix mutable default argument options_override={} in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (change to None and initialize in function body)
- [x] T015 [US1] Remove duplicate import tempfile in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
**Checkpoint**: At this point, User Story 1 should be fully functional and testable independently
---
## Phase 4: User Story 2 - GPU Resource Management (Priority: P1)
**Goal**: Prevent VRAM exhaustion on Desk-5439 by implementing adaptive OCR residency policy and CPU fallback for retrieval models, ensuring LLM has priority GPU access.
**Independent Test**: Monitor VRAM usage during concurrent OCR and embedding operations; verify BGE-M3 and FlagReranker fall back to CPU when GPU is under pressure.
### Tests for User Story 2
- [x] T016 [P] [US2] Create residency wiring unit test in tests/unit/ocr-sidecar/test_residency_wiring.py (verify calculate_ocr_residency is called in process_ocr)
- [x] T017 [P] [US2] Create CPU fallback integration test in tests/integration/ocr-sidecar/test_cpu_fallback.py (verify BGE-M3 and FlagReranker use CPU when GPU under pressure)
### Implementation for User Story 2
- [x] T018 [US2] Import calculate_ocr_residency from residency_policy.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T019 [US2] Wire calculate_ocr_residency(active_profile) into process_ocr function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T020 [US2] Remove hardcoded keep_alive=0 in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T021 [US2] Reject explicit options_override["keep_alive"] from backend in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (keep_alive must be calculated lazily per ADR-036 Gap-2)
- [x] T022 [US2] Retain vram_monitor.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/ (ensure not deleted)
- [x] T023 [US2] Retain residency_policy.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/ (ensure not deleted)
- [x] T024 [US2] Verify dynamic CPU/GPU selection exists for /embed endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check .to(device) logic)
- [x] T025 [US2] Verify dynamic CPU/GPU selection exists for /rerank endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check .to(device) logic)
**Checkpoint**: At this point, User Stories 1 AND 2 should both work independently
---
## Phase 5: User Story 3 - Parameter Governance via Active Prompt (Priority: P2)
**Goal**: Enable backend services to control AI model parameters from the database via ai_execution_profiles and ai_prompts tables, ensuring no hardcoded values in the sidecar.
**Independent Test**: Modify ai_execution_profiles row ocr-extract and verify that the sidecar uses the new parameters on the next request.
### Tests for User Story 3
- [x] T026 [P] [US3] Create parameter resolution integration test in tests/integration/ocr-sidecar/test_parameter_governance.py (verify parameters from ai_execution_profiles are used)
- [x] T027 [P] [US3] Create Active Prompt integration test in tests/integration/ocr-sidecar/test_active_prompt.py (verify systemPrompt and DMS tags from ai_prompts are used)
### Implementation for User Story 3
- [x] T028 [US3] Remove hardcoded runtime parameters (temperature, top_p, repeat_penalty, max_tokens) in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T029 [US3] Add runtime_params field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T030 [US3] Add system_prompt field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T031 [US3] Add dms_tags field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T032 [US3] Pass runtime_params to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T033 [US3] Pass system_prompt to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (inject into every load/generate call)
- [x] T034 [US3] Pass dms_tags to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (inject into every load/generate call)
- [x] T035 [US3] Implement parameter resolution in backend/src/modules/ai/services/ocr.service.ts (resolve from ai_execution_profiles row ocr-extract)
- [x] T036 [US3] Implement Active Prompt resolution in backend/src/modules/ai/services/ocr.service.ts (resolve from ai_prompts type ocr_extraction)
- [x] T037 [US3] Extract systemPrompt and DMS tags in backend/src/modules/ai/services/ocr.service.ts
- [x] T038 [US3] Send resolved parameters to sidecar in backend/src/modules/ai/services/ocr.service.ts
- [x] T039 [US3] Implement parameter resolution in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts (same pattern as ocr.service.ts)
- [x] T040 [US3] Implement Active Prompt resolution in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts (same pattern as ocr.service.ts)
**Checkpoint**: All user stories should now be independently functional
---
## Phase 6: User Story 4 - Async I/O Performance (Priority: P2)
**Goal**: Use asynchronous I/O patterns to prevent blocking the FastAPI event loop, improving throughput and reducing latency for OCR operations.
**Independent Test**: Run concurrent OCR requests and measure response times; verify async implementation handles load without blocking.
### Tests for User Story 4
- [x] T041 [P] [US4] Create async I/O performance test in tests/integration/ocr-sidecar/test_async_performance.py (benchmark concurrent requests)
### Implementation for User Story 4
- [x] T042 [US4] Refactor process_ocr to async def in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T043 [US4] Create AsyncClient via lifespan context manager in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T044 [US4] Replace httpx.Client with httpx.AsyncClient in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T045 [US4] Replace @app.on_event("startup") with @asynccontextmanager lifespan in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T046 [US4] Load models via asyncio.to_thread during lifespan in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (avoid blocking startup)
---
## Phase 7: User Story 5 - Network Isolation Auth Phase 2 (Priority: P3)
**Goal**: After ADR-041 server consolidation completes, remove X-API-Key validation and rely solely on Docker-internal network isolation for authentication.
**Independent Test**: After consolidation, remove X-API-Key headers and verify that requests from within Docker network succeed while external requests fail.
### Tests for User Story 5
- [ ] T047 [P] [US5] Create network isolation test in tests/integration/ocr-sidecar/test_network_isolation.py (verify Docker-internal requests work, external requests fail)
### Implementation for User Story 5 (BLOCKED until ADR-041 consolidation complete)
- [ ] T048 [US5] Remove X-API-Key validation from all endpoints in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [ ] T049 [US5] Remove OCR_SIDECAR_API_KEY from .env in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env
- [ ] T050 [US5] Remove X-API-Key send-side in backend/src/modules/ai/services/ocr.service.ts
- [ ] T051 [US5] Remove X-API-Key send-side in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts
- [ ] T052 [US5] Remove OCR_API_KEY from backend .env in backend/.env
- [ ] T053 [US5] Update OCR_API_URL to Docker-internal URL in backend/.env (e.g., http://sidecar:8765)
**Note**: Phase 7 tasks are BLOCKED until ADR-041 server consolidation completes. Do not implement until ADR-041 cutover is successful.
---
## Phase 8: Remove /normalize Endpoint (Cross-Cutting)
**Purpose**: Remove unused /normalize endpoint per ADR-040 D2
- [x] T054 Remove /normalize endpoint from specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T055 Verify no consumers exist via grep search in backend codebase
---
## Phase 9: Polish & Cross-Cutting Concerns
**Purpose**: Improvements that affect multiple user stories
- [x] T056 [P] Update Dockerfile in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/Dockerfile (if any changes needed)
- [x] T057 [P] Update docker-compose.yml in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/docker-compose.yml (if any changes needed)
- [x] T058 Run path traversal test suite and verify all tests pass
- [x] T059 Run residency wiring test suite and verify all tests pass
- [x] T060 Run parameter governance test suite and verify all tests pass
- [x] T061 Run async performance test and verify 20%+ throughput improvement
- [x] T062 Update documentation in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/README.md
- [x] T063 Validate quickstart.md deployment steps on Desk-5439
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies - can start immediately
- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
- **User Stories (Phase 3-6)**: All depend on Foundational phase completion
- User Stories 1-4 (P1, P1, P2, P2) can proceed in parallel after Phase 2
- User Story 5 (P3) is BLOCKED until ADR-041 consolidation completes
- **Remove /normalize (Phase 8)**: Can run in parallel with user stories (no dependencies)
- **Polish (Phase 9)**: Depends on all desired user stories being complete
### User Story Dependencies
- **User Story 1 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 2 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 3 (P2)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 4 (P2)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 5 (P3)**: BLOCKED until ADR-041 consolidation completes
### Within Each User Story
- Tests MUST be written and FAIL before implementation (TDD approach)
- Sidecar implementation before backend implementation (for parameter governance story)
- Core implementation before integration
- Story complete before moving to next priority
### Parallel Opportunities
- All Setup tasks (T001-T003) can run in parallel
- All Foundational tasks (T004-T006) can run in parallel
- Once Foundational phase completes, User Stories 1-4 can start in parallel (if team capacity allows)
- All tests for a user story marked [P] can run in parallel
- User Story 5 tasks can run in parallel once ADR-041 consolidation completes
- Remove /normalize task (T054-T055) can run in parallel with user stories
- Polish tasks (T056-T057) can run in parallel
---
## Parallel Example: User Story 1
```bash
# Launch all tests for User Story 1 together:
Task: "Create path traversal test in tests/unit/ocr-sidecar/test_path_traversal.py"
Task: "Create API key validation test in tests/unit/ocr-sidecar/test_api_key_validation.py"
# Launch implementation tasks sequentially (each depends on previous):
Task: "Remove hardcoded default API key in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
Task: "Add fail-fast check for OCR_SIDECAR_API_KEY environment variable in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
Task: "Implement path canonicalization function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
```
---
## Implementation Strategy
### MVP First (User Stories 1-2 Only - Critical Security & GPU Management)
1. Complete Phase 1: Setup
2. Complete Phase 2: Foundational (CRITICAL - blocks all stories)
3. Complete Phase 3: User Story 1 (Security Hardening)
4. Complete Phase 4: User Story 2 (GPU Resource Management)
5. **STOP and VALIDATE**: Test User Stories 1-2 independently
6. Deploy/demo if ready
### Incremental Delivery
1. Complete Setup + Foundational → Foundation ready
2. Add User Story 1 → Test independently → Deploy/Demo (Security MVP!)
3. Add User Story 2 → Test independently → Deploy/Demo (GPU Management MVP!)
4. Add User Story 3 → Test independently → Deploy/Demo (Parameter Governance)
5. Add User Story 4 → Test independently → Deploy/Demo (Async Performance)
6. Wait for ADR-041 consolidation → Add User Story 5 → Test independently → Deploy/Demo
7. Each story adds value without breaking previous stories
### Parallel Team Strategy
With multiple developers:
1. Team completes Setup + Foundational together
2. Once Foundational is done:
- Developer A: User Story 1 (Security)
- Developer B: User Story 2 (GPU Management)
- Developer C: User Story 3 (Parameter Governance)
- Developer D: User Story 4 (Async I/O)
3. Stories complete and integrate independently
4. After ADR-041 consolidation: Developer A/E: User Story 5 (Network Isolation)
---
## Notes
- [P] tasks = different files, no dependencies
- [Story] label maps task to specific user story for traceability
- Each user story should be independently completable and testable
- Verify tests fail before implementing
- Commit after each task or logical group
- Stop at any checkpoint to validate story independently
- User Story 5 is BLOCKED until ADR-041 consolidation completes
- Phase 7 tasks should NOT be started until ADR-041 cutover is successful
- Avoid: vague tasks, same file conflicts, cross-story dependencies that break independence
@@ -0,0 +1,36 @@
# Specification Quality Checklist: Single-Host Server Consolidation
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-06-20
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs) — spec focuses on operational outcomes
- [x] Focused on user value and business needs — admin/ops workflows clearly defined
- [x] Written for non-technical stakeholders — user stories describe journeys, not code
- [x] All mandatory sections completed — User Scenarios, Requirements, Success Criteria all filled
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain — all requirements have clear definitions
- [x] Requirements are testable and unambiguous — each FR has measurable acceptance criteria
- [x] Success criteria are measurable — SC-001 through SC-010 have specific metrics
- [x] Success criteria are technology-agnostic — focus on outcomes (parity, latency, uptime) not tools
- [x] All acceptance scenarios are defined — 5 user stories with Given/When/Then scenarios
- [x] Edge cases are identified — 7 edge cases covering GPU OOM, RAM, CIFS, SPOF, network, migration failures
- [x] Scope is clearly bounded — includes provisioning, migration, cutover, security, decommission
- [x] Dependencies and assumptions identified — 7 assumptions documented
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria — FR-001 through FR-015 mapped to user stories
- [x] User scenarios cover primary flows — P1 (provision) → P2 (migrate) → P3 (cutover) → P4 (security) → P5 (decommission)
- [x] Feature meets measurable outcomes defined in Success Criteria — 10 measurable outcomes
- [x] No implementation details leak into specification — Docker/tech names are inherent to infra spec but kept at architecture level
## Notes
- This is an infrastructure specification based on ADR-041; some technical terms (Docker, CIFS, VRAM) are inherent to the domain
- ADR-040 (OCR Sidecar Refactor) is a hard dependency for FR-008 (remove X-API-Key) and FR-009 (GPU VRAM management)
- Spec is ready for `/speckit-clarify` or `/speckit-plan`
@@ -0,0 +1,69 @@
# Docker Compose Contract: New Host
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
This contract defines the service topology for the consolidated single-host deployment.
The actual `docker-compose.new-host.yml` will be created at:
`specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
## Service Topology
| Service | Image | Networks | LAN Ports | Internal Port | Memory Limit | Depends On |
|---------|-------|----------|-----------|---------------|--------------|------------|
| ollama | ollama/ollama:latest | dms-internal | none | 11434 | 2G (host) | — |
| ocr-sidecar | build (local) | dms-internal | none | 8765 | 1G | ollama |
| backend | lcbp3-backend:latest | dms-internal, dms-frontend | 3001→3000 | 3000 | 2G | ollama, ocr-sidecar, redis, mariadb, elasticsearch, qdrant, clamav |
| frontend | lcbp3-frontend:latest | dms-frontend | 3000 | 3000 | 1G | backend |
| redis | redis:7-alpine | dms-internal | none | 6379 | 1G | — |
| mariadb | mariadb:11.8 | dms-internal | none | 3306 | 8G | — |
| elasticsearch | elasticsearch:8.11.1 | dms-internal | none | 9200 | 4G | — |
| qdrant | qdrant/qdrant:v1.16.1 | dms-internal | none | 6333 | 1G | — |
| clamav | clamav/clamav:1.4.4 | dms-internal | none | 3310 | 2G | — |
| ollama-metrics | ghcr.io/norskhelsenett/ollama-metrics:latest | dms-internal | 9924 | 9924 | 256M | ollama |
## Network Topology
```
dms-internal (bridge, no LAN access)
├── ollama:11434
├── ocr-sidecar:8765
├── backend:3000 (also on dms-frontend)
├── redis:6379
├── mariadb:3306
├── elasticsearch:9200
├── qdrant:6333
├── clamav:3310
└── ollama-metrics:9924
dms-frontend (bridge, LAN published)
├── frontend:3000 → LAN:3000
├── backend:3000 → LAN:3001 (NPM routes backend.np-dms.work → :3001)
└── ollama-metrics:9924 → LAN:9924 (Prometheus scrape target)
```
## Environment Variables (New)
| Variable | Default | Description |
|----------|---------|-------------|
| ASUSTOR_USER | (required) | CIFS share username |
| ASUSTOR_PASS | (required) | CIFS share password |
| NEW_HOST_IP | (required) | New host LAN IP for CI/CD deploy target |
## Environment Variables (Changed from QNAP)
| Variable | Old Value (QNAP) | New Value (New Host) |
|----------|------------------|---------------------|
| DB_HOST | mariadb | mariadb (unchanged — Docker DNS) |
| REDIS_HOST | cache | redis (service name change) |
| ELASTICSEARCH_HOST | search | elasticsearch (service name change) |
| QDRANT_HOST | qdrant | qdrant (unchanged) |
| OCR_API_URL | http://192.168.10.100:8765 | http://ocr-sidecar:8765 |
| OLLAMA_API_URL | http://192.168.10.100:11434 | http://ollama:11434 |
| CLAMAV_HOST | clamav | clamav (unchanged) |
## Removed Environment Variables
| Variable | Reason |
|----------|--------|
| OCR_SIDECAR_API_KEY | ADR-040 D5 — network-only auth, no API key needed |
| OCR_SIDECAR_UPLOAD_BASE | Still needed but value changes to /mnt/uploads (same) |
@@ -0,0 +1,230 @@
// File: specs/100-Infrastructures/141-server-consolidation/data-model.md
// Change Log:
// - 2026-06-20: Initial data model for Single-Host Server Consolidation
# Data Model: Single-Host Server Consolidation
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
## Infrastructure Entities
### 1. Docker Network: dms-internal
| Attribute | Type | Description |
|-----------|------|-------------|
| name | string | `dms-internal` |
| driver | string | `bridge` |
| scope | string | local (single host) |
| published_ports | none | No ports published to LAN |
**Members**: ollama, ocr-sidecar, backend, redis, mariadb, elasticsearch, qdrant, clamav, ollama-metrics
### 2. Docker Network: dms-frontend
| Attribute | Type | Description |
|-----------|------|-------------|
| name | string | `dms-frontend` |
| driver | string | `bridge` |
| scope | string | local (single host) |
| published_ports | 3000 (frontend), 3001→3000 (backend), 9924 (ollama-metrics) | Only ports published to LAN |
**Members**: frontend, backend
### 3. Docker Volume: asustor_uploads
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` |
| type | string | `cifs` |
| device | string | `//192.168.10.9/np-dms-as/data/uploads` |
| mount_options | string | `username=${ASUSTOR_USER},password=${ASUSTOR_PASS},vers=3.0,uid=0,gid=0` |
| mount_point (sidecar) | string | `/mnt/uploads` (read-only) |
| mount_point (backend) | string | `/app/uploads` (read-write) |
### 4. Docker Volume: ollama_models
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/root/.ollama` |
| content | string | Ollama model files (np-dms-ai, np-dms-ocr, nomic-embed-text) |
### 5. Docker Volume: mariadb_data
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/var/lib/mysql` |
| content | string | MariaDB data files (migrated from QNAP) |
### 6. Docker Volume: es_data
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/usr/share/elasticsearch/data` |
| content | string | Elasticsearch indices (migrated from QNAP) |
### 7. Docker Volume: redis_data
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/data` |
| content | string | Redis AOF persistence + BullMQ queue data |
### 8. Docker Volume: qdrant_data
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/qdrant/storage` |
| content | string | Qdrant vector collections |
## Service Definitions
### ollama
| Attribute | Value |
|-----------|-------|
| image | `ollama/ollama:latest` |
| GPU | NVIDIA RTX 5060 Ti 16GB (passthrough) |
| network | dms-internal only |
| ports | none (expose 11434 internal only) |
| volumes | ollama_models → /root/.ollama |
| depends_on | none |
| healthcheck | `ollama list` (verify API responsive) |
### ocr-sidecar
| Attribute | Value |
|-----------|-------|
| build | `./specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar` |
| network | dms-internal only |
| ports | none (expose 8765 internal only) |
| volumes | asustor_uploads → /mnt/uploads (read-only) |
| depends_on | ollama |
| env | OLLAMA_API_URL=http://ollama:11434, OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads |
| healthcheck | `curl -f http://localhost:8765/health` |
### backend
| Attribute | Value |
|-----------|-------|
| image | `lcbp3-backend:${BACKEND_IMAGE_TAG:-latest}` |
| networks | dms-internal + dms-frontend |
| ports | 3001:3000 (published to LAN — NPM routes `backend.np-dms.work` → :3001) |
| volumes | asustor_uploads → /app/uploads (read-write) |
| depends_on | ollama, ocr-sidecar, redis, mariadb, elasticsearch, qdrant, clamav |
| env | OCR_API_URL=http://ocr-sidecar:8765, OLLAMA_API_URL=http://ollama:11434, DB_HOST=mariadb, REDIS_HOST=redis, ELASTICSEARCH_HOST=elasticsearch, QDRANT_HOST=qdrant |
| healthcheck | `curl -f http://localhost:3000/health` |
| memory_limit | 2G |
### frontend
| Attribute | Value |
|-----------|-------|
| image | `lcbp3-frontend:${FRONTEND_IMAGE_TAG:-latest}` |
| networks | dms-frontend only |
| ports | 3000:3000 (published to LAN) |
| depends_on | backend |
| env | INTERNAL_API_URL=http://backend:3000/api |
| healthcheck | `curl -f http://localhost:3000/` |
| memory_limit | 1G |
### redis
| Attribute | Value |
|-----------|-------|
| image | `redis:7-alpine` |
| network | dms-internal only |
| ports | none (expose 6379 internal only) |
| volumes | redis_data → /data |
| command | `redis-server --requirepass ${REDIS_PASSWORD} --appendonly yes --maxmemory-policy noeviction` |
| healthcheck | `redis-cli -a ${REDIS_PASSWORD} --no-auth-warning ping` |
| memory_limit | 1G |
### mariadb
| Attribute | Value |
|-----------|-------|
| image | `mariadb:11.8` |
| network | dms-internal only |
| ports | none (expose 3306 internal only) |
| volumes | mariadb_data → /var/lib/mysql |
| env | MARIADB_ROOT_PASSWORD, MARIADB_DATABASE=lcbp3, MARIADB_USER=center |
| command | `--character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci` |
| healthcheck | `healthcheck.sh --connect --innodb_initialized` |
| memory_limit | 8G |
### elasticsearch
| Attribute | Value |
|-----------|-------|
| image | `elasticsearch:8.11.1` |
| network | dms-internal only |
| ports | none (expose 9200 internal only) |
| volumes | es_data → /usr/share/elasticsearch/data |
| env | discovery.type=single-node, xpack.security.enabled=false, ES_JAVA_OPTS=-Xms2g -Xmx2g |
| healthcheck | `curl -s http://localhost:9200/_cluster/health` |
| memory_limit | 4G |
### qdrant
| Attribute | Value |
|-----------|-------|
| image | `qdrant/qdrant:v1.16.1` |
| network | dms-internal only |
| ports | none (expose 6333 internal only) |
| volumes | qdrant_data → /qdrant/storage |
| healthcheck | TCP check on port 6333 |
| memory_limit | 1G |
### clamav
| Attribute | Value |
|-----------|-------|
| image | `clamav/clamav:1.4.4` |
| network | dms-internal only |
| ports | none (expose 3310 internal only) |
| healthcheck | `clamdcheck.sh` |
| memory_limit | 2G |
### ollama-metrics
| Attribute | Value |
|-----------|-------|
| image | `ghcr.io/norskhelsenett/ollama-metrics:latest` |
| network | dms-internal only |
| ports | 9924:9924 (published to LAN — Prometheus on ASUSTOR scrapes `http://<new-host-ip>:9924/metrics`) |
| env | OLLAMA_HOST=http://ollama:11434 |
| memory_limit | 256M |
## Service Communication Map
```
LAN (VLAN 10)
├── :3000 (Frontend) ──→ http://backend:3000/api (dms-frontend)
├── :3001 (Backend) ──→ http://backend:3000/api (dms-frontend)
└── :9924 (ollama-metrics) ──→ Prometheus scrape target
├──→ mariadb:3306 (dms-internal)
├──→ redis:6379 (dms-internal)
├──→ elasticsearch:9200 (dms-internal)
├──→ qdrant:6333 (dms-internal)
├──→ clamav:3310 (dms-internal)
├──→ ocr-sidecar:8765 (dms-internal)
└──→ ollama:11434 (dms-internal)
```
## Path Mapping
| Service | Container Path | Source |
|---------|---------------|--------|
| Backend | `/app/uploads/temp` | ASUSTOR CIFS `/data/uploads/temp` |
| Backend | `/app/uploads/permanent` | ASUSTOR CIFS `/data/uploads/permanent` |
| Sidecar | `/mnt/uploads/temp` (read-only) | ASUSTOR CIFS `/data/uploads/temp` |
| Sidecar | `/mnt/uploads/permanent` (read-only) | ASUSTOR CIFS `/data/uploads/permanent` |
**Note**: Backend uses `/app/uploads` (read-write), Sidecar uses `/mnt/uploads` (read-only). Both map to the same ASUSTOR CIFS share. Path remapping in `ocr.service.ts` (`remapPath()`) continues to work — strip `/app/uploads` and replace with `/mnt/uploads`.
@@ -0,0 +1,124 @@
// File: specs/100-Infrastructures/141-server-consolidation/plan.md
// Change Log:
// - 2026-06-20: Initial implementation plan for Single-Host Server Consolidation
# Implementation Plan: Single-Host Server Consolidation
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/100-Infrastructures/141-server-consolidation/spec.md`
**Related ADRs**: [ADR-041](../../06-Decision-Records/ADR-041-server-consolidation.md), [ADR-040](../../06-Decision-Records/ADR-040-ocr-sidecar-refactor.md)
## Summary
Consolidate all LCBP3-DMS services from a 2-host architecture (QNAP NAS + Desk-5439) onto a single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB). ASUSTOR becomes primary NAS for file storage via CIFS. Docker internal bridge network isolates Ollama and OCR Sidecar from LAN, enabling removal of X-API-Key auth (ADR-040 D5). QNAP becomes backup server; Desk-5439 is retired.
## Technical Context
**Language/Version**: Docker Compose v2 (YAML), Bash scripts, PowerShell provisioning
**Primary Dependencies**: Docker Engine 24+, Docker Compose v2, NVIDIA Container Toolkit, CIFS Utils
**Storage**: MariaDB 11.8 (Docker volume), Elasticsearch 8.11 (Docker volume), Redis 7 (Docker volume), Qdrant v1.16 (Docker volume), ASUSTOR CIFS for file uploads
**Testing**: Smoke tests (manual + scripted), health check endpoints, data parity verification scripts
**Target Platform**: Linux (Ubuntu 22.04 LTS or Debian 12) on Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB
**Project Type**: Infrastructure (Docker Compose stack + provisioning scripts)
**Performance Goals**: Backend-to-Ollama latency <50ms (localhost vs ~2ms LAN), all containers healthy within 5 min
**Constraints**: 32GB RAM total (target <28GB usage), 16GB VRAM (target <15GB usage), CIFS mount reliability
**Scale/Scope**: 8 containers (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, ES, Qdrant) + ClamAV + ollama-metrics
## Constitution Check
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
| Principle | Status | Notes |
|-----------|--------|-------|
| ADR-016 Security | ✅ Pass | Network isolation replaces API key; no ports published for internal services |
| ADR-019 UUID | ✅ Pass | No UUID changes — infrastructure only |
| ADR-009 Schema | ✅ Pass | No schema changes — data migration via dump/restore |
| ADR-023/023A AI Boundary | ✅ Pass | Ollama isolated on Docker internal network; no direct DB/storage access |
| ADR-040 D5 Network Auth | ✅ Pass | Docker bridge isolation enables X-API-Key removal |
| ADR-008 BullMQ | ✅ Pass | Redis co-located on same host; queue behavior unchanged |
| ADR-002 Document Numbering | ✅ Pass | Redis Redlock unchanged; co-located reduces lock latency |
| SPOF Risk | ⚠️ Acknowledged | Single host = SPOF; mitigated by QNAP backup + DR plan |
**Gate Result**: PASS — no violations. SPOF risk is acknowledged in ADR-041 with mitigation plan.
## Project Structure
### Documentation (this feature)
```text
specs/100-Infrastructures/141-server-consolidation/
├── spec.md # Feature specification
├── plan.md # This file
├── research.md # Phase 0 output — research findings
├── data-model.md # Phase 1 output — infrastructure data model
├── quickstart.md # Phase 1 output — deployment guide
├── contracts/ # Phase 1 output — docker-compose contracts
│ └── docker-compose.new-host.yml
├── checklists/
│ └── requirements.md # Spec quality checklist
└── tasks.md # Phase 2 output (/speckit.tasks command)
```
### Source Code (repository root)
```text
specs/04-Infrastructure-OPS/04-00-docker-compose/
├── New-Host/ # NEW — consolidated host
│ ├── docker-compose.new-host.yml # Unified compose for all 8+ services
│ ├── .env.template # Environment template for new host
│ ├── ocr-sidecar/ # Sidecar (copied from Desk-5439, adapted)
│ │ ├── Dockerfile
│ │ ├── app.py
│ │ └── requirements.txt
│ ├── scripts/
│ │ ├── provision-host.sh # OS prep + Docker + NVIDIA toolkit
│ │ ├── migrate-mariadb.sh # Dump from QNAP → restore to new host
│ │ ├── migrate-elasticsearch.sh # Snapshot from QNAP → restore to new host
│ │ ├── smoke-test.sh # Post-cutover verification
│ │ └── rollback.sh # Emergency rollback to QNAP + Desk-5439
│ └── README.md # Deployment guide for new host
├── QNAP/ # EXISTING — becomes backup
├── Desk-5439/ # EXISTING — retired after cutover
└── ASUSTOR/ # EXISTING — Gitea runner stays
```
**Structure Decision**: New `New-Host/` directory under existing `04-00-docker-compose/` follows the established per-host directory pattern (QNAP/, Desk-5439/, ASUSTOR/). The unified compose file replaces the split QNAP/app + QNAP/service + QNAP/mariadb + Desk-5439/ocr-sidecar pattern with a single stack.
## Complexity Tracking
> No constitution check violations — table not needed.
## Implementation Phases
### Phase 1: Provision New Host (T001-T002)
- Install Ubuntu 22.04 LTS / Debian 12
- Install Docker Engine + Docker Compose v2
- Install NVIDIA drivers + nvidia-container-toolkit
- Mount ASUSTOR CIFS share to `/mnt/uploads`
- Create directory structure for Docker volumes
### Phase 2: Create Unified Docker Compose (T003-T005)
- Write `docker-compose.new-host.yml` with all services
- Configure `dms-internal` bridge network (no LAN publish for Ollama/sidecar)
- Configure `dms-frontend` bridge network (Frontend + Backend published)
- Copy OCR sidecar code from Desk-5439, adapt for Docker-internal Ollama URL
- Configure per-container memory limits per ADR-041 D5
### Phase 3: Migrate Data (T006-T007)
- Dump MariaDB from QNAP → restore to new host container
- Snapshot Elasticsearch from QNAP → restore to new host container
- Verify row count + document count parity
- Verify CIFS file access from backend container
### Phase 4: Cutover (T008-T010)
- Update Gitea CI/CD deploy target to new host
- Deploy services on new host
- Run smoke tests (login, document CRUD, OCR, AI, search)
- Remove X-API-Key from sidecar + backend (ADR-040 D5)
- Update DNS/NPM to point to new host
### Phase 5: Decommission (T011-T012)
- Stop services on QNAP (retain data for backup)
- Retire Desk-5439 (power off or repurpose)
- Monitor RAM/VRAM for 24-48 hours
- Document rollback procedure
@@ -0,0 +1,154 @@
// File: specs/100-Infrastructures/141-server-consolidation/quickstart.md
// Change Log:
// - 2026-06-20: Initial quickstart guide for Single-Host Server Consolidation
# Quickstart: Single-Host Server Consolidation
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
## Prerequisites
- New host with Ubuntu 22.04 LTS or Debian 12 installed
- Ryzen 5 5600 / 32GB RAM / RTX 5060 Ti 16GB
- Network access to VLAN 10 (192.168.10.x)
- ASUSTOR NAS accessible at 192.168.10.9 with CIFS share `np-dms-as`
- SSH access to QNAP (192.168.10.8) for data migration
- Gitea CI/CD access for deploy target update
## Step 1: Provision Host
```bash
# Run on new host (as root or sudo user)
cd /opt/lcbp3
bash specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/provision-host.sh
```
This script:
1. Installs Docker Engine + Docker Compose v2
2. Installs NVIDIA drivers + nvidia-container-toolkit
3. Creates CIFS mount for ASUSTOR at `/mnt/uploads`
4. Creates Docker volume directories
5. Verifies GPU access with `nvidia-smi`
## Step 2: Prepare .env
```bash
cd /opt/lcbp3/specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host
cp .env.template .env
# Edit .env with real values:
# - ASUSTOR_USER, ASUSTOR_PASS (CIFS credentials)
# - DB_PASSWORD, DB_ROOT_PASSWORD (from QNAP .env)
# - REDIS_PASSWORD (from QNAP .env)
# - JWT_SECRET, JWT_REFRESH_SECRET (from QNAP .env)
# - AUTH_SECRET (from QNAP .env)
# - ELASTICSEARCH_PASSWORD (from QNAP .env)
```
## Step 3: Migrate Data
```bash
# Migrate MariaDB (from QNAP to new host)
bash scripts/migrate-mariadb.sh
# Migrate Elasticsearch (from QNAP to new host)
bash scripts/migrate-elasticsearch.sh
# Verify parity
bash scripts/verify-data-parity.sh
```
## Step 4: Deploy Services
```bash
# Pull latest images from Gitea registry
docker compose --env-file .env -f docker-compose.new-host.yml pull
# Start all services
docker compose --env-file .env -f docker-compose.new-host.yml up -d
# Check health
docker compose -f docker-compose.new-host.yml ps
docker compose -f docker-compose.new-host.yml logs --tail=50
```
## Step 5: Smoke Test
```bash
# Run smoke tests
bash scripts/smoke-test.sh
```
Smoke tests verify:
- Backend health check (`GET http://localhost:3001/health`)
- Frontend accessible (`GET http://localhost:3000/`)
- Login flow (POST /api/auth/login)
- Document list (GET /api/correspondences)
- OCR endpoint (POST /api/ai/sandbox/ocr)
- AI inference (POST /api/ai/sandbox/extract)
- Full-text search (GET /api/search)
## Step 6: Update CI/CD
Update Gitea secrets:
- `HOST` → new host IP (e.g., `192.168.10.50`)
- `COMPOSE_FILE``specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
## Step 7: Cutover DNS
Update NPM (Nginx Proxy Manager) on QNAP:
- `lcbp3.np-dms.work` → new host IP
- `backend.np-dms.work` → new host IP
## Step 8: Remove X-API-Key (ADR-040 D5)
After verifying Docker-internal network isolation:
1. Remove `OCR_SIDECAR_API_KEY` from sidecar environment
2. Remove API key validation from `app.py`
3. Remove `X-API-Key` header from backend `ocr.service.ts`
4. Rebuild and redeploy sidecar + backend
## Step 9: Monitor (24-48 hours)
```bash
# Monitor RAM usage
docker stats --no-stream
# Monitor VRAM usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 60
# Monitor container health
watch -n 30 'docker compose -f docker-compose.new-host.yml ps'
```
## Step 10: Decommission Old Hosts
After 24-48 hours of stable operation:
```bash
# Stop QNAP services (retain data for backup)
ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose down'
ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose down'
# Power off Desk-5439
ssh user@192.168.10.100 'sudo shutdown -h now'
```
## Rollback (Emergency)
```bash
# Stop new host services
docker compose -f docker-compose.new-host.yml down
# Restore QNAP services
ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose up -d'
ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose up -d'
# Restore Desk-5439 services
ssh user@192.168.10.100 'cd /opt/ocr-sidecar && docker compose up -d'
# Revert DNS
# Update NPM to point back to QNAP (192.168.10.8)
# Revert CI/CD
# Update Gitea secrets HOST back to 192.168.10.8
```
@@ -0,0 +1,139 @@
// File: specs/100-Infrastructures/141-server-consolidation/research.md
// Change Log:
// - 2026-06-20: Initial research for Single-Host Server Consolidation
# Research: Single-Host Server Consolidation
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
## R1: Docker Network Isolation Strategy
**Decision**: Use two Docker bridge networks — `dms-internal` (all services) and `dms-frontend` (Frontend + Backend only, for LAN publish).
**Rationale**: Docker bridge networks provide L2 isolation. Services on `dms-internal` without `ports` mapping are unreachable from LAN. Only Frontend (3000) and Backend (3000) need LAN access. This replaces VLAN/firewall ACL reliance with Docker-native isolation.
**Alternatives Considered**:
- Single bridge network + iptables rules — more complex, error-prone
- Docker Swarm overlay network — overkill for single host
- Host network mode — no isolation, security risk
## R2: CIFS Mount Strategy for ASUSTOR
**Decision**: Use Docker named volume with CIFS driver to mount ASUSTOR share `//192.168.10.9/np-dms-as/data/uploads` as `asustor_uploads` volume, mounted at `/mnt/uploads` in sidecar and `/app/uploads` in backend.
**Rationale**: Docker CIFS volume driver handles mount lifecycle with container start/stop. Credentials in `.env` (gitignored). Both backend and sidecar see the same files via the same CIFS mount point.
**Alternatives Considered**:
- Host-level `mount -t cifs` then bind mount — requires host OS config, not portable
- SSHFS — slower than CIFS for file operations
- Sync files to local SSD — adds complexity, storage duplication
**Key Consideration**: Previous Desk-5439 setup had issues with Docker Desktop WSL2 + CIFS (see memory). On Linux host, CIFS volume driver works natively without WSL2 layer.
## R3: MariaDB Migration Strategy
**Decision**: Use `mariadb-dump` (logical dump) from QNAP MariaDB 11.8, pipe directly to new host MariaDB 11.8 container.
**Rationale**: Same MariaDB version (11.8) on both hosts → logical dump is safest. Database is small enough (<10GB estimated) that dump/restore completes within maintenance window.
**Alternatives Considered**:
- `mariabackup` (physical backup) — faster but requires same filesystem layout
- Replication (binlog) — overkill for one-time migration
- Copy raw data files — risky, requires same version + config
**Migration Command**:
```bash
# From QNAP (source) — dump all databases
mariadb-dump --single-transaction --routines --triggers \
-h 127.0.0.1 -u root -p"$DB_ROOT_PASSWORD" \
--all-databases > qnap-full-dump.sql
# On new host — restore
docker exec -i lcbp3-mariadb mariadb -u root -p"$DB_ROOT_PASSWORD" < qnap-full-dump.sql
```
## R4: Elasticsearch Migration Strategy
**Decision**: Use ES snapshot/restore API — create snapshot on QNAP ES, transfer to new host, restore.
**Rationale**: ES snapshot API is the official migration path. Handles index mappings, settings, and data. Works across same ES version (8.11.x).
**Alternatives Considered**:
- Copy raw data directory — risky, requires identical ES config
- Re-index from MariaDB — slow, loses search index tuning
- Logstash pipeline — overkill for one-time migration
**Migration Steps**:
1. Register shared filesystem repo on QNAP ES
2. Create snapshot of all indices
3. Copy snapshot files to new host ES data volume
4. Register repo on new host ES
5. Restore snapshot
## R5: GPU VRAM Management on Single Host
**Decision**: Rely on ADR-040 D3 (Adaptive OCR Residency via `calculate_ocr_residency()`) and ADR-040 D4 (CPU Fallback Retrieval). LLM-First GPU Ownership from CONTEXT.md.
**Rationale**: RTX 5060 Ti 16GB must serve:
- np-dms-ai (Typhoon-2.5 ~7-8B): ~6-8GB VRAM
- np-dms-ocr (Typhoon OCR): ~5GB VRAM
- nomic-embed-text: ~0.5GB VRAM
- CUDA overhead: ~1.5GB
- Total: ~13-15GB → tight but feasible with adaptive residency
**Key Policy**: When LLM (np-dms-ai) needs to load, OCR model is unloaded first (`keep_alive=0` for OCR). BGE-M3 + Reranker use CPU fallback when GPU is occupied.
**Alternatives Considered**:
- Force GPU-resident for all models — OOM risk (15.5GB > 16GB with overhead)
- CPU-only for all AI — too slow for production
- Second GPU — not available on new host
## R6: RAM Budget Allocation
**Decision**: Per-container memory limits in Docker Compose:
| Service | Memory Limit | Notes |
|---------|-------------|-------|
| MariaDB | 8G | Largest consumer, tune innodb_buffer_pool |
| Elasticsearch | 4G | ES_JAVA_OPTS=-Xms2g -Xmx2g |
| Backend (NestJS) | 2G | Node.js + BullMQ workers |
| Frontend (Next.js) | 1G | Standalone mode |
| Redis | 1G | In-memory + AOF |
| Qdrant | 1G | Vector DB |
| OCR Sidecar | 1G | Python + PyMuPDF |
| Ollama | 2G | Model loading + inference |
| ClamAV | 2G | Virus definitions |
| ollama-metrics | 256M | Lightweight proxy |
| **Total** | **~22.3G** | Leaves ~9.7G for OS + swap |
**Rationale**: 32GB total - 22.3GB containers = ~9.7GB for OS kernel + page cache + swap. Comfortable margin.
**Alternatives Considered**:
- No limits — risk of OOM killer affecting critical services
- Tighter limits — may cause ES/MariaDB instability
## R7: CI/CD Pipeline Update
**Decision**: Update Gitea Actions `ci-deploy.yml` to SSH-deploy to new host IP instead of QNAP IP. ASUSTOR Gitea runner stays unchanged.
**Rationale**: Gitea runner on ASUSTOR (192.168.10.9) can reach new host via VLAN 10. Only the deploy target IP changes. `deploy.sh` path to compose file updates to `New-Host/docker-compose.new-host.yml`.
**Alternatives Considered**:
- Move Gitea runner to new host — unnecessary, runner works remotely
- Manual deployment — not sustainable for ongoing releases
## R8: Rollback Strategy
**Decision**: Multi-step rollback plan documented in `rollback.sh`:
1. Stop services on new host (`docker compose down`)
2. Restore services on QNAP (start existing containers with old data)
3. Restore services on Desk-5439 (start Ollama + sidecar)
4. Revert DNS/NPM to point to QNAP
5. Revert Gitea CI/CD deploy target to QNAP
6. Re-enable X-API-Key in sidecar + backend
**Rationale**: QNAP retains all data (MariaDB, ES, Redis, files) until verified stable. Rollback is fast (<2 hours) because old infrastructure is intact.
**Alternatives Considered**:
- No rollback (accept SPOF) — too risky for production DMS
- Hot failover with replication — overkill for current scale
@@ -0,0 +1,160 @@
// File: specs/100-Infrastructures/141-server-consolidation/spec.md
// Change Log:
// - 2026-06-20: Initial specification for Single-Host Server Consolidation (ADR-041)
# Feature Specification: Single-Host Server Consolidation
**Feature Branch**: `141-server-consolidation`
**Created**: 2026-06-20
**Status**: Draft
**Category**: 100-Infrastructures
**Input**: ADR-041 — Consolidate all LCBP3-DMS services onto a single Docker host with ASUSTOR as primary NAS.
**Related ADRs**: [ADR-041](../../06-Decision-Records/ADR-041-server-consolidation.md), [ADR-040](../../06-Decision-Records/ADR-040-ocr-sidecar-refactor.md), [ADR-016](../../06-Decision-Records/ADR-016-security-authentication.md), [ADR-023A](../../06-Decision-Records/ADR-023A-unified-ai-architecture.md), [ADR-034](../../06-Decision-Records/ADR-034-AI-model-change.md)
## User Scenarios & Testing _(mandatory)_
### User Story 1 - Provision and Deploy on New Host (Priority: P1)
System administrator provisions the new single host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB), installs Docker, mounts CIFS share from ASUSTOR, and deploys all services (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, Elasticsearch) using a single Docker Compose stack with internal bridge network isolation.
**Why this priority**: Without a running host, no other work can proceed. This is the foundation for all subsequent stories.
**Independent Test**: Can be fully tested by running `docker compose up` on the new host and verifying all containers are healthy via `docker ps` and health check endpoints.
**Acceptance Scenarios**:
1. **Given** a fresh OS installation on the new host, **When** the administrator runs the provisioning script, **Then** Docker Engine and Docker Compose are installed and verified with `docker --version`
2. **Given** Docker is installed, **When** the administrator mounts the ASUSTOR CIFS share, **Then** `/mnt/uploads/temp` and `/mnt/uploads/permanent` are accessible and writable by containers
3. **Given** CIFS mounts are ready, **When** the administrator runs `docker compose up -d`, **Then** all 7 service containers start and report healthy within 5 minutes
4. **Given** all containers are running, **When** the administrator checks network isolation, **Then** Ollama and OCR Sidecar ports are NOT accessible from LAN (only Frontend port 3000 and Backend port 3000 are published)
---
### User Story 2 - Migrate Data from QNAP to New Host (Priority: P2)
Database administrator migrates MariaDB data and Elasticsearch indices from QNAP to the new host, ensuring zero data loss and minimal downtime.
**Why this priority**: Data migration is the critical path for cutover. Without migrated data, the new host cannot serve production traffic.
**Independent Test**: Can be tested by comparing row counts and index document counts between source (QNAP) and destination (new host) after migration.
**Acceptance Scenarios**:
1. **Given** the new host is running with empty MariaDB, **When** the administrator performs a database dump-and-restore from QNAP, **Then** all tables and row counts match the source exactly
2. **Given** the new host is running with empty Elasticsearch, **When** the administrator migrates indices from QNAP, **Then** all index document counts match the source exactly
3. **Given** data migration is complete, **When** the administrator runs a data integrity check script, **Then** all critical tables pass checksum verification with zero discrepancies
4. **Given** file storage is on ASUSTOR CIFS mount, **When** the administrator verifies file access from the backend container, **Then** all existing uploaded files are accessible at the expected paths
---
### User Story 3 - Cutover and Smoke Test (Priority: P3)
Operations team performs the cutover from the old 2-host architecture (QNAP + Desk-5439) to the new single host, updates DNS/network routing, and runs smoke tests to verify all system functions work end-to-end.
**Why this priority**: Cutover is the final step that makes the new host production-active. It depends on P1 and P2 being complete.
**Independent Test**: Can be tested by accessing the application via the new host's IP/hostname and performing core DMS operations (login, document upload, search, AI inference).
**Acceptance Scenarios**:
1. **Given** data migration is verified, **When** the administrator updates DNS to point to the new host, **Then** users accessing the application URL reach the new host within the DNS TTL period
2. **Given** DNS is updated, **When** a user logs in and creates a new Correspondence, **Then** the document is saved successfully and visible in the list
3. **Given** the system is live on the new host, **When** a user uploads a PDF and triggers OCR, **Then** OCR text extraction completes successfully via the internal Docker network (sidecar → Ollama)
4. **Given** the system is live, **When** a user performs a full-text search, **Then** Elasticsearch returns results with the same accuracy as before migration
5. **Given** the system is live, **When** a user triggers AI metadata extraction, **Then** the AI inference completes successfully via the internal Docker network (backend → Ollama)
---
### User Story 4 - Remove X-API-Key and Verify Network-Only Auth (Priority: P4)
Security administrator removes the `X-API-Key` header authentication from the OCR Sidecar and Backend, relying solely on Docker-internal network isolation as per ADR-040 D5.
**Why this priority**: This is a key security improvement enabled by the consolidation. It simplifies the architecture but must be validated carefully.
**Independent Test**: Can be tested by attempting to access sidecar endpoints from outside the Docker network (should fail) and from within the Docker network (should succeed without API key).
**Acceptance Scenarios**:
1. **Given** all services are on the Docker internal bridge, **When** the backend calls the sidecar without `X-API-Key`, **Then** the sidecar processes the request successfully
2. **Given** the sidecar is not publishing ports to LAN, **When** an external client attempts to reach the sidecar directly, **Then** the connection is refused
3. **Given** the `X-API-Key` code is removed, **When** the administrator reviews the sidecar and backend configuration, **Then** no hardcoded API keys remain in the codebase
---
### User Story 5 - Decommission Old Hosts (Priority: P5)
Operations team stops services on QNAP (which becomes backup server) and retires Desk-5439, completing the consolidation.
**Why this priority**: Cleanup is the final step after the new host is verified stable. It frees up old hardware and reduces management complexity.
**Independent Test**: Can be tested by verifying that QNAP services are stopped (except backup-related) and Desk-5439 is powered off or repurposed.
**Acceptance Scenarios**:
1. **Given** the new host has been stable for 24-48 hours, **When** the administrator stops backend/frontend/Redis/DB/ES services on QNAP, **Then** QNAP remains available as a backup server with data intact
2. **Given** QNAP services are stopped, **When** the administrator powers off Desk-5439, **Then** no LCBP3-DMS services are affected on the new host
3. **Given** old hosts are decommissioned, **When** the administrator verifies monitoring dashboards, **Then** only the new host is tracked as the active production host
---
### Edge Cases
- **GPU OOM during concurrent AI + OCR load**: What happens when np-dms-ai and np-dms-ocr are loaded simultaneously and VRAM exceeds 16GB? ADR-040 D3 (Adaptive OCR Residency) must unload OCR model to make room for LLM.
- **RAM exhaustion under heavy load**: What happens when MariaDB + Elasticsearch + CPU-fallback tensors consume more than 32GB? System must have swap space configured and memory limits per container.
- **CIFS mount failure**: What happens when ASUSTOR NAS is unreachable? File upload/download will fail; system must degrade gracefully with clear error messages.
- **Single host hardware failure**: What happens when the new host crashes? SPOF mitigation requires backup data on QNAP and a disaster recovery plan.
- **Network misconfiguration**: What happens if Docker bridge network is accidentally exposed? Sidecar and Ollama would be accessible from LAN, breaking the security model.
- **Database migration partial failure**: What happens if MariaDB migration fails midway? Rollback plan must restore QNAP as the active database host.
- **Elasticsearch index corruption during migration**: What happens if ES indices are corrupted during transfer? Re-indexing from MariaDB data must be available as a fallback.
## Requirements _(mandatory)_
### Functional Requirements
- **FR-001**: System MUST co-locate all 7 services (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, Elasticsearch) on a single Docker host with a unified `docker-compose.yml`
- **FR-002**: System MUST use ASUSTOR (192.168.10.9) as the primary NAS for file storage via CIFS mount at `/mnt/uploads`
- **FR-003**: System MUST isolate Ollama and OCR Sidecar on a Docker internal bridge network (`dms-internal`) with no ports published to LAN
- **FR-004**: System MUST publish only Frontend (port 3000) and Backend (port 3000) to the LAN
- **FR-005**: System MUST enable backend-to-sidecar and backend-to-Ollama communication via Docker service names (`http://ocr-sidecar:8765`, `http://ollama:11434`)
- **FR-006**: System MUST migrate MariaDB data from QNAP to the new host with zero data loss
- **FR-007**: System MUST migrate Elasticsearch indices from QNAP to the new host with zero data loss
- **FR-008**: System MUST remove `X-API-Key` authentication from sidecar and backend after confirming Docker-internal network isolation (ADR-040 D5)
- **FR-009**: System MUST enforce GPU VRAM management via Adaptive OCR Residency (ADR-040 D3) and CPU Fallback Retrieval (ADR-040 D4)
- **FR-010**: System MUST configure per-container memory limits to prevent any single service from exhausting 32GB RAM
- **FR-011**: System MUST retain QNAP as a backup server with database and file storage data intact after cutover
- **FR-012**: System MUST retire Desk-5439 after cutover is verified stable for 24-48 hours
- **FR-013**: System MUST provide a rollback plan to restore services on QNAP and Desk-5439 if the new host fails
- **FR-014**: System MUST verify all core DMS functions (login, document CRUD, OCR, AI inference, search) work end-to-end on the new host before decommissioning old hosts
- **FR-015**: System MUST monitor RAM and VRAM usage for 24-48 hours post-cutover to detect resource pressure
### Key Entities _(include if feature involves data)_
- **Docker Compose Stack**: Single `docker-compose.yml` defining all 7 services, 2 networks (`dms-internal`, `dms-frontend`), and volumes (CIFS, named volumes for data)
- **CIFS Volume Mount**: ASUSTOR network share mounted as Docker volume for file storage (`/mnt/uploads/temp`, `/mnt/uploads/permanent`)
- **Docker Internal Network**: Bridge network (`dms-internal`) isolating Ollama, Sidecar, Backend, Redis, MariaDB, and Elasticsearch from LAN access
- **GPU Resource Allocation**: NVIDIA GPU passthrough to Ollama container with VRAM management via adaptive residency policies
## Success Criteria _(mandatory)_
### Measurable Outcomes
- **SC-001**: All 7 service containers start and report healthy within 5 minutes of `docker compose up -d` on the new host
- **SC-002**: Database migration completes with 100% row count parity between QNAP and new host for all critical tables
- **SC-003**: Elasticsearch migration completes with 100% document count parity between QNAP and new host for all indices
- **SC-004**: Core DMS operations (login, document upload, search, OCR, AI inference) complete successfully on the new host with zero functional regressions
- **SC-005**: Ollama and OCR Sidecar are unreachable from LAN (port scan returns closed/refused for ports 11434 and 8765)
- **SC-006**: Backend-to-Ollama latency is reduced by at least 50% compared to cross-host LAN communication (measured via AI inference response time)
- **SC-007**: RAM usage remains below 28GB (87.5% of 32GB) under normal operational load for 24 hours post-cutover
- **SC-008**: VRAM usage remains below 15GB (93.7% of 16GB) during concurrent AI inference and OCR workloads
- **SC-009**: Rollback plan can be executed within 2 hours to restore services on QNAP and Desk-5439 if needed
- **SC-010**: QNAP backup server retains a valid database snapshot within 24 hours of cutover
### Assumptions
- The new host hardware (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB) is physically available and OS-installed before provisioning begins
- ASUSTOR NAS (192.168.10.9) has sufficient storage capacity for all file uploads (temp + permanent)
- Network connectivity between the new host and ASUSTOR is via VLAN 10 with CIFS/SMB 3.0 support
- NVIDIA drivers and Docker GPU runtime (nvidia-container-toolkit) are compatible with the RTX 5060 Ti
- QNAP data (MariaDB, Elasticsearch) is in a consistent state suitable for dump-and-restore migration
- ADR-040 (OCR Sidecar Refactor) is implemented concurrently or prior to cutover for network-only auth and adaptive residency
- Gitea CI/CD pipeline can be updated to target the new host for deployment
@@ -0,0 +1,221 @@
// File: specs/100-Infrastructures/141-server-consolidation/tasks.md
// Change Log:
// - 2026-06-20: Initial task list for Single-Host Server Consolidation
// - 2026-06-20: Fix C1-C5 from analysis: backend env var update, port conflict, GPU residency, ollama-metrics port, n8n endpoints
# Tasks: Single-Host Server Consolidation
**Input**: Design documents from `/specs/100-Infrastructures/141-server-consolidation/`
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/
**Related ADRs**: ADR-041, ADR-040, ADR-016, ADR-023A, ADR-034
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
- Include exact file paths in descriptions
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Create directory structure and initial files for the new host deployment
- [ ] T001 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/` directory structure with subdirectories: `ocr-sidecar/`, `scripts/`
- [ ] T002 [P] Create `.env.template` at `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.template` with all required env vars from contracts
- [ ] T003 [P] Create `README.md` at `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/README.md` with deployment overview
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Provision the new host OS and create the unified Docker Compose stack — MUST be complete before any user story can proceed
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
- [ ] T004 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/provision-host.sh` — installs Docker Engine, Docker Compose v2, NVIDIA drivers, nvidia-container-toolkit, CIFS utils, creates directory structure
- [ ] T005 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` — unified compose with all 10 services, 2 networks (dms-internal, dms-frontend), CIFS volume, named volumes, memory limits per data-model.md. Backend publishes `3001:3000` to LAN (NPM routes `backend.np-dms.work` → :3001); Frontend publishes `3000:3000`; ollama-metrics publishes `9924:9924` to LAN for Prometheus scraping from ASUSTOR
- [ ] T006 [P] Copy OCR sidecar code from `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/` to `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/` — adapt `OLLAMA_API_URL` to `http://ollama:11434` (Docker DNS), remove `ports` mapping, use `expose` only
- [ ] T007 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/Dockerfile` — verify GPU access via nvidia-container-toolkit, ensure poppler-utils installed
- [ ] T008 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/requirements.txt` — verify typhoon-ocr, PyMuPDF, httpx, fastapi versions match Desk-5439
- [ ] T008b Update backend environment variables for renamed service names: `REDIS_HOST=redis` (was `cache`), `ELASTICSEARCH_HOST=elasticsearch` (was `search`) in `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.template` and `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` backend environment section — these service names changed from QNAP compose where Redis was `cache` and ES was `search`
**Checkpoint**: New host directory structure and unified compose file ready — user story implementation can now begin
---
## Phase 3: User Story 1 - Provision and Deploy on New Host (Priority: P1) 🎯 MVP
**Goal**: Administrator provisions the new host, mounts ASUSTOR CIFS, and deploys all services with Docker internal network isolation
**Independent Test**: Run `docker compose up -d` on the new host and verify all containers are healthy via `docker ps` and health check endpoints
### Implementation for User Story 1
- [ ] T009 [US1] Run `provision-host.sh` on new host — verify Docker, NVIDIA, CIFS mount at `/mnt/uploads`
- [ ] T010 [US1] Pull Ollama models on new host: `ollama pull np-dms-ai:latest`, `ollama pull np-dms-ocr:latest`, `ollama pull nomic-embed-text:latest` — verify with `ollama list`
- [ ] T011 [US1] Copy `.env.template` to `.env`, fill in all secrets from QNAP `.env` (DB passwords, JWT secrets, Redis password, ASUSTOR CIFS credentials)
- [ ] T012 [US1] Run `docker compose --env-file .env -f docker-compose.new-host.yml up -d` and verify all 10 containers start
- [ ] T013 [US1] Verify network isolation: `nmap -p 11434 <new-host-ip>` from another VLAN 10 machine should show closed/refused; `nmap -p 8765` should show closed/refused; `nmap -p 3000` (frontend) and `nmap -p 3001` (backend) should show open; `nmap -p 9924` (ollama-metrics) should show open for Prometheus
- [ ] T014 [US1] Verify health checks: `curl http://localhost:3001/health` (backend on published port 3001), `curl http://localhost:3000/` (frontend), `curl http://ocr-sidecar:8765/health` (from inside backend container via Docker DNS)
**Checkpoint**: All services running on new host with correct network isolation — MVP achieved
---
## Phase 4: User Story 2 - Migrate Data from QNAP to New Host (Priority: P2)
**Goal**: Migrate MariaDB and Elasticsearch data from QNAP to the new host with zero data loss
**Independent Test**: Compare row counts and index document counts between QNAP (source) and new host (destination) after migration
### Implementation for User Story 2
- [ ] T015 [P] [US2] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-mariadb.sh` — dump from QNAP MariaDB 11.8 via `mariadb-dump --single-transaction --routines --triggers`, pipe to new host container
- [ ] T016 [P] [US2] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-elasticsearch.sh` — create snapshot on QNAP ES, transfer files, register repo on new host, restore
- [ ] T017 [US2] Run `migrate-mariadb.sh` — verify all table row counts match between QNAP and new host
- [ ] T018 [US2] Run `migrate-elasticsearch.sh` — verify all index document counts match between QNAP and new host
- [ ] T019 [US2] Create and run `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/verify-data-parity.sh` — automated row count + document count comparison script
- [ ] T020 [US2] Verify CIFS file access: list files in `/app/uploads/temp` and `/app/uploads/permanent` from backend container, compare with ASUSTOR share
**Checkpoint**: All data migrated and verified — new host has complete production data
---
## Phase 5: User Story 3 - Cutover and Smoke Test (Priority: P3)
**Goal**: Perform production cutover from old 2-host architecture to new single host, verify all DMS functions work end-to-end
**Independent Test**: Access application via new host IP, perform core DMS operations (login, document upload, search, AI inference)
### Implementation for User Story 3
- [ ] T021 [P] [US3] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/smoke-test.sh` — automated tests for: backend health, frontend accessible, login flow, document list, OCR endpoint, AI inference, full-text search
- [ ] T022 [US3] Update Gitea secrets: `HOST` → new host IP, `COMPOSE_FILE``specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
- [ ] T023 [US3] Update `scripts/deploy.sh` — change `COMPOSE_FILE` path to New-Host directory
- [ ] T024 [US3] Update NPM (Nginx Proxy Manager) on QNAP: `lcbp3.np-dms.work` → new host IP:3000 (frontend), `backend.np-dms.work` → new host IP:3001 (backend)
- [ ] T024b [US3] Update n8n workflow endpoints on QNAP: change all backend API URLs from `http://192.168.10.8:3000/api` (QNAP) to `http://<new-host-ip>:3001/api` (new host) — n8n stays on QNAP but must reach backend on new host via LAN port 3001
- [ ] T025 [US3] Run `smoke-test.sh` on new host — verify all 7 smoke tests pass
- [ ] T026 [US3] Verify from external machine on VLAN 10: access `https://lcbp3.np-dms.work`, login, create a test Correspondence, upload a PDF, trigger OCR, perform search
**Checkpoint**: New host is production-active — all DMS functions verified end-to-end
---
## Phase 6: User Story 4 - Remove X-API-Key and Verify Network-Only Auth (Priority: P4)
**Goal**: Remove `X-API-Key` authentication from sidecar and backend, relying solely on Docker-internal network isolation per ADR-040 D5
**Independent Test**: Attempt to access sidecar from outside Docker network (should fail); verify backend calls sidecar without API key (should succeed)
### Implementation for User Story 4
- [ ] T027 [P] [US4] Remove `OCR_SIDECAR_API_KEY` from `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` ocr-sidecar environment
- [ ] T028 [P] [US4] Remove API key validation from `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/app.py` — remove `X-API-Key` header check middleware
- [ ] T029 [US4] Remove `X-API-Key` header from `backend/src/modules/ai/services/ocr.service.ts` — remove API key from HTTP client headers
- [ ] T030 [US4] Remove `OCR_SIDECAR_API_KEY` from `backend/.env.example` and any backend config that sets it
- [ ] T031 [US4] Rebuild and redeploy sidecar + backend containers — verify backend can call sidecar without API key
- [ ] T032 [US4] Verify external access blocked: `curl http://<new-host-ip>:8765/health` from VLAN 10 machine should fail (connection refused)
**Checkpoint**: Network-only auth verified — no API key needed, Docker isolation sufficient
---
## Phase 7: User Story 5 - Decommission Old Hosts (Priority: P5)
**Goal**: Stop services on QNAP (becomes backup) and retire Desk-5439, completing the consolidation
**Independent Test**: Verify QNAP services stopped (except backup), Desk-5439 powered off, new host unaffected
### Implementation for User Story 5
- [ ] T033 [P] [US5] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/rollback.sh` — emergency rollback: stop new host, restore QNAP + Desk-5439 services, revert DNS, revert CI/CD
- [ ] T034 [US5] Monitor new host for 24-48 hours: RAM usage (`docker stats`), VRAM usage (`nvidia-smi`), container health, application logs
- [ ] T034b [US5] Verify Adaptive OCR Residency (ADR-040 D3) on new RTX 5060 Ti: load `np-dms-ai` and `np-dms-ocr` concurrently, confirm `calculate_ocr_residency()` unloads OCR model when LLM needs VRAM; verify CPU Fallback Retrieval (ADR-040 D4) activates for BGE-M3/Reranker when GPU is occupied by LLM
- [ ] T035 [US5] Stop QNAP app services: `ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose down'`
- [ ] T036 [US5] Stop QNAP service stack: `ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose down'`
- [ ] T037 [US5] Retire Desk-5439: `ssh user@192.168.10.100 'sudo shutdown -h now'` (or repurpose)
- [ ] T038 [US5] Verify new host still fully operational after old hosts decommissioned — re-run `smoke-test.sh`
- [ ] T039 [US5] Take QNAP backup snapshot: `mariadb-dump` on QNAP MariaDB (if still running) or verify existing backup is current
**Checkpoint**: Consolidation complete — single host is sole production, old hosts decommissioned
---
## Phase 8: Polish & Cross-Cutting Concerns
**Purpose**: Documentation, monitoring, and final verification
- [ ] T040 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/README.md` — add New-Host section, mark QNAP as backup, mark Desk-5439 as retired
- [ ] T041 [P] Update `CONTEXT.md` — update infrastructure topology to reflect single-host architecture
- [ ] T042 [P] Update `AGENTS.md` — update infrastructure references (Desk-5439 → New Host, QNAP → backup)
- [ ] T043 Update `specs/04-Infrastructure-OPS/04-00-docker-compose/.env.template` — add ASUSTOR_USER, ASUSTOR_PASS, NEW_HOST_IP variables
- [ ] T044 [P] Update Prometheus/Grafana scrape config on ASUSTOR — update ollama-metrics target from `192.168.10.100:9924` to new host internal or host-published port
- [ ] T045 Run `quickstart.md` validation — follow all steps end-to-end on a fresh provision
- [ ] T046 [P] Document disaster recovery procedure — backup schedule, restore from QNAP backup, estimated RTO/RPO
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies — can start immediately
- **Foundational (Phase 2)**: Depends on Setup — BLOCKS all user stories
- **US1 (Phase 3)**: Depends on Foundational — requires physical access to new host
- **US2 (Phase 4)**: Depends on US1 (services must be running to receive migrated data)
- **US3 (Phase 5)**: Depends on US1 + US2 (services running + data migrated for cutover)
- **US4 (Phase 6)**: Depends on US3 (cutover complete, network isolation verified)
- **US5 (Phase 7)**: Depends on US3 + US4 (stable production before decommissioning)
- **Polish (Phase 8)**: Can start after US3; some tasks depend on US5
### User Story Dependencies
- **US1 (P1)**: Foundational → US1 — no dependencies on other stories
- **US2 (P2)**: US1 → US2 — needs running services to receive data
- **US3 (P3)**: US1 + US2 → US3 — needs running services + migrated data
- **US4 (P4)**: US3 → US4 — needs cutover complete to verify network isolation in production
- **US5 (P5)**: US3 + US4 → US5 — needs stable production before decommissioning
### Parallel Opportunities
- T002, T003 can run in parallel (different files)
- T006, T007, T008 can run in parallel (sidecar files, no dependencies)
- T015, T016 can run in parallel (different migration scripts)
- T027, T028 can run in parallel (different files: compose vs app.py)
- T040, T041, T042, T044 can run in parallel (different doc files)
- T027, T028, T030 can run in parallel (different files: compose, app.py, .env.example)
---
## Implementation Strategy
### MVP First (User Story 1 Only)
1. Complete Phase 1: Setup (create directory structure)
2. Complete Phase 2: Foundational (provision host + create compose)
3. Complete Phase 3: User Story 1 (deploy services)
4. **STOP and VALIDATE**: All containers healthy, network isolation verified
5. Demo to stakeholders if ready
### Incremental Delivery
1. Setup + Foundational → Infrastructure ready
2. Add US1 → Services deployed → Validate (MVP!)
3. Add US2 → Data migrated → Validate parity
4. Add US3 → Cutover complete → Validate end-to-end
5. Add US4 → Security hardened → Validate network-only auth
6. Add US5 → Old hosts retired → Validate stability
7. Polish → Documentation updated → Final validation
---
## Notes
- This is an infrastructure task — most work is shell scripts, Docker Compose YAML, and manual operations
- Physical access to the new host is required for US1
- Data migration (US2) requires SSH access to QNAP
- Cutover (US3) requires DNS/NPM access and coordination with users
- Decommission (US5) should only proceed after 24-48 hours of stable monitoring
- Rollback plan must be tested before cutover
- All env secrets must come from `.env` (gitignored) — never commit real secrets