refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
CI / CD Pipeline / build (push) Successful in 7m37s
CI / CD Pipeline / deploy (push) Failing after 20m15s

This commit is contained in:
2026-06-20 16:37:04 +07:00
parent d418d791a4
commit a80ebef285
70 changed files with 5762 additions and 452 deletions
@@ -0,0 +1,34 @@
# Specification Quality Checklist: OCR Sidecar Refactor
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-06-20
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Notes
All checklist items pass. Specification is ready for `/speckit-clarify` or `/speckit-plan`.
@@ -0,0 +1,246 @@
# Sidecar API Contract
**Version**: 1.0
**Date**: 2026-06-20
**Service**: OCR Sidecar (Desk-5439)
**Base URL**: `http://192.168.10.100:8765` (Phase 1) / `http://sidecar:8765` (Phase 2, Docker-internal)
## Overview
The OCR sidecar provides OCR processing capabilities as a pure compute worker. This document defines the API contract between backend services and the sidecar.
## Authentication
### Phase 1 (Before ADR-041 Consolidation)
All endpoints require `X-API-Key` header:
```http
X-API-Key: {OCR_SIDECAR_API_KEY}
```
If the header is missing or invalid, returns `401 Unauthorized`.
### Phase 2 (After ADR-041 Consolidation)
No authentication required. Relies on Docker-internal network isolation.
## Endpoints
### POST /ocr
Extract text from PDF file using Typhoon OCR.
**Request Headers**:
```http
Content-Type: application/json
X-API-Key: {key} # Phase 1 only
```
**Request Body**:
```json
{
"pdf_path": "/mnt/uploads/temp/abc123.pdf",
"system_prompt": "Extract document metadata from: {{ocr_text}}...",
"dms_tags": {
"document_number": "RFA-2025-001",
"document_date": "2025-01-15",
"received_date": "2025-01-16"
},
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
},
"page_range": {
"start": 1,
"end": 3
}
}
```
**Request Fields**:
- `pdf_path` (string, required): Absolute path to PDF file. Must be within whitelisted base path (`OCR_SIDECAR_UPLOAD_BASE`).
- `system_prompt` (string, optional): System prompt from Active Prompt. Contains `{{ocr_text}}` placeholder.
- `dms_tags` (object, optional): DMS extraction tags to inject into prompt.
- `document_number` (string, optional): Document number
- `document_date` (string, optional): Document date
- `received_date` (string, optional): Received date
- `runtime_params` (object, required): Runtime parameters from `ai_execution_profiles`.
- `temperature` (number, required): Temperature (0.0 - 2.0)
- `top_p` (number, required): Top P (0.0 - 1.0)
- `repeat_penalty` (number, required): Repeat penalty (typically 1.0 - 2.0)
- `max_tokens` (number, required): Max tokens
- `page_range` (object, optional): Page range for processing.
- `start` (number, required): Start page (1-indexed)
- `end` (number, required): End page (inclusive)
**Response (200 OK)**:
```json
{
"text": "Extracted text in Markdown format...",
"ocr_used": true,
"model_used": "typhoon-np-dms-ocr:latest",
"processing_time_ms": 1250,
"error": null
}
```
**Response Fields**:
- `text` (string): Extracted text in Markdown format
- `ocr_used` (boolean): Whether OCR was used (vs fast-path text layer)
- `model_used` (string): Model identifier
- `processing_time_ms` (number): Processing time in milliseconds
- `error` (string, nullable): Error message if failed
**Error Responses**:
- `400 Bad Request`: Invalid request body or parameters
- `401 Unauthorized`: Missing or invalid X-API-Key (Phase 1 only)
- `403 Forbidden`: Path outside whitelisted base directory
- `500 Internal Server Error`: Internal processing error
**Path Traversal Protection**:
- PDF path is canonicalized using `os.path.abspath()` + `os.path.realpath()`
- Path must start with whitelisted base path (`OCR_SIDECAR_UPLOAD_BASE`)
- Symlinks are resolved to their targets before whitelist check
- Returns `403 Forbidden` for any path outside base directory
### GET /health
Health check endpoint for monitoring.
**Response (200 OK)**:
```json
{
"status": "healthy",
"timestamp": "2026-06-20T10:30:00Z",
"version": "1.0.0"
}
```
**Response Fields**:
- `status` (string): Service status ("healthy" or "unhealthy")
- `timestamp` (string): ISO 8601 timestamp
- `version` (string): Service version
## Removed Endpoints
### POST /normalize (REMOVED)
This endpoint has been removed per ADR-040 D2. ThaiPreprocessProcessor has no consumers in the backend (verified by grep search).
## Rate Limiting
No rate limiting implemented on sidecar. Rate limiting is handled by backend services.
## Error Handling
All errors return JSON responses with consistent format:
```json
{
"error": "Error message",
"code": "ERROR_CODE",
"timestamp": "2026-06-20T10:30:00Z"
}
```
**Common Error Codes**:
- `INVALID_REQUEST`: Invalid request body or parameters
- `UNAUTHORIZED`: Missing or invalid authentication
- `FORBIDDEN`: Path outside whitelisted directory
- `INTERNAL_ERROR`: Internal processing error
- `OCR_FAILED`: OCR processing failed
## Examples
### Example 1: Basic OCR Request (Phase 1)
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/document.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
### Example 2: OCR with System Prompt and DMS Tags (Phase 1)
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/document.pdf",
"system_prompt": "Extract document metadata from: {{ocr_text}}",
"dms_tags": {
"document_number": "RFA-2025-001",
"document_date": "2025-01-15"
},
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
### Example 3: OCR Request (Phase 2, Docker-internal)
```bash
curl -X POST http://sidecar:8765/ocr \
-H "Content-Type: application/json" \
-d '{
"pdf_path": "/mnt/uploads/temp/document.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
### Example 4: Path Traversal Attempt (Rejected)
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Response: `403 Forbidden`
```json
{
"error": "Path outside whitelisted base directory",
"code": "FORBIDDEN",
"timestamp": "2026-06-20T10:30:00Z"
}
```
## Version History
- **1.0** (2026-06-20): Initial version for OCR sidecar refactor
- Added POST /ocr with parameter governance
- Added path traversal protection
- Removed POST /normalize endpoint
- Documented Phase 1/Phase 2 auth migration
@@ -0,0 +1,319 @@
# Data Model: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Define data contracts and entity relationships for OCR sidecar refactor
## Overview
The OCR sidecar is a pure compute worker with no database access (ADR-023/023A boundary). All data persistence and business logic remain in backend services. This document defines the data contracts between backend and sidecar.
## Entities
### OCR Request (Backend → Sidecar)
```typescript
interface OcrRequest {
pdfPath: string; // Absolute path to PDF file (whitelisted)
systemPrompt?: string; // System prompt from Active Prompt
dmsTags?: { // DMS extraction tags from Active Prompt
documentNumber?: string;
documentDate?: string;
receivedDate?: string;
};
runtimeParams: { // Runtime parameters from ai_execution_profiles
temperature: number;
top_p: number;
repeat_penalty: number;
max_tokens: number;
};
pageRange?: { // Page range for processing
start: number;
end: number;
};
}
```
### OCR Response (Sidecar → Backend)
```typescript
interface OcrResponse {
text: string; // Extracted text (Markdown format)
ocrUsed: boolean; // Whether OCR was used (vs fast-path text layer)
modelUsed: string; // Model identifier (e.g., "typhoon-np-dms-ocr")
processingTimeMs: number; // Processing time in milliseconds
error?: string; // Error message if failed
}
```
### AI Execution Profile (Database)
```sql
-- Existing table (no schema changes)
CREATE TABLE ai_execution_profiles (
id INT AUTO_INCREMENT PRIMARY KEY,
profile_name VARCHAR(100) UNIQUE NOT NULL,
model_name VARCHAR(100) NOT NULL,
parameters JSON NOT NULL, -- { temperature, top_p, repeat_penalty, max_tokens, keep_alive }
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
-- Row for OCR extraction:
-- profile_name = 'ocr-extract'
-- parameters = { temperature: 0.7, top_p: 0.9, repeat_penalty: 1.1, max_tokens: 4096 }
```
### Active Prompt (Database)
```sql
-- Existing table (no schema changes per ADR-029/037)
CREATE TABLE ai_prompts (
id INT AUTO_INCREMENT PRIMARY KEY,
public_id UUID,
prompt_type VARCHAR(50) NOT NULL, -- 'ocr_extraction'
template TEXT NOT NULL, -- System prompt template with {{ocr_text}} placeholder
context_config JSON, -- DMS tags configuration
version INT NOT NULL,
is_active TINYINT(1) DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY (prompt_type, version)
);
-- Active prompt for OCR extraction:
-- prompt_type = 'ocr_extraction'
-- template = "Extract document metadata from: {{ocr_text}}..."
-- context_config = { dmsTags: { documentNumber: true, documentDate: true, receivedDate: true } }
```
## Data Flow
### Phase 1: OCR Request Flow (Before ADR-041)
```
Backend OcrService
1. Resolve parameters from ai_execution_profiles (row 'ocr-extract')
2. Resolve Active Prompt from ai_prompts (type 'ocr_extraction')
3. Extract systemPrompt and DMS tags from Active Prompt
4. Build OcrRequest with parameters, systemPrompt, DMS tags
5. Send POST /ocr with X-API-Key header to sidecar
Sidecar (app.py)
1. Validate X-API-Key
2. Canonicalize pdfPath and check whitelist
3. Extract systemPrompt and DMS tags from request
4. Call calculate_ocr_residency(active_profile) for keep_alive
5. Process OCR with Ollama (inject systemPrompt + DMS tags)
6. Return OcrResponse
Backend OcrService
1. Parse OcrResponse
2. Return extracted text to caller
```
### Phase 2: OCR Request Flow (After ADR-041)
```
Backend OcrService
1. Resolve parameters from ai_execution_profiles (row 'ocr-extract')
2. Resolve Active Prompt from ai_prompts (type 'ocr_extraction')
3. Extract systemPrompt and DMS tags from Active Prompt
4. Build OcrRequest with parameters, systemPrompt, DMS tags
5. Send POST /ocr (NO X-API-Key header) to sidecar
Sidecar (app.py)
1. NO X-API-Key validation (network isolation only)
2. Canonicalize pdfPath and check whitelist
3. Extract systemPrompt and DMS tags from request
4. Call calculate_ocr_residency(active_profile) for keep_alive
5. Process OCR with Ollama (inject systemPrompt + DMS tags)
6. Return OcrResponse
Backend OcrService
1. Parse OcrResponse
2. Return extracted text to caller
```
## Backend Service Changes
### OcrService Parameter Resolution
```typescript
// backend/src/modules/ai/services/ocr.service.ts
async extractMetadata(documentId: string): Promise<AIMetadata> {
// 1. Resolve runtime parameters from ai_execution_profiles
const profile = await this.aiProfilesService.getActiveProfile('ocr-extract');
const runtimeParams = profile.parameters; // { temperature, top_p, repeat_penalty, max_tokens }
// 2. Resolve Active Prompt
const activePrompt = await this.aiPromptsService.getActivePrompt('ocr_extraction');
const systemPrompt = activePrompt.template;
const dmsTags = activePrompt.context_config?.dmsTags || {};
// 3. Build request
const ocrRequest: OcrRequest = {
pdfPath: document.filePath,
systemPrompt,
dmsTags,
runtimeParams,
};
// 4. Send to sidecar (with X-API-Key in Phase 1)
const response = await this.httpClient.post(
`${this.ocrApiUrl}/ocr`,
ocrRequest,
{ headers: { 'X-API-Key': this.ocrApiKey } } // Phase 1 only
);
return response.data;
}
```
### SandboxOcrEngineService Parameter Resolution
```typescript
// backend/src/modules/ai/services/sandbox-ocr-engine.service.ts
async processSandboxOcr(request: SandboxOcrRequest): Promise<SandboxOcrResult> {
// Same parameter resolution pattern as OcrService
const profile = await this.aiProfilesService.getActiveProfile('ocr-extract');
const activePrompt = await this.aiPromptsService.getActivePrompt('ocr_extraction');
const ocrRequest: OcrRequest = {
pdfPath: request.pdfPath,
systemPrompt: activePrompt.template,
dmsTags: activePrompt.context_config?.dmsTags || {},
runtimeParams: profile.parameters,
};
const response = await this.httpClient.post(
`${this.ocrApiUrl}/ocr`,
ocrRequest,
{ headers: { 'X-API-Key': this.ocrApiKey } } // Phase 1 only
);
return response.data;
}
```
## Sidecar API Changes
### POST /ocr Request Body
```python
# specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
from pydantic import BaseModel
class OcrRequest(BaseModel):
pdf_path: str
system_prompt: Optional[str] = None
dms_tags: Optional[Dict[str, str]] = None
runtime_params: RuntimeParams
page_range: Optional[PageRange] = None
class RuntimeParams(BaseModel):
temperature: float
top_p: float
repeat_penalty: float
max_tokens: int
class PageRange(BaseModel):
start: int
end: int
```
### POST /ocr Response Body
```python
class OcrResponse(BaseModel):
text: str
ocr_used: bool
model_used: str
processing_time_ms: float
error: Optional[str] = None
```
## Environment Variables
### Sidecar Environment Variables
```bash
# specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env
# Phase 1 (before ADR-041)
OCR_SIDECAR_API_KEY=required_value # Fail-fast if missing
# Phase 2 (after ADR-041) - remove OCR_SIDECAR_API_KEY
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads # CIFS mount base path
OLLAMA_API_URL=http://localhost:11434
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
```
### Backend Environment Variables
```bash
# backend/.env
# Phase 1 (before ADR-041)
OCR_API_URL=http://192.168.10.100:8765
OCR_API_KEY=required_value # Send-side X-API-Key
# Phase 2 (after ADR-041) - remove OCR_API_KEY
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/app/uploads # Backend view of uploads
```
## Validation Rules
### Path Canonicalization (Sidecar)
```python
def validate_pdf_path(pdf_path: str, base_path: str) -> str:
"""Canonicalize and whitelist PDF path"""
# 1. Canonicalize path
canonical = os.path.abspath(os.path.realpath(pdf_path))
# 2. Check whitelist
if not canonical.startswith(base_path):
raise HTTPException(
status_code=403,
detail="Path outside whitelisted base directory"
)
return canonical
```
### Parameter Validation (Backend)
```typescript
// Validate runtime parameters from ai_execution_profiles
function validateRuntimeParams(params: any): RuntimeParams {
if (!params.temperature || params.temperature < 0 || params.temperature > 2) {
throw new BusinessException('Invalid temperature value');
}
if (!params.top_p || params.top_p < 0 || params.top_p > 1) {
throw new BusinessException('Invalid top_p value');
}
// ... similar validation for other params
return params;
}
```
## No Schema Changes
This refactor does not require database schema changes:
- `ai_execution_profiles` table already exists (ADR-036)
- `ai_prompts` table already exists (ADR-029/037)
- No new tables or columns needed
- Per ADR-009: No TypeORM migrations (edit SQL directly if needed, but not needed here)
@@ -0,0 +1,147 @@
# Implementation Plan: OCR Sidecar Refactor
**Branch**: `140-ocr-sidecar-refactor` | **Date**: 2026-06-20 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/100-Infrastructures/140-ocr-sidecar-refactor/spec.md`
## Summary
Refactor the OCR sidecar on Desk-5439 to address security vulnerabilities (hardcoded API keys, path traversal), implement async I/O for performance, preserve GPU resource management policies (Adaptive OCR Residency, CPU Fallback Retrieval), and align with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System. The sidecar becomes a pure compute worker with all orchestration and parameter governance moved to backend services.
## Technical Context
**Language/Version**: Python 3.11+ (FastAPI)
**Primary Dependencies**: FastAPI 0.111.0, httpx 0.27.0, PyMuPDF 1.24.0, typhoon-ocr>=0.4.1, FlagEmbedding>=1.2.0, pythainlp 5.0.4
**Storage**: No database access (ADR-023/023A boundary - sidecar is pure compute worker)
**Testing**: pytest for path-traversal and residency wiring tests
**Target Platform**: Desk-5439 (192.168.10.100, Windows 10/11, RTX 5060 Ti 16GB GPU) via Docker
**Project Type**: Infrastructure (sidecar service)
**Performance Goals**: 20%+ throughput improvement with async I/O; VRAM exhaustion prevention under load
**Constraints**: Must preserve LLM-First GPU Ownership; must not bypass existing residency_policy.py; must align with ADR-036 Gap-2 (keep_alive as lazy resource param)
**Scale/Scope**: Single sidecar service; affects backend AI services (OcrService, SandboxOcrEngineService)
## Constitution Check
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
| Gate | Status | Justification |
|------|--------|---------------|
| ADR-019 UUID | ✅ PASS | Sidecar N/A (pure compute worker), Backend applies ADR-019 (parameter resolution in OcrService/SandboxOcrEngineService) |
| ADR-009 Schema | N/A | No database schema changes in sidecar |
| ADR-016 Security | ✅ PASS | Path traversal hardening; no hardcoded secrets; network isolation auth |
| ADR-002 Numbering | N/A | No document numbering in sidecar |
| ADR-008 BullMQ | N/A | Sidecar does not use BullMQ (backend does) |
| ADR-023/023A AI Boundary | ✅ PASS | Sidecar is pure compute worker; no DB/storage access; AI → DMS API → DB pattern preserved |
| ADR-007 Errors | ✅ PASS | FastAPI exception handling with user-friendly messages |
| TypeScript Strict | N/A | Python codebase |
## Project Structure
### Documentation (this feature)
```text
specs/100-Infrastructures/140-ocr-sidecar-refactor/
├── spec.md # Feature specification
├── plan.md # This file
├── research.md # Phase 0 output (technical decisions from ADR-040)
├── data-model.md # Phase 1 output (data contracts)
├── quickstart.md # Phase 1 output (deployment guide)
├── contracts/ # Phase 1 output (API contracts)
│ └── sidecar-api.md # Sidecar API specification
└── tasks.md # Phase 2 output (implementation tasks)
```
### Source Code
```text
specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/
├── app.py # FastAPI application (main refactor target)
├── residency_policy.py # Retain (Adaptive OCR Residency)
├── vram_monitor.py # Retain (VRAM monitoring)
├── requirements.txt # Python dependencies
├── Dockerfile # Container definition
├── docker-compose.yml # Orchestration
└── .env # Environment variables
backend/src/modules/ai/
├── services/
│ ├── ocr.service.ts # Parameter resolution + sidecar calls
│ └── sandbox-ocr-engine.service.ts # Sandbox parameter resolution
└── processors/
└── ai-batch.processor.ts # BullMQ processor (unchanged)
tests/
├── unit/
│ └── ocr-sidecar/ # Sidecar unit tests
│ ├── test_path_traversal.py # Path traversal tests
│ └── test_residency_wiring.py # Residency calculation tests
└── integration/
└── ocr-sidecar/ # Sidecar integration tests
```
**Structure Decision**: Infrastructure refactor targeting existing OCR sidecar on Desk-5439. Backend changes limited to parameter resolution in AI services. No new frontend changes.
## Complexity Tracking
> No constitution violations - all gates pass. This section not applicable.
## Phase 0: Research & Technical Decisions
All technical decisions are already documented in ADR-040. Key decisions:
### Security Decisions
- **Decision**: Remove hardcoded default API key; fail-fast if env missing
- **Rationale**: Security vulnerability - leaked key cannot be rotated without rebuild
- **Decision**: Implement path canonicalization + base-path whitelist
- **Rationale**: Prevent path traversal attacks (ADR-016)
### I/O Pattern Decisions
- **Decision**: Refactor to async I/O with shared AsyncClient via lifespan
- **Rationale**: Synchronous blocking I/O reduces throughput under load
- **Decision**: Replace `@app.on_event("startup")` with lifespan context manager
- **Rationale**: Deprecated pattern; lifespan provides better resource management
### GPU Resource Management Decisions
- **Decision**: Wire `calculate_ocr_residency()` into `process_ocr` for dynamic keep_alive
- **Rationale**: Preserve Adaptive OCR Residency policy (CONTEXT.md); avoid fixed values
- **Decision**: Retain vram_monitor.py and residency_policy.py
- **Rationale**: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
- **Decision**: Reject forced GPU-resident BGE-M3/Reranker
- **Rationale**: CPU fallback is required for VRAM pressure scenarios
### Parameter Governance Decisions
- **Decision**: Remove hardcoded runtime params; accept from backend job snapshot
- **Rationale**: ADR-036 Profile-Only Parameter Governance; dynamic tuning without rebuild
- **Decision**: Backend resolves systemPrompt and DMS tags from Active Prompt
- **Rationale**: ADR-029/037 Active Prompt System; prompt authority in DB not code
- **Decision**: Reject creating PromptBuilderService
- **Rationale**: Use existing Active Prompt system; avoid invented orchestration
### Auth Decisions
- **Decision**: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
- **Rationale**: Sequenced migration; network isolation only possible post-consolidation
- **Decision**: Interim period requires X-API-Key validation
- **Rationale**: Cross-host topology (before ADR-041) requires defense-in-depth
### Endpoint Decisions
- **Decision**: Remove /normalize endpoint
- **Rationale**: No consumers (verified by grep); ThaiPreprocessProcessor unused
- **Decision**: Fix mutable default argument `options_override={}`
- **Rationale**: Python anti-pattern; causes unexpected behavior
## Phase 1: Design & Contracts
### Data Model
See [data-model.md](./data-model.md) for detailed data contracts and entity relationships.
### API Contracts
See [contracts/sidecar-api.md](./contracts/sidecar-api.md) for sidecar API specification.
### Quickstart Guide
See [quickstart.md](./quickstart.md) for deployment and testing instructions.
## Phase 2: Implementation (Tasks)
See [tasks.md](./tasks.md) for detailed implementation tasks generated by `/speckit-tasks`.
@@ -0,0 +1,374 @@
# Quickstart: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Deployment and testing guide for OCR sidecar refactor
## Prerequisites
- Access to Desk-5439 (192.168.10.100) with Docker
- Access to backend services (QNAP 192.168.10.8)
- Python 3.11+ for local testing (optional)
- pytest for testing (optional)
## Phase 1: Deployment (Before ADR-041 Consolidation)
### Step 1: Update Sidecar Code
1. Navigate to sidecar directory:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
```
2. Update `app.py` with the following changes:
- Remove hardcoded default API key
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
- Implement async I/O with `httpx.AsyncClient` via lifespan
- Replace `@app.on_event("startup")` with lifespan context manager
- Wire `calculate_ocr_residency()` into `process_ocr`
- Implement path canonicalization + base-path whitelist on `/ocr`
- Remove hardcoded runtime parameters
- Receive systemPrompt and DMS tags from backend
- Remove `/normalize` endpoint
- Fix mutable default argument `options_override={}`
- Load models via `asyncio.to_thread` during lifespan
3. Update `requirements.txt`:
```text
PyMuPDF==1.24.0
fastapi==0.111.0
uvicorn[standard]==0.30.1
python-multipart==0.0.9
httpx==0.27.0
FlagEmbedding>=1.2.0
typhoon-ocr>=0.4.1
```
4. Update `.env`:
```bash
# Phase 1 (before ADR-041)
OCR_SIDECAR_API_KEY=your-secure-api-key-here
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
OLLAMA_API_URL=http://localhost:11434
OCR_MODEL=np-dms-ocr:latest
```
### Step 2: Update Backend Services
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
- Add parameter resolution from `ai_execution_profiles` (row `ocr-extract`)
- Add Active Prompt resolution from `ai_prompts` (type `ocr_extraction`)
- Extract systemPrompt and DMS tags from Active Prompt
- Send resolved parameters to sidecar in OCR requests
- Keep X-API-Key send-side (Phase 1)
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
- Same parameter resolution pattern as OcrService
- Keep X-API-Key send-side (Phase 1)
3. Update backend `.env`:
```bash
# Phase 1 (before ADR-041)
OCR_API_URL=http://192.168.10.100:8765
OCR_API_KEY=your-secure-api-key-here
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
```
### Step 3: Rebuild and Deploy Sidecar
1. Build Docker image on Desk-5439:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
docker-compose build
```
2. Stop existing container:
```bash
docker-compose down
```
3. Start new container:
```bash
docker-compose up -d
```
4. Verify health:
```bash
curl http://192.168.10.100:8765/health
```
Expected response:
```json
{
"status": "healthy",
"timestamp": "2026-06-20T10:30:00Z",
"version": "1.0.0"
}
```
### Step 4: Deploy Backend Changes
1. Build backend:
```bash
cd backend
pnpm run build
```
2. Deploy backend containers (via existing deploy script or manual):
```bash
# From repo root
./scripts/deploy.sh
```
3. Verify backend health:
```bash
curl http://localhost:3001/api/ai/health
```
## Phase 2: Deployment (After ADR-041 Consolidation)
**Note**: This phase can only be executed after ADR-041 server consolidation completes (single Docker host).
### Step 1: Remove X-API-Key from Sidecar
1. Update `app.py` on sidecar:
- Remove X-API-Key validation from all endpoints
- Remove `OCR_SIDECAR_API_KEY` environment variable check
2. Update `.env` on sidecar:
```bash
# Remove OCR_SIDECAR_API_KEY line
# Keep common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
OLLAMA_API_URL=http://localhost:11434
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
```
3. Rebuild and redeploy sidecar:
```bash
docker-compose down
docker-compose build
docker-compose up -d
```
### Step 2: Remove X-API-Key from Backend
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
- Remove X-API-Key header from sidecar requests
- Remove `OCR_API_KEY` environment variable usage
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
- Remove X-API-Key header from sidecar requests
- Remove `OCR_API_KEY` environment variable usage
3. Update backend `.env`:
```bash
# Remove OCR_API_KEY line
# Keep common variables
OCR_API_URL=http://sidecar:8765 # Docker-internal URL
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
```
4. Rebuild and redeploy backend:
```bash
cd backend
pnpm run build
./scripts/deploy.sh
```
## Testing
### Unit Tests (Sidecar)
1. Navigate to sidecar tests directory:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests
```
2. Run path traversal tests:
```bash
pytest test_path_traversal.py -v
```
Expected output: All tests pass, path traversal attempts return 403
3. Run residency wiring tests:
```bash
pytest test_residency_wiring.py -v
```
Expected output: All tests pass, `calculate_ocr_residency()` is called correctly
### Integration Tests (Backend)
1. Run backend AI service tests:
```bash
cd backend
pnpm test ai/ocr.service.spec.ts
pnpm test ai/sandbox-ocr-engine.service.spec.ts
```
2. Verify parameter resolution from database:
- Check that `ai_execution_profiles` row `ocr-extract` exists
- Check that `ai_prompts` has active row for `ocr_extraction` type
- Verify parameters are correctly resolved and sent to sidecar
### Manual Testing
1. Test path traversal protection:
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Expected: `403 Forbidden`
2. Test valid OCR request:
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/test.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Expected: `200 OK` with extracted text
3. Test parameter governance:
- Modify `ai_execution_profiles` row `ocr-extract` parameters
- Run OCR request
- Verify new parameters are used (check sidecar logs)
4. Test Active Prompt integration:
- Modify active prompt in `ai_prompts` for `ocr_extraction`
- Run OCR request
- Verify new system prompt is used
## Performance Testing
1. Benchmark async vs sync I/O:
```bash
# Use Apache Bench or similar tool
ab -n 1000 -c 10 -p ocr_request.json -T application/json \
http://192.168.10.100:8765/ocr
```
Expected: 20%+ throughput improvement with async I/O
2. Monitor VRAM usage:
```bash
# On Desk-5439, monitor GPU usage during OCR operations
nvidia-smi -l 1
```
Expected: VRAM usage stays within limits, no exhaustion
## Monitoring
### Health Checks
- Sidecar health: `GET http://192.168.10.100:8765/health`
- Backend AI health: `GET http://localhost:3001/api/ai/health`
### Logs
- Sidecar logs: `docker-compose logs -f ocr-sidecar`
- Backend logs: Check backend application logs
### Metrics
- Monitor OCR request latency
- Monitor VRAM usage on Desk-5439
- Monitor error rates (403 for path traversal, 500 for internal errors)
## Rollback
If issues arise during deployment:
### Rollback Sidecar
1. Revert `app.py` to previous version
2. Restore previous `.env` file
3. Rebuild and redeploy:
```bash
docker-compose down
docker-compose build
docker-compose up -d
```
### Rollback Backend
1. Revert service changes in `ocr.service.ts` and `sandbox-ocr-engine.service.ts`
2. Restore previous `.env` file
3. Rebuild and redeploy:
```bash
cd backend
pnpm run build
./scripts/deploy.sh
```
### Emergency Rollback
If immediate rollback is needed:
1. Revert `keep_alive` to fixed value `0` in `process_ocr`
2. Restore hardcoded runtime parameters
3. Restore X-API-Key validation
4. Rebuild and redeploy
## Troubleshooting
### Sidecar fails to start
1. Check environment variables are set correctly
2. Check `OCR_SIDECAR_API_KEY` is provided (Phase 1)
3. Check Docker logs: `docker-compose logs ocr-sidecar`
4. Verify Ollama is running on Desk-5439
### Path traversal returns 200 instead of 403
1. Verify `OCR_SIDECAR_UPLOAD_BASE` is set correctly
2. Check path canonicalization logic in `app.py`
3. Test with absolute paths to verify whitelist check
### Parameters not being used
1. Check `ai_execution_profiles` row `ocr-extract` exists
2. Check backend service parameter resolution logic
3. Check sidecar receives parameters in request body
4. Check sidecar passes parameters to Ollama
### VRAM exhaustion
1. Check `calculate_ocr_residency()` is being called
2. Check `vram_monitor.py` and `residency_policy.py` are present
3. Verify CPU fallback is working for `/embed` and `/rerank`
4. Monitor GPU usage with `nvidia-smi`
## References
- ADR-040: OCR Sidecar Refactor
- ADR-036: Profile-Only Parameter Governance
- ADR-029: Dynamic Prompt Management
- ADR-037: Active Prompt System
- ADR-041: Server Consolidation (dependency for Phase 2)
- [Sidecar API Contract](./contracts/sidecar-api.md)
@@ -0,0 +1,179 @@
# Research: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Document technical decisions and research findings from ADR-040
## Overview
All technical decisions for this refactor are already documented in ADR-040. This file consolidates those decisions for implementation reference.
## Security Decisions
### Hardcoded API Key Removal
- **Decision**: Remove hardcoded default API key (`lcbp3-dms-ocr-sidecar-secure-token-2026`) from `app.py`
- **Rationale**: Security vulnerability - if leaked, key cannot be rotated without rebuilding container
- **Implementation**: Fail-fast if `OCR_SIDECAR_API_KEY` environment variable is missing
- **Phase**: Phase 1 (before ADR-041 consolidation)
### Path Traversal Hardening
- **Decision**: Implement path canonicalization + base-path whitelist on `/ocr` endpoint
- **Rationale**: Prevent arbitrary file read attacks (ADR-016)
- **Implementation**:
- Use `os.path.abspath()` + `os.path.realpath()` for canonicalization
- Whitelist base path = `OCR_SIDECAR_UPLOAD_BASE` (CIFS mount base)
- Reject paths outside base path → 403 Forbidden
- **Alternatives Considered**:
- Using path validation regex only → rejected (insufficient for symlink attacks)
- Chroot jail → rejected (overkill for this use case)
## I/O Pattern Decisions
### Async I/O Refactor
- **Decision**: Refactor `process_ocr` to `async def` and use `httpx.AsyncClient` shared via lifespan
- **Rationale**: Synchronous blocking I/O reduces throughput under load; FastAPI event loop blocked
- **Implementation**:
- Replace `httpx.Client` with `httpx.AsyncClient`
- Create AsyncClient in lifespan context manager
- Load models via `asyncio.to_thread` to avoid blocking startup
- **Performance Target**: 20%+ throughput improvement under concurrent load
- **Alternatives Considered**:
- Keep sync I/O but add more workers → rejected (still blocks event loop)
- Use thread pool → rejected (adds complexity without solving root cause)
### Lifespan Pattern
- **Decision**: Replace `@app.on_event("startup")` with `@asynccontextmanager` lifespan
- **Rationale**: Deprecated pattern; lifespan provides better resource management and cleanup
- **Implementation**: Use FastAPI lifespan context manager for AsyncClient lifecycle
## GPU Resource Management Decisions
### Adaptive OCR Residency
- **Decision**: Wire `calculate_ocr_residency(active_profile)` into `process_ocr` for dynamic `keep_alive`
- **Rationale**: Preserve Adaptive OCR Residency policy from CONTEXT.md; avoid fixed values
- **Implementation**:
- Import `calculate_ocr_residency` from `residency_policy.py`
- Call function during OCR request to calculate appropriate keep_alive
- Do NOT accept explicit `options_override["keep_alive"]` from backend
- keep_alive is a lazy resource parameter calculated at process time (ADR-036 Gap-2)
- **Alternatives Rejected**:
- Fixed `keep_alive=0` (Claude plan) → rejected (violates ADR-036 Gap-2)
- Fixed `keep_alive=10m` (Qwen plan) → rejected (violates adaptive policy)
### Retain VRAM Monitor and Residency Policy
- **Decision**: Retain `vram_monitor.py` and `residency_policy.py` modules
- **Rationale**: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
- **Alternatives Rejected**:
- Delete these modules (Claude + Qwen plans) → rejected (violates CONTEXT.md resolved GPU policies)
### CPU Fallback for Retrieval
- **Decision**: Retain dynamic CPU/GPU selection for `/embed` and `/rerank` via `.to(device)` logic
- **Rationale**: CPU fallback required when GPU is under pressure; prevents VRAM exhaustion
- **Alternatives Rejected**:
- Force BGE-M3 and Reranker GPU-resident → rejected (violates LLM-First policy)
## Parameter Governance Decisions
### Remove Hardcoded Runtime Parameters
- **Decision**: Remove hardcoded `temperature`, `top_p`, `repeat_penalty`, `max_tokens` from sidecar
- **Rationale**: ADR-036 Profile-Only Parameter Governance; enable dynamic tuning without rebuild
- **Implementation**:
- Backend resolves parameters from `ai_execution_profiles` row `ocr-extract`
- Backend sends parameters to sidecar in every request
- Sidecar passes parameters to Ollama in every load/generate call
- Modfile serves as last-resort fallback only
- **Alternatives Rejected**:
- Keep hardcoded values in sidecar → rejected (violates ADR-036)
- Create new `PromptBuilderService` → rejected (use existing Active Prompt system)
### Active Prompt Integration
- **Decision**: Backend resolves systemPrompt and DMS tags from Active Prompt in `ai_prompts`
- **Rationale**: ADR-029/037 Active Prompt System; prompt authority in database not code
- **Implementation**:
- Backend resolves Active Prompt for `ocr_extraction` type
- Backend extracts systemPrompt and DMS tags (`<document_number>`, `<document_date>`, `<received_date>`)
- Backend sends systemPrompt and DMS tags to sidecar
- Sidecar receives and injects into Ollama request in every load/generate call
- **Alternatives Rejected**:
- Create new `PromptBuilderService` → rejected (use existing ADR-029/037 system)
- Hardcode DMS tags in sidecar → rejected (violates ADR-036 parameter governance)
## Authentication Decisions
### Two-Phase Auth Migration
- **Decision**: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
- **Rationale**: Sequenced migration; network isolation only possible after server consolidation
- **Phase 1 Implementation**:
- Remove hardcoded default API key
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
- Continue validating X-API-Key on both sidecar and backend
- **Phase 2 Implementation** (after ADR-041 consolidation):
- Remove X-API-Key validation from sidecar endpoints
- Remove X-API-Key send-side from `OcrService`
- Remove X-API-Key send-side from `SandboxOcrEngineService`
- Rely on Docker-internal network isolation
- **Interim Period**: X-API-Key validation must remain active until ADR-041 cutover
- **Alternatives Considered**:
- Remove X-API-Key immediately → rejected (cross-host topology requires defense-in-depth)
- Keep X-API-Key permanently → rejected (adds complexity without value post-consolidation)
## Endpoint Decisions
### Remove /normalize Endpoint
- **Decision**: Remove `/normalize` endpoint from sidecar
- **Rationale**: No consumers exist (verified by grep across backend codebase); ThaiPreprocessProcessor unused
- **Verification**: Grep search found no calls to `/normalize` or `THAI_PREPROCESS_URL`
- **Impact**: None - endpoint has no consumers
### Fix Mutable Default Argument
- **Decision**: Fix mutable default argument `options_override={}` in `process_with_typhoon_ocr`
- **Rationale**: Python anti-pattern; causes unexpected behavior when defaults are mutated
- **Implementation**: Change to `options_override: dict = None` and initialize to `{}` in function body
## Dependencies
### External Dependencies
- **FastAPI 0.111.0**: Web framework (already in use)
- **httpx 0.27.0**: Async HTTP client (upgrade from sync httpx)
- **PyMuPDF 1.24.0**: PDF processing (already in use)
- **typhoon-ocr>=0.4.1**: OCR library (already in use)
- **FlagEmbedding>=1.2.0**: Embedding model (already in use)
- **pythainlp 5.0.4**: Thai NLP (already in use)
### Internal Dependencies
- **residency_policy.py**: Must retain for Adaptive OCR Residency
- **vram_monitor.py**: Must retain for VRAM monitoring
- **backend AI services**: OcrService, SandboxOcrEngineService must be updated for parameter resolution
## Testing Strategy
### Path Traversal Tests
- Test cases for various path traversal patterns (`../../etc/passwd`, symlinks, etc.)
- Expect 403 Forbidden for all malicious paths
- Use pytest for automated testing
### Residency Wiring Tests
- Unit test to verify `calculate_ocr_residency()` is called in `process_ocr`
- Verify keep_alive value is calculated dynamically, not fixed
- Test with different VRAM pressure scenarios
### Performance Tests
- Benchmark async vs sync I/O under concurrent load
- Target: 20%+ throughput improvement
- Measure response times and resource utilization
## Rollback Plan
If issues arise during deployment:
1. Revert `app.py` to previous version
2. Restore X-API-Key send-side in backend services
3. Re-pin `keep_alive` default to `0` in `process_ocr`
4. Restore hardcoded runtime params if needed for emergency fallback
## References
- ADR-040: OCR Sidecar Refactor
- ADR-036: Profile-Only Parameter Governance
- ADR-029: Dynamic Prompt Management
- ADR-037: Active Prompt System
- ADR-041: Server Consolidation (dependency for Phase 2)
- CONTEXT.md: GPU Policy (LLM-First Ownership, CPU Fallback)
@@ -0,0 +1,168 @@
# Feature Specification: OCR Sidecar Refactor
**Feature Branch**: `140-ocr-sidecar-refactor`
**Created**: 2026-06-20
**Status**: Draft
**Input**: ADR-040: OCR Sidecar Refactor — Pure Compute Worker, Preserved GPU Policy, Network-Trust Boundary
## User Scenarios & Testing _(mandatory)_
### User Story 1 - Sidecar Security Hardening (Priority: P1)
System administrators need to ensure the OCR sidecar on Desk-5439 is secure from path traversal attacks and does not contain hardcoded secrets that cannot be rotated without rebuilding containers.
**Why this priority**: Security vulnerabilities (hardcoded API keys, path traversal) are critical risks that could lead to unauthorized access and data breaches.
**Independent Test**: Can be fully tested by attempting path traversal requests and verifying that hardcoded default keys are rejected when environment variables are missing, delivering immediate security validation.
**Acceptance Scenarios**:
1. **Given** the sidecar is running with a leaked API key, **When** an attacker attempts to use it, **Then** the system should allow key rotation without container rebuild
2. **Given** a malicious request with path traversal (e.g., `../../etc/passwd`), **When** the `/ocr` endpoint receives the request, **Then** the system returns 403 Forbidden
3. **Given** the sidecar starts without `OCR_SIDECAR_API_KEY` environment variable, **When** the container initializes, **Then** it fails fast with clear error message
---
### User Story 2 - GPU Resource Management (Priority: P1)
The system must prevent VRAM exhaustion on Desk-5439 (RTX 5060 Ti 16GB) by implementing adaptive OCR residency policy and CPU fallback for retrieval models, ensuring the LLM (Typhoon-2.5) has priority GPU access.
**Why this priority**: VRAM exhaustion causes complete system failure. The LLM-First GPU Ownership policy is critical for system stability.
**Independent Test**: Can be fully tested by monitoring VRAM usage during concurrent OCR and embedding operations, verifying that BGE-M3 and FlagReranker fall back to CPU when GPU is under pressure.
**Acceptance Scenarios**:
1. **Given** the GPU is under heavy load from LLM operations, **When** an OCR request comes in, **Then** the system uses `calculate_ocr_residency()` to determine appropriate `keep_alive` value
2. **Given** VRAM is nearly full, **When** embedding or reranking requests are made, **Then** BGE-M3 and FlagReranker automatically fall back to CPU
3. **Given** the sidecar loads OCR model, **When** the operation completes, **Then** the model is unloaded based on residency policy (not fixed `keep_alive=0` or `300`)
---
### User Story 3 - Parameter Governance via Active Prompt (Priority: P2)
Backend services need to control AI model parameters (temperature, top_p, repeat_penalty, max_tokens, keep_alive) from the database via `ai_execution_profiles` and `ai_prompts` tables, ensuring no hardcoded values in the sidecar.
**Why this priority**: This enables dynamic parameter tuning without container rebuilds, aligning with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System.
**Independent Test**: Can be fully tested by modifying `ai_execution_profiles` row `ocr-extract` and verifying that the sidecar uses the new parameters on the next request.
**Acceptance Scenarios**:
1. **Given** the `ai_execution_profiles` row `ocr-extract` has `temperature=0.7`, **When** the backend sends OCR request, **Then** the sidecar passes `temperature=0.7` to Ollama
2. **Given** the Active Prompt in `ai_prompts` contains system prompt and DMS tags, **When** the backend resolves the prompt, **Then** the sidecar receives and injects these into the Ollama request
3. **Given** a parameter is missing from the job snapshot, **When** the sidecar processes the request, **Then** it uses Modfile as last-resort fallback only
---
### User Story 4 - Async I/O Performance (Priority: P2)
The sidecar must use asynchronous I/O patterns to prevent blocking the FastAPI event loop, improving throughput and reducing latency for OCR operations.
**Why this priority**: Synchronous blocking I/O reduces system throughput and can cause request timeouts under load.
**Independent Test**: Can be fully tested by running concurrent OCR requests and measuring response times, verifying that async implementation handles load without blocking.
**Acceptance Scenarios**:
1. **Given** the sidecar receives multiple concurrent OCR requests, **When** processing with `httpx.AsyncClient`, **Then** requests do not block each other
2. **Given** the sidecar starts up, **When** models are loaded, **Then** loading happens via `asyncio.to_thread` to avoid blocking startup
3. **Given** the sidecar is under load, **When** measuring request latency, **Then** async implementation shows improved throughput compared to sync version
---
### User Story 5 - Network Isolation Auth (Phase 2, Post-Consolidation) (Priority: P3)
After ADR-041 server consolidation completes (single Docker host), the system should remove X-API-Key validation and rely solely on Docker-internal network isolation for authentication.
**Why this priority**: This is a future-phase improvement that simplifies the system after infrastructure consolidation. It's lower priority as it depends on ADR-041 completion.
**Independent Test**: Can be fully tested after consolidation by removing X-API-Key headers and verifying that requests from within Docker network succeed while external requests fail.
**Acceptance Scenarios**:
1. **Given** ADR-041 consolidation is complete (single Docker host), **When** backend calls sidecar without X-API-Key, **Then** the request succeeds via Docker-internal network
2. **Given** consolidation is complete, **When** external network attempts to call sidecar, **Then** the request is blocked by network isolation
3. **Given** the interim period (before consolidation), **When** backend calls sidecar, **Then** X-API-Key validation is still active
---
### Edge Cases
- What happens when the OCR sidecar receives a request for a PDF file that does not exist within the whitelisted base path? (Tested via path traversal test T007)
- How does the system handle VRAM exhaustion when both LLM and OCR models attempt to load simultaneously?
- What happens when the `ai_execution_profiles` row `ocr-extract` is missing or has invalid parameter values?
- How does the sidecar handle Ollama service unavailability or timeout during OCR processing? (Handled by FastAPI exception handling with user-friendly error messages per ADR-007)
- What happens when the Active Prompt system is unavailable during OCR request processing?
- How does the system handle concurrent requests when GPU is under extreme pressure (e.g., 95% VRAM usage)?
- What happens when path canonicalization resolves to a symlink outside the base path? (Tested via path traversal test T007 with symlink scenarios)
- How does the system behave during the transition period between Phase 1 (X-API-Key) and Phase 2 (Network Isolation)?
## Requirements _(mandatory)_
### Functional Requirements
- **FR-001**: Sidecar MUST remove hardcoded default API key and fail-fast if `OCR_SIDECAR_API_KEY` environment variable is missing
- **FR-002**: Sidecar MUST implement path canonicalization via `os.path.abspath()` + `os.path.realpath()` on all PDF path inputs
- **FR-003**: Sidecar MUST enforce base-path whitelist check on `/ocr` endpoint, rejecting paths outside `OCR_SIDECAR_UPLOAD_BASE` with 403 Forbidden
- **FR-004**: Sidecar MUST refactor `process_ocr` to use `async def` and `httpx.AsyncClient` via lifespan context manager
- **FR-005**: Sidecar MUST replace `@app.on_event("startup")` with `@asynccontextmanager` lifespan pattern
- **FR-006**: Sidecar MUST wire `calculate_ocr_residency(active_profile)` into `process_ocr` for dynamic `keep_alive` calculation
- **FR-007**: Sidecar MUST NOT accept explicit `options_override["keep_alive"]` from backend (keep_alive must be calculated lazily per ADR-036 Gap-2)
- **FR-008**: Sidecar MUST retain `vram_monitor.py` and `residency_policy.py` modules (reject deletion)
- **FR-009**: Sidecar MUST retain dynamic CPU/GPU selection for `/embed` and `/rerank` endpoints via `.to(device)` logic
- **FR-010**: Sidecar MUST remove hardcoded runtime parameters (temperature, top_p, repeat_penalty, max_tokens) and accept from backend job snapshot
- **FR-011**: Sidecar MUST receive systemPrompt and DMS extraction tags from backend and pass to Ollama in every load/generate call
- **FR-012**: Sidecar MUST remove `/normalize` endpoint (ThaiPreprocessProcessor has no consumers)
- **FR-013**: Sidecar MUST fix mutable default argument `options_override={}` in `process_with_typhoon_ocr`
- **FR-014**: Sidecar MUST load models via `asyncio.to_thread` during lifespan to avoid blocking startup
- **FR-015**: Backend MUST resolve runtime parameters from `ai_execution_profiles` row `ocr-extract` and send to sidecar
- **FR-016**: Backend MUST resolve systemPrompt and DMS tags from Active Prompt in `ai_prompts` (ADR-029/037)
- **FR-017**: Backend MUST send resolved parameters to sidecar in every OCR request
- **FR-018**: Phase 2 (post-ADR-041): Sidecar MUST remove X-API-Key validation from all endpoints
- **FR-019**: Phase 2 (post-ADR-041): Backend MUST remove X-API-Key send-side in `OcrService`
- **FR-020**: Phase 2 (post-ADR-041): Backend MUST remove X-API-Key send-side in `SandboxOcrEngineService`
### Key Entities
- **OCR Sidecar (FastAPI Service)**: Pure compute worker on Desk-5439 that provides `/ocr`, `/embed`, `/rerank` endpoints. No business logic or parameter governance. Receives parameters from backend.
- **ai_execution_profiles**: Database table containing runtime parameter profiles for different AI operations (row `ocr-extract` for OCR parameters)
- **ai_prompts**: Database table containing prompt templates with versioning and activation status (ADR-029/037)
- **Backend OcrService**: Service that orchestrates OCR requests, resolves parameters from database, and sends to sidecar
- **Backend SandboxOcrEngineService**: Service for OCR sandbox testing, similar parameter resolution as OcrService
## Success Criteria _(mandatory)_
### Measurable Outcomes
- **SC-001**: Path traversal attacks return 403 Forbidden in 100% of test cases (verified by pytest suite)
- **SC-002**: VRAM exhaustion is prevented under load; system remains stable with LLM-First GPU Ownership policy (verified by VRAM monitoring during stress test)
- **SC-003**: OCR request throughput improves by at least 20% with async I/O implementation (measured by concurrent request benchmark)
- **SC-004**: Parameter changes in `ai_execution_profiles` take effect immediately without container rebuild (verified by runtime parameter update test)
- **SC-005**: System startup time does not increase despite async model loading (measured by container startup benchmark)
- **SC-006**: No hardcoded secrets remain in sidecar codebase (verified by code audit)
- **SC-007**: All sidecar endpoints respect network isolation after ADR-041 consolidation (verified by network access test)
- **SC-008**: CPU fallback for BGE-M3 and FlagReranker activates correctly when GPU is under pressure (verified by VRAM monitoring test)
## Assumptions
- ADR-041 server consolidation will complete before Phase 2 (X-API-Key removal) can be implemented
- Desk-5439 (192.168.10.100) will continue to host the OCR sidecar with RTX 5060 Ti 16GB GPU
- Ollama service on Desk-5439 will continue to provide Typhoon OCR model
- ThaiPreprocessProcessor has no active consumers (verified by grep search across backend codebase)
- `calculate_ocr_residency()` function exists in `residency_policy.py` and is not currently wired into `process_ocr`
- VLAN/firewall ACL provides interim network security before ADR-041 consolidation
## Dependencies
- ADR-041 Server Consolidation must complete before Phase 2 (X-API-Key removal)
- ADR-036 Profile-Only Parameter Governance must be implemented for parameter resolution
- ADR-029 Dynamic Prompt Management must be implemented for Active Prompt system
- ADR-037 Active Prompt System must be operational for system prompt injection
- Desk-5439 infrastructure must remain stable (GPU, network, Ollama service)
## Out of Scope
- 1-page-1-request horizontal scaling rework (separate future ADR)
- OpenTelemetry/Prometheus/Grafana observability (separate ticket)
- `/normalize` endpoint functionality (removed per D2; ThaiPreprocessProcessor has no consumers)
@@ -0,0 +1,296 @@
# Tasks: OCR Sidecar Refactor
**Input**: Design documents from `/specs/100-Infrastructures/140-ocr-sidecar-refactor/`
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/sidecar-api.md, quickstart.md
**Tests**: Tests are included for path-traversal protection and residency wiring (per spec acceptance criteria)
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
- Include exact file paths in descriptions
## Path Conventions
- **Sidecar**: `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/`
- **Backend**: `backend/src/modules/ai/`
- **Tests**: `tests/unit/ocr-sidecar/`, `tests/integration/ocr-sidecar/`
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Project initialization and basic structure
- [x] T001 Create test directory structure in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests/
- [x] T002 Create test directory structure in tests/unit/ocr-sidecar/
- [x] T003 Create test directory structure in tests/integration/ocr-sidecar/
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
- [x] T004 Update requirements.txt in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/requirements.txt (add httpx 0.27.0, remove numpy if present)
- [x] T005 Update .env template in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env (add OCR_SIDECAR_API_KEY placeholder)
- [x] T006 Update backend .env.example in backend/.env.example (add OCR_API_URL, OCR_API_KEY placeholders)
**Checkpoint**: Foundation ready - user story implementation can now begin in parallel
---
## Phase 3: User Story 1 - Sidecar Security Hardening (Priority: P1) 🎯 MVP
**Goal**: Ensure the OCR sidecar is secure from path traversal attacks and does not contain hardcoded secrets that cannot be rotated without rebuilding containers.
**Independent Test**: Attempt path traversal requests and verify they return 403 Forbidden; verify sidecar fails fast when OCR_SIDECAR_API_KEY env is missing.
### Tests for User Story 1
- [x] T007 [P] [US1] Create path traversal test in tests/unit/ocr-sidecar/test_path_traversal.py (test various path patterns: ../../etc/passwd, symlinks outside base path, etc.)
- [x] T008 [P] [US1] Create API key validation test in tests/unit/ocr-sidecar/test_api_key_validation.py (test missing key, invalid key scenarios)
### Implementation for User Story 1
- [x] T009 [US1] Remove hardcoded default API key in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T010 [US1] Add fail-fast check for OCR_SIDECAR_API_KEY environment variable in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (raise error on startup if missing)
- [x] T011 [US1] Implement path canonicalization function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (using os.path.abspath + os.path.realpath)
- [x] T012 [US1] Implement base-path whitelist check in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check against OCR_SIDECAR_UPLOAD_BASE)
- [x] T013 [US1] Add path validation to POST /ocr endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (return 403 for invalid paths)
- [x] T014 [US1] Fix mutable default argument options_override={} in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (change to None and initialize in function body)
- [x] T015 [US1] Remove duplicate import tempfile in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
**Checkpoint**: At this point, User Story 1 should be fully functional and testable independently
---
## Phase 4: User Story 2 - GPU Resource Management (Priority: P1)
**Goal**: Prevent VRAM exhaustion on Desk-5439 by implementing adaptive OCR residency policy and CPU fallback for retrieval models, ensuring LLM has priority GPU access.
**Independent Test**: Monitor VRAM usage during concurrent OCR and embedding operations; verify BGE-M3 and FlagReranker fall back to CPU when GPU is under pressure.
### Tests for User Story 2
- [x] T016 [P] [US2] Create residency wiring unit test in tests/unit/ocr-sidecar/test_residency_wiring.py (verify calculate_ocr_residency is called in process_ocr)
- [x] T017 [P] [US2] Create CPU fallback integration test in tests/integration/ocr-sidecar/test_cpu_fallback.py (verify BGE-M3 and FlagReranker use CPU when GPU under pressure)
### Implementation for User Story 2
- [x] T018 [US2] Import calculate_ocr_residency from residency_policy.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T019 [US2] Wire calculate_ocr_residency(active_profile) into process_ocr function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T020 [US2] Remove hardcoded keep_alive=0 in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T021 [US2] Reject explicit options_override["keep_alive"] from backend in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (keep_alive must be calculated lazily per ADR-036 Gap-2)
- [x] T022 [US2] Retain vram_monitor.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/ (ensure not deleted)
- [x] T023 [US2] Retain residency_policy.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/ (ensure not deleted)
- [x] T024 [US2] Verify dynamic CPU/GPU selection exists for /embed endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check .to(device) logic)
- [x] T025 [US2] Verify dynamic CPU/GPU selection exists for /rerank endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check .to(device) logic)
**Checkpoint**: At this point, User Stories 1 AND 2 should both work independently
---
## Phase 5: User Story 3 - Parameter Governance via Active Prompt (Priority: P2)
**Goal**: Enable backend services to control AI model parameters from the database via ai_execution_profiles and ai_prompts tables, ensuring no hardcoded values in the sidecar.
**Independent Test**: Modify ai_execution_profiles row ocr-extract and verify that the sidecar uses the new parameters on the next request.
### Tests for User Story 3
- [x] T026 [P] [US3] Create parameter resolution integration test in tests/integration/ocr-sidecar/test_parameter_governance.py (verify parameters from ai_execution_profiles are used)
- [x] T027 [P] [US3] Create Active Prompt integration test in tests/integration/ocr-sidecar/test_active_prompt.py (verify systemPrompt and DMS tags from ai_prompts are used)
### Implementation for User Story 3
- [x] T028 [US3] Remove hardcoded runtime parameters (temperature, top_p, repeat_penalty, max_tokens) in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T029 [US3] Add runtime_params field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T030 [US3] Add system_prompt field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T031 [US3] Add dms_tags field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T032 [US3] Pass runtime_params to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T033 [US3] Pass system_prompt to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (inject into every load/generate call)
- [x] T034 [US3] Pass dms_tags to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (inject into every load/generate call)
- [x] T035 [US3] Implement parameter resolution in backend/src/modules/ai/services/ocr.service.ts (resolve from ai_execution_profiles row ocr-extract)
- [x] T036 [US3] Implement Active Prompt resolution in backend/src/modules/ai/services/ocr.service.ts (resolve from ai_prompts type ocr_extraction)
- [x] T037 [US3] Extract systemPrompt and DMS tags in backend/src/modules/ai/services/ocr.service.ts
- [x] T038 [US3] Send resolved parameters to sidecar in backend/src/modules/ai/services/ocr.service.ts
- [x] T039 [US3] Implement parameter resolution in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts (same pattern as ocr.service.ts)
- [x] T040 [US3] Implement Active Prompt resolution in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts (same pattern as ocr.service.ts)
**Checkpoint**: All user stories should now be independently functional
---
## Phase 6: User Story 4 - Async I/O Performance (Priority: P2)
**Goal**: Use asynchronous I/O patterns to prevent blocking the FastAPI event loop, improving throughput and reducing latency for OCR operations.
**Independent Test**: Run concurrent OCR requests and measure response times; verify async implementation handles load without blocking.
### Tests for User Story 4
- [x] T041 [P] [US4] Create async I/O performance test in tests/integration/ocr-sidecar/test_async_performance.py (benchmark concurrent requests)
### Implementation for User Story 4
- [x] T042 [US4] Refactor process_ocr to async def in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T043 [US4] Create AsyncClient via lifespan context manager in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T044 [US4] Replace httpx.Client with httpx.AsyncClient in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T045 [US4] Replace @app.on_event("startup") with @asynccontextmanager lifespan in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T046 [US4] Load models via asyncio.to_thread during lifespan in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (avoid blocking startup)
---
## Phase 7: User Story 5 - Network Isolation Auth Phase 2 (Priority: P3)
**Goal**: After ADR-041 server consolidation completes, remove X-API-Key validation and rely solely on Docker-internal network isolation for authentication.
**Independent Test**: After consolidation, remove X-API-Key headers and verify that requests from within Docker network succeed while external requests fail.
### Tests for User Story 5
- [ ] T047 [P] [US5] Create network isolation test in tests/integration/ocr-sidecar/test_network_isolation.py (verify Docker-internal requests work, external requests fail)
### Implementation for User Story 5 (BLOCKED until ADR-041 consolidation complete)
- [ ] T048 [US5] Remove X-API-Key validation from all endpoints in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [ ] T049 [US5] Remove OCR_SIDECAR_API_KEY from .env in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env
- [ ] T050 [US5] Remove X-API-Key send-side in backend/src/modules/ai/services/ocr.service.ts
- [ ] T051 [US5] Remove X-API-Key send-side in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts
- [ ] T052 [US5] Remove OCR_API_KEY from backend .env in backend/.env
- [ ] T053 [US5] Update OCR_API_URL to Docker-internal URL in backend/.env (e.g., http://sidecar:8765)
**Note**: Phase 7 tasks are BLOCKED until ADR-041 server consolidation completes. Do not implement until ADR-041 cutover is successful.
---
## Phase 8: Remove /normalize Endpoint (Cross-Cutting)
**Purpose**: Remove unused /normalize endpoint per ADR-040 D2
- [x] T054 Remove /normalize endpoint from specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T055 Verify no consumers exist via grep search in backend codebase
---
## Phase 9: Polish & Cross-Cutting Concerns
**Purpose**: Improvements that affect multiple user stories
- [x] T056 [P] Update Dockerfile in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/Dockerfile (if any changes needed)
- [x] T057 [P] Update docker-compose.yml in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/docker-compose.yml (if any changes needed)
- [x] T058 Run path traversal test suite and verify all tests pass
- [x] T059 Run residency wiring test suite and verify all tests pass
- [x] T060 Run parameter governance test suite and verify all tests pass
- [x] T061 Run async performance test and verify 20%+ throughput improvement
- [x] T062 Update documentation in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/README.md
- [x] T063 Validate quickstart.md deployment steps on Desk-5439
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies - can start immediately
- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
- **User Stories (Phase 3-6)**: All depend on Foundational phase completion
- User Stories 1-4 (P1, P1, P2, P2) can proceed in parallel after Phase 2
- User Story 5 (P3) is BLOCKED until ADR-041 consolidation completes
- **Remove /normalize (Phase 8)**: Can run in parallel with user stories (no dependencies)
- **Polish (Phase 9)**: Depends on all desired user stories being complete
### User Story Dependencies
- **User Story 1 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 2 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 3 (P2)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 4 (P2)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 5 (P3)**: BLOCKED until ADR-041 consolidation completes
### Within Each User Story
- Tests MUST be written and FAIL before implementation (TDD approach)
- Sidecar implementation before backend implementation (for parameter governance story)
- Core implementation before integration
- Story complete before moving to next priority
### Parallel Opportunities
- All Setup tasks (T001-T003) can run in parallel
- All Foundational tasks (T004-T006) can run in parallel
- Once Foundational phase completes, User Stories 1-4 can start in parallel (if team capacity allows)
- All tests for a user story marked [P] can run in parallel
- User Story 5 tasks can run in parallel once ADR-041 consolidation completes
- Remove /normalize task (T054-T055) can run in parallel with user stories
- Polish tasks (T056-T057) can run in parallel
---
## Parallel Example: User Story 1
```bash
# Launch all tests for User Story 1 together:
Task: "Create path traversal test in tests/unit/ocr-sidecar/test_path_traversal.py"
Task: "Create API key validation test in tests/unit/ocr-sidecar/test_api_key_validation.py"
# Launch implementation tasks sequentially (each depends on previous):
Task: "Remove hardcoded default API key in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
Task: "Add fail-fast check for OCR_SIDECAR_API_KEY environment variable in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
Task: "Implement path canonicalization function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
```
---
## Implementation Strategy
### MVP First (User Stories 1-2 Only - Critical Security & GPU Management)
1. Complete Phase 1: Setup
2. Complete Phase 2: Foundational (CRITICAL - blocks all stories)
3. Complete Phase 3: User Story 1 (Security Hardening)
4. Complete Phase 4: User Story 2 (GPU Resource Management)
5. **STOP and VALIDATE**: Test User Stories 1-2 independently
6. Deploy/demo if ready
### Incremental Delivery
1. Complete Setup + Foundational → Foundation ready
2. Add User Story 1 → Test independently → Deploy/Demo (Security MVP!)
3. Add User Story 2 → Test independently → Deploy/Demo (GPU Management MVP!)
4. Add User Story 3 → Test independently → Deploy/Demo (Parameter Governance)
5. Add User Story 4 → Test independently → Deploy/Demo (Async Performance)
6. Wait for ADR-041 consolidation → Add User Story 5 → Test independently → Deploy/Demo
7. Each story adds value without breaking previous stories
### Parallel Team Strategy
With multiple developers:
1. Team completes Setup + Foundational together
2. Once Foundational is done:
- Developer A: User Story 1 (Security)
- Developer B: User Story 2 (GPU Management)
- Developer C: User Story 3 (Parameter Governance)
- Developer D: User Story 4 (Async I/O)
3. Stories complete and integrate independently
4. After ADR-041 consolidation: Developer A/E: User Story 5 (Network Isolation)
---
## Notes
- [P] tasks = different files, no dependencies
- [Story] label maps task to specific user story for traceability
- Each user story should be independently completable and testable
- Verify tests fail before implementing
- Commit after each task or logical group
- Stop at any checkpoint to validate story independently
- User Story 5 is BLOCKED until ADR-041 consolidation completes
- Phase 7 tasks should NOT be started until ADR-041 cutover is successful
- Avoid: vague tasks, same file conflicts, cross-story dependencies that break independence