7.3 KiB
Implementation Plan: OCR Sidecar Refactor
Branch: 140-ocr-sidecar-refactor | Date: 2026-06-20 | Spec: spec.md
Input: Feature specification from /specs/100-Infrastructures/140-ocr-sidecar-refactor/spec.md
Summary
Refactor the OCR sidecar on Desk-5439 to address security vulnerabilities (hardcoded API keys, path traversal), implement async I/O for performance, preserve GPU resource management policies (Adaptive OCR Residency, CPU Fallback Retrieval), and align with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System. The sidecar becomes a pure compute worker with all orchestration and parameter governance moved to backend services.
Technical Context
Language/Version: Python 3.11+ (FastAPI) Primary Dependencies: FastAPI 0.111.0, httpx 0.27.0, PyMuPDF 1.24.0, typhoon-ocr>=0.4.1, FlagEmbedding>=1.2.0, pythainlp 5.0.4 Storage: No database access (ADR-023/023A boundary - sidecar is pure compute worker) Testing: pytest for path-traversal and residency wiring tests Target Platform: Desk-5439 (192.168.10.100, Windows 10/11, RTX 5060 Ti 16GB GPU) via Docker Project Type: Infrastructure (sidecar service) Performance Goals: 20%+ throughput improvement with async I/O; VRAM exhaustion prevention under load Constraints: Must preserve LLM-First GPU Ownership; must not bypass existing residency_policy.py; must align with ADR-036 Gap-2 (keep_alive as lazy resource param) Scale/Scope: Single sidecar service; affects backend AI services (OcrService, SandboxOcrEngineService)
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
| Gate | Status | Justification |
|---|---|---|
| ADR-019 UUID | ✅ PASS | Sidecar N/A (pure compute worker), Backend applies ADR-019 (parameter resolution in OcrService/SandboxOcrEngineService) |
| ADR-009 Schema | N/A | No database schema changes in sidecar |
| ADR-016 Security | ✅ PASS | Path traversal hardening; no hardcoded secrets; network isolation auth |
| ADR-002 Numbering | N/A | No document numbering in sidecar |
| ADR-008 BullMQ | N/A | Sidecar does not use BullMQ (backend does) |
| ADR-023/023A AI Boundary | ✅ PASS | Sidecar is pure compute worker; no DB/storage access; AI → DMS API → DB pattern preserved |
| ADR-007 Errors | ✅ PASS | FastAPI exception handling with user-friendly messages |
| TypeScript Strict | N/A | Python codebase |
Project Structure
Documentation (this feature)
specs/100-Infrastructures/140-ocr-sidecar-refactor/
├── spec.md # Feature specification
├── plan.md # This file
├── research.md # Phase 0 output (technical decisions from ADR-040)
├── data-model.md # Phase 1 output (data contracts)
├── quickstart.md # Phase 1 output (deployment guide)
├── contracts/ # Phase 1 output (API contracts)
│ └── sidecar-api.md # Sidecar API specification
└── tasks.md # Phase 2 output (implementation tasks)
Source Code
specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/
├── app.py # FastAPI application (main refactor target)
├── residency_policy.py # Retain (Adaptive OCR Residency)
├── vram_monitor.py # Retain (VRAM monitoring)
├── requirements.txt # Python dependencies
├── Dockerfile # Container definition
├── docker-compose.yml # Orchestration
└── .env # Environment variables
backend/src/modules/ai/
├── services/
│ ├── ocr.service.ts # Parameter resolution + sidecar calls
│ └── sandbox-ocr-engine.service.ts # Sandbox parameter resolution
└── processors/
└── ai-batch.processor.ts # BullMQ processor (unchanged)
tests/
├── unit/
│ └── ocr-sidecar/ # Sidecar unit tests
│ ├── test_path_traversal.py # Path traversal tests
│ └── test_residency_wiring.py # Residency calculation tests
└── integration/
└── ocr-sidecar/ # Sidecar integration tests
Structure Decision: Infrastructure refactor targeting existing OCR sidecar on Desk-5439. Backend changes limited to parameter resolution in AI services. No new frontend changes.
Complexity Tracking
No constitution violations - all gates pass. This section not applicable.
Phase 0: Research & Technical Decisions
All technical decisions are already documented in ADR-040. Key decisions:
Security Decisions
- Decision: Remove hardcoded default API key; fail-fast if env missing
- Rationale: Security vulnerability - leaked key cannot be rotated without rebuild
- Decision: Implement path canonicalization + base-path whitelist
- Rationale: Prevent path traversal attacks (ADR-016)
I/O Pattern Decisions
- Decision: Refactor to async I/O with shared AsyncClient via lifespan
- Rationale: Synchronous blocking I/O reduces throughput under load
- Decision: Replace
@app.on_event("startup")with lifespan context manager - Rationale: Deprecated pattern; lifespan provides better resource management
GPU Resource Management Decisions
- Decision: Wire
calculate_ocr_residency()intoprocess_ocrfor dynamic keep_alive - Rationale: Preserve Adaptive OCR Residency policy (CONTEXT.md); avoid fixed values
- Decision: Retain vram_monitor.py and residency_policy.py
- Rationale: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
- Decision: Reject forced GPU-resident BGE-M3/Reranker
- Rationale: CPU fallback is required for VRAM pressure scenarios
Parameter Governance Decisions
- Decision: Remove hardcoded runtime params; accept from backend job snapshot
- Rationale: ADR-036 Profile-Only Parameter Governance; dynamic tuning without rebuild
- Decision: Backend resolves systemPrompt and DMS tags from Active Prompt
- Rationale: ADR-029/037 Active Prompt System; prompt authority in DB not code
- Decision: Reject creating PromptBuilderService
- Rationale: Use existing Active Prompt system; avoid invented orchestration
Auth Decisions
- Decision: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
- Rationale: Sequenced migration; network isolation only possible post-consolidation
- Decision: Interim period requires X-API-Key validation
- Rationale: Cross-host topology (before ADR-041) requires defense-in-depth
Endpoint Decisions
- Decision: Remove /normalize endpoint
- Rationale: No consumers (verified by grep); ThaiPreprocessProcessor unused
- Decision: Fix mutable default argument
options_override={} - Rationale: Python anti-pattern; causes unexpected behavior
Phase 1: Design & Contracts
Data Model
See data-model.md for detailed data contracts and entity relationships.
API Contracts
See contracts/sidecar-api.md for sidecar API specification.
Quickstart Guide
See quickstart.md for deployment and testing instructions.
Phase 2: Implementation (Tasks)
See tasks.md for detailed implementation tasks generated by /speckit-tasks.