Files
lcbp3/specs/100-Infrastructures/140-ocr-sidecar-refactor/plan.md
T
admin a80ebef285
CI / CD Pipeline / build (push) Successful in 7m37s
CI / CD Pipeline / deploy (push) Failing after 20m15s
refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
2026-06-20 16:37:04 +07:00

7.3 KiB

Implementation Plan: OCR Sidecar Refactor

Branch: 140-ocr-sidecar-refactor | Date: 2026-06-20 | Spec: spec.md Input: Feature specification from /specs/100-Infrastructures/140-ocr-sidecar-refactor/spec.md

Summary

Refactor the OCR sidecar on Desk-5439 to address security vulnerabilities (hardcoded API keys, path traversal), implement async I/O for performance, preserve GPU resource management policies (Adaptive OCR Residency, CPU Fallback Retrieval), and align with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System. The sidecar becomes a pure compute worker with all orchestration and parameter governance moved to backend services.

Technical Context

Language/Version: Python 3.11+ (FastAPI) Primary Dependencies: FastAPI 0.111.0, httpx 0.27.0, PyMuPDF 1.24.0, typhoon-ocr>=0.4.1, FlagEmbedding>=1.2.0, pythainlp 5.0.4 Storage: No database access (ADR-023/023A boundary - sidecar is pure compute worker) Testing: pytest for path-traversal and residency wiring tests Target Platform: Desk-5439 (192.168.10.100, Windows 10/11, RTX 5060 Ti 16GB GPU) via Docker Project Type: Infrastructure (sidecar service) Performance Goals: 20%+ throughput improvement with async I/O; VRAM exhaustion prevention under load Constraints: Must preserve LLM-First GPU Ownership; must not bypass existing residency_policy.py; must align with ADR-036 Gap-2 (keep_alive as lazy resource param) Scale/Scope: Single sidecar service; affects backend AI services (OcrService, SandboxOcrEngineService)

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Gate Status Justification
ADR-019 UUID PASS Sidecar N/A (pure compute worker), Backend applies ADR-019 (parameter resolution in OcrService/SandboxOcrEngineService)
ADR-009 Schema N/A No database schema changes in sidecar
ADR-016 Security PASS Path traversal hardening; no hardcoded secrets; network isolation auth
ADR-002 Numbering N/A No document numbering in sidecar
ADR-008 BullMQ N/A Sidecar does not use BullMQ (backend does)
ADR-023/023A AI Boundary PASS Sidecar is pure compute worker; no DB/storage access; AI → DMS API → DB pattern preserved
ADR-007 Errors PASS FastAPI exception handling with user-friendly messages
TypeScript Strict N/A Python codebase

Project Structure

Documentation (this feature)

specs/100-Infrastructures/140-ocr-sidecar-refactor/
├── spec.md              # Feature specification
├── plan.md              # This file
├── research.md          # Phase 0 output (technical decisions from ADR-040)
├── data-model.md        # Phase 1 output (data contracts)
├── quickstart.md        # Phase 1 output (deployment guide)
├── contracts/           # Phase 1 output (API contracts)
│   └── sidecar-api.md   # Sidecar API specification
└── tasks.md             # Phase 2 output (implementation tasks)

Source Code

specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/
├── app.py               # FastAPI application (main refactor target)
├── residency_policy.py  # Retain (Adaptive OCR Residency)
├── vram_monitor.py      # Retain (VRAM monitoring)
├── requirements.txt     # Python dependencies
├── Dockerfile           # Container definition
├── docker-compose.yml   # Orchestration
└── .env                 # Environment variables

backend/src/modules/ai/
├── services/
│   ├── ocr.service.ts                    # Parameter resolution + sidecar calls
│   └── sandbox-ocr-engine.service.ts     # Sandbox parameter resolution
└── processors/
    └── ai-batch.processor.ts             # BullMQ processor (unchanged)

tests/
├── unit/
│   └── ocr-sidecar/                      # Sidecar unit tests
│       ├── test_path_traversal.py        # Path traversal tests
│       └── test_residency_wiring.py      # Residency calculation tests
└── integration/
    └── ocr-sidecar/                      # Sidecar integration tests

Structure Decision: Infrastructure refactor targeting existing OCR sidecar on Desk-5439. Backend changes limited to parameter resolution in AI services. No new frontend changes.

Complexity Tracking

No constitution violations - all gates pass. This section not applicable.

Phase 0: Research & Technical Decisions

All technical decisions are already documented in ADR-040. Key decisions:

Security Decisions

  • Decision: Remove hardcoded default API key; fail-fast if env missing
  • Rationale: Security vulnerability - leaked key cannot be rotated without rebuild
  • Decision: Implement path canonicalization + base-path whitelist
  • Rationale: Prevent path traversal attacks (ADR-016)

I/O Pattern Decisions

  • Decision: Refactor to async I/O with shared AsyncClient via lifespan
  • Rationale: Synchronous blocking I/O reduces throughput under load
  • Decision: Replace @app.on_event("startup") with lifespan context manager
  • Rationale: Deprecated pattern; lifespan provides better resource management

GPU Resource Management Decisions

  • Decision: Wire calculate_ocr_residency() into process_ocr for dynamic keep_alive
  • Rationale: Preserve Adaptive OCR Residency policy (CONTEXT.md); avoid fixed values
  • Decision: Retain vram_monitor.py and residency_policy.py
  • Rationale: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
  • Decision: Reject forced GPU-resident BGE-M3/Reranker
  • Rationale: CPU fallback is required for VRAM pressure scenarios

Parameter Governance Decisions

  • Decision: Remove hardcoded runtime params; accept from backend job snapshot
  • Rationale: ADR-036 Profile-Only Parameter Governance; dynamic tuning without rebuild
  • Decision: Backend resolves systemPrompt and DMS tags from Active Prompt
  • Rationale: ADR-029/037 Active Prompt System; prompt authority in DB not code
  • Decision: Reject creating PromptBuilderService
  • Rationale: Use existing Active Prompt system; avoid invented orchestration

Auth Decisions

  • Decision: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
  • Rationale: Sequenced migration; network isolation only possible post-consolidation
  • Decision: Interim period requires X-API-Key validation
  • Rationale: Cross-host topology (before ADR-041) requires defense-in-depth

Endpoint Decisions

  • Decision: Remove /normalize endpoint
  • Rationale: No consumers (verified by grep); ThaiPreprocessProcessor unused
  • Decision: Fix mutable default argument options_override={}
  • Rationale: Python anti-pattern; causes unexpected behavior

Phase 1: Design & Contracts

Data Model

See data-model.md for detailed data contracts and entity relationships.

API Contracts

See contracts/sidecar-api.md for sidecar API specification.

Quickstart Guide

See quickstart.md for deployment and testing instructions.

Phase 2: Implementation (Tasks)

See tasks.md for detailed implementation tasks generated by /speckit-tasks.