8.5 KiB
8.5 KiB
Research: OCR Sidecar Refactor
Date: 2026-06-20
Purpose: Document technical decisions and research findings from ADR-040
Overview
All technical decisions for this refactor are already documented in ADR-040. This file consolidates those decisions for implementation reference.
Security Decisions
Hardcoded API Key Removal
- Decision: Remove hardcoded default API key (
lcbp3-dms-ocr-sidecar-secure-token-2026) fromapp.py - Rationale: Security vulnerability - if leaked, key cannot be rotated without rebuilding container
- Implementation: Fail-fast if
OCR_SIDECAR_API_KEYenvironment variable is missing - Phase: Phase 1 (before ADR-041 consolidation)
Path Traversal Hardening
- Decision: Implement path canonicalization + base-path whitelist on
/ocrendpoint - Rationale: Prevent arbitrary file read attacks (ADR-016)
- Implementation:
- Use
os.path.abspath()+os.path.realpath()for canonicalization - Whitelist base path =
OCR_SIDECAR_UPLOAD_BASE(CIFS mount base) - Reject paths outside base path → 403 Forbidden
- Use
- Alternatives Considered:
- Using path validation regex only → rejected (insufficient for symlink attacks)
- Chroot jail → rejected (overkill for this use case)
I/O Pattern Decisions
Async I/O Refactor
- Decision: Refactor
process_ocrtoasync defand usehttpx.AsyncClientshared via lifespan - Rationale: Synchronous blocking I/O reduces throughput under load; FastAPI event loop blocked
- Implementation:
- Replace
httpx.Clientwithhttpx.AsyncClient - Create AsyncClient in lifespan context manager
- Load models via
asyncio.to_threadto avoid blocking startup
- Replace
- Performance Target: 20%+ throughput improvement under concurrent load
- Alternatives Considered:
- Keep sync I/O but add more workers → rejected (still blocks event loop)
- Use thread pool → rejected (adds complexity without solving root cause)
Lifespan Pattern
- Decision: Replace
@app.on_event("startup")with@asynccontextmanagerlifespan - Rationale: Deprecated pattern; lifespan provides better resource management and cleanup
- Implementation: Use FastAPI lifespan context manager for AsyncClient lifecycle
GPU Resource Management Decisions
Adaptive OCR Residency
- Decision: Wire
calculate_ocr_residency(active_profile)intoprocess_ocrfor dynamickeep_alive - Rationale: Preserve Adaptive OCR Residency policy from CONTEXT.md; avoid fixed values
- Implementation:
- Import
calculate_ocr_residencyfromresidency_policy.py - Call function during OCR request to calculate appropriate keep_alive
- Do NOT accept explicit
options_override["keep_alive"]from backend - keep_alive is a lazy resource parameter calculated at process time (ADR-036 Gap-2)
- Import
- Alternatives Rejected:
- Fixed
keep_alive=0(Claude plan) → rejected (violates ADR-036 Gap-2) - Fixed
keep_alive=10m(Qwen plan) → rejected (violates adaptive policy)
- Fixed
Retain VRAM Monitor and Residency Policy
- Decision: Retain
vram_monitor.pyandresidency_policy.pymodules - Rationale: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
- Alternatives Rejected:
- Delete these modules (Claude + Qwen plans) → rejected (violates CONTEXT.md resolved GPU policies)
CPU Fallback for Retrieval
- Decision: Retain dynamic CPU/GPU selection for
/embedand/rerankvia.to(device)logic - Rationale: CPU fallback required when GPU is under pressure; prevents VRAM exhaustion
- Alternatives Rejected:
- Force BGE-M3 and Reranker GPU-resident → rejected (violates LLM-First policy)
Parameter Governance Decisions
Remove Hardcoded Runtime Parameters
- Decision: Remove hardcoded
temperature,top_p,repeat_penalty,max_tokensfrom sidecar - Rationale: ADR-036 Profile-Only Parameter Governance; enable dynamic tuning without rebuild
- Implementation:
- Backend resolves parameters from
ai_execution_profilesrowocr-extract - Backend sends parameters to sidecar in every request
- Sidecar passes parameters to Ollama in every load/generate call
- Modfile serves as last-resort fallback only
- Backend resolves parameters from
- Alternatives Rejected:
- Keep hardcoded values in sidecar → rejected (violates ADR-036)
- Create new
PromptBuilderService→ rejected (use existing Active Prompt system)
Active Prompt Integration
- Decision: Backend resolves systemPrompt and DMS tags from Active Prompt in
ai_prompts - Rationale: ADR-029/037 Active Prompt System; prompt authority in database not code
- Implementation:
- Backend resolves Active Prompt for
ocr_extractiontype - Backend extracts systemPrompt and DMS tags (
<document_number>,<document_date>,<received_date>) - Backend sends systemPrompt and DMS tags to sidecar
- Sidecar receives and injects into Ollama request in every load/generate call
- Backend resolves Active Prompt for
- Alternatives Rejected:
- Create new
PromptBuilderService→ rejected (use existing ADR-029/037 system) - Hardcode DMS tags in sidecar → rejected (violates ADR-036 parameter governance)
- Create new
Authentication Decisions
Two-Phase Auth Migration
- Decision: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
- Rationale: Sequenced migration; network isolation only possible after server consolidation
- Phase 1 Implementation:
- Remove hardcoded default API key
- Fail-fast if
OCR_SIDECAR_API_KEYenv missing - Continue validating X-API-Key on both sidecar and backend
- Phase 2 Implementation (after ADR-041 consolidation):
- Remove X-API-Key validation from sidecar endpoints
- Remove X-API-Key send-side from
OcrService - Remove X-API-Key send-side from
SandboxOcrEngineService - Rely on Docker-internal network isolation
- Interim Period: X-API-Key validation must remain active until ADR-041 cutover
- Alternatives Considered:
- Remove X-API-Key immediately → rejected (cross-host topology requires defense-in-depth)
- Keep X-API-Key permanently → rejected (adds complexity without value post-consolidation)
Endpoint Decisions
Remove /normalize Endpoint
- Decision: Remove
/normalizeendpoint from sidecar - Rationale: No consumers exist (verified by grep across backend codebase); ThaiPreprocessProcessor unused
- Verification: Grep search found no calls to
/normalizeorTHAI_PREPROCESS_URL - Impact: None - endpoint has no consumers
Fix Mutable Default Argument
- Decision: Fix mutable default argument
options_override={}inprocess_with_typhoon_ocr - Rationale: Python anti-pattern; causes unexpected behavior when defaults are mutated
- Implementation: Change to
options_override: dict = Noneand initialize to{}in function body
Dependencies
External Dependencies
- FastAPI 0.111.0: Web framework (already in use)
- httpx 0.27.0: Async HTTP client (upgrade from sync httpx)
- PyMuPDF 1.24.0: PDF processing (already in use)
- typhoon-ocr>=0.4.1: OCR library (already in use)
- FlagEmbedding>=1.2.0: Embedding model (already in use)
- pythainlp 5.0.4: Thai NLP (already in use)
Internal Dependencies
- residency_policy.py: Must retain for Adaptive OCR Residency
- vram_monitor.py: Must retain for VRAM monitoring
- backend AI services: OcrService, SandboxOcrEngineService must be updated for parameter resolution
Testing Strategy
Path Traversal Tests
- Test cases for various path traversal patterns (
../../etc/passwd, symlinks, etc.) - Expect 403 Forbidden for all malicious paths
- Use pytest for automated testing
Residency Wiring Tests
- Unit test to verify
calculate_ocr_residency()is called inprocess_ocr - Verify keep_alive value is calculated dynamically, not fixed
- Test with different VRAM pressure scenarios
Performance Tests
- Benchmark async vs sync I/O under concurrent load
- Target: 20%+ throughput improvement
- Measure response times and resource utilization
Rollback Plan
If issues arise during deployment:
- Revert
app.pyto previous version - Restore X-API-Key send-side in backend services
- Re-pin
keep_alivedefault to0inprocess_ocr - Restore hardcoded runtime params if needed for emergency fallback
References
- ADR-040: OCR Sidecar Refactor
- ADR-036: Profile-Only Parameter Governance
- ADR-029: Dynamic Prompt Management
- ADR-037: Active Prompt System
- ADR-041: Server Consolidation (dependency for Phase 2)
- CONTEXT.md: GPU Policy (LLM-First Ownership, CPU Fallback)