4.6 KiB
Research: Typhoon OCR Integration
Feature: 232-typhoon-ocr-integration Date: 2026-05-30 Phase: Phase 0 - Outline & Research
Research Findings
Typhoon OCR Ollama Integration
Decision: Use Ollama HTTP API for Typhoon OCR integration via Admin Desktop (Desk-5439)
Rationale:
- Typhoon OCR models are available in Ollama registry (scb10x/typhoon-ocr-3b, scb10x/typhoon-ocr-7b)
- Ollama provides consistent HTTP API for model inference
- Aligns with ADR-023/023A on-premises AI requirement
- Existing Ollama infrastructure on Admin Desktop can be reused
Alternatives Considered:
- OpenTyphoon Cloud API: Rejected due to ADR-023 on-premises requirement
- Direct model loading in Python: Rejected due to complexity and lack of integration with existing AI infrastructure
Implementation Details:
- Model: scb10x/typhoon-ocr-3b (~3-4GB VRAM)
- API endpoint:
POST /api/generatewith model parameter - Input: Image data (base64 or file upload)
- Output: Extracted text with confidence scores
- Fallback: Tesseract OCR when Ollama unavailable
Typhoon LLM Model Integration
Decision: Add typhoon2.1-gemma3-4b to AI Model Management as alternative to gemma4
Rationale:
- Typhoon models are optimized for Thai language
- Q3_K_M quantization reduces VRAM requirements (~8-10GB vs 16GB+)
- Provides model selection flexibility for administrators
- Compatible with existing Ollama infrastructure
Alternatives Considered:
- Full precision typhoon2.1-gemma3-12b: Rejected due to VRAM constraints
- Other Typhoon variants: Rejected due to limited availability in Ollama
Implementation Details:
- Model: typhoon2.1-gemma3-4b (~4-5GB VRAM)
- Integration via existing AI service with BullMQ queues
- Requires system.manage_all permission for model selection
- VRAM monitoring to prevent concurrent model loading
Redis Caching for OCR Results
Decision: Use Redis with 24-hour TTL for OCR result caching
Rationale:
- Avoid reprocessing same document within short timeframe
- Redis already in use for other caching needs
- 24-hour TTL balances performance with storage efficiency
- Aligns with ADR-023A RAG embedding gap coverage pattern
Alternatives Considered:
- Permanent database storage: Rejected due to storage growth concerns
- No caching: Rejected due to performance impact
- Longer TTL (e.g., 7 days): Rejected due to storage efficiency
Implementation Details:
- Cache key:
ocr:cache:{documentPublicId}:{engine}:{hash} - TTL: 86400 seconds (24 hours)
- Cache invalidation: Manual or on document update
- Fallback to Tesseract bypasses cache
VRAM Monitoring
Decision: Implement VRAM monitoring via Ollama API and Redis state tracking
Rationale:
- Prevent VRAM exhaustion when loading multiple models
- Sequential processing constraint (1 concurrent request)
- 90% VRAM usage limit per success criteria
- Ollama provides model status API
Alternatives Considered:
- GPU monitoring tools (nvidia-smi): Rejected due to complexity and OS dependency
- No monitoring: Rejected due to risk of VRAM exhaustion
Implementation Details:
- Monitor via Ollama
/api/tagsendpoint for loaded models - Track VRAM usage in Redis:
ai:vram:usage - Block model loading if usage > 90%
- Sequential processing enforced via BullMQ queue
ADR Updates
Decision: Create ADR-032 for Typhoon OCR integration and update ADR-023/023A
Rationale:
- Document Typhoon models as supported on-premises AI options
- Resolve conflicts between existing ADRs and new integration
- Provide clear guidance for future development
- Maintain ADR consistency per FR-009
Alternatives Considered:
- Only update existing ADRs: Rejected due to scope and clarity benefits of dedicated ADR
- No ADR updates: Rejected due to documentation requirements
Implementation Details:
- ADR-032: Typhoon OCR integration architecture
- ADR-023: Add Typhoon models to supported AI options
- ADR-023A: Add Typhoon models as alternatives to gemma4/nomic-embed-text
- Review for conflicts with existing ADRs
Unknowns Resolved
No NEEDS CLARIFICATION markers remained in Technical Context. All technical decisions documented above.
Dependencies Verified
- ✅ Ollama service operational on Admin Desktop (per ADR-023/023A)
- ✅ Typhoon OCR-3B available in Ollama registry
- ✅ Typhoon2.1-gemma3-4b available in Ollama registry
- ✅ Redis infrastructure available for caching
- ✅ BullMQ infrastructure available for job queues
- ✅ CASL infrastructure available for permission checks
Next Steps
Proceed to Phase 1: Design & Contracts
- Generate data-model.md
- Generate API contracts in contracts/
- Generate quickstart.md
- Update agent context