131 lines
4.6 KiB
Markdown
131 lines
4.6 KiB
Markdown
# Research: Typhoon OCR Integration
|
|
|
|
**Feature**: 232-typhoon-ocr-integration
|
|
**Date**: 2026-05-30
|
|
**Phase**: Phase 0 - Outline & Research
|
|
|
|
## Research Findings
|
|
|
|
### Typhoon OCR Ollama Integration
|
|
|
|
**Decision**: Use Ollama HTTP API for Typhoon OCR integration via Admin Desktop (Desk-5439)
|
|
|
|
**Rationale**:
|
|
- Typhoon OCR models are available in Ollama registry (scb10x/typhoon-ocr-3b, scb10x/typhoon-ocr-7b)
|
|
- Ollama provides consistent HTTP API for model inference
|
|
- Aligns with ADR-023/023A on-premises AI requirement
|
|
- Existing Ollama infrastructure on Admin Desktop can be reused
|
|
|
|
**Alternatives Considered**:
|
|
- OpenTyphoon Cloud API: Rejected due to ADR-023 on-premises requirement
|
|
- Direct model loading in Python: Rejected due to complexity and lack of integration with existing AI infrastructure
|
|
|
|
**Implementation Details**:
|
|
- Model: scb10x/typhoon-ocr-3b (~3-4GB VRAM)
|
|
- API endpoint: `POST /api/generate` with model parameter
|
|
- Input: Image data (base64 or file upload)
|
|
- Output: Extracted text with confidence scores
|
|
- Fallback: Tesseract OCR when Ollama unavailable
|
|
|
|
### Typhoon LLM Model Integration
|
|
|
|
**Decision**: Add typhoon2.1-gemma3-4b to AI Model Management as alternative to gemma4
|
|
|
|
**Rationale**:
|
|
- Typhoon models are optimized for Thai language
|
|
- Q3_K_M quantization reduces VRAM requirements (~8-10GB vs 16GB+)
|
|
- Provides model selection flexibility for administrators
|
|
- Compatible with existing Ollama infrastructure
|
|
|
|
**Alternatives Considered**:
|
|
- Full precision typhoon2.1-gemma3-12b: Rejected due to VRAM constraints
|
|
- Other Typhoon variants: Rejected due to limited availability in Ollama
|
|
|
|
**Implementation Details**:
|
|
- Model: typhoon2.1-gemma3-4b (~4-5GB VRAM)
|
|
- Integration via existing AI service with BullMQ queues
|
|
- Requires system.manage_all permission for model selection
|
|
- VRAM monitoring to prevent concurrent model loading
|
|
|
|
### Redis Caching for OCR Results
|
|
|
|
**Decision**: Use Redis with 24-hour TTL for OCR result caching
|
|
|
|
**Rationale**:
|
|
- Avoid reprocessing same document within short timeframe
|
|
- Redis already in use for other caching needs
|
|
- 24-hour TTL balances performance with storage efficiency
|
|
- Aligns with ADR-023A RAG embedding gap coverage pattern
|
|
|
|
**Alternatives Considered**:
|
|
- Permanent database storage: Rejected due to storage growth concerns
|
|
- No caching: Rejected due to performance impact
|
|
- Longer TTL (e.g., 7 days): Rejected due to storage efficiency
|
|
|
|
**Implementation Details**:
|
|
- Cache key: `ocr:cache:{documentPublicId}:{engine}:{hash}`
|
|
- TTL: 86400 seconds (24 hours)
|
|
- Cache invalidation: Manual or on document update
|
|
- Fallback to Tesseract bypasses cache
|
|
|
|
### VRAM Monitoring
|
|
|
|
**Decision**: Implement VRAM monitoring via Ollama API and Redis state tracking
|
|
|
|
**Rationale**:
|
|
- Prevent VRAM exhaustion when loading multiple models
|
|
- Sequential processing constraint (1 concurrent request)
|
|
- 90% VRAM usage limit per success criteria
|
|
- Ollama provides model status API
|
|
|
|
**Alternatives Considered**:
|
|
- GPU monitoring tools (nvidia-smi): Rejected due to complexity and OS dependency
|
|
- No monitoring: Rejected due to risk of VRAM exhaustion
|
|
|
|
**Implementation Details**:
|
|
- Monitor via Ollama `/api/tags` endpoint for loaded models
|
|
- Track VRAM usage in Redis: `ai:vram:usage`
|
|
- Block model loading if usage > 90%
|
|
- Sequential processing enforced via BullMQ queue
|
|
|
|
### ADR Updates
|
|
|
|
**Decision**: Create ADR-032 for Typhoon OCR integration and update ADR-023/023A
|
|
|
|
**Rationale**:
|
|
- Document Typhoon models as supported on-premises AI options
|
|
- Resolve conflicts between existing ADRs and new integration
|
|
- Provide clear guidance for future development
|
|
- Maintain ADR consistency per FR-009
|
|
|
|
**Alternatives Considered**:
|
|
- Only update existing ADRs: Rejected due to scope and clarity benefits of dedicated ADR
|
|
- No ADR updates: Rejected due to documentation requirements
|
|
|
|
**Implementation Details**:
|
|
- ADR-032: Typhoon OCR integration architecture
|
|
- ADR-023: Add Typhoon models to supported AI options
|
|
- ADR-023A: Add Typhoon models as alternatives to gemma4/nomic-embed-text
|
|
- Review for conflicts with existing ADRs
|
|
|
|
## Unknowns Resolved
|
|
|
|
No NEEDS CLARIFICATION markers remained in Technical Context. All technical decisions documented above.
|
|
|
|
## Dependencies Verified
|
|
|
|
- ✅ Ollama service operational on Admin Desktop (per ADR-023/023A)
|
|
- ✅ Typhoon OCR-3B available in Ollama registry
|
|
- ✅ Typhoon2.1-gemma3-4b available in Ollama registry
|
|
- ✅ Redis infrastructure available for caching
|
|
- ✅ BullMQ infrastructure available for job queues
|
|
- ✅ CASL infrastructure available for permission checks
|
|
|
|
## Next Steps
|
|
|
|
Proceed to Phase 1: Design & Contracts
|
|
- Generate data-model.md
|
|
- Generate API contracts in contracts/
|
|
- Generate quickstart.md
|
|
- Update agent context
|