Files
admin ae1b1f35e1
CI / CD Pipeline / build (push) Successful in 4m51s
CI / CD Pipeline / deploy (push) Successful in 12m7s
feat(ai): ADR-032 Typhoon OCR integration - models, processors, cache, VRAM monitor, sandbox UI
2026-05-30 22:18:51 +07:00

131 lines
4.6 KiB
Markdown

# Research: Typhoon OCR Integration
**Feature**: 232-typhoon-ocr-integration
**Date**: 2026-05-30
**Phase**: Phase 0 - Outline & Research
## Research Findings
### Typhoon OCR Ollama Integration
**Decision**: Use Ollama HTTP API for Typhoon OCR integration via Admin Desktop (Desk-5439)
**Rationale**:
- Typhoon OCR models are available in Ollama registry (scb10x/typhoon-ocr-3b, scb10x/typhoon-ocr-7b)
- Ollama provides consistent HTTP API for model inference
- Aligns with ADR-023/023A on-premises AI requirement
- Existing Ollama infrastructure on Admin Desktop can be reused
**Alternatives Considered**:
- OpenTyphoon Cloud API: Rejected due to ADR-023 on-premises requirement
- Direct model loading in Python: Rejected due to complexity and lack of integration with existing AI infrastructure
**Implementation Details**:
- Model: scb10x/typhoon-ocr-3b (~3-4GB VRAM)
- API endpoint: `POST /api/generate` with model parameter
- Input: Image data (base64 or file upload)
- Output: Extracted text with confidence scores
- Fallback: Tesseract OCR when Ollama unavailable
### Typhoon LLM Model Integration
**Decision**: Add typhoon2.1-gemma3-4b to AI Model Management as alternative to gemma4
**Rationale**:
- Typhoon models are optimized for Thai language
- Q3_K_M quantization reduces VRAM requirements (~8-10GB vs 16GB+)
- Provides model selection flexibility for administrators
- Compatible with existing Ollama infrastructure
**Alternatives Considered**:
- Full precision typhoon2.1-gemma3-12b: Rejected due to VRAM constraints
- Other Typhoon variants: Rejected due to limited availability in Ollama
**Implementation Details**:
- Model: typhoon2.1-gemma3-4b (~4-5GB VRAM)
- Integration via existing AI service with BullMQ queues
- Requires system.manage_all permission for model selection
- VRAM monitoring to prevent concurrent model loading
### Redis Caching for OCR Results
**Decision**: Use Redis with 24-hour TTL for OCR result caching
**Rationale**:
- Avoid reprocessing same document within short timeframe
- Redis already in use for other caching needs
- 24-hour TTL balances performance with storage efficiency
- Aligns with ADR-023A RAG embedding gap coverage pattern
**Alternatives Considered**:
- Permanent database storage: Rejected due to storage growth concerns
- No caching: Rejected due to performance impact
- Longer TTL (e.g., 7 days): Rejected due to storage efficiency
**Implementation Details**:
- Cache key: `ocr:cache:{documentPublicId}:{engine}:{hash}`
- TTL: 86400 seconds (24 hours)
- Cache invalidation: Manual or on document update
- Fallback to Tesseract bypasses cache
### VRAM Monitoring
**Decision**: Implement VRAM monitoring via Ollama API and Redis state tracking
**Rationale**:
- Prevent VRAM exhaustion when loading multiple models
- Sequential processing constraint (1 concurrent request)
- 90% VRAM usage limit per success criteria
- Ollama provides model status API
**Alternatives Considered**:
- GPU monitoring tools (nvidia-smi): Rejected due to complexity and OS dependency
- No monitoring: Rejected due to risk of VRAM exhaustion
**Implementation Details**:
- Monitor via Ollama `/api/tags` endpoint for loaded models
- Track VRAM usage in Redis: `ai:vram:usage`
- Block model loading if usage > 90%
- Sequential processing enforced via BullMQ queue
### ADR Updates
**Decision**: Create ADR-032 for Typhoon OCR integration and update ADR-023/023A
**Rationale**:
- Document Typhoon models as supported on-premises AI options
- Resolve conflicts between existing ADRs and new integration
- Provide clear guidance for future development
- Maintain ADR consistency per FR-009
**Alternatives Considered**:
- Only update existing ADRs: Rejected due to scope and clarity benefits of dedicated ADR
- No ADR updates: Rejected due to documentation requirements
**Implementation Details**:
- ADR-032: Typhoon OCR integration architecture
- ADR-023: Add Typhoon models to supported AI options
- ADR-023A: Add Typhoon models as alternatives to gemma4/nomic-embed-text
- Review for conflicts with existing ADRs
## Unknowns Resolved
No NEEDS CLARIFICATION markers remained in Technical Context. All technical decisions documented above.
## Dependencies Verified
- ✅ Ollama service operational on Admin Desktop (per ADR-023/023A)
- ✅ Typhoon OCR-3B available in Ollama registry
- ✅ Typhoon2.1-gemma3-4b available in Ollama registry
- ✅ Redis infrastructure available for caching
- ✅ BullMQ infrastructure available for job queues
- ✅ CASL infrastructure available for permission checks
## Next Steps
Proceed to Phase 1: Design & Contracts
- Generate data-model.md
- Generate API contracts in contracts/
- Generate quickstart.md
- Update agent context