np-dms/lcbp3

Fork 0

Files

T

admin ae1b1f35e1

CI / CD Pipeline / build (push) Successful in 4m51s

Details

CI / CD Pipeline / deploy (push) Successful in 12m7s

Details

feat(ai): ADR-032 Typhoon OCR integration - models, processors, cache, VRAM monitor, sandbox UI

2026-05-30 22:18:51 +07:00

4.6 KiB

Raw Blame History

Research: Typhoon OCR Integration

Feature: 232-typhoon-ocr-integration Date: 2026-05-30 Phase: Phase 0 - Outline & Research

Research Findings

Typhoon OCR Ollama Integration

Decision: Use Ollama HTTP API for Typhoon OCR integration via Admin Desktop (Desk-5439)

Rationale:

Typhoon OCR models are available in Ollama registry (scb10x/typhoon-ocr-3b, scb10x/typhoon-ocr-7b)
Ollama provides consistent HTTP API for model inference
Aligns with ADR-023/023A on-premises AI requirement
Existing Ollama infrastructure on Admin Desktop can be reused

Alternatives Considered:

OpenTyphoon Cloud API: Rejected due to ADR-023 on-premises requirement
Direct model loading in Python: Rejected due to complexity and lack of integration with existing AI infrastructure

Implementation Details:

Model: scb10x/typhoon-ocr-3b (~3-4GB VRAM)
API endpoint: POST /api/generate with model parameter
Input: Image data (base64 or file upload)
Output: Extracted text with confidence scores
Fallback: Tesseract OCR when Ollama unavailable

Typhoon LLM Model Integration

Decision: Add typhoon2.1-gemma3-4b to AI Model Management as alternative to gemma4

Rationale:

Typhoon models are optimized for Thai language
Q3_K_M quantization reduces VRAM requirements (~8-10GB vs 16GB+)
Provides model selection flexibility for administrators
Compatible with existing Ollama infrastructure

Alternatives Considered:

Full precision typhoon2.1-gemma3-12b: Rejected due to VRAM constraints
Other Typhoon variants: Rejected due to limited availability in Ollama

Implementation Details:

Model: typhoon2.1-gemma3-4b (~4-5GB VRAM)
Integration via existing AI service with BullMQ queues
Requires system.manage_all permission for model selection
VRAM monitoring to prevent concurrent model loading

Redis Caching for OCR Results

Decision: Use Redis with 24-hour TTL for OCR result caching

Rationale:

Avoid reprocessing same document within short timeframe
Redis already in use for other caching needs
24-hour TTL balances performance with storage efficiency
Aligns with ADR-023A RAG embedding gap coverage pattern

Alternatives Considered:

Permanent database storage: Rejected due to storage growth concerns
No caching: Rejected due to performance impact
Longer TTL (e.g., 7 days): Rejected due to storage efficiency

Implementation Details:

Cache key: ocr:cache:{documentPublicId}:{engine}:{hash}
TTL: 86400 seconds (24 hours)
Cache invalidation: Manual or on document update
Fallback to Tesseract bypasses cache

VRAM Monitoring

Decision: Implement VRAM monitoring via Ollama API and Redis state tracking

Rationale:

Prevent VRAM exhaustion when loading multiple models
Sequential processing constraint (1 concurrent request)
90% VRAM usage limit per success criteria
Ollama provides model status API

Alternatives Considered:

GPU monitoring tools (nvidia-smi): Rejected due to complexity and OS dependency
No monitoring: Rejected due to risk of VRAM exhaustion

Implementation Details:

Monitor via Ollama /api/tags endpoint for loaded models
Track VRAM usage in Redis: ai:vram:usage
Block model loading if usage > 90%
Sequential processing enforced via BullMQ queue

ADR Updates

Decision: Create ADR-032 for Typhoon OCR integration and update ADR-023/023A

Rationale:

Document Typhoon models as supported on-premises AI options
Resolve conflicts between existing ADRs and new integration
Provide clear guidance for future development
Maintain ADR consistency per FR-009

Alternatives Considered:

Only update existing ADRs: Rejected due to scope and clarity benefits of dedicated ADR
No ADR updates: Rejected due to documentation requirements

Implementation Details:

ADR-032: Typhoon OCR integration architecture
ADR-023: Add Typhoon models to supported AI options
ADR-023A: Add Typhoon models as alternatives to gemma4/nomic-embed-text
Review for conflicts with existing ADRs

Unknowns Resolved

No NEEDS CLARIFICATION markers remained in Technical Context. All technical decisions documented above.

Dependencies Verified

✅ Ollama service operational on Admin Desktop (per ADR-023/023A)
✅ Typhoon OCR-3B available in Ollama registry
✅ Typhoon2.1-gemma3-4b available in Ollama registry
✅ Redis infrastructure available for caching
✅ BullMQ infrastructure available for job queues
✅ CASL infrastructure available for permission checks

Next Steps

Proceed to Phase 1: Design & Contracts

Generate data-model.md
Generate API contracts in contracts/
Generate quickstart.md
Update agent context

4.6 KiB Raw Blame History

Research: Typhoon OCR Integration

Research Findings

Typhoon OCR Ollama Integration

Typhoon LLM Model Integration

Redis Caching for OCR Results

VRAM Monitoring

ADR Updates

Unknowns Resolved

Dependencies Verified

Next Steps

4.6 KiB

Raw Blame History