Files
lcbp3/specs/200-fullstacks/232-typhoon-ocr-integration/research.md
T
admin ae1b1f35e1
CI / CD Pipeline / build (push) Successful in 4m51s
CI / CD Pipeline / deploy (push) Successful in 12m7s
feat(ai): ADR-032 Typhoon OCR integration - models, processors, cache, VRAM monitor, sandbox UI
2026-05-30 22:18:51 +07:00

4.6 KiB

Research: Typhoon OCR Integration

Feature: 232-typhoon-ocr-integration Date: 2026-05-30 Phase: Phase 0 - Outline & Research

Research Findings

Typhoon OCR Ollama Integration

Decision: Use Ollama HTTP API for Typhoon OCR integration via Admin Desktop (Desk-5439)

Rationale:

  • Typhoon OCR models are available in Ollama registry (scb10x/typhoon-ocr-3b, scb10x/typhoon-ocr-7b)
  • Ollama provides consistent HTTP API for model inference
  • Aligns with ADR-023/023A on-premises AI requirement
  • Existing Ollama infrastructure on Admin Desktop can be reused

Alternatives Considered:

  • OpenTyphoon Cloud API: Rejected due to ADR-023 on-premises requirement
  • Direct model loading in Python: Rejected due to complexity and lack of integration with existing AI infrastructure

Implementation Details:

  • Model: scb10x/typhoon-ocr-3b (~3-4GB VRAM)
  • API endpoint: POST /api/generate with model parameter
  • Input: Image data (base64 or file upload)
  • Output: Extracted text with confidence scores
  • Fallback: Tesseract OCR when Ollama unavailable

Typhoon LLM Model Integration

Decision: Add typhoon2.1-gemma3-4b to AI Model Management as alternative to gemma4

Rationale:

  • Typhoon models are optimized for Thai language
  • Q3_K_M quantization reduces VRAM requirements (~8-10GB vs 16GB+)
  • Provides model selection flexibility for administrators
  • Compatible with existing Ollama infrastructure

Alternatives Considered:

  • Full precision typhoon2.1-gemma3-12b: Rejected due to VRAM constraints
  • Other Typhoon variants: Rejected due to limited availability in Ollama

Implementation Details:

  • Model: typhoon2.1-gemma3-4b (~4-5GB VRAM)
  • Integration via existing AI service with BullMQ queues
  • Requires system.manage_all permission for model selection
  • VRAM monitoring to prevent concurrent model loading

Redis Caching for OCR Results

Decision: Use Redis with 24-hour TTL for OCR result caching

Rationale:

  • Avoid reprocessing same document within short timeframe
  • Redis already in use for other caching needs
  • 24-hour TTL balances performance with storage efficiency
  • Aligns with ADR-023A RAG embedding gap coverage pattern

Alternatives Considered:

  • Permanent database storage: Rejected due to storage growth concerns
  • No caching: Rejected due to performance impact
  • Longer TTL (e.g., 7 days): Rejected due to storage efficiency

Implementation Details:

  • Cache key: ocr:cache:{documentPublicId}:{engine}:{hash}
  • TTL: 86400 seconds (24 hours)
  • Cache invalidation: Manual or on document update
  • Fallback to Tesseract bypasses cache

VRAM Monitoring

Decision: Implement VRAM monitoring via Ollama API and Redis state tracking

Rationale:

  • Prevent VRAM exhaustion when loading multiple models
  • Sequential processing constraint (1 concurrent request)
  • 90% VRAM usage limit per success criteria
  • Ollama provides model status API

Alternatives Considered:

  • GPU monitoring tools (nvidia-smi): Rejected due to complexity and OS dependency
  • No monitoring: Rejected due to risk of VRAM exhaustion

Implementation Details:

  • Monitor via Ollama /api/tags endpoint for loaded models
  • Track VRAM usage in Redis: ai:vram:usage
  • Block model loading if usage > 90%
  • Sequential processing enforced via BullMQ queue

ADR Updates

Decision: Create ADR-032 for Typhoon OCR integration and update ADR-023/023A

Rationale:

  • Document Typhoon models as supported on-premises AI options
  • Resolve conflicts between existing ADRs and new integration
  • Provide clear guidance for future development
  • Maintain ADR consistency per FR-009

Alternatives Considered:

  • Only update existing ADRs: Rejected due to scope and clarity benefits of dedicated ADR
  • No ADR updates: Rejected due to documentation requirements

Implementation Details:

  • ADR-032: Typhoon OCR integration architecture
  • ADR-023: Add Typhoon models to supported AI options
  • ADR-023A: Add Typhoon models as alternatives to gemma4/nomic-embed-text
  • Review for conflicts with existing ADRs

Unknowns Resolved

No NEEDS CLARIFICATION markers remained in Technical Context. All technical decisions documented above.

Dependencies Verified

  • Ollama service operational on Admin Desktop (per ADR-023/023A)
  • Typhoon OCR-3B available in Ollama registry
  • Typhoon2.1-gemma3-4b available in Ollama registry
  • Redis infrastructure available for caching
  • BullMQ infrastructure available for job queues
  • CASL infrastructure available for permission checks

Next Steps

Proceed to Phase 1: Design & Contracts

  • Generate data-model.md
  • Generate API contracts in contracts/
  • Generate quickstart.md
  • Update agent context