Files
lcbp3/specs/200-fullstacks/232-typhoon-ocr-integration/spec.md
T
admin ae1b1f35e1
CI / CD Pipeline / build (push) Successful in 4m51s
CI / CD Pipeline / deploy (push) Successful in 12m7s
feat(ai): ADR-032 Typhoon OCR integration - models, processors, cache, VRAM monitor, sandbox UI
2026-05-30 22:18:51 +07:00

10 KiB

// File: specs/200-fullstacks/232-typhoon-ocr-integration/spec.md // Change Log: // - 2026-05-30: Initial specification for Typhoon OCR integration // - 2026-05-30: Updated VRAM strategy (keep_alive=0), System Prompt (Option 2), and hyperparameters.

Feature Specification: Typhoon OCR Integration

Feature Branch: 232-typhoon-ocr-integration Created: 2026-05-30 Status: Draft Category: 200-fullstacks Input: User description: "refactor ส่วนที่เกี่ยวข้อง, เพิ่ม typhoon2.1-gemma3-12b Q3_K_M ใน option AI Model Management, เพิ่ม typhoon-ocr-7b ~5-6GB VRAM (ollama) เป็น option ใน OCR Sandbox Runner, ให้ปรับปรุง ADR ที่ขัดแย้งด้วย"

Clarifications

Session 2026-05-30

  • Q: What permission level should be required for users to select Typhoon OCR in OCR Sandbox Runner? → A: Only system administrators (system.manage_all)
  • Q: What is the maximum acceptable processing time for Typhoon OCR to extract text from a single document page? → A: Under 60 seconds per page
  • Q: What permission level should be required for AI administrators to add typhoon2.1-gemma3-4b to AI Model Management? → A: Only system administrators (system.manage_all)
  • Q: What is the maximum number of concurrent Typhoon OCR requests the system should support? → A: 1 concurrent request (sequential processing only)
  • Q: Should Typhoon OCR results be cached or stored for future reference? → A: Cache results temporarily (24 hours) in Redis but not persist permanently
  • Q: What are the Typhoon OCR model hyperparameters? → A: temperature = 0.0, top_p = 0.9, repeat_penalty = 1.0, and keep_alive = 0 to unload VRAM immediately.
  • Q: What is the System Prompt for Typhoon OCR? → A: "สกัดข้อความภาษาไทยและอังกฤษทั้งหมดจากภาพนี้อย่างถูกต้อง รักษาโครงสร้างบรรทัดและการเว้นวรรคให้ใกล้เคียงต้นฉบับมากที่สุด ห้ามเพิ่มคำอธิบายใดๆ"

User Scenarios & Testing (mandatory)

User Story 1 - Typhoon OCR Option in OCR Sandbox (Priority: P1)

As a document processor, I want to use Typhoon OCR as an alternative to Tesseract for better Thai text extraction accuracy, so that I can achieve higher OCR accuracy (95%+) for Thai documents.

Why this priority: This is the primary user-facing value - improved OCR accuracy directly impacts document processing quality and reduces manual correction effort.

Independent Test: Can be fully tested by selecting Typhoon OCR in OCR Sandbox Runner and processing a Thai document, delivering improved text extraction accuracy compared to Tesseract.

Acceptance Scenarios:

  1. Given a user has access to OCR Sandbox Runner, When they select "Typhoon OCR-3B" as the OCR engine option, Then the system should process the document using Typhoon OCR via Ollama and return extracted text.
  2. Given a document is processed with Typhoon OCR, When the OCR completes, Then the extracted text should have accuracy comparable to or better than Tesseract (target: 95%+ for Thai text).
  3. Given Typhoon OCR is selected, When the Ollama service is unavailable, Then the system should fall back to Tesseract OCR and display a warning message.

User Story 2 - Typhoon LLM in AI Model Management (Priority: P2)

As an AI administrator, I want to add typhoon2.1-gemma3-4b as an option in AI Model Management, so that I can use this model for AI-powered document analysis tasks.

Why this priority: This enables model selection flexibility and allows administrators to choose between different LLM models based on performance and resource requirements.

Independent Test: Can be fully tested by adding typhoon2.1-gemma3-4b to the AI Model Management configuration and selecting it for a document analysis task.

Acceptance Scenarios:

  1. Given an AI administrator has system.manage_all permission, When they add typhoon2.1-gemma3-4b to the AI model options, Then the model should be available for selection in AI-powered features.
  2. Given typhoon2.1-gemma3-4b is selected, When a document analysis task is initiated, Then the system should use this model via Ollama for inference.
  3. Given the GPU has limited VRAM, When typhoon2.1-gemma3-4b is loaded, Then the system should monitor VRAM usage and prevent concurrent model loading if VRAM would be exceeded.

User Story 3 - ADR Conflict Resolution (Priority: P3)

As a system architect, I want to update ADR-023 and ADR-023A to include Typhoon OCR and Typhoon LLM models, so that the architecture documentation reflects the current AI infrastructure capabilities.

Why this priority: This ensures architectural decisions remain accurate and provide clear guidance for future development and compliance checks.

Independent Test: Can be fully tested by reviewing the updated ADRs and verifying they correctly document Typhoon model integration without conflicts.

Acceptance Scenarios:

  1. Given ADR-023 and ADR-023A exist, When they are updated to include Typhoon models, Then the ADRs should clearly specify Typhoon OCR and Typhoon LLM as supported on-premises AI options.
  2. Given ADR-023A is updated, When it describes the 2-model stack, Then it should include Typhoon models as alternatives to gemma4 and nomic-embed-text where applicable.
  3. Given ADR conflicts are identified, When they are resolved, Then all ADRs should be consistent with each other and with the actual implementation.

Edge Cases

  • What happens when Ollama service is down or unresponsive?
  • How does system handle VRAM exhaustion when multiple AI models are loaded? (Solved by sequential loading and Ollama keep_alive = 0 configuration).
  • What happens when Typhoon OCR model fails to load or crashes during processing?
  • How does system handle concurrent OCR requests when Typhoon OCR is selected?
  • What happens when user selects Typhoon OCR but the model is not installed in Ollama?
  • How does system handle fallback to Tesseract when Typhoon OCR fails?
  • What happens when GPU VRAM is insufficient for Typhoon OCR-3B (3-4GB)?

Requirements (mandatory)

Functional Requirements

  • FR-001: System MUST provide Typhoon OCR-3B as an option in OCR Sandbox Runner alongside Tesseract OCR.
  • FR-002: System MUST allow users with system.manage_all permission to select between Tesseract OCR and Typhoon OCR for document text extraction.
  • FR-003: System MUST integrate Typhoon OCR via Ollama service on Admin Desktop (on-premises only, per ADR-023/023A) with CASL Guard for all AI-related endpoints per ADR-016.
  • FR-004: System MUST fall back to Tesseract OCR when Typhoon OCR is unavailable or fails, with appropriate user notification.
  • FR-005: System MUST allow users with system.manage_all permission to add typhoon2.1-gemma3-4b as an option in AI Model Management configuration with CASL Guard per ADR-016.
  • FR-006: System MUST allow AI administrators with system.manage_all permission to select typhoon2.1-gemma3-4b for AI-powered document analysis tasks with CASL Guard per ADR-016.
  • FR-007: System MUST monitor GPU VRAM usage and prevent concurrent model loading if VRAM would be exceeded.
  • FR-011: System MUST process Typhoon OCR requests sequentially (1 concurrent request) to manage VRAM and model loading constraints.
  • FR-012: System MUST cache Typhoon OCR results temporarily (24 hours in Redis: ocr:cache:{documentPublicId}:{engine}:{hash}) to avoid reprocessing the same document. Cache invalidation occurs automatically on document update or manually via admin API.
  • FR-008: System MUST update ADR-023 and ADR-023A to document Typhoon OCR and Typhoon LLM as supported on-premises AI options.
  • FR-009: System MUST ensure ADR consistency - no conflicts between ADR-023, ADR-023A, and ADR-032 regarding Typhoon model integration.
  • FR-010: System MUST log all Typhoon OCR and Typhoon LLM interactions in ai_audit_logs per ADR-023/023A requirements.

Key Entities

  • OCR Engine Configuration: Represents the available OCR engines (Tesseract, Typhoon OCR) with their parameters and resource requirements.
  • AI Model Configuration: Represents the available AI models (gemma4, typhoon2.1-gemma3-4b, nomic-embed-text) with their VRAM requirements and use cases.
  • VRAM Monitor: Tracks GPU VRAM usage across all loaded AI models to prevent resource exhaustion.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: Typhoon OCR achieves 95%+ accuracy for Thai text extraction compared to Tesseract's 90% baseline (measured at character-level accuracy).
  • SC-002: Typhoon OCR processes a single document page within 60 seconds (per-page timing).
  • SC-003: System successfully falls back to Tesseract OCR within 5 seconds when Typhoon OCR is unavailable.
  • SC-004: GPU VRAM usage never exceeds 90% of available VRAM when multiple AI models are loaded.
  • SC-005: AI administrators can successfully add and select typhoon2.1-gemma3-4b in AI Model Management within 2 minutes.
  • SC-006: ADR-023 and ADR-023A are updated and reviewed with no conflicts identified within 1 business day.
  • SC-007: All Typhoon OCR and Typhoon LLM interactions are logged in ai_audit_logs with 100% coverage.

Assumptions

  • Admin Desktop (Desk-5439) has sufficient GPU VRAM (8GB+) to support Typhoon OCR-3B (~3-4GB) and other AI models sequentially.
  • Ollama service is already installed and running on Admin Desktop per ADR-023/023A.
  • Typhoon OCR-3B and typhoon2.1-gemma3-4b models are available in Ollama registry and can be pulled.
  • Current Tesseract OCR implementation (90% accuracy) is acceptable as a fallback option.
  • OCR Sandbox Runner and AI Model Management components exist and can be refactored to support additional options.
  • OCR sidecar uses Python 3.11 for Typhoon OCR integration.

Dependencies

  • ADR-023/023A must be updated to include Typhoon models before implementation begins.
  • Ollama service on Admin Desktop must be operational and accessible.
  • Typhoon OCR-3B and typhoon2.1-gemma3-4b models must be available in Ollama.
  • Existing OCR Sandbox Runner component must be refactored to support multiple OCR engines.
  • Existing AI Model Management component must be refactored to support additional LLM models.
  • VRAM monitoring capability must be implemented or enhanced.