10 KiB
// File: specs/200-fullstacks/232-typhoon-ocr-integration/spec.md // Change Log: // - 2026-05-30: Initial specification for Typhoon OCR integration // - 2026-05-30: Updated VRAM strategy (keep_alive=0), System Prompt (Option 2), and hyperparameters.
Feature Specification: Typhoon OCR Integration
Feature Branch: 232-typhoon-ocr-integration
Created: 2026-05-30
Status: Draft
Category: 200-fullstacks
Input: User description: "refactor ส่วนที่เกี่ยวข้อง, เพิ่ม typhoon2.1-gemma3-12b Q3_K_M ใน option AI Model Management, เพิ่ม typhoon-ocr-7b ~5-6GB VRAM (ollama) เป็น option ใน OCR Sandbox Runner, ให้ปรับปรุง ADR ที่ขัดแย้งด้วย"
Clarifications
Session 2026-05-30
- Q: What permission level should be required for users to select Typhoon OCR in OCR Sandbox Runner? → A: Only system administrators (system.manage_all)
- Q: What is the maximum acceptable processing time for Typhoon OCR to extract text from a single document page? → A: Under 60 seconds per page
- Q: What permission level should be required for AI administrators to add typhoon2.1-gemma3-4b to AI Model Management? → A: Only system administrators (system.manage_all)
- Q: What is the maximum number of concurrent Typhoon OCR requests the system should support? → A: 1 concurrent request (sequential processing only)
- Q: Should Typhoon OCR results be cached or stored for future reference? → A: Cache results temporarily (24 hours) in Redis but not persist permanently
- Q: What are the Typhoon OCR model hyperparameters? → A: temperature = 0.0, top_p = 0.9, repeat_penalty = 1.0, and keep_alive = 0 to unload VRAM immediately.
- Q: What is the System Prompt for Typhoon OCR? → A:
"สกัดข้อความภาษาไทยและอังกฤษทั้งหมดจากภาพนี้อย่างถูกต้อง รักษาโครงสร้างบรรทัดและการเว้นวรรคให้ใกล้เคียงต้นฉบับมากที่สุด ห้ามเพิ่มคำอธิบายใดๆ"
User Scenarios & Testing (mandatory)
User Story 1 - Typhoon OCR Option in OCR Sandbox (Priority: P1)
As a document processor, I want to use Typhoon OCR as an alternative to Tesseract for better Thai text extraction accuracy, so that I can achieve higher OCR accuracy (95%+) for Thai documents.
Why this priority: This is the primary user-facing value - improved OCR accuracy directly impacts document processing quality and reduces manual correction effort.
Independent Test: Can be fully tested by selecting Typhoon OCR in OCR Sandbox Runner and processing a Thai document, delivering improved text extraction accuracy compared to Tesseract.
Acceptance Scenarios:
- Given a user has access to OCR Sandbox Runner, When they select "Typhoon OCR-3B" as the OCR engine option, Then the system should process the document using Typhoon OCR via Ollama and return extracted text.
- Given a document is processed with Typhoon OCR, When the OCR completes, Then the extracted text should have accuracy comparable to or better than Tesseract (target: 95%+ for Thai text).
- Given Typhoon OCR is selected, When the Ollama service is unavailable, Then the system should fall back to Tesseract OCR and display a warning message.
User Story 2 - Typhoon LLM in AI Model Management (Priority: P2)
As an AI administrator, I want to add typhoon2.1-gemma3-4b as an option in AI Model Management, so that I can use this model for AI-powered document analysis tasks.
Why this priority: This enables model selection flexibility and allows administrators to choose between different LLM models based on performance and resource requirements.
Independent Test: Can be fully tested by adding typhoon2.1-gemma3-4b to the AI Model Management configuration and selecting it for a document analysis task.
Acceptance Scenarios:
- Given an AI administrator has system.manage_all permission, When they add typhoon2.1-gemma3-4b to the AI model options, Then the model should be available for selection in AI-powered features.
- Given typhoon2.1-gemma3-4b is selected, When a document analysis task is initiated, Then the system should use this model via Ollama for inference.
- Given the GPU has limited VRAM, When typhoon2.1-gemma3-4b is loaded, Then the system should monitor VRAM usage and prevent concurrent model loading if VRAM would be exceeded.
User Story 3 - ADR Conflict Resolution (Priority: P3)
As a system architect, I want to update ADR-023 and ADR-023A to include Typhoon OCR and Typhoon LLM models, so that the architecture documentation reflects the current AI infrastructure capabilities.
Why this priority: This ensures architectural decisions remain accurate and provide clear guidance for future development and compliance checks.
Independent Test: Can be fully tested by reviewing the updated ADRs and verifying they correctly document Typhoon model integration without conflicts.
Acceptance Scenarios:
- Given ADR-023 and ADR-023A exist, When they are updated to include Typhoon models, Then the ADRs should clearly specify Typhoon OCR and Typhoon LLM as supported on-premises AI options.
- Given ADR-023A is updated, When it describes the 2-model stack, Then it should include Typhoon models as alternatives to gemma4 and nomic-embed-text where applicable.
- Given ADR conflicts are identified, When they are resolved, Then all ADRs should be consistent with each other and with the actual implementation.
Edge Cases
- What happens when Ollama service is down or unresponsive?
- How does system handle VRAM exhaustion when multiple AI models are loaded? (Solved by sequential loading and Ollama
keep_alive = 0configuration). - What happens when Typhoon OCR model fails to load or crashes during processing?
- How does system handle concurrent OCR requests when Typhoon OCR is selected?
- What happens when user selects Typhoon OCR but the model is not installed in Ollama?
- How does system handle fallback to Tesseract when Typhoon OCR fails?
- What happens when GPU VRAM is insufficient for Typhoon OCR-3B (3-4GB)?
Requirements (mandatory)
Functional Requirements
- FR-001: System MUST provide Typhoon OCR-3B as an option in OCR Sandbox Runner alongside Tesseract OCR.
- FR-002: System MUST allow users with system.manage_all permission to select between Tesseract OCR and Typhoon OCR for document text extraction.
- FR-003: System MUST integrate Typhoon OCR via Ollama service on Admin Desktop (on-premises only, per ADR-023/023A) with CASL Guard for all AI-related endpoints per ADR-016.
- FR-004: System MUST fall back to Tesseract OCR when Typhoon OCR is unavailable or fails, with appropriate user notification.
- FR-005: System MUST allow users with system.manage_all permission to add typhoon2.1-gemma3-4b as an option in AI Model Management configuration with CASL Guard per ADR-016.
- FR-006: System MUST allow AI administrators with system.manage_all permission to select typhoon2.1-gemma3-4b for AI-powered document analysis tasks with CASL Guard per ADR-016.
- FR-007: System MUST monitor GPU VRAM usage and prevent concurrent model loading if VRAM would be exceeded.
- FR-011: System MUST process Typhoon OCR requests sequentially (1 concurrent request) to manage VRAM and model loading constraints.
- FR-012: System MUST cache Typhoon OCR results temporarily (24 hours in Redis:
ocr:cache:{documentPublicId}:{engine}:{hash}) to avoid reprocessing the same document. Cache invalidation occurs automatically on document update or manually via admin API. - FR-008: System MUST update ADR-023 and ADR-023A to document Typhoon OCR and Typhoon LLM as supported on-premises AI options.
- FR-009: System MUST ensure ADR consistency - no conflicts between ADR-023, ADR-023A, and ADR-032 regarding Typhoon model integration.
- FR-010: System MUST log all Typhoon OCR and Typhoon LLM interactions in ai_audit_logs per ADR-023/023A requirements.
Key Entities
- OCR Engine Configuration: Represents the available OCR engines (Tesseract, Typhoon OCR) with their parameters and resource requirements.
- AI Model Configuration: Represents the available AI models (gemma4, typhoon2.1-gemma3-4b, nomic-embed-text) with their VRAM requirements and use cases.
- VRAM Monitor: Tracks GPU VRAM usage across all loaded AI models to prevent resource exhaustion.
Success Criteria (mandatory)
Measurable Outcomes
- SC-001: Typhoon OCR achieves 95%+ accuracy for Thai text extraction compared to Tesseract's 90% baseline (measured at character-level accuracy).
- SC-002: Typhoon OCR processes a single document page within 60 seconds (per-page timing).
- SC-003: System successfully falls back to Tesseract OCR within 5 seconds when Typhoon OCR is unavailable.
- SC-004: GPU VRAM usage never exceeds 90% of available VRAM when multiple AI models are loaded.
- SC-005: AI administrators can successfully add and select typhoon2.1-gemma3-4b in AI Model Management within 2 minutes.
- SC-006: ADR-023 and ADR-023A are updated and reviewed with no conflicts identified within 1 business day.
- SC-007: All Typhoon OCR and Typhoon LLM interactions are logged in ai_audit_logs with 100% coverage.
Assumptions
- Admin Desktop (Desk-5439) has sufficient GPU VRAM (8GB+) to support Typhoon OCR-3B (~3-4GB) and other AI models sequentially.
- Ollama service is already installed and running on Admin Desktop per ADR-023/023A.
- Typhoon OCR-3B and typhoon2.1-gemma3-4b models are available in Ollama registry and can be pulled.
- Current Tesseract OCR implementation (90% accuracy) is acceptable as a fallback option.
- OCR Sandbox Runner and AI Model Management components exist and can be refactored to support additional options.
- OCR sidecar uses Python 3.11 for Typhoon OCR integration.
Dependencies
- ADR-023/023A must be updated to include Typhoon models before implementation begins.
- Ollama service on Admin Desktop must be operational and accessible.
- Typhoon OCR-3B and typhoon2.1-gemma3-4b models must be available in Ollama.
- Existing OCR Sandbox Runner component must be refactored to support multiple OCR engines.
- Existing AI Model Management component must be refactored to support additional LLM models.
- VRAM monitoring capability must be implemented or enhanced.