lcbp3/specs/200-fullstacks/232-typhoon-ocr-integration/spec.md

// File: specs/200-fullstacks/232-typhoon-ocr-integration/spec.md
// Change Log:
// - 2026-05-30: Initial specification for Typhoon OCR integration
// - 2026-05-30: Updated VRAM strategy (keep_alive=0), System Prompt (Option 2), and hyperparameters.

# Feature Specification: Typhoon OCR Integration

**Feature Branch**: `232-typhoon-ocr-integration`
**Created**: 2026-05-30
**Status**: Draft
**Category**: 200-fullstacks
**Input**: User description: "refactor ส่วนที่เกี่ยวข้อง, เพิ่ม typhoon2.1-gemma3-12b Q3_K_M ใน option AI Model Management, เพิ่ม typhoon-ocr-7b ~5-6GB VRAM (ollama) เป็น option ใน OCR Sandbox Runner, ให้ปรับปรุง ADR ที่ขัดแย้งด้วย"

## Clarifications

### Session 2026-05-30

- Q: What permission level should be required for users to select Typhoon OCR in OCR Sandbox Runner? → A: Only system administrators (system.manage_all)
- Q: What is the maximum acceptable processing time for Typhoon OCR to extract text from a single document page? → A: Under 60 seconds per page
- Q: What permission level should be required for AI administrators to add typhoon2.1-gemma3-4b to AI Model Management? → A: Only system administrators (system.manage_all)
- Q: What is the maximum number of concurrent Typhoon OCR requests the system should support? → A: 1 concurrent request (sequential processing only)
- Q: Should Typhoon OCR results be cached or stored for future reference? → A: Cache results temporarily (24 hours) in Redis but not persist permanently
- Q: What are the Typhoon OCR model hyperparameters? → A: temperature = 0.0, top_p = 0.9, repeat_penalty = 1.0, and keep_alive = 0 to unload VRAM immediately.
- Q: What is the System Prompt for Typhoon OCR? → A: `"สกัดข้อความภาษาไทยและอังกฤษทั้งหมดจากภาพนี้อย่างถูกต้อง รักษาโครงสร้างบรรทัดและการเว้นวรรคให้ใกล้เคียงต้นฉบับมากที่สุด ห้ามเพิ่มคำอธิบายใดๆ"`

## User Scenarios & Testing _(mandatory)_

### User Story 1 - Typhoon OCR Option in OCR Sandbox (Priority: P1)

As a document processor, I want to use Typhoon OCR as an alternative to Tesseract for better Thai text extraction accuracy, so that I can achieve higher OCR accuracy (95%+) for Thai documents.

**Why this priority**: This is the primary user-facing value - improved OCR accuracy directly impacts document processing quality and reduces manual correction effort.

**Independent Test**: Can be fully tested by selecting Typhoon OCR in OCR Sandbox Runner and processing a Thai document, delivering improved text extraction accuracy compared to Tesseract.

**Acceptance Scenarios**:

1. **Given** a user has access to OCR Sandbox Runner, **When** they select "Typhoon OCR-3B" as the OCR engine option, **Then** the system should process the document using Typhoon OCR via Ollama and return extracted text.
2. **Given** a document is processed with Typhoon OCR, **When** the OCR completes, **Then** the extracted text should have accuracy comparable to or better than Tesseract (target: 95%+ for Thai text).
3. **Given** Typhoon OCR is selected, **When** the Ollama service is unavailable, **Then** the system should fall back to Tesseract OCR and display a warning message.

---

### User Story 2 - Typhoon LLM in AI Model Management (Priority: P2)

As an AI administrator, I want to add typhoon2.1-gemma3-4b as an option in AI Model Management, so that I can use this model for AI-powered document analysis tasks.

**Why this priority**: This enables model selection flexibility and allows administrators to choose between different LLM models based on performance and resource requirements.

**Independent Test**: Can be fully tested by adding typhoon2.1-gemma3-4b to the AI Model Management configuration and selecting it for a document analysis task.

**Acceptance Scenarios**:

1. **Given** an AI administrator has system.manage_all permission, **When** they add typhoon2.1-gemma3-4b to the AI model options, **Then** the model should be available for selection in AI-powered features.
2. **Given** typhoon2.1-gemma3-4b is selected, **When** a document analysis task is initiated, **Then** the system should use this model via Ollama for inference.
3. **Given** the GPU has limited VRAM, **When** typhoon2.1-gemma3-4b is loaded, **Then** the system should monitor VRAM usage and prevent concurrent model loading if VRAM would be exceeded.

---

### User Story 3 - ADR Conflict Resolution (Priority: P3)

As a system architect, I want to update ADR-023 and ADR-023A to include Typhoon OCR and Typhoon LLM models, so that the architecture documentation reflects the current AI infrastructure capabilities.

**Why this priority**: This ensures architectural decisions remain accurate and provide clear guidance for future development and compliance checks.

**Independent Test**: Can be fully tested by reviewing the updated ADRs and verifying they correctly document Typhoon model integration without conflicts.

**Acceptance Scenarios**:

1. **Given** ADR-023 and ADR-023A exist, **When** they are updated to include Typhoon models, **Then** the ADRs should clearly specify Typhoon OCR and Typhoon LLM as supported on-premises AI options.
2. **Given** ADR-023A is updated, **When** it describes the 2-model stack, **Then** it should include Typhoon models as alternatives to gemma4 and nomic-embed-text where applicable.
3. **Given** ADR conflicts are identified, **When** they are resolved, **Then** all ADRs should be consistent with each other and with the actual implementation.

---

### Edge Cases

- What happens when Ollama service is down or unresponsive?
- How does system handle VRAM exhaustion when multiple AI models are loaded? (Solved by sequential loading and Ollama `keep_alive = 0` configuration).
- What happens when Typhoon OCR model fails to load or crashes during processing?
- How does system handle concurrent OCR requests when Typhoon OCR is selected?
- What happens when user selects Typhoon OCR but the model is not installed in Ollama?
- How does system handle fallback to Tesseract when Typhoon OCR fails?
- What happens when GPU VRAM is insufficient for Typhoon OCR-3B (3-4GB)?

## Requirements _(mandatory)_

### Functional Requirements

- **FR-001**: System MUST provide Typhoon OCR-3B as an option in OCR Sandbox Runner alongside Tesseract OCR.
- **FR-002**: System MUST allow users with system.manage_all permission to select between Tesseract OCR and Typhoon OCR for document text extraction.
- **FR-003**: System MUST integrate Typhoon OCR via Ollama service on Admin Desktop (on-premises only, per ADR-023/023A) with CASL Guard for all AI-related endpoints per ADR-016.
- **FR-004**: System MUST fall back to Tesseract OCR when Typhoon OCR is unavailable or fails, with appropriate user notification.
- **FR-005**: System MUST allow users with system.manage_all permission to add typhoon2.1-gemma3-4b as an option in AI Model Management configuration with CASL Guard per ADR-016.
- **FR-006**: System MUST allow AI administrators with system.manage_all permission to select typhoon2.1-gemma3-4b for AI-powered document analysis tasks with CASL Guard per ADR-016.
- **FR-007**: System MUST monitor GPU VRAM usage and prevent concurrent model loading if VRAM would be exceeded.
- **FR-011**: System MUST process Typhoon OCR requests sequentially (1 concurrent request) to manage VRAM and model loading constraints.
- **FR-012**: System MUST cache Typhoon OCR results temporarily (24 hours in Redis: `ocr:cache:{documentPublicId}:{engine}:{hash}`) to avoid reprocessing the same document. Cache invalidation occurs automatically on document update or manually via admin API.
- **FR-008**: System MUST update ADR-023 and ADR-023A to document Typhoon OCR and Typhoon LLM as supported on-premises AI options.
- **FR-009**: System MUST ensure ADR consistency - no conflicts between ADR-023, ADR-023A, and ADR-032 regarding Typhoon model integration.
- **FR-010**: System MUST log all Typhoon OCR and Typhoon LLM interactions in ai_audit_logs per ADR-023/023A requirements.

### Key Entities

- **OCR Engine Configuration**: Represents the available OCR engines (Tesseract, Typhoon OCR) with their parameters and resource requirements.
- **AI Model Configuration**: Represents the available AI models (gemma4, typhoon2.1-gemma3-4b, nomic-embed-text) with their VRAM requirements and use cases.
- **VRAM Monitor**: Tracks GPU VRAM usage across all loaded AI models to prevent resource exhaustion.

## Success Criteria _(mandatory)_

### Measurable Outcomes

- **SC-001**: Typhoon OCR achieves 95%+ accuracy for Thai text extraction compared to Tesseract's 90% baseline (measured at character-level accuracy).
- **SC-002**: Typhoon OCR processes a single document page within 60 seconds (per-page timing).
- **SC-003**: System successfully falls back to Tesseract OCR within 5 seconds when Typhoon OCR is unavailable.
- **SC-004**: GPU VRAM usage never exceeds 90% of available VRAM when multiple AI models are loaded.
- **SC-005**: AI administrators can successfully add and select typhoon2.1-gemma3-4b in AI Model Management within 2 minutes.
- **SC-006**: ADR-023 and ADR-023A are updated and reviewed with no conflicts identified within 1 business day.
- **SC-007**: All Typhoon OCR and Typhoon LLM interactions are logged in ai_audit_logs with 100% coverage.

## Assumptions

- Admin Desktop (Desk-5439) has sufficient GPU VRAM (8GB+) to support Typhoon OCR-3B (~3-4GB) and other AI models sequentially.
- Ollama service is already installed and running on Admin Desktop per ADR-023/023A.
- Typhoon OCR-3B and typhoon2.1-gemma3-4b models are available in Ollama registry and can be pulled.
- Current Tesseract OCR implementation (90% accuracy) is acceptable as a fallback option.
- OCR Sandbox Runner and AI Model Management components exist and can be refactored to support additional options.
- OCR sidecar uses Python 3.11 for Typhoon OCR integration.

## Dependencies

- ADR-023/023A must be updated to include Typhoon models before implementation begins.
- Ollama service on Admin Desktop must be operational and accessible.
- Typhoon OCR-3B and typhoon2.1-gemma3-4b models must be available in Ollama.
- Existing OCR Sandbox Runner component must be refactored to support multiple OCR engines.
- Existing AI Model Management component must be refactored to support additional LLM models.
- VRAM monitoring capability must be implemented or enhanced.