np-dms/lcbp3

Fork 0

Files

T

admin ae1b1f35e1

CI / CD Pipeline / build (push) Successful in 4m51s

Details

CI / CD Pipeline / deploy (push) Successful in 12m7s

Details

feat(ai): ADR-032 Typhoon OCR integration - models, processors, cache, VRAM monitor, sandbox UI

2026-05-30 22:18:51 +07:00

10 KiB

Raw Blame History

// File: specs/200-fullstacks/232-typhoon-ocr-integration/spec.md // Change Log: // - 2026-05-30: Initial specification for Typhoon OCR integration // - 2026-05-30: Updated VRAM strategy (keep_alive=0), System Prompt (Option 2), and hyperparameters.

Feature Specification: Typhoon OCR Integration

Feature Branch: 232-typhoon-ocr-integration Created: 2026-05-30 Status: Draft Category: 200-fullstacks Input: User description: "refactor ส่วนที่เกี่ยวข้อง, เพิ่ม typhoon2.1-gemma3-12b Q3_K_M ใน option AI Model Management, เพิ่ม typhoon-ocr-7b ~5-6GB VRAM (ollama) เป็น option ใน OCR Sandbox Runner, ให้ปรับปรุง ADR ที่ขัดแย้งด้วย"

Clarifications

Session 2026-05-30

Q: What permission level should be required for users to select Typhoon OCR in OCR Sandbox Runner? → A: Only system administrators (system.manage_all)
Q: What is the maximum acceptable processing time for Typhoon OCR to extract text from a single document page? → A: Under 60 seconds per page
Q: What permission level should be required for AI administrators to add typhoon2.1-gemma3-4b to AI Model Management? → A: Only system administrators (system.manage_all)
Q: What is the maximum number of concurrent Typhoon OCR requests the system should support? → A: 1 concurrent request (sequential processing only)
Q: Should Typhoon OCR results be cached or stored for future reference? → A: Cache results temporarily (24 hours) in Redis but not persist permanently
Q: What are the Typhoon OCR model hyperparameters? → A: temperature = 0.0, top_p = 0.9, repeat_penalty = 1.0, and keep_alive = 0 to unload VRAM immediately.
Q: What is the System Prompt for Typhoon OCR? → A: "สกัดข้อความภาษาไทยและอังกฤษทั้งหมดจากภาพนี้อย่างถูกต้อง รักษาโครงสร้างบรรทัดและการเว้นวรรคให้ใกล้เคียงต้นฉบับมากที่สุด ห้ามเพิ่มคำอธิบายใดๆ"

User Scenarios & Testing (mandatory)

User Story 1 - Typhoon OCR Option in OCR Sandbox (Priority: P1)

As a document processor, I want to use Typhoon OCR as an alternative to Tesseract for better Thai text extraction accuracy, so that I can achieve higher OCR accuracy (95%+) for Thai documents.

Why this priority: This is the primary user-facing value - improved OCR accuracy directly impacts document processing quality and reduces manual correction effort.

Independent Test: Can be fully tested by selecting Typhoon OCR in OCR Sandbox Runner and processing a Thai document, delivering improved text extraction accuracy compared to Tesseract.

Acceptance Scenarios:

Given a user has access to OCR Sandbox Runner, When they select "Typhoon OCR-3B" as the OCR engine option, Then the system should process the document using Typhoon OCR via Ollama and return extracted text.
Given a document is processed with Typhoon OCR, When the OCR completes, Then the extracted text should have accuracy comparable to or better than Tesseract (target: 95%+ for Thai text).
Given Typhoon OCR is selected, When the Ollama service is unavailable, Then the system should fall back to Tesseract OCR and display a warning message.

User Story 2 - Typhoon LLM in AI Model Management (Priority: P2)

As an AI administrator, I want to add typhoon2.1-gemma3-4b as an option in AI Model Management, so that I can use this model for AI-powered document analysis tasks.

Why this priority: This enables model selection flexibility and allows administrators to choose between different LLM models based on performance and resource requirements.

Independent Test: Can be fully tested by adding typhoon2.1-gemma3-4b to the AI Model Management configuration and selecting it for a document analysis task.

Acceptance Scenarios:

Given an AI administrator has system.manage_all permission, When they add typhoon2.1-gemma3-4b to the AI model options, Then the model should be available for selection in AI-powered features.
Given typhoon2.1-gemma3-4b is selected, When a document analysis task is initiated, Then the system should use this model via Ollama for inference.
Given the GPU has limited VRAM, When typhoon2.1-gemma3-4b is loaded, Then the system should monitor VRAM usage and prevent concurrent model loading if VRAM would be exceeded.

User Story 3 - ADR Conflict Resolution (Priority: P3)

As a system architect, I want to update ADR-023 and ADR-023A to include Typhoon OCR and Typhoon LLM models, so that the architecture documentation reflects the current AI infrastructure capabilities.

Why this priority: This ensures architectural decisions remain accurate and provide clear guidance for future development and compliance checks.

Independent Test: Can be fully tested by reviewing the updated ADRs and verifying they correctly document Typhoon model integration without conflicts.

Acceptance Scenarios:

Given ADR-023 and ADR-023A exist, When they are updated to include Typhoon models, Then the ADRs should clearly specify Typhoon OCR and Typhoon LLM as supported on-premises AI options.
Given ADR-023A is updated, When it describes the 2-model stack, Then it should include Typhoon models as alternatives to gemma4 and nomic-embed-text where applicable.
Given ADR conflicts are identified, When they are resolved, Then all ADRs should be consistent with each other and with the actual implementation.

Edge Cases

What happens when Ollama service is down or unresponsive?
How does system handle VRAM exhaustion when multiple AI models are loaded? (Solved by sequential loading and Ollama keep_alive = 0 configuration).
What happens when Typhoon OCR model fails to load or crashes during processing?
How does system handle concurrent OCR requests when Typhoon OCR is selected?
What happens when user selects Typhoon OCR but the model is not installed in Ollama?
How does system handle fallback to Tesseract when Typhoon OCR fails?
What happens when GPU VRAM is insufficient for Typhoon OCR-3B (3-4GB)?

Requirements (mandatory)

Functional Requirements

FR-001: System MUST provide Typhoon OCR-3B as an option in OCR Sandbox Runner alongside Tesseract OCR.
FR-002: System MUST allow users with system.manage_all permission to select between Tesseract OCR and Typhoon OCR for document text extraction.
FR-003: System MUST integrate Typhoon OCR via Ollama service on Admin Desktop (on-premises only, per ADR-023/023A) with CASL Guard for all AI-related endpoints per ADR-016.
FR-004: System MUST fall back to Tesseract OCR when Typhoon OCR is unavailable or fails, with appropriate user notification.
FR-005: System MUST allow users with system.manage_all permission to add typhoon2.1-gemma3-4b as an option in AI Model Management configuration with CASL Guard per ADR-016.
FR-006: System MUST allow AI administrators with system.manage_all permission to select typhoon2.1-gemma3-4b for AI-powered document analysis tasks with CASL Guard per ADR-016.
FR-007: System MUST monitor GPU VRAM usage and prevent concurrent model loading if VRAM would be exceeded.
FR-011: System MUST process Typhoon OCR requests sequentially (1 concurrent request) to manage VRAM and model loading constraints.
FR-012: System MUST cache Typhoon OCR results temporarily (24 hours in Redis: ocr:cache:{documentPublicId}:{engine}:{hash}) to avoid reprocessing the same document. Cache invalidation occurs automatically on document update or manually via admin API.
FR-008: System MUST update ADR-023 and ADR-023A to document Typhoon OCR and Typhoon LLM as supported on-premises AI options.
FR-009: System MUST ensure ADR consistency - no conflicts between ADR-023, ADR-023A, and ADR-032 regarding Typhoon model integration.
FR-010: System MUST log all Typhoon OCR and Typhoon LLM interactions in ai_audit_logs per ADR-023/023A requirements.

Key Entities

OCR Engine Configuration: Represents the available OCR engines (Tesseract, Typhoon OCR) with their parameters and resource requirements.
AI Model Configuration: Represents the available AI models (gemma4, typhoon2.1-gemma3-4b, nomic-embed-text) with their VRAM requirements and use cases.
VRAM Monitor: Tracks GPU VRAM usage across all loaded AI models to prevent resource exhaustion.

Success Criteria (mandatory)

Measurable Outcomes

SC-001: Typhoon OCR achieves 95%+ accuracy for Thai text extraction compared to Tesseract's 90% baseline (measured at character-level accuracy).
SC-002: Typhoon OCR processes a single document page within 60 seconds (per-page timing).
SC-003: System successfully falls back to Tesseract OCR within 5 seconds when Typhoon OCR is unavailable.
SC-004: GPU VRAM usage never exceeds 90% of available VRAM when multiple AI models are loaded.
SC-005: AI administrators can successfully add and select typhoon2.1-gemma3-4b in AI Model Management within 2 minutes.
SC-006: ADR-023 and ADR-023A are updated and reviewed with no conflicts identified within 1 business day.
SC-007: All Typhoon OCR and Typhoon LLM interactions are logged in ai_audit_logs with 100% coverage.

Assumptions

Admin Desktop (Desk-5439) has sufficient GPU VRAM (8GB+) to support Typhoon OCR-3B (~3-4GB) and other AI models sequentially.
Ollama service is already installed and running on Admin Desktop per ADR-023/023A.
Typhoon OCR-3B and typhoon2.1-gemma3-4b models are available in Ollama registry and can be pulled.
Current Tesseract OCR implementation (90% accuracy) is acceptable as a fallback option.
OCR Sandbox Runner and AI Model Management components exist and can be refactored to support additional options.
OCR sidecar uses Python 3.11 for Typhoon OCR integration.

Dependencies

ADR-023/023A must be updated to include Typhoon models before implementation begins.
Ollama service on Admin Desktop must be operational and accessible.
Typhoon OCR-3B and typhoon2.1-gemma3-4b models must be available in Ollama.
Existing OCR Sandbox Runner component must be refactored to support multiple OCR engines.
Existing AI Model Management component must be refactored to support additional LLM models.
VRAM monitoring capability must be implemented or enhanced.

10 KiB Raw Blame History

Feature Specification: Typhoon OCR Integration

Clarifications

Session 2026-05-30

User Scenarios & Testing (mandatory)

User Story 1 - Typhoon OCR Option in OCR Sandbox (Priority: P1)

User Story 2 - Typhoon LLM in AI Model Management (Priority: P2)

User Story 3 - ADR Conflict Resolution (Priority: P3)

Edge Cases

Requirements (mandatory)

Functional Requirements

Key Entities

Success Criteria (mandatory)

Measurable Outcomes

Assumptions

Dependencies

10 KiB

Raw Blame History