# Quickstart: Typhoon OCR Integration **Feature**: 232-typhoon-ocr-integration **Date**: 2026-05-30 **Phase**: Implementation ## Current Scope This feature is being implemented against the live LCBP3 repo structure, not the older generated paths in `plan.md` / `tasks.md`. Current verified baseline: - AI Model Management already exists via `ai_available_models` and `system_settings` - OCR Sandbox already exists as a 2-step flow in `frontend/components/admin/ai/OcrSandboxPromptManager.tsx` - OCR sidecar currently runs **Tesseract** as the production baseline - Typhoon LLM option can be seeded into `ai_available_models` by SQL delta - Typhoon OCR runtime path is still pending full backend/sidecar integration ## Prerequisites - Admin Desktop (Desk-5439) with Ollama service reachable from DMS backend - Redis service running - MariaDB database with `ai_available_models`, `ai_prompts`, and `ai_audit_logs` - BullMQ queues configured (`ai-realtime`, `ai-batch`) - `system.manage_all` permission for AI admin features ## Installation Steps ### 1. Pull Typhoon models on Admin Desktop ```powershell ollama pull scb10x/typhoon2.1-gemma3-4b ollama pull scb10x/typhoon-ocr-3b ollama list ``` Expected list should include: - `scb10x/typhoon2.1-gemma3-4b` - `scb10x/typhoon-ocr-3b` ### 2. Apply the Typhoon model seed delta Apply: - `specs/03-Data-and-Storage/deltas/2026-05-30-seed-typhoon-ai-models.sql` This delta adds `typhoon2.1-gemma3-4b` into `ai_available_models` if it does not already exist. ### 3. Verify AI admin model data Verified code path: - Backend: `backend/src/modules/ai/ai-settings.service.ts` - API: `GET /api/ai/admin/models` - Frontend: `frontend/app/(admin)/admin/ai/page.tsx` Expected behavior: - `gemma4:e4b` remains the default fallback active model when `AI_ACTIVE_MODEL` is unset - `typhoon2.1-gemma3-4b` appears as an additional selectable model after the delta is applied ## Usage ### AI Model Management 1. Open the AI admin page. 2. Confirm `typhoon2.1-gemma3-4b` appears in the model list. 3. Activate it from the existing AI Model Management card. ### OCR Sandbox Current verified baseline: - OCR Sandbox uses the existing 2-step flow: - Step 1: OCR only - Step 2: AI extraction from cached OCR text - OCR sidecar health card now reflects the current engine baseline as `OCR Sidecar (Tesseract)` Typhoon OCR engine selection is still pending implementation and should not be treated as complete until backend, queue, and sidecar integration are added. ## Verification ### Verify the model seed 1. Apply the SQL delta. 2. Open `/admin/ai`. 3. Confirm `typhoon2.1-gemma3-4b` appears in the model list. ### Verify the fallback active model 1. Ensure `AI_ACTIVE_MODEL` is missing from `system_settings` in a test environment. 2. Call `GET /api/ai/admin/models/active`. 3. Confirm the fallback response resolves to `gemma4:e4b`. ### Verify OCR baseline label 1. Open `/admin/ai`. 2. Go to `Overview & Health`. 3. Confirm the OCR card label reads `OCR Sidecar (Tesseract)`. ## Troubleshooting ### Ollama unavailable Symptoms: - AI health endpoint reports Ollama as down - model activation cannot proceed Checks: ```powershell ollama list ``` ### Typhoon model missing from UI Checks: - verify `2026-05-30-seed-typhoon-ai-models.sql` was applied - verify `GET /api/ai/admin/models` returns the seeded row ### OCR Sandbox still uses Tesseract only This is expected until Typhoon OCR runtime integration is implemented in: - `backend/src/modules/ai/services/ocr.service.ts` - `backend/src/modules/ai/processors/ai-batch.processor.ts` - `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py` ## Security Notes - All AI admin endpoints require `system.manage_all` - AI models remain on-premises only per ADR-023 / ADR-023A - OCR results must stay behind the DMS backend boundary - Do not treat Typhoon OCR as production-ready until fallback, queueing, and audit coverage are implemented end-to-end