7.2 KiB
// File: specs/200-fullstacks/232-typhoon-ocr-integration/plan.md // Change Log: // - 2026-05-30: Initial implementation plan for Typhoon OCR integration
Implementation Plan: Typhoon OCR Integration
Branch: 232-typhoon-ocr-integration | Date: 2026-05-30 | Spec: spec.md
Input: Feature specification from /specs/200-fullstacks/232-typhoon-ocr-integration/spec.md
Note: This template is filled in by the /speckit.plan command. See .agents/skills/plan.md for the execution workflow.
Summary
Integrate Typhoon OCR-3B as an alternative OCR engine in OCR Sandbox Runner, add typhoon2.1-gemma3-4b to AI Model Management, and update ADR-023/023A to document Typhoon models as supported on-premises AI options. The implementation uses Ollama on Admin Desktop (Desk-5439) with sequential processing (1 concurrent request), 24-hour result caching, and fallback to Tesseract OCR when Typhoon is unavailable. All changes require system.manage_all permission and must comply with ADR-023/023A AI boundary policies.
Technical Context
Language/Version: TypeScript 5.x (NestJS 11 backend, Next.js 16 frontend), Python 3.11 (OCR sidecar) Primary Dependencies: Ollama (AI runtime), BullMQ (job queues), TypeORM (ORM), Redis (caching/locks), MariaDB 11.8 (database) Storage: MariaDB (ai_prompts, ai_audit_logs), Redis (24-hour OCR result cache, VRAM monitoring) Testing: Jest (backend unit tests), Playwright (E2E tests) Target Platform: Linux server (Admin Desktop Desk-5439 for AI processing) Project Type: web (backend + frontend + infrastructure) Performance Goals: 60 seconds/page OCR processing, 5-second fallback to Tesseract, 90% VRAM usage limit Constraints: On-premises AI only (ADR-023/023A), system.manage_all permission required, sequential OCR processing (1 concurrent request) Scale/Scope: Single Admin Desktop GPU, 24-hour cache TTL, ai_audit_logs for all AI interactions
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
Based on AGENTS.md Tier 1 non-negotiables:
- ADR-019 UUID: ✅ PASS - Using publicId for all API responses, no parseInt on UUID
- ADR-009 Schema: ✅ PASS - No TypeORM migrations, will edit SQL directly if schema changes needed
- ADR-016 Security: ✅ PASS - CASL Guard with system.manage_all permission for all AI-related mutations
- ADR-002 Numbering: N/A - No document numbering in this feature
- ADR-008 BullMQ: ✅ PASS - AI interactions via BullMQ queues (ai-realtime/ai-batch)
- ADR-023/023A AI Boundary: ✅ PASS - Typhoon models run on Admin Desktop Ollama only, no direct DB/storage access
- ADR-007 Errors: ✅ PASS - Will use layered error classification with user-friendly messages
- TypeScript Strict: ✅ PASS - No
anytypes, noconsole.log, explicit typing - i18n: ✅ PASS - No hardcoded Thai/English strings, use i18n keys
- File Upload: N/A - No file upload changes in this feature
Gate Status: ✅ PASS - No violations
Project Structure
Documentation (this feature)
specs/200-fullstacks/232-typhoon-ocr-integration/
├── spec.md # Feature specification
├── plan.md # This file (/speckit.plan command output)
├── research.md # Phase 0 output (/speckit.plan command)
├── data-model.md # Phase 1 output (/speckit.plan command)
├── quickstart.md # Phase 1 output (/speckit.plan command)
├── contracts/ # Phase 1 output (/speckit.plan command)
└── tasks.md # Phase 2 output (/speckit.tasks command)
Source Code (repository root)
backend/
├── src/
│ ├── modules/
│ │ ├── ai/
│ │ │ ├── ai.service.ts # Add Typhoon model support
│ │ │ ├── ai.controller.ts # Add Typhoon OCR endpoint
│ │ │ └── dto/ # Add Typhoon-specific DTOs
│ │ └── ocr/
│ │ ├── ocr.service.ts # Add Typhoon OCR integration
│ │ └── dto/ # Add OCR engine selection DTOs
│ └── common/
│ └── guards/
│ └── casl-ability.guard.ts # Verify system.manage_all permission
└── tests/
└── unit/
└── modules/
└── ai/ # Add Typhoon model tests
frontend/
├── src/
│ ├── features/
│ │ ├── ai-admin/
│ │ │ └── components/
│ │ │ └── ModelManagement.tsx # Add typhoon2.1-gemma3-12b option
│ │ └── ocr-sandbox/
│ │ └── components/
│ │ └── OcrEngineSelector.tsx # Add Typhoon OCR option
│ └── lib/
│ └── i18n/
│ └── locales/
│ └── th.ts # Add Typhoon-related i18n keys
└── tests/
└── e2e/
└── ai-admin.spec.ts # Add Typhoon model E2E tests
specs/
├── 06-Decision-Records/
│ ├── ADR-023-unified-ai-architecture.md
│ ├── ADR-023A-unified-ai-architecture.md
│ └── ADR-032-typhoon-ocr-integration.md # New ADR for Typhoon integration
└── 04-Infrastructure-OPS/
└── 04-00-docker-compose/
└── Desk-5439/
└── ocr-sidecar/
└── app.py # Add Typhoon OCR Ollama integration
Structure Decision: Web application structure (backend + frontend + infrastructure). Backend uses NestJS modular structure with ai and ocr modules. Frontend uses Next.js feature-based structure. Infrastructure includes OCR sidecar on Admin Desktop.
Phase 0: Research - COMPLETE
Output: research.md
Decisions Made:
- Use Ollama HTTP API for Typhoon OCR integration via Admin Desktop
- Add typhoon2.1-gemma3-12b Q3_K_M to AI Model Management
- Use Redis with 24-hour TTL for OCR result caching
- Implement VRAM monitoring via Ollama API and Redis state tracking
- Create ADR-032 for Typhoon OCR integration and update ADR-023/023A
Unknowns Resolved: All NEEDS CLARIFICATION markers resolved
Phase 1: Design & Contracts - COMPLETE
Outputs:
data-model.md- Entity definitions, relationships, validation rulescontracts/api-contracts.md- API endpoints, request/response schemasquickstart.md- Installation, usage, verification, troubleshooting- Agent context updated with Typhoon-specific technologies
Constitution Check Re-evaluation: ✅ PASS - No violations introduced in design phase
Complexity Tracking
Fill ONLY if Constitution Check has violations that must be justified
| Violation | Why Needed | Simpler Alternative Rejected Because |
|---|---|---|
| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |