Files
lcbp3/specs/200-fullstacks/232-typhoon-ocr-integration/plan.md
T
admin ae1b1f35e1
CI / CD Pipeline / build (push) Successful in 4m51s
CI / CD Pipeline / deploy (push) Successful in 12m7s
feat(ai): ADR-032 Typhoon OCR integration - models, processors, cache, VRAM monitor, sandbox UI
2026-05-30 22:18:51 +07:00

7.2 KiB

// File: specs/200-fullstacks/232-typhoon-ocr-integration/plan.md // Change Log: // - 2026-05-30: Initial implementation plan for Typhoon OCR integration

Implementation Plan: Typhoon OCR Integration

Branch: 232-typhoon-ocr-integration | Date: 2026-05-30 | Spec: spec.md Input: Feature specification from /specs/200-fullstacks/232-typhoon-ocr-integration/spec.md

Note: This template is filled in by the /speckit.plan command. See .agents/skills/plan.md for the execution workflow.

Summary

Integrate Typhoon OCR-3B as an alternative OCR engine in OCR Sandbox Runner, add typhoon2.1-gemma3-4b to AI Model Management, and update ADR-023/023A to document Typhoon models as supported on-premises AI options. The implementation uses Ollama on Admin Desktop (Desk-5439) with sequential processing (1 concurrent request), 24-hour result caching, and fallback to Tesseract OCR when Typhoon is unavailable. All changes require system.manage_all permission and must comply with ADR-023/023A AI boundary policies.

Technical Context

Language/Version: TypeScript 5.x (NestJS 11 backend, Next.js 16 frontend), Python 3.11 (OCR sidecar) Primary Dependencies: Ollama (AI runtime), BullMQ (job queues), TypeORM (ORM), Redis (caching/locks), MariaDB 11.8 (database) Storage: MariaDB (ai_prompts, ai_audit_logs), Redis (24-hour OCR result cache, VRAM monitoring) Testing: Jest (backend unit tests), Playwright (E2E tests) Target Platform: Linux server (Admin Desktop Desk-5439 for AI processing) Project Type: web (backend + frontend + infrastructure) Performance Goals: 60 seconds/page OCR processing, 5-second fallback to Tesseract, 90% VRAM usage limit Constraints: On-premises AI only (ADR-023/023A), system.manage_all permission required, sequential OCR processing (1 concurrent request) Scale/Scope: Single Admin Desktop GPU, 24-hour cache TTL, ai_audit_logs for all AI interactions

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Based on AGENTS.md Tier 1 non-negotiables:

  • ADR-019 UUID: PASS - Using publicId for all API responses, no parseInt on UUID
  • ADR-009 Schema: PASS - No TypeORM migrations, will edit SQL directly if schema changes needed
  • ADR-016 Security: PASS - CASL Guard with system.manage_all permission for all AI-related mutations
  • ADR-002 Numbering: N/A - No document numbering in this feature
  • ADR-008 BullMQ: PASS - AI interactions via BullMQ queues (ai-realtime/ai-batch)
  • ADR-023/023A AI Boundary: PASS - Typhoon models run on Admin Desktop Ollama only, no direct DB/storage access
  • ADR-007 Errors: PASS - Will use layered error classification with user-friendly messages
  • TypeScript Strict: PASS - No any types, no console.log, explicit typing
  • i18n: PASS - No hardcoded Thai/English strings, use i18n keys
  • File Upload: N/A - No file upload changes in this feature

Gate Status: PASS - No violations

Project Structure

Documentation (this feature)

specs/200-fullstacks/232-typhoon-ocr-integration/
├── spec.md              # Feature specification
├── plan.md              # This file (/speckit.plan command output)
├── research.md          # Phase 0 output (/speckit.plan command)
├── data-model.md        # Phase 1 output (/speckit.plan command)
├── quickstart.md        # Phase 1 output (/speckit.plan command)
├── contracts/           # Phase 1 output (/speckit.plan command)
└── tasks.md             # Phase 2 output (/speckit.tasks command)

Source Code (repository root)

backend/
├── src/
│   ├── modules/
│   │   ├── ai/
│   │   │   ├── ai.service.ts              # Add Typhoon model support
│   │   │   ├── ai.controller.ts           # Add Typhoon OCR endpoint
│   │   │   └── dto/                       # Add Typhoon-specific DTOs
│   │   └── ocr/
│   │       ├── ocr.service.ts             # Add Typhoon OCR integration
│   │       └── dto/                       # Add OCR engine selection DTOs
│   └── common/
│       └── guards/
│           └── casl-ability.guard.ts      # Verify system.manage_all permission
└── tests/
    └── unit/
        └── modules/
            └── ai/                        # Add Typhoon model tests

frontend/
├── src/
│   ├── features/
│   │   ├── ai-admin/
│   │   │   └── components/
│   │   │       └── ModelManagement.tsx    # Add typhoon2.1-gemma3-12b option
│   │   └── ocr-sandbox/
│   │       └── components/
│   │           └── OcrEngineSelector.tsx # Add Typhoon OCR option
│   └── lib/
│       └── i18n/
│           └── locales/
│               └── th.ts                 # Add Typhoon-related i18n keys
└── tests/
    └── e2e/
        └── ai-admin.spec.ts              # Add Typhoon model E2E tests

specs/
├── 06-Decision-Records/
│   ├── ADR-023-unified-ai-architecture.md
│   ├── ADR-023A-unified-ai-architecture.md
│   └── ADR-032-typhoon-ocr-integration.md  # New ADR for Typhoon integration
└── 04-Infrastructure-OPS/
    └── 04-00-docker-compose/
        └── Desk-5439/
            └── ocr-sidecar/
                └── app.py                 # Add Typhoon OCR Ollama integration

Structure Decision: Web application structure (backend + frontend + infrastructure). Backend uses NestJS modular structure with ai and ocr modules. Frontend uses Next.js feature-based structure. Infrastructure includes OCR sidecar on Admin Desktop.

Phase 0: Research - COMPLETE

Output: research.md

Decisions Made:

  • Use Ollama HTTP API for Typhoon OCR integration via Admin Desktop
  • Add typhoon2.1-gemma3-12b Q3_K_M to AI Model Management
  • Use Redis with 24-hour TTL for OCR result caching
  • Implement VRAM monitoring via Ollama API and Redis state tracking
  • Create ADR-032 for Typhoon OCR integration and update ADR-023/023A

Unknowns Resolved: All NEEDS CLARIFICATION markers resolved

Phase 1: Design & Contracts - COMPLETE

Outputs:

  • data-model.md - Entity definitions, relationships, validation rules
  • contracts/api-contracts.md - API endpoints, request/response schemas
  • quickstart.md - Installation, usage, verification, troubleshooting
  • Agent context updated with Typhoon-specific technologies

Constitution Check Re-evaluation: PASS - No violations introduced in design phase

Complexity Tracking

Fill ONLY if Constitution Check has violations that must be justified

Violation Why Needed Simpler Alternative Rejected Because
[e.g., 4th project] [current need] [why 3 projects insufficient]
[e.g., Repository pattern] [specific problem] [why direct DB access insufficient]