- Add ADR-036 unified OCR architecture (typhoon-ocr via Ollama) - Extend AI execution profiles for OCR sandbox configuration - Add comprehensive frontend test coverage (components, hooks, services) - Add backend test coverage for document-numbering services - Update OCR sidecar with typhoon-ocr integration - Add AI policy service and execution profile management - Update AGENTS.md and architecture documentation
6.7 KiB
// File: specs/200-fullstacks/236-unified-ocr-architecture/research.md // Change Log: // - 2026-06-13: Research decisions from ADR-036
Research: Unified AI Model Architecture — Sandbox-Production Parity
Overview
This document consolidates technical decisions from ADR-036 for the Unified AI Model Architecture feature. All decisions are already ratified in ADR-036; this document serves as a quick reference for implementation.
Decisions
D1: Calibration on Existing Profile/Prompt Stores
Decision: Reuse existing ai_execution_profiles as production parameter store and create new ai_sandbox_profiles as draft store. Do not create new parameter store in system_settings.
Rationale:
- Existing
ai_execution_profilesalready has the right structure (profile_name, temperature, top_p, etc.) - Adding
canonical_modelcolumn distinguishes np-dms-ai vs np-dms-ocr - Avoids schema bloat and migration complexity
- Leverages existing Redis cache in AiPolicyService
Alternatives Considered:
- Create new
ai_model_parameterstable → Rejected: unnecessary duplication - Use
system_settingsJSON → Rejected: loses type safety and queryability
D2: Dual-Model Parameter Management
Decision: Store OCR parameters in dedicated row ocr-extract with canonical_model='np-dms-ocr'. Make numCtx and maxTokens nullable for OCR (not used).
Rationale:
- OCR has different parameter requirements than LLM (no context window, no max tokens)
- Single table with
canonical_modelcolumn simplifies queries - Nullable columns allow row-level variation without schema fragmentation
Alternatives Considered:
- Separate
ai_ocr_profilestable → Rejected: adds join complexity - JSON blob for model-specific params → Rejected: loses queryability
D3: Snapshot Semantics
Decision: Parameters are frozen at job dispatch time (snapshot), not lazy-loaded during processing. keep_alive is excluded from snapshot (lazy-loaded per ADR-033).
Rationale:
- Ensures job consistency regardless of subsequent parameter changes
- Allows safe parameter tuning without affecting running jobs
keep_aliveis a resource parameter, not a model parameter (ADR-033)
Alternatives Considered:
- Lazy-load parameters during processing → Rejected: race condition risk
- Include
keep_alivein snapshot → Rejected: violates ADR-033 residency logic
D4: Dual-Model Snapshot for OCR+LLM Jobs
Decision: Support ocrSnapshotParams (OCR) and snapshotParams (LLM) in AiJobPayload for jobs that use both models.
Rationale:
- Migration jobs use both OCR and LLM
- Each model needs its own parameter set
- Separation allows independent tuning
Alternatives Considered:
- Single snapshot with union of params → Rejected: unclear which params apply to which model
- Job-level model selection → Rejected: adds complexity to processor logic
D5: Master Data Context Parity in Sandbox
Decision: Require project selection in sandbox tests (no 'default' project). Use selected project/contract context for master data lookup.
Rationale:
- Eliminates parity gap where sandbox used 'default' while production used real project
- Ensures sandbox tests accurately reflect production behavior
{{master_data_context}}in prompts will match production
Alternatives Considered:
- Keep 'default' project for sandbox → Rejected: inaccurate test results
- Auto-select first project → Rejected: hides context selection UI
D6: System Prompt Integration
Decision: System prompts managed via ADR-029 (ai_prompts table), not duplicated in parameter store. Parameter interface links to Prompt Version UI.
Rationale:
- ADR-029 already has versioning, approval workflow, and audit trail
- Avoids duplication and maintenance burden
- Clear separation of concerns (prompts vs runtime parameters)
Alternatives Considered:
- Store system prompt in ai_execution_profiles → Rejected: duplicates ADR-029
- Inline system prompt in sandbox draft → Rejected: loses versioning
D7: Model Name Alignment
Decision: Update model names from typhoon2.5-np-dms/typhoon-np-dms-ocr to np-dms-ai/np-dms-ocr across codebase.
Rationale:
- Canonical names are shorter and more semantic
- Aligns with ADR-034 decision
- Simplifies documentation and communication
Alternatives Considered:
- Keep typhoon names → Rejected: inconsistent with ADR-034
- Use generic names (main/ocr) → Rejected: loses semantic meaning
D8: Security Guardrails
Decision: Apply endpoint requires Idempotency-Key validation, CASL permission (system.manage_ai), and parameter range validation (temperature/topP 0-1).
Rationale:
- Idempotency-Key prevents duplicate applies
- CASL enforces RBAC
- Range validation prevents invalid parameters
- Audit logging tracks all changes
Alternatives Considered:
- Skip Idempotency-Key → Rejected: risk of duplicate applies
- Use weaker permission → Rejected: security risk
D9: Cache Invalidation
Decision: Invalidate Redis cache after applying parameters to production.
Rationale:
- Ensures new jobs use updated parameters
- Prevents stale cache issues
- Simple DEL operation on cache key
Alternatives Considered:
- Wait for cache TTL → Rejected: delayed effect
- No cache invalidation → Rejected: stale parameters
D10: OCR Parameter Wiring to Sidecar
Decision: Add typhoonOptions to OcrDetectionInput and append temperature/topP/repeatPenalty to form data sent to sidecar.
Rationale:
- Sidecar already accepts overrides via form data
- Allows OCR model tuning without sidecar changes
- Maintains existing contract
Alternatives Considered:
- Modify sidecar API → Rejected: unnecessary infrastructure change
- Hardcode params in sidecar → Rejected: loses tunability
Technology Stack
- Backend: NestJS 11, TypeORM, Redis, BullMQ
- Frontend: Next.js 16, TanStack Query, React Hook Form, Zod
- Database: MariaDB 11.8
- Testing: Jest (backend), Vitest + Playwright (frontend)
Performance Targets
- Apply operation: <2s (including cache invalidation)
- Sandbox test cycle: <5s (test → apply → verify)
- Cache invalidation: <100ms
Security Considerations
- CASL guard on apply endpoint
- Idempotency-Key validation (5-minute window)
- Parameter range validation (temperature/topP 0-1)
- Audit logging for all apply operations
- No direct DB/storage access from AI (ADR-023/023A)
Dependencies
- ADR-029: Dynamic Prompt Management (system prompt integration)
- ADR-033: Adaptive OCR Residency (keep_alive lazy-loading)
- ADR-034: AI Model Change (canonical model names)
- Existing AiPolicyService with Redis cache
- Existing ai_audit_logs table