Files
lcbp3/specs/200-fullstacks/236-unified-ocr-architecture/research.md
T
admin 7e8f4859cd
CI / CD Pipeline / build (push) Failing after 6m24s
CI / CD Pipeline / deploy (push) Has been skipped
feat(ai): add ADR-036 unified OCR architecture and frontend test coverage
- Add ADR-036 unified OCR architecture (typhoon-ocr via Ollama)
- Extend AI execution profiles for OCR sandbox configuration
- Add comprehensive frontend test coverage (components, hooks, services)
- Add backend test coverage for document-numbering services
- Update OCR sidecar with typhoon-ocr integration
- Add AI policy service and execution profile management
- Update AGENTS.md and architecture documentation
2026-06-14 06:34:07 +07:00

193 lines
6.7 KiB
Markdown

// File: specs/200-fullstacks/236-unified-ocr-architecture/research.md
// Change Log:
// - 2026-06-13: Research decisions from ADR-036
# Research: Unified AI Model Architecture — Sandbox-Production Parity
## Overview
This document consolidates technical decisions from ADR-036 for the Unified AI Model Architecture feature. All decisions are already ratified in ADR-036; this document serves as a quick reference for implementation.
## Decisions
### D1: Calibration on Existing Profile/Prompt Stores
**Decision**: Reuse existing `ai_execution_profiles` as production parameter store and create new `ai_sandbox_profiles` as draft store. Do not create new parameter store in `system_settings`.
**Rationale**:
- Existing `ai_execution_profiles` already has the right structure (profile_name, temperature, top_p, etc.)
- Adding `canonical_model` column distinguishes np-dms-ai vs np-dms-ocr
- Avoids schema bloat and migration complexity
- Leverages existing Redis cache in AiPolicyService
**Alternatives Considered**:
- Create new `ai_model_parameters` table → Rejected: unnecessary duplication
- Use `system_settings` JSON → Rejected: loses type safety and queryability
---
### D2: Dual-Model Parameter Management
**Decision**: Store OCR parameters in dedicated row `ocr-extract` with `canonical_model='np-dms-ocr'`. Make `numCtx` and `maxTokens` nullable for OCR (not used).
**Rationale**:
- OCR has different parameter requirements than LLM (no context window, no max tokens)
- Single table with `canonical_model` column simplifies queries
- Nullable columns allow row-level variation without schema fragmentation
**Alternatives Considered**:
- Separate `ai_ocr_profiles` table → Rejected: adds join complexity
- JSON blob for model-specific params → Rejected: loses queryability
---
### D3: Snapshot Semantics
**Decision**: Parameters are frozen at job dispatch time (snapshot), not lazy-loaded during processing. `keep_alive` is excluded from snapshot (lazy-loaded per ADR-033).
**Rationale**:
- Ensures job consistency regardless of subsequent parameter changes
- Allows safe parameter tuning without affecting running jobs
- `keep_alive` is a resource parameter, not a model parameter (ADR-033)
**Alternatives Considered**:
- Lazy-load parameters during processing → Rejected: race condition risk
- Include `keep_alive` in snapshot → Rejected: violates ADR-033 residency logic
---
### D4: Dual-Model Snapshot for OCR+LLM Jobs
**Decision**: Support `ocrSnapshotParams` (OCR) and `snapshotParams` (LLM) in `AiJobPayload` for jobs that use both models.
**Rationale**:
- Migration jobs use both OCR and LLM
- Each model needs its own parameter set
- Separation allows independent tuning
**Alternatives Considered**:
- Single snapshot with union of params → Rejected: unclear which params apply to which model
- Job-level model selection → Rejected: adds complexity to processor logic
---
### D5: Master Data Context Parity in Sandbox
**Decision**: Require project selection in sandbox tests (no 'default' project). Use selected project/contract context for master data lookup.
**Rationale**:
- Eliminates parity gap where sandbox used 'default' while production used real project
- Ensures sandbox tests accurately reflect production behavior
- `{{master_data_context}}` in prompts will match production
**Alternatives Considered**:
- Keep 'default' project for sandbox → Rejected: inaccurate test results
- Auto-select first project → Rejected: hides context selection UI
---
### D6: System Prompt Integration
**Decision**: System prompts managed via ADR-029 (`ai_prompts` table), not duplicated in parameter store. Parameter interface links to Prompt Version UI.
**Rationale**:
- ADR-029 already has versioning, approval workflow, and audit trail
- Avoids duplication and maintenance burden
- Clear separation of concerns (prompts vs runtime parameters)
**Alternatives Considered**:
- Store system prompt in ai_execution_profiles → Rejected: duplicates ADR-029
- Inline system prompt in sandbox draft → Rejected: loses versioning
---
### D7: Model Name Alignment
**Decision**: Update model names from `typhoon2.5-np-dms`/`typhoon-np-dms-ocr` to `np-dms-ai`/`np-dms-ocr` across codebase.
**Rationale**:
- Canonical names are shorter and more semantic
- Aligns with ADR-034 decision
- Simplifies documentation and communication
**Alternatives Considered**:
- Keep typhoon names → Rejected: inconsistent with ADR-034
- Use generic names (main/ocr) → Rejected: loses semantic meaning
---
### D8: Security Guardrails
**Decision**: Apply endpoint requires Idempotency-Key validation, CASL permission (`system.manage_ai`), and parameter range validation (temperature/topP 0-1).
**Rationale**:
- Idempotency-Key prevents duplicate applies
- CASL enforces RBAC
- Range validation prevents invalid parameters
- Audit logging tracks all changes
**Alternatives Considered**:
- Skip Idempotency-Key → Rejected: risk of duplicate applies
- Use weaker permission → Rejected: security risk
---
### D9: Cache Invalidation
**Decision**: Invalidate Redis cache after applying parameters to production.
**Rationale**:
- Ensures new jobs use updated parameters
- Prevents stale cache issues
- Simple DEL operation on cache key
**Alternatives Considered**:
- Wait for cache TTL → Rejected: delayed effect
- No cache invalidation → Rejected: stale parameters
---
### D10: OCR Parameter Wiring to Sidecar
**Decision**: Add `typhoonOptions` to `OcrDetectionInput` and append temperature/topP/repeatPenalty to form data sent to sidecar.
**Rationale**:
- Sidecar already accepts overrides via form data
- Allows OCR model tuning without sidecar changes
- Maintains existing contract
**Alternatives Considered**:
- Modify sidecar API → Rejected: unnecessary infrastructure change
- Hardcode params in sidecar → Rejected: loses tunability
---
## Technology Stack
- **Backend**: NestJS 11, TypeORM, Redis, BullMQ
- **Frontend**: Next.js 16, TanStack Query, React Hook Form, Zod
- **Database**: MariaDB 11.8
- **Testing**: Jest (backend), Vitest + Playwright (frontend)
## Performance Targets
- Apply operation: <2s (including cache invalidation)
- Sandbox test cycle: <5s (test → apply → verify)
- Cache invalidation: <100ms
## Security Considerations
- CASL guard on apply endpoint
- Idempotency-Key validation (5-minute window)
- Parameter range validation (temperature/topP 0-1)
- Audit logging for all apply operations
- No direct DB/storage access from AI (ADR-023/023A)
## Dependencies
- ADR-029: Dynamic Prompt Management (system prompt integration)
- ADR-033: Adaptive OCR Residency (keep_alive lazy-loading)
- ADR-034: AI Model Change (canonical model names)
- Existing AiPolicyService with Redis cache
- Existing ai_audit_logs table