lcbp3/specs/200-fullstacks/236-unified-ocr-architecture/research.md

// File: specs/200-fullstacks/236-unified-ocr-architecture/research.md
// Change Log:
// - 2026-06-13: Research decisions from ADR-036

# Research: Unified AI Model Architecture — Sandbox-Production Parity

## Overview

This document consolidates technical decisions from ADR-036 for the Unified AI Model Architecture feature. All decisions are already ratified in ADR-036; this document serves as a quick reference for implementation.

## Decisions

### D1: Calibration on Existing Profile/Prompt Stores

**Decision**: Reuse existing `ai_execution_profiles` as production parameter store and create new `ai_sandbox_profiles` as draft store. Do not create new parameter store in `system_settings`.

**Rationale**:
- Existing `ai_execution_profiles` already has the right structure (profile_name, temperature, top_p, etc.)
- Adding `canonical_model` column distinguishes np-dms-ai vs np-dms-ocr
- Avoids schema bloat and migration complexity
- Leverages existing Redis cache in AiPolicyService

**Alternatives Considered**:
- Create new `ai_model_parameters` table → Rejected: unnecessary duplication
- Use `system_settings` JSON → Rejected: loses type safety and queryability

---

### D2: Dual-Model Parameter Management

**Decision**: Store OCR parameters in dedicated row `ocr-extract` with `canonical_model='np-dms-ocr'`. Make `numCtx` and `maxTokens` nullable for OCR (not used).

**Rationale**:
- OCR has different parameter requirements than LLM (no context window, no max tokens)
- Single table with `canonical_model` column simplifies queries
- Nullable columns allow row-level variation without schema fragmentation

**Alternatives Considered**:
- Separate `ai_ocr_profiles` table → Rejected: adds join complexity
- JSON blob for model-specific params → Rejected: loses queryability

---

### D3: Snapshot Semantics

**Decision**: Parameters are frozen at job dispatch time (snapshot), not lazy-loaded during processing. `keep_alive` is excluded from snapshot (lazy-loaded per ADR-033).

**Rationale**:
- Ensures job consistency regardless of subsequent parameter changes
- Allows safe parameter tuning without affecting running jobs
- `keep_alive` is a resource parameter, not a model parameter (ADR-033)

**Alternatives Considered**:
- Lazy-load parameters during processing → Rejected: race condition risk
- Include `keep_alive` in snapshot → Rejected: violates ADR-033 residency logic

---

### D4: Dual-Model Snapshot for OCR+LLM Jobs

**Decision**: Support `ocrSnapshotParams` (OCR) and `snapshotParams` (LLM) in `AiJobPayload` for jobs that use both models.

**Rationale**:
- Migration jobs use both OCR and LLM
- Each model needs its own parameter set
- Separation allows independent tuning

**Alternatives Considered**:
- Single snapshot with union of params → Rejected: unclear which params apply to which model
- Job-level model selection → Rejected: adds complexity to processor logic

---

### D5: Master Data Context Parity in Sandbox

**Decision**: Require project selection in sandbox tests (no 'default' project). Use selected project/contract context for master data lookup.

**Rationale**:
- Eliminates parity gap where sandbox used 'default' while production used real project
- Ensures sandbox tests accurately reflect production behavior
- `{{master_data_context}}` in prompts will match production

**Alternatives Considered**:
- Keep 'default' project for sandbox → Rejected: inaccurate test results
- Auto-select first project → Rejected: hides context selection UI

---

### D6: System Prompt Integration

**Decision**: System prompts managed via ADR-029 (`ai_prompts` table), not duplicated in parameter store. Parameter interface links to Prompt Version UI.

**Rationale**:
- ADR-029 already has versioning, approval workflow, and audit trail
- Avoids duplication and maintenance burden
- Clear separation of concerns (prompts vs runtime parameters)

**Alternatives Considered**:
- Store system prompt in ai_execution_profiles → Rejected: duplicates ADR-029
- Inline system prompt in sandbox draft → Rejected: loses versioning

---

### D7: Model Name Alignment

**Decision**: Update model names from `typhoon2.5-np-dms`/`typhoon-np-dms-ocr` to `np-dms-ai`/`np-dms-ocr` across codebase.

**Rationale**:
- Canonical names are shorter and more semantic
- Aligns with ADR-034 decision
- Simplifies documentation and communication

**Alternatives Considered**:
- Keep typhoon names → Rejected: inconsistent with ADR-034
- Use generic names (main/ocr) → Rejected: loses semantic meaning

---

### D8: Security Guardrails

**Decision**: Apply endpoint requires Idempotency-Key validation, CASL permission (`system.manage_ai`), and parameter range validation (temperature/topP 0-1).

**Rationale**:
- Idempotency-Key prevents duplicate applies
- CASL enforces RBAC
- Range validation prevents invalid parameters
- Audit logging tracks all changes

**Alternatives Considered**:
- Skip Idempotency-Key → Rejected: risk of duplicate applies
- Use weaker permission → Rejected: security risk

---

### D9: Cache Invalidation

**Decision**: Invalidate Redis cache after applying parameters to production.

**Rationale**:
- Ensures new jobs use updated parameters
- Prevents stale cache issues
- Simple DEL operation on cache key

**Alternatives Considered**:
- Wait for cache TTL → Rejected: delayed effect
- No cache invalidation → Rejected: stale parameters

---

### D10: OCR Parameter Wiring to Sidecar

**Decision**: Add `typhoonOptions` to `OcrDetectionInput` and append temperature/topP/repeatPenalty to form data sent to sidecar.

**Rationale**:
- Sidecar already accepts overrides via form data
- Allows OCR model tuning without sidecar changes
- Maintains existing contract

**Alternatives Considered**:
- Modify sidecar API → Rejected: unnecessary infrastructure change
- Hardcode params in sidecar → Rejected: loses tunability

---

## Technology Stack

- **Backend**: NestJS 11, TypeORM, Redis, BullMQ
- **Frontend**: Next.js 16, TanStack Query, React Hook Form, Zod
- **Database**: MariaDB 11.8
- **Testing**: Jest (backend), Vitest + Playwright (frontend)

## Performance Targets

- Apply operation: <2s (including cache invalidation)
- Sandbox test cycle: <5s (test → apply → verify)
- Cache invalidation: <100ms

## Security Considerations

- CASL guard on apply endpoint
- Idempotency-Key validation (5-minute window)
- Parameter range validation (temperature/topP 0-1)
- Audit logging for all apply operations
- No direct DB/storage access from AI (ADR-023/023A)

## Dependencies

- ADR-029: Dynamic Prompt Management (system prompt integration)
- ADR-033: Adaptive OCR Residency (keep_alive lazy-loading)
- ADR-034: AI Model Change (canonical model names)
- Existing AiPolicyService with Redis cache
- Existing ai_audit_logs table