Files
lcbp3/specs/200-fullstacks/235-ai-runtime-policy-refactor/plan.md
T
admin 71c5e88181
CI / CD Pipeline / build (push) Has been skipped
CI / CD Pipeline / deploy (push) Has been skipped
690611:1705 ADR-035-235 #00 [skip CI]
2026-06-11 17:05:17 +07:00

8.4 KiB

// File: specs/200-fullstacks/235-ai-runtime-policy-refactor/plan.md // Change Log: // - 2026-06-11: Initial implementation plan for AI Runtime Policy Refactor

Implementation Plan: AI Runtime Policy Refactor

Branch: 235-ai-runtime-policy-refactor | Date: 2026-06-11 | Spec: spec.md Input: Feature specification from specs/200-fullstacks/235-ai-runtime-policy-refactor/spec.md

Summary

Refactor AI runtime ของ LCBP3-DMS ให้รองรับ GPU ใหม่ (RTX 5060 Ti 16GB) โดย: (A) เปลี่ยน API contract ให้ใช้ executionProfile แทน caller-driven model selection, (B) สร้าง backend policy mapping layer, (C) เพิ่ม adaptive OCR residency, (D) เพิ่ม CPU fallback สำหรับ retrieval acceleration, และ (E) ปรับ BullMQ queue concurrency พร้อม verification suite ครอบคลุม big bang cutover gate ทั้ง 4 แกน


Technical Context

Language/Version: TypeScript 5.x (NestJS 10, Next.js 14), Python 3.11 (OCR sidecar FastAPI) Primary Dependencies:

  • Backend: NestJS, BullMQ, TypeORM, CASL, class-validator, class-transformer
  • Frontend: Next.js, TanStack Query, Zod, shadcn/ui
  • Sidecar: FastAPI, PyMuPDF (fitz), typhoon-ocr, httpx, FlagEmbedding
  • Infrastructure: Ollama (Desk-5439), Redis, MariaDB Storage: MariaDB (ai_audit_logs, ai_prompts, ai_intent_patterns), Redis (BullMQ, cache) Testing: Jest (backend unit/integration), Vitest (frontend), Pytest (sidecar) Target Platform: QNAP NAS (backend/frontend containers), Desk-5439 (Ollama + OCR sidecar) Performance Goals: OCR cold start < 5s (with residency), retrieval CPU fallback < 30s timeout Constraints: Big bang rollout — no legacy parallel path; LLM-First GPU ownership must be enforced Scale/Scope: Single-server AI stack on Desk-5439; BullMQ concurrency max ai-realtime: 2, ai-batch: 1

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Rule Status Notes
ADR-019 UUID: no parseInt on UUID Pass No new UUID handling in this feature
ADR-009: No TypeORM migrations Pass No schema changes required
ADR-016 Security: CASL Guard on all API Required large-context profile must have CASL admin check
ADR-007 Error Handling: layered classification Required 400 (validation), 403 (profile auth), 504 (CPU timeout)
ADR-008 BullMQ: no inline jobs Pass Queue policy adjustment, not new inline processing
ADR-023/023A AI Boundary: no direct DB/storage Pass Policy layer stays in NestJS service
ADR-023A BullMQ 2-queue: ai-realtime + ai-batch Required concurrency adjustment within existing queues
ADR-002 Doc Numbering: Redis Redlock N/A Not applicable to this feature
TypeScript: no any, no console.log Required All new TypeScript code must comply
File headers: // File: path/filename Required All new files must have header

No constitution violations. Proceeding to Phase 0.


Project Structure

Documentation (this feature)

specs/200-fullstacks/235-ai-runtime-policy-refactor/
├── spec.md               # Feature specification
├── plan.md               # This file
├── research.md           # Phase 0 output
├── data-model.md         # Phase 1 output
├── quickstart.md         # Phase 1 output
├── tasks.md              # Phase 2 output
├── checklists/
│   └── requirements.md
└── contracts/
    ├── create-ai-job.dto.ts.md
    ├── execution-policy.interface.ts.md
    └── ocr-residency-policy.interface.ts.md

Source Code (repository root)

backend/src/modules/ai/
├── dto/
│   ├── create-ai-job.dto.ts          # [MODIFY] เอา model.key ออก, เพิ่ม executionProfile
│   └── ai-job-response.dto.ts        # [MODIFY] เพิ่ม modelUsed canonical name
├── services/
│   ├── ai.service.ts                  # [MODIFY] เพิ่ม profile validation + canonical name
│   ├── ai-policy.service.ts           # [NEW] ExecutionProfile → RuntimePolicy mapping
│   ├── ocr.service.ts                 # [MODIFY] เพิ่ม adaptive residency calculation
│   └── vram-monitor.service.ts        # [NEW] VRAM headroom query service
├── processors/
│   ├── ai-batch.processor.ts          # [MODIFY] ใช้ policy จาก AiPolicyService
│   └── ai-realtime.processor.ts       # [MODIFY] lightweight job classification + concurrency
├── interfaces/
│   ├── execution-policy.interface.ts  # [NEW] RuntimePolicy type definition
│   └── ocr-residency.interface.ts     # [NEW] OcrResidencyDecision type
├── guards/
│   └── execution-profile.guard.ts     # [NEW] large-context profile admin check
└── ai.module.ts                       # [MODIFY] register new services + guard

backend/src/config/
└── bullmq.config.ts                   # [MODIFY] ai-realtime concurrency uplift config

specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/
├── app.py                             # [MODIFY] adaptive keep_alive, CPU fallback embed/rerank
├── services/
│   ├── vram_monitor.py                # [NEW] VRAM headroom query via Ollama API
│   └── residency_policy.py           # [NEW] keep_alive calculation policy
└── requirements.txt                   # [MODIFY] add nvidia-ml-py or pynvml if needed

frontend/
├── types/
│   └── ai.ts                          # [MODIFY] เอา model fields ออก, เพิ่ม executionProfile
├── lib/services/
│   └── admin-ai.service.ts            # [MODIFY] update types + canonical name display
└── components/admin/ai/
    └── OcrSandboxPromptManager.tsx    # [MODIFY] แสดง canonical names ใน UI

backend/src/modules/ai/
└── tests/
    ├── ai-policy.service.spec.ts       # [NEW] unit tests profile mapping
    ├── ocr-residency.spec.ts           # [NEW] unit tests adaptive residency
    └── execution-profile.guard.spec.ts # [NEW] unit tests CASL guard

Phases

Phase 1: Foundational — Policy Infrastructure

ต้องเสร็จก่อน workstream อื่นทั้งหมด:

  1. สร้าง VramMonitorService — query VRAM headroom จาก Ollama /api/ps endpoint
  2. สร้าง AiPolicyService — mapping ExecutionProfileRuntimePolicy
  3. สร้าง ExecutionProfileGuard — CASL check สำหรับ large-context
  4. แก้ CreateAiJobDto — เอา model.key + parameter overrides ออก
  5. แก้ vram_monitor.py บน sidecar — query GPU headroom

Phase 2: Contract & Canonical Naming (Workstream A)

  1. แก้ AiService — validate profile, override data-affecting jobs, log canonical names
  2. แก้ ai-job-response.dto.tsmodelUsed เป็น canonical name
  3. แก้ Frontend types และ Admin Console UI — แสดง canonical names
  4. เพิ่ม rejection tests สำหรับ model.key และ parameter overrides

Phase 3: Adaptive OCR Residency (Workstream B)

  1. แก้ OcrService — inject VramMonitorService, คำนวณ keep_alive แบบ dynamic
  2. แก้ residency_policy.py บน sidecar — รับ keep_alive จาก backend policy
  3. เพิ่ม unit tests residency scenarios

Phase 4: Retrieval Acceleration (Workstream C)

  1. แก้ app.py — เพิ่ม GPU headroom check ใน /embed และ /rerank
  2. เพิ่ม CPU fallback path พร้อม log
  3. แก้ ai-batch.processor.ts สำหรับ RAG query fallback handling

Phase 5: Queue Policy (Workstream D)

  1. แก้ bullmq.config.tsai-realtime concurrency = 2 (configurable)
  2. แก้ ai-realtime.processor.ts — classify lightweight vs generation-heavy jobs
  3. ตรวจว่า rag-query ถูก route ไป ai-batch เท่านั้น

Phase 6: Verification & Cutover (Workstream E)

  1. รวม test suite ทั้ง 4 แกน
  2. Manual validation checklist (Admin Console, OCR Sandbox)
  3. Cutover gate verification

Complexity Tracking

ไม่มี constitution violations ที่ต้องอธิบาย