8.4 KiB
// File: specs/200-fullstacks/235-ai-runtime-policy-refactor/plan.md // Change Log: // - 2026-06-11: Initial implementation plan for AI Runtime Policy Refactor
Implementation Plan: AI Runtime Policy Refactor
Branch: 235-ai-runtime-policy-refactor | Date: 2026-06-11 | Spec: spec.md
Input: Feature specification from specs/200-fullstacks/235-ai-runtime-policy-refactor/spec.md
Summary
Refactor AI runtime ของ LCBP3-DMS ให้รองรับ GPU ใหม่ (RTX 5060 Ti 16GB) โดย: (A) เปลี่ยน API contract ให้ใช้ executionProfile แทน caller-driven model selection, (B) สร้าง backend policy mapping layer, (C) เพิ่ม adaptive OCR residency, (D) เพิ่ม CPU fallback สำหรับ retrieval acceleration, และ (E) ปรับ BullMQ queue concurrency พร้อม verification suite ครอบคลุม big bang cutover gate ทั้ง 4 แกน
Technical Context
Language/Version: TypeScript 5.x (NestJS 10, Next.js 14), Python 3.11 (OCR sidecar FastAPI) Primary Dependencies:
- Backend: NestJS, BullMQ, TypeORM, CASL, class-validator, class-transformer
- Frontend: Next.js, TanStack Query, Zod, shadcn/ui
- Sidecar: FastAPI, PyMuPDF (fitz), typhoon-ocr, httpx, FlagEmbedding
- Infrastructure: Ollama (Desk-5439), Redis, MariaDB
Storage: MariaDB (ai_audit_logs, ai_prompts, ai_intent_patterns), Redis (BullMQ, cache)
Testing: Jest (backend unit/integration), Vitest (frontend), Pytest (sidecar)
Target Platform: QNAP NAS (backend/frontend containers), Desk-5439 (Ollama + OCR sidecar)
Performance Goals: OCR cold start < 5s (with residency), retrieval CPU fallback < 30s timeout
Constraints: Big bang rollout — no legacy parallel path; LLM-First GPU ownership must be enforced
Scale/Scope: Single-server AI stack on Desk-5439; BullMQ concurrency max
ai-realtime: 2,ai-batch: 1
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
| Rule | Status | Notes |
|---|---|---|
| ADR-019 UUID: no parseInt on UUID | ✅ Pass | No new UUID handling in this feature |
| ADR-009: No TypeORM migrations | ✅ Pass | No schema changes required |
| ADR-016 Security: CASL Guard on all API | ✅ Required | large-context profile must have CASL admin check |
| ADR-007 Error Handling: layered classification | ✅ Required | 400 (validation), 403 (profile auth), 504 (CPU timeout) |
| ADR-008 BullMQ: no inline jobs | ✅ Pass | Queue policy adjustment, not new inline processing |
| ADR-023/023A AI Boundary: no direct DB/storage | ✅ Pass | Policy layer stays in NestJS service |
| ADR-023A BullMQ 2-queue: ai-realtime + ai-batch | ✅ Required | concurrency adjustment within existing queues |
| ADR-002 Doc Numbering: Redis Redlock | ✅ N/A | Not applicable to this feature |
TypeScript: no any, no console.log |
✅ Required | All new TypeScript code must comply |
File headers: // File: path/filename |
✅ Required | All new files must have header |
No constitution violations. Proceeding to Phase 0.
Project Structure
Documentation (this feature)
specs/200-fullstacks/235-ai-runtime-policy-refactor/
├── spec.md # Feature specification
├── plan.md # This file
├── research.md # Phase 0 output
├── data-model.md # Phase 1 output
├── quickstart.md # Phase 1 output
├── tasks.md # Phase 2 output
├── checklists/
│ └── requirements.md
└── contracts/
├── create-ai-job.dto.ts.md
├── execution-policy.interface.ts.md
└── ocr-residency-policy.interface.ts.md
Source Code (repository root)
backend/src/modules/ai/
├── dto/
│ ├── create-ai-job.dto.ts # [MODIFY] เอา model.key ออก, เพิ่ม executionProfile
│ └── ai-job-response.dto.ts # [MODIFY] เพิ่ม modelUsed canonical name
├── services/
│ ├── ai.service.ts # [MODIFY] เพิ่ม profile validation + canonical name
│ ├── ai-policy.service.ts # [NEW] ExecutionProfile → RuntimePolicy mapping
│ ├── ocr.service.ts # [MODIFY] เพิ่ม adaptive residency calculation
│ └── vram-monitor.service.ts # [NEW] VRAM headroom query service
├── processors/
│ ├── ai-batch.processor.ts # [MODIFY] ใช้ policy จาก AiPolicyService
│ └── ai-realtime.processor.ts # [MODIFY] lightweight job classification + concurrency
├── interfaces/
│ ├── execution-policy.interface.ts # [NEW] RuntimePolicy type definition
│ └── ocr-residency.interface.ts # [NEW] OcrResidencyDecision type
├── guards/
│ └── execution-profile.guard.ts # [NEW] large-context profile admin check
└── ai.module.ts # [MODIFY] register new services + guard
backend/src/config/
└── bullmq.config.ts # [MODIFY] ai-realtime concurrency uplift config
specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/
├── app.py # [MODIFY] adaptive keep_alive, CPU fallback embed/rerank
├── services/
│ ├── vram_monitor.py # [NEW] VRAM headroom query via Ollama API
│ └── residency_policy.py # [NEW] keep_alive calculation policy
└── requirements.txt # [MODIFY] add nvidia-ml-py or pynvml if needed
frontend/
├── types/
│ └── ai.ts # [MODIFY] เอา model fields ออก, เพิ่ม executionProfile
├── lib/services/
│ └── admin-ai.service.ts # [MODIFY] update types + canonical name display
└── components/admin/ai/
└── OcrSandboxPromptManager.tsx # [MODIFY] แสดง canonical names ใน UI
backend/src/modules/ai/
└── tests/
├── ai-policy.service.spec.ts # [NEW] unit tests profile mapping
├── ocr-residency.spec.ts # [NEW] unit tests adaptive residency
└── execution-profile.guard.spec.ts # [NEW] unit tests CASL guard
Phases
Phase 1: Foundational — Policy Infrastructure
ต้องเสร็จก่อน workstream อื่นทั้งหมด:
- สร้าง
VramMonitorService— query VRAM headroom จาก Ollama/api/psendpoint - สร้าง
AiPolicyService— mappingExecutionProfile→RuntimePolicy - สร้าง
ExecutionProfileGuard— CASL check สำหรับlarge-context - แก้
CreateAiJobDto— เอาmodel.key+ parameter overrides ออก - แก้
vram_monitor.pyบน sidecar — query GPU headroom
Phase 2: Contract & Canonical Naming (Workstream A)
- แก้
AiService— validate profile, override data-affecting jobs, log canonical names - แก้
ai-job-response.dto.ts—modelUsedเป็น canonical name - แก้ Frontend types และ Admin Console UI — แสดง canonical names
- เพิ่ม rejection tests สำหรับ
model.keyและ parameter overrides
Phase 3: Adaptive OCR Residency (Workstream B)
- แก้
OcrService— injectVramMonitorService, คำนวณkeep_aliveแบบ dynamic - แก้
residency_policy.pyบน sidecar — รับkeep_aliveจาก backend policy - เพิ่ม unit tests residency scenarios
Phase 4: Retrieval Acceleration (Workstream C)
- แก้
app.py— เพิ่ม GPU headroom check ใน/embedและ/rerank - เพิ่ม CPU fallback path พร้อม log
- แก้
ai-batch.processor.tsสำหรับ RAG query fallback handling
Phase 5: Queue Policy (Workstream D)
- แก้
bullmq.config.ts—ai-realtimeconcurrency = 2 (configurable) - แก้
ai-realtime.processor.ts— classify lightweight vs generation-heavy jobs - ตรวจว่า
rag-queryถูก route ไปai-batchเท่านั้น
Phase 6: Verification & Cutover (Workstream E)
- รวม test suite ทั้ง 4 แกน
- Manual validation checklist (Admin Console, OCR Sandbox)
- Cutover gate verification
Complexity Tracking
ไม่มี constitution violations ที่ต้องอธิบาย