np-dms/lcbp3

Fork 0

Files

T

admin 71c5e88181

CI / CD Pipeline / build (push) Has been skipped

Details

CI / CD Pipeline / deploy (push) Has been skipped

Details

690611:1705 ADR-035-235 #00 [skip CI]

2026-06-11 17:05:17 +07:00

8.4 KiB

Raw Blame History

// File: specs/200-fullstacks/235-ai-runtime-policy-refactor/plan.md // Change Log: // - 2026-06-11: Initial implementation plan for AI Runtime Policy Refactor

Implementation Plan: AI Runtime Policy Refactor

Branch: 235-ai-runtime-policy-refactor | Date: 2026-06-11 | Spec: spec.md Input: Feature specification from specs/200-fullstacks/235-ai-runtime-policy-refactor/spec.md

Summary

Refactor AI runtime ของ LCBP3-DMS ให้รองรับ GPU ใหม่ (RTX 5060 Ti 16GB) โดย: (A) เปลี่ยน API contract ให้ใช้ executionProfile แทน caller-driven model selection, (B) สร้าง backend policy mapping layer, (C) เพิ่ม adaptive OCR residency, (D) เพิ่ม CPU fallback สำหรับ retrieval acceleration, และ (E) ปรับ BullMQ queue concurrency พร้อม verification suite ครอบคลุม big bang cutover gate ทั้ง 4 แกน

Technical Context

Language/Version: TypeScript 5.x (NestJS 10, Next.js 14), Python 3.11 (OCR sidecar FastAPI) Primary Dependencies:

Backend: NestJS, BullMQ, TypeORM, CASL, class-validator, class-transformer
Frontend: Next.js, TanStack Query, Zod, shadcn/ui
Sidecar: FastAPI, PyMuPDF (fitz), typhoon-ocr, httpx, FlagEmbedding
Infrastructure: Ollama (Desk-5439), Redis, MariaDB Storage: MariaDB (ai_audit_logs, ai_prompts, ai_intent_patterns), Redis (BullMQ, cache) Testing: Jest (backend unit/integration), Vitest (frontend), Pytest (sidecar) Target Platform: QNAP NAS (backend/frontend containers), Desk-5439 (Ollama + OCR sidecar) Performance Goals: OCR cold start < 5s (with residency), retrieval CPU fallback < 30s timeout Constraints: Big bang rollout — no legacy parallel path; LLM-First GPU ownership must be enforced Scale/Scope: Single-server AI stack on Desk-5439; BullMQ concurrency max ai-realtime: 2, ai-batch: 1

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Rule	Status	Notes
ADR-019 UUID: no parseInt on UUID	✅ Pass	No new UUID handling in this feature
ADR-009: No TypeORM migrations	✅ Pass	No schema changes required
ADR-016 Security: CASL Guard on all API	✅ Required	`large-context` profile must have CASL admin check
ADR-007 Error Handling: layered classification	✅ Required	400 (validation), 403 (profile auth), 504 (CPU timeout)
ADR-008 BullMQ: no inline jobs	✅ Pass	Queue policy adjustment, not new inline processing
ADR-023/023A AI Boundary: no direct DB/storage	✅ Pass	Policy layer stays in NestJS service
ADR-023A BullMQ 2-queue: ai-realtime + ai-batch	✅ Required	concurrency adjustment within existing queues
ADR-002 Doc Numbering: Redis Redlock	✅ N/A	Not applicable to this feature
TypeScript: no `any`, no `console.log`	✅ Required	All new TypeScript code must comply
File headers: `// File: path/filename`	✅ Required	All new files must have header

No constitution violations. Proceeding to Phase 0.

Project Structure

Documentation (this feature)

specs/200-fullstacks/235-ai-runtime-policy-refactor/
├── spec.md               # Feature specification
├── plan.md               # This file
├── research.md           # Phase 0 output
├── data-model.md         # Phase 1 output
├── quickstart.md         # Phase 1 output
├── tasks.md              # Phase 2 output
├── checklists/
│   └── requirements.md
└── contracts/
    ├── create-ai-job.dto.ts.md
    ├── execution-policy.interface.ts.md
    └── ocr-residency-policy.interface.ts.md

Source Code (repository root)

backend/src/modules/ai/
├── dto/
│   ├── create-ai-job.dto.ts          # [MODIFY] เอา model.key ออก, เพิ่ม executionProfile
│   └── ai-job-response.dto.ts        # [MODIFY] เพิ่ม modelUsed canonical name
├── services/
│   ├── ai.service.ts                  # [MODIFY] เพิ่ม profile validation + canonical name
│   ├── ai-policy.service.ts           # [NEW] ExecutionProfile → RuntimePolicy mapping
│   ├── ocr.service.ts                 # [MODIFY] เพิ่ม adaptive residency calculation
│   └── vram-monitor.service.ts        # [NEW] VRAM headroom query service
├── processors/
│   ├── ai-batch.processor.ts          # [MODIFY] ใช้ policy จาก AiPolicyService
│   └── ai-realtime.processor.ts       # [MODIFY] lightweight job classification + concurrency
├── interfaces/
│   ├── execution-policy.interface.ts  # [NEW] RuntimePolicy type definition
│   └── ocr-residency.interface.ts     # [NEW] OcrResidencyDecision type
├── guards/
│   └── execution-profile.guard.ts     # [NEW] large-context profile admin check
└── ai.module.ts                       # [MODIFY] register new services + guard

backend/src/config/
└── bullmq.config.ts                   # [MODIFY] ai-realtime concurrency uplift config

specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/
├── app.py                             # [MODIFY] adaptive keep_alive, CPU fallback embed/rerank
├── services/
│   ├── vram_monitor.py                # [NEW] VRAM headroom query via Ollama API
│   └── residency_policy.py           # [NEW] keep_alive calculation policy
└── requirements.txt                   # [MODIFY] add nvidia-ml-py or pynvml if needed

frontend/
├── types/
│   └── ai.ts                          # [MODIFY] เอา model fields ออก, เพิ่ม executionProfile
├── lib/services/
│   └── admin-ai.service.ts            # [MODIFY] update types + canonical name display
└── components/admin/ai/
    └── OcrSandboxPromptManager.tsx    # [MODIFY] แสดง canonical names ใน UI

backend/src/modules/ai/
└── tests/
    ├── ai-policy.service.spec.ts       # [NEW] unit tests profile mapping
    ├── ocr-residency.spec.ts           # [NEW] unit tests adaptive residency
    └── execution-profile.guard.spec.ts # [NEW] unit tests CASL guard

Phases

Phase 1: Foundational — Policy Infrastructure

ต้องเสร็จก่อน workstream อื่นทั้งหมด:

สร้าง VramMonitorService — query VRAM headroom จาก Ollama /api/ps endpoint
สร้าง AiPolicyService — mapping ExecutionProfile → RuntimePolicy
สร้าง ExecutionProfileGuard — CASL check สำหรับ large-context
แก้ CreateAiJobDto — เอา model.key + parameter overrides ออก
แก้ vram_monitor.py บน sidecar — query GPU headroom

Phase 2: Contract & Canonical Naming (Workstream A)

แก้ AiService — validate profile, override data-affecting jobs, log canonical names
แก้ ai-job-response.dto.ts — modelUsed เป็น canonical name
แก้ Frontend types และ Admin Console UI — แสดง canonical names
เพิ่ม rejection tests สำหรับ model.key และ parameter overrides

Phase 3: Adaptive OCR Residency (Workstream B)

แก้ OcrService — inject VramMonitorService, คำนวณ keep_alive แบบ dynamic
แก้ residency_policy.py บน sidecar — รับ keep_alive จาก backend policy
เพิ่ม unit tests residency scenarios

Phase 4: Retrieval Acceleration (Workstream C)

แก้ app.py — เพิ่ม GPU headroom check ใน /embed และ /rerank
เพิ่ม CPU fallback path พร้อม log
แก้ ai-batch.processor.ts สำหรับ RAG query fallback handling

Phase 5: Queue Policy (Workstream D)

แก้ bullmq.config.ts — ai-realtime concurrency = 2 (configurable)
แก้ ai-realtime.processor.ts — classify lightweight vs generation-heavy jobs
ตรวจว่า rag-query ถูก route ไป ai-batch เท่านั้น

Phase 6: Verification & Cutover (Workstream E)

รวม test suite ทั้ง 4 แกน
Manual validation checklist (Admin Console, OCR Sandbox)
Cutover gate verification

Complexity Tracking

ไม่มี constitution violations ที่ต้องอธิบาย

8.4 KiB Raw Blame History