lcbp3/specs/200-fullstacks/235-ai-runtime-policy-refactor/tasks.md

// File: specs/200-fullstacks/235-ai-runtime-policy-refactor/tasks.md
// Change Log:
// - 2026-06-11: Initial task list for AI Runtime Policy Refactor
// - 2026-06-11: เพิ่ม T040/T041 สำหรับ delta SQL (ai_execution_profiles, ai_audit_logs extension)
// - 2026-06-11: อัปเดต T001 (AiJobPayload, JobType), T005 (snapshot), T010 (snapshotParams)

# Tasks: AI Runtime Policy Refactor

**Input**: Design documents from `specs/200-fullstacks/235-ai-runtime-policy-refactor/`
**Prerequisites**: plan.md ✅, spec.md ✅, research.md ✅, data-model.md ✅, contracts/ ✅

## Format: `[ID] [P?] [Story] Description`

- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: US1=Contract&Naming, US2=OCR Residency, US3=Retrieval Fallback, US4=Queue Policy, US5=Verification

---

## Phase 1: Setup (Shared Infrastructure)

**Purpose**: สร้าง foundational types และ interfaces ก่อน workstream ทุกอัน

- [x] T001 สร้าง interface file `backend/src/modules/ai/interfaces/execution-policy.interface.ts` — `ExecutionProfile` type (`interactive|standard|quality|deep-analysis`), `PublicJobType`, `InternalJobType`, `RuntimePolicy`, `OcrRuntimePolicy`, `AiJobPayload` (snapshot params), `VramHeadroom`
- [x] T002 สร้าง interface file `backend/src/modules/ai/interfaces/ocr-residency.interface.ts` (OcrResidencyDecision interface)
- [x] T003 [P] สร้าง `backend/src/modules/ai/services/vram-monitor.service.ts` — query Ollama `/api/ps` เพื่อคำนวณ VRAM headroom
- [x] T004 [P] สร้าง `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/services/vram_monitor.py` — Python VRAM headroom query via Ollama `/api/ps`

---

## Phase 2: Foundational (Blocking Prerequisites)

**Purpose**: Policy infrastructure ที่ทุก workstream ต้องพึ่งพา — MUST complete ก่อนทุก user story

**⚠️ CRITICAL**: No user story work can begin until this phase is complete

- [x] T040 [P] Apply delta SQL `specs/03-Data-and-Storage/deltas/2026-06-11-create-ai-execution-profiles.sql` — สร้าง table `ai_execution_profiles` + seed 4 profiles; ตรวจว่ามี row `interactive`, `standard`, `quality`, `deep-analysis` ใน DB (**MUST apply ก่อน** T005 อ่าน table นี้)
- [x] T041 [P] Apply delta SQL `specs/03-Data-and-Storage/deltas/2026-06-11-extend-ai-audit-logs-runtime-policy.sql` — เพิ่ม columns `effective_profile`, `canonical_model`, `snapshot_params_json` ใน `ai_audit_logs`; ตรวจด้วย `SHOW COLUMNS` (**MUST apply ก่อน** T010 เขียนลง columns เหล่านี้)
- [x] T005 สร้าง `backend/src/modules/ai/services/ai-policy.service.ts` — `InternalJobType` → `ExecutionProfile` mapping, อ่าน `ai_execution_profiles` จาก DB (Redis cache TTL 60s), snapshot `RuntimePolicy` parameters ลง `AiJobPayload` ตอน dispatch (FR-A09)
- [x] T006 ~~ลบออก~~ ExecutionProfileGuard ไม่จำเป็นแล้ว — ไม่มี caller input เลย (Option B) *skip task นี้*
- [x] T007 [P] แก้ `backend/src/modules/ai/dto/create-ai-job.dto.ts` — เอา `model.key`, `executionProfile`, `temperature`, `top_p`, `maxTokens` ออกทั้งหมด; เหลือเฉพาะ `type`, `documentPublicId`, `attachmentPublicId`; เพิ่ม `@IsForbidden()` validator หรือ forbidden field check ใน pipe
- [x] T008 สร้าง `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/services/residency_policy.py` — OCR keep_alive calculation function
- [x] T009 แก้ `backend/src/modules/ai/ai.module.ts` — register `AiPolicyService`, `VramMonitorService` (ลบ `ExecutionProfileGuard` ออก)

**Checkpoint**: Foundation ready — delta SQL applied, policy services + updated DTO available

---

## Phase 3: User Story 1 — Policy Contract & Canonical Naming (P1) 🎯 MVP

**Goal**: API reject `model.key`/parameter overrides; ทุก layer แสดง canonical names; data-affecting jobs ถูก override

**Independent Test**: ยิง POST ด้วย `model.key` → ต้องได้ 400; ยิงด้วย `executionProfile: "balanced"` → ต้องได้ 201 + `modelUsed: "np-dms-ai"`

### Implementation for User Story 1

- [x] T010 [US1] แก้ `backend/src/modules/ai/ai.service.ts` — inject `AiPolicyService`, กำหนด `effectiveProfile` อัตโนมัติจาก `job.type`, บันทึก `effectiveProfile` + `modelUsed` + `snapshotParams` ลง `ai_audit_logs` (FR-A08, FR-A09) — ไม่มี `requestedProfile` แล้ว
- [x] T011 [P] [US1] แก้ `backend/src/modules/ai/dto/ai-job-response.dto.ts` — เพิ่ม `modelUsed: 'np-dms-ai' | 'np-dms-ocr'` field, เพิ่ม `executionProfile` field (effective profile หลัง override)
- [x] T012 [P] [US1] แก้ `backend/src/modules/ai/ai.controller.ts` — validate forbidden fields (`model.*`, `executionProfile`, `temperature` ฯลฯ) ใน pipe — ไม่ต้อง guard แล้ว เพราะ DTO ทำไว้แล้ว
- [x] T013 [P] [US1] แก้ `frontend/types/ai.ts` — เอา `model` field ออก, เพิ่ม `executionProfile?: ExecutionProfile`, เพิ่ม `modelUsed?: string`
- [x] T014 [US1] แก้ `frontend/lib/services/admin-ai.service.ts` — update request/response types ให้สอดคล้องกับ DTO ใหม่
- [x] T015 [P] [US1] แก้ `frontend/components/admin/ai/OcrSandboxPromptManager.tsx` — แสดง `np-dms-ai` / `np-dms-ocr` แทนชื่อ runtime ใน result cards และ model info
- [x] T016 [US1] แก้ `frontend/app/(admin)/admin/ai/page.tsx` — แสดง canonical names ใน System Health panel และ model status cards

**Checkpoint**: US1 fully functional — policy contract enforced, canonical naming in all layers

---

## Phase 4: User Story 2 — Adaptive OCR Residency (P2)

**Goal**: `OcrService` คำนวณ `keep_alive` dynamic ตาม VRAM headroom + active profile; sidecar รับค่าและใช้

**Independent Test**: ดู log จาก OCR job ใน high-pressure scenario → `keep_alive=0`; ใน headroom-sufficient scenario → `keep_alive>0`

### Implementation for User Story 2

- [x] T017 [US2] แก้ `backend/src/modules/ai/services/ocr.service.ts` — inject `VramMonitorService` และ `AiPolicyService`, เพิ่ม `calculateOcrResidency()` method, ส่ง `keep_alive` ที่คำนวณได้ไปใน OCR sidecar request, log `OcrResidencyDecision`
- [x] T018 [P] [US2] แก้ `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py` — รับ `keep_alive` parameter จาก request body แทน hardcode `keep_alive=0`, ส่ง `keep_alive` ค่านั้นไปใน Ollama `/v1/chat/completions` call
- [x] T019 [P] [US2] เพิ่ม env variables ใน docker-compose ของ Desk-5439 OCR sidecar — `VRAM_HEADROOM_THRESHOLD_MB`, `OCR_RESIDENCY_WINDOW_SECONDS`, `GPU_TOTAL_VRAM_MB`
- [x] T020 [US2] เพิ่ม unit tests `backend/src/modules/ai/tests/ocr-residency.spec.ts` — scenarios: large-context-active, high-pressure, headroom-sufficient, query-failed fallback

**Checkpoint**: US2 functional — OCR keep_alive computed dynamically per policy

---

## Phase 5: User Story 3 — Retrieval Acceleration with CPU Fallback (P3)

**Goal**: `/embed` และ `/rerank` บน sidecar ตรวจ VRAM headroom; fallback CPU ถ้าไม่พอ; log fallback decision

**Independent Test**: จำลอง GPU pressure → ยิง `/embed` → ต้องได้ผลลัพธ์ (ไม่ fail) + log `device: "cpu"`

### Implementation for User Story 3

- [x] T021 [P] [US3] แก้ `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py` — เพิ่ม VRAM headroom check ใน `POST /embed` endpoint; ถ้าผ่าน threshold ใช้ GPU, ถ้าไม่ผ่านหรือ query ล้มเหลว ใช้ CPU; log `device` และ `reason`
- [x] T022 [P] [US3] แก้ `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py` — เพิ่ม VRAM headroom check ใน `POST /rerank` endpoint; CPU fallback logic เหมือน `/embed`; เพิ่ม timeout guard (504 response ถ้า CPU timeout)
- [x] T023 [US3] แก้ `backend/src/modules/ai/processors/ai-batch.processor.ts` — รอง handle กรณีที่ `/embed` หรือ `/rerank` ตอบ `device: "cpu"` ใน response; log `retrievalDevice` ลง ai_audit_logs metadata
- [x] T024 [P] [US3] สร้าง `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests/test_retrieval_fallback.py` — pytest tests สำหรับ CPU fallback behavior ของ `/embed` และ `/rerank`

**Checkpoint**: US3 functional — retrieval never hard-fails due to GPU pressure

---

## Phase 6: User Story 4 — Queue Policy & Selective Realtime Concurrency (P4)

**Goal**: `ai-realtime` concurrency = 2 สำหรับ lightweight jobs; `rag-query` route ไป `ai-batch`; pause/resume ยังทำงาน

**Independent Test**: ส่ง 2 intent-classify jobs พร้อมกัน → ทั้งสองรันพร้อมกัน; ส่ง rag-query → ไปอยู่ใน `ai-batch`

### Implementation for User Story 4

- [x] T025 [US4] แก้ `backend/src/config/bullmq.config.ts` — เพิ่ม `REALTIME_CONCURRENCY` env variable (default: 2); ปรับ `ai-realtime` worker concurrency ให้ configurable
- [x] T026 [US4] แก้ `backend/src/modules/ai/processors/ai-realtime.processor.ts` — เพิ่ม job type classification: `LIGHTWEIGHT_REALTIME_JOBS = ['intent-classify', 'tool-suggest']`; generation-heavy jobs ถูก redirect ไป `ai-batch` ถ้าเข้ามาผิด queue; เพิ่ม log สำหรับ classification decision
- [x] T027 [P] [US4] ตรวจสอบ `backend/src/modules/ai/ai.service.ts` — ยืนยันว่า `rag-query` ถูก dispatch ไป `ai-batch` เสมอ (ไม่ใช่ `ai-realtime`); เพิ่ม explicit assertion ใน dispatch logic
- [x] T028 [P] [US4] เพิ่ม unit tests `backend/src/modules/ai/tests/queue-policy.spec.ts` — ทดสอบ job classification, rag-query routing, lightweight job concurrency

**Checkpoint**: US4 functional — selective concurrency active, rag-query always in ai-batch

---

## Phase 7: User Story 5 — Verification & Cutover Gate (P5)

**Goal**: Test suite ครอบ 4 แกน cutover gate ทั้งหมด; manual validation checklist พร้อม; Admin Console / OCR Sandbox แสดงถูกต้อง

**Independent Test**: `pnpm test -- --testPathPattern="ai-policy|ocr-residency|execution-profile|queue-policy"` ทุก test ผ่าน 100%

### Implementation for User Story 5

- [x] T029 [US5] สร้าง `backend/src/modules/ai/tests/ai-policy.service.spec.ts` — unit tests ครอบ: `job.type` → `effectiveProfile` mapping ทุก job type, canonical name mapping, forbidden fields rejection (400), audit log มี `effectiveProfile` + `modelUsed` และไม่มี `requestedProfile` (FR-A08)
- [x] T030 [US5] ~~ExecutionProfileGuard tests — skip~~ แทนที่: เพิ่ม integration test สำหรับ forbidden fields validation ใน `ai.controller.spec.ts` — ตรวจว่า `model.*` และ `executionProfile` ใน payload → 400
- [x] T031 [P] [US5] สร้าง `backend/src/modules/ai/tests/vram-monitor.service.spec.ts` — unit tests: successful query, Ollama timeout fallback, empty models response
- [ ] T032 [US5] ทดสอบ manual validation ตาม `quickstart.md` — รัน curl commands ทั้ง Gate 1–4, ตรวจ Admin Console labels, ตรวจ OCR Sandbox behavior; บันทึกผลใน checklist
- [x] T033 [P] [US5] อัปเดต env template ไฟล์ `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/.env.template` — เพิ่ม `VRAM_HEADROOM_THRESHOLD_MB`, `OCR_RESIDENCY_WINDOW_SECONDS`, `GPU_TOTAL_VRAM_MB`, `GPU_MAIN_MODEL_PRESSURE_THRESHOLD_MB`, `REALTIME_CONCURRENCY`
- [x] T034 [P] [US5] อัปเดต `backend/.env.example` — เพิ่ม `AI_VRAM_HEADROOM_THRESHOLD_MB`, `AI_GPU_MAIN_MODEL_PRESSURE_THRESHOLD_MB`, `AI_OCR_RESIDENCY_WINDOW_SECONDS`, `AI_REALTIME_CONCURRENCY`

**Checkpoint**: All 5 user stories complete — big bang cutover gate ready for validation

---

## Phase 8: Polish & Cross-Cutting Concerns

- [x] T039 [US1] แก้ `backend/src/modules/ai/processors/ai-batch.processor.ts` — เปลี่ยน `ocrUsed` label value จาก `"Typhoon OCR"` / `"PaddleOCR"` เป็น `"np-dms-ocr"` ใน Redis completed result (ครอบคลุม FR-A07: canonical names ทุก layer รวมถึง OCR Sandbox badge) — verified: engineUsed ค่า canonical แล้ว (`typhoon-np-dms-ocr`, `tesseract`, `fast-path`); frontend badge แสดง `np-dms-ocr` ถูกต้อง
- [x] T035 [P] ตรวจสอบ i18n keys ที่ต้องเพิ่มใน `frontend/public/locales/` สำหรับ error messages ใหม่ (400 model.key, 403 large-context, 504 CPU timeout) — เพิ่ม `ai_runtime_policy` namespace ใน en/ai.json และ th/ai.json
- [x] T036 อัปเดต CONTEXT.md และ AGENTS.md — เพิ่ม `np-dms-ai` / `np-dms-ocr` เป็น canonical identity ใน System readiness summary; เพิ่ม ADR-034 ใน ADRs table
- [x] T037 [P] ตรวจสอบ ADR-034 references ทั้งหมดใน codebase ด้วย search — ไม่พบ `typhoon*:latest` ใน user-facing surfaces (frontend TS/TSX); พบใน ops internals (ollama.service.ts, ai-settings.service.ts, test files) ซึ่งถูกต้องตามนโยบาย
- [x] T038 รัน `pnpm lint` และ `pnpm type-check` สำหรับ backend และ frontend — แก้ทุก error ก่อน cutover — ESLint + tsc --noEmit ผ่านครบ ไม่มี error

---

## Dependencies & Execution Order

### Phase Dependencies

- **Setup (Phase 1)**: ไม่มี dependency — เริ่มได้ทันที
- **Foundational (Phase 2)**: ต้องรอ Phase 1 (T001, T002) — BLOCKS ทุก user story; **T040/T041 (delta SQL) MUST apply ก่อน** T005 และ T010
- **US1 (Phase 3)**: ต้องรอ Phase 2 complete — สำคัญสุด, ทำก่อน
- **US2 (Phase 4)**: ต้องรอ Phase 2 complete — ขึ้นกับ `VramMonitorService` จาก T003
- **US3 (Phase 5)**: ต้องรอ Phase 2 complete — ขึ้นกับ `vram_monitor.py` จาก T004
- **US4 (Phase 6)**: ต้องรอ Phase 2 complete — independent จาก US1/US2/US3
- **US5 (Phase 7)**: ต้องรอ US1+US2+US3+US4 complete (ทดสอบทุกแกน)
- **Polish (Phase 8)**: ต้องรอ US5 ผ่าน cutover gate

### User Story Dependencies

- **US1 (P1)**: ต้อง complete ก่อน — contract เป็น foundation ของ canonical naming ทุกชั้น
- **US2 (P2)**: ขึ้นกับ `VramMonitorService` (T003, Phase 1) เท่านั้น — parallel กับ US1 ได้
- **US3 (P3)**: ขึ้นกับ `vram_monitor.py` (T004, Phase 1) เท่านั้น — parallel กับ US1/US2 ได้
- **US4 (P4)**: Independent จาก US1/US2/US3 — parallel ได้หลัง Phase 2
- **US5 (P5)**: ต้องรอทุก US ก่อนหน้า

### Parallel Opportunities

- T001 + T002: parallel (different files)
- T003 + T004: parallel (different stacks)
- T040 + T041: parallel (different tables) — ต้องรอ Phase 1 และ MUST apply ก่อน T005/T010
- T005, T006, T007: T005 ทำก่อน (T006, T007 ขึ้นกับ types จาก T005); T040 ต้อง complete ก่อน T005
- US1 + US2 + US3 + US4: parallel หลัง Phase 2 complete (ถ้ามีทีม)
- T029, T030, T031, T033, T034: parallel (different test files / env files)

---

## Implementation Strategy

### MVP First (US1 Only)

1. Phase 1: Setup (T001–T004)
2. Phase 2: Foundational (T005–T009)
3. Phase 3: US1 (T010–T016)
4. **STOP & VALIDATE**: ยิง curl ตาม Gate 1 และ Gate 2 ใน quickstart.md
5. Deploy/validate canonical naming ใน Admin Console

### Incremental Delivery

1. Phase 1+2 → Foundation
2. US1 → Policy contract + canonical naming (MVP)
3. US2 → Adaptive OCR residency
4. US3 → Retrieval CPU fallback
5. US4 → Queue policy
6. US5 → Full cutover gate verification

### Total Task Count

- **Total**: 41 tasks
- **US1**: 7 tasks (T010–T016)
- **US2**: 4 tasks (T017–T020)
- **US3**: 4 tasks (T021–T024)
- **US4**: 4 tasks (T025–T028)
- **US5**: 6 tasks (T029–T034)
- **Setup**: 4 tasks (T001–T004)
- **Foundational**: 7 tasks (T040, T041, T005–T009)
- **Polish**: 4 tasks (T035–T038)