lcbp3/specs/300-others/302-ai-model-revision/plan.md

# Implementation Plan: AI Model Revision (ADR-023A)

**Branch**: `main` | **Date**: 2026-05-15 | **Spec**: [spec.md](./spec.md)
**Feature Dir**: `specs/300-others/302-ai-model-revision/`

---

## Summary

Implement ADR-023A AI Architecture Revision: เปลี่ยน model stack จาก 3-model (gemma4:9b + Typhoon + nomic-embed-text) เป็น 2-model (gemma4:e2b + nomic-embed-text), แยก BullMQ เป็น 2 queues (`ai-realtime`/`ai-batch`), เพิ่ม OCR auto-detection, enforce multi-tenant QdrantService, implement Legacy Migration pipeline และ migration_review_queue, และลบ Typhoon Cloud API ออกจาก codebase ทั้งหมด

---

## Technical Context

**Language/Version**: TypeScript 5.x (strict mode)
**Primary Dependencies**:
- Backend: NestJS 10, BullMQ 5, TypeORM 0.3, ioredis (Redis 7), @qdrant/js-client-rest
- AI Infrastructure: Ollama (Desk-5439), PaddleOCR, PyMuPDF (Python sidecar)
- Queue: Redis 7 (same instance as existing BullMQ)
**Storage**: MariaDB (existing) + Qdrant (external vector DB) + Local Storage (existing)
**Testing**: Jest (NestJS unit/integration)
**Target Platform**: QNAP NAS (NestJS container) + Admin Desktop Desk-5439 (Ollama)
**Performance Goals**: ai-suggest < 30s; rag-query < 10s (p95 dequeue-to-response)
**Constraints**: VRAM ≤ 3GB peak, concurrency=1 per queue (prevent GPU overflow)
**Scale/Scope**: ~20,000 legacy docs (migration), ~50 new docs/day (production)

---

## Constitution Check

_GATE: Must pass before Phase 0 research._

| Rule | Status | Notes |
|------|--------|-------|
| ADR-019 UUID: no parseInt on UUID | ✅ PASS | BullMQ payloads ใช้ `publicId: string` เสมอ |
| ADR-009: no TypeORM migrations | ✅ PASS | `migration_review_queue` ผ่าน SQL delta (#14) |
| ADR-016: RBAC on all endpoints | ✅ PASS | AI endpoints จะมี CASL guard: `ai.manage` |
| ADR-007: error handling layered | ✅ PASS | BullMQ failed jobs → dead-letter + log |
| ADR-008: BullMQ for async | ✅ PASS | Inference ทั้งหมดผ่าน BullMQ (ไม่มี inline) |
| ADR-023/023A: no direct Ollama | ✅ PASS | n8n → DMS API → BullMQ → Ollama เท่านั้น |
| ADR-023A: QdrantService required projectPublicId | ✅ PASS | Enforce ที่ TypeScript compile-time |
| TypeScript strict: no `any`, no `console.log` | ✅ PASS | Enforced ผ่าน eslint |
| **Typhoon Cloud API removal** | ⚠️ PENDING | `rag/typhoon.service.ts` ต้อง delete (T002) |

---

## Project Structure

### Documentation (this feature)

```text
specs/300-others/302-ai-model-revision/
├── spec.md              ✅ done
├── plan.md              ✅ this file
├── research.md          ✅ done
├── data-model.md        ✅ done
├── quickstart.md        (Phase 1)
├── contracts/           (Phase 1)
│   ├── ai-jobs.yaml
│   └── migration-queue.yaml
├── checklists/
│   └── requirements.md  ✅ done
└── tasks.md             (Phase 2 — speckit-tasks)
```

### Schema Delta (ADR-009)

```text
specs/03-Data-and-Storage/deltas/
└── 14-add-migration-review-queue.sql    # new
```

### Source Code

```text
backend/src/modules/ai/
├── ai.module.ts                         # update: register 2 queues, remove Typhoon
├── ai.controller.ts                     # update: add /migration/queue endpoint
├── ai.service.ts                        # update: routing logic, queue selection
├── processors/
│   ├── ai-realtime.processor.ts         # new: ai-realtime consumer
│   └── ai-batch.processor.ts            # new: ai-batch consumer (replaces existing)
├── services/
│   ├── ollama.service.ts                # update: model → gemma4:e2b
│   ├── qdrant.service.ts                # update: enforce projectPublicId param
│   ├── ocr.service.ts                   # new: OCR auto-detect + PaddleOCR routing
│   ├── migration.service.ts             # new: Legacy Migration pipeline
│   └── embedding.service.ts            # new: full-doc chunking + embed
├── dto/
│   ├── create-ai-job.dto.ts             # update: queue discriminator field
│   ├── migration-queue-item.dto.ts      # new
│   └── rag-query.dto.ts                 # new
├── entities/
│   └── migration-review-queue.entity.ts # new
└── rag/
    ├── rag.service.ts                   # update: remove typhoon ref, use QdrantService
    └── typhoon.service.ts               # DELETE ← Tier 1 critical

backend/src/config/
└── bullmq.config.ts                     # update: add ai-batch queue config

frontend/app/(dashboard)/ai-staging/
├── page.tsx                             # update: add migration queue tab
└── migration-review/
    └── page.tsx                         # new: Admin Migration Review UI

frontend/components/ai/
├── ai-suggestion-field.tsx              # update: confidence threshold display
├── migration-queue-table.tsx            # new: queue list + approve/reject
└── AiStatusBanner.tsx                   # update: show queue status (ai-batch paused)
```

---

## Phases

### Phase 0: Cleanup & Foundation (Tier 1 Critical First)

**Goal**: ลบ Typhoon ออก, ตั้ง BullMQ 2-queue, สร้าง Schema Delta

Tasks: T001–T008

### Phase 1: Core AI Pipeline

**Goal**: OCR auto-detect, gemma4:e2b integration, ai-suggest + embed-document flows

Tasks: T009–T022

### Phase 2: RAG Pipeline

**Goal**: QdrantService multi-tenancy, chunking, rag-query endpoint

Tasks: T023–T030

### Phase 3: Legacy Migration Pipeline

**Goal**: migration_review_queue, n8n API endpoint, Admin Review UI

Tasks: T031–T042

### Phase 4: Monitoring & Threshold Management

**Goal**: Admin Dashboard AI metrics, threshold config, audit log delete permission

Tasks: T043–T050

---

## Complexity Tracking

| Violation | Why Needed | Simpler Alternative Rejected Because |
|-----------|-----------|-------------------------------------|
| 2-queue BullMQ (vs single) | RAG SLA requires isolation from batch jobs | Single queue + priority ไม่ป้องกัน long-running job block |
| External Qdrant (vs SQL FTS) | Semantic search capability ไม่มีใน MariaDB FULLTEXT | MariaDB FTS ไม่รองรับ multilingual semantic similarity |
| Python sidecar OCR | PaddleOCR เป็น Python library ไม่มี Node.js binding | ไม่มีทางเลือก OCR ภาษาไทยที่เทียบเท่าใน Node.js ecosystem |

---

## 🔗 Cross-Spec Dependencies

### Dependencies กับ 204-rfa-approval-refactor

| Component | Impact | Coordination |
|-----------|--------|--------------|
| **BullMQ Queues** | `ai-realtime` และ `ai-batch` จะถูกใช้โดย RFA Reminder/Escalation | ตรวจสอบว่า RFA jobs ไม่ชนกับ AI jobs — ใช้ queue name prefix หรือ priority |
| **QdrantService** | อาจถูกใช้สำหรับ RFA document context | ตรวจสอบว่า RFA ใช้ projectPublicId filter ถูกต้อง |
| **Ollama GPU** | Shared resource กับ RFA (ถ้ามี AI features) | ตรวจสอบว่าไม่มี GPU contention |

### Shared Infrastructure

- **BullMQ**: 2 queues (`ai-realtime`, `ai-batch`) — RFA อาจเพิ่ม `rfa-reminders` queue
- **Redis**: ใช้ instance เดียวกัน — ตรวจสอบ memory usage
- **Audit Logs**: ใช้ table เดียวกัน — ตรวจสอบ action types ไม่ซ้ำกัน

### Deployment Sequence

1. Phase 0-1 ของ AI Model Revision (BullMQ 2-queue foundation)
2. Phase 1-3 ของ RFA Approval Refactor (ใช้ BullMQ ที่ setup แล้ว)
3. ทดสอบ integration ก่อน deploy ทั้งสอง features พร้อมกัน