690609:2223 Prepare to MOD AI flow [skip CI]

2026-06-09 22:23:59 +07:00
parent 75d07b5ac9
commit cd7d20ccd4
13 changed files with 671 additions and 2 deletions
@@ -1,5 +1,36 @@
 # Version History

+## 1.9.10 (2026-06-08)
+
+### bugfix(ai): Fix LLM JSON Response Truncation in OCR Sandbox & Migration
+
+#### Summary
+
+แก้ไขปัญหา LLM JSON Response Truncation ใน OCR Sandbox Step 2 และ Migration Pipeline โดยการขยายขนาดหน้าต่างบริบท `num_ctx` ของ Ollama เป็น `16384` สำหรับงานสกัดข้อมูล (ดำเนินการแก้ไขโดย AGY Gemini 3.5 Flash (Medium))
+
+#### Changes
+
+- **Ollama Context Window Expansion**: เพิ่มพารามิเตอร์ `num_ctx: 16384` ใน `processSandboxExtract` และ `processSandboxAiExtract` สำหรับงานสกัดข้อมูลใน Sandbox เพื่อรองรับข้อมูลขนาดใหญ่ (สูงสุด 15,000 ตัวอักษร)
+- **Migration Pipeline Hardening**: อัปเดต `processMigrateDocument` ให้บังคับส่ง `format: 'json'` และ `options: { num_ctx: 16384, num_predict: 4096 }` ให้ตรงกับพฤติกรรมของ Sandbox
+- **Regression Tests**: ปรับปรุง Unit Test ใน `ai-batch.processor.spec.ts` เพื่อให้สอดคล้องกับพารามิเตอร์การเรียก Ollama แบบใหม่
+
+---
+
+## 1.9.9 (2026-06-06)
+
+### feat(ai): LLM JSON Parse Failure & VRAM Fix (ADR-035-135)
+
+#### Summary
+
+แก้ไขข้อผิดพลาด JSON Parse และหน่วยความจำ VRAM โดยเพิ่มระบบ retry logic และปรับปรุง VRAM switching
+
+#### Changes
+
+- **JSON Parse Retry**: เพิ่มระบบ retry logic (2 attempts) สำหรับกรณี JSON parse fail พร้อมแสดงรายละเอียด log
+- **VRAM limit**: ปรับแต่งค่า `keep_alive=0` สำหรับ OCR model และแก้ปัญหาความจำรั่วไหลใน Node.js/ESLint heap
+
+---
+
 ## 1.9.8 (2026-06-02)

 ### feat(ai): AI Model Swapping, GPU Unloading & OCR Security (ADR-033)
@@ -1,6 +1,6 @@
 // File: src/modules/ai/processors/ai-batch.processor.ts
 // Change Log
-// - 2026-06-08: แก้ไขปัญหา LLM JSON response truncated โดยการเพิ่ม num_ctx เป็น 16384 ใน sandbox-extract, sandbox-ai-extract และ migrate-document
+// - 2026-06-08: แก้ไขปัญหา LLM JSON response truncated โดยการเพิ่ม num_ctx เป็น 16384 ใน sandbox-extract, sandbox-ai-extract และ migrate-document (แก้ไขโดย AGY Gemini 3.5 Flash (Medium))
 // - 2026-05-15: เพิ่ม processor สำหรับ ai-batch queue ตาม ADR-023A.
 // - 2026-05-15: เพิ่ม EmbeddingService สำหรับ embed-document logic (T022).
 // - 2026-05-21: เพิ่มการรองรับ sandbox-rag และ sandbox-extract สำหรับ Superadmin sandbox.
@@ -0,0 +1,537 @@
+# AI Refactor
+เนื่องจากการอัพเกรด จาก RTX2060 SUPER 8GB เป็น ASUS DUAL **RTX5060 Ti 16GB**
+
+## เป้าหมาย
+ปรับปรุงประสิทธิภาพการประมวลผล AI โดยใช้ทรัพยากรใหม่ให้เหมาะสม, รวมถึงปรับปรุงขั้นตอนการทำงานให้เหมาะสมกับทรัพยากรใหม่
+
+```text
+Typhoon OCR 1.5
+Typhoon2.5-Qwen3-4B
+BGE-M3
+การตั้งค่าระบบคิว (BullMQ) ร่วมกับ AI
+```
+## Model
+
+|Model Name|Size|Base FROM|PARAMETER|File|
+|-|-|-|-|-|
+|np-dms-ocr|2.9GB|scb10x/typhoon-ocr1.5-3b:latest|num_ctx 8192|np-dms-ocr-model.md|
+|np-dms-typhoon2.5|3.6GB|scb10x/typhoon2.5-qwen3-4b:latest|num_ctx 8192|np-dms-typhoon2.5.model.md|
+|np-dms-llama3.1-typhoon2-8b|5.5GB|scb10x/llama3.1-typhoon2-8b-instruct|num_ctx 8192|np-dms-llama3.1-typhoon2-8b.model.md|
+|np-dms-gemma4-4eb|3.2GB|gemma4:e4b|num_ctx 8192|np-dms-gemma4-4eb.model.md|
+|np-dms-openthaigpt-7b|8GB|promptnow/openthaigpt1.5-7b-instruct-q4_k_m|num_ctx 8192|np-dms-openthaigpt-7b.model.md|
+|np-dms-openthaigpt-14b|9.7GB|promptnow/openthaigpt1.5-14b-instruct-q4_k_m|num_ctx 8192|np-dms-openthaigpt-14b.model.md|
+
+
+
+ollama create np-dms-typhoon2.5 -f np-dms-typhoon2.5.model.md
+
+ollama create np-dms-llama3.1-typhoon2-8b -f np-dms-llama3.1-typhoon2-8b.model.md
+
+ollama create np-dms-gemma4-4eb -f np-dms-gemma4-4eb.model.md
+
+ollama create np-dms-openthaigpt-7b -f np-dms-openthaigpt-7b.model.md
+
+ollama create np-dms-openthaigpt-14b -f np-dms-openthaigpt-14b.model.md
+
+---
+
+## Architecture Decisions (RTX5060 Ti 16GB Optimized)
+
+> สรุปการตัดสินใจจาก grilling session — อัปเกรดจาก RTX2060 SUPER 8GB
+
+### VRAM Budget
+
+| คอมโพเนนต์ | VRAM | หมายเหตุ |
+|-----------|------|----------|
+| `typhoon2.5-np-dms` | 3.6GB | โหลดค้างตลอด (resident) |
+| `typhoon-np-dms-ocr` | 2.9GB | transient (load on-demand) |
+| BGE-M3 | 2.3GB | ย้ายเข้า GPU (Sidecar device='cuda') |
+| BGE-Reranker-Large | 1.5GB | ย้ายเข้า GPU (Sidecar device='cuda') |
+| **รวมสูงสุด** | **~10.3GB** | เหลือ headroom ~5.7GB |
+
+### BullMQ Concurrency
+
+| Queue | Concurrency | เหตุผล |
+|-------|-------------|--------|
+| `ai-realtime` | **2** | VRAM เหลือเยอะ, response เร็วขึ้น |
+| `ai-batch` | **1** | background job, ป้องกัน VRAM overflow |
+
+### Model Loading Strategy
+
+| โมเดล | กลยุทธ์ | keep_alive |
+|-------|---------|------------|
+| `typhoon2.5-np-dms` | โหลดค้างตลอด (ไม่ unload) | — |
+| `typhoon-np-dms-ocr` | โหลดตาม demand, unload อัตโนมัติหลัง 5 นาที | 300 |
+
+### Sidecar Changes (port 8765)
+
+```diff
+# ปัจจุบัน (CPU RAM)
+POST /embed   → BGE-M3 (CPU)
+POST /rerank  → BGE-Reranker (CPU)
+
+# หลังอัปเกรด (GPU)
+POST /embed   → BGE-M3 (GPU via device='cuda')
+POST /rerank  → BGE-Reranker (GPU via device='cuda')
+POST /ocr-upload  → Typhoon OCR (Ollama) ← ไม่เปลี่ยน
+POST /normalize   → PyThaiNLP (CPU)      ← ไม่เปลี่ยน
+```
+
+### Implementation Tasks
+
+- [ ] แก้ไข Sidecar Dockerfile — เพิ่ม CUDA runtime
+- [ ] แก้ไข Sidecar app.py — เปลี่ยน `device='cuda'` สำหรับ BGE models
+- [ ] แก้ไข docker-compose.yml — เพิ่ม NVIDIA Container Toolkit
+- [ ] อัปเดต BullMQ concurrency config (ai-realtime=2)
+- [ ] อัปเดต OCR keep_alive จาก 0 เป็น 300
+- [ ] ตรวจสอบ OllamaService รองรับ resident model
+- [ ] ทดสอบ VRAM usage จริงกับเอกสารขนาดใหญ่
+
+### Rollout Strategy
+
+**Big Bang** — ระบบยังไม่เปิดใช้งาน production ทำการเปลี่ยนแปลงทั้งหมดในครั้งเดียว
+
+---
+
+# Phase 1 : Foundation
+
+## 1. Infrastructure
+
+### AI Services
+
+```text
+Ollama
+├── Typhoon OCR 1.5
+├── Typhoon2.5-Qwen3-4B
+└── BGE-M3
+```
+
+### Database
+
+```text
+Qdrant
+```
+
+### Storage AI
+
+```text
+File Serv
+├── OCR Output
+└── Processed Data
+```
+
+---
+
+# Phase 2 : Ingestion Pipeline
+
+## Step 1 Upload
+
+```text
+PDF Upload
+↓
+Store Original File
+↓
+Create Job
+```
+
+---
+
+## Step 2 OCR
+
+### Input
+
+```text
+PDF
+```
+
+### Process
+
+```text
+Typhoon OCR
+```
+
+### Output
+
+```json
+{
+  "page": 1,
+  "content": "..."
+}
+```
+
+Store
+
+```text
+raw_ocr
+```
+
+Table
+
+```sql
+document_pages
+```
+
+```sql
+document_id
+page_no
+raw_text
+```
+
+---
+
+## Step 3 Structure
+
+### Input
+
+```text
+Raw OCR Text
+```
+
+### Process
+
+```text
+Typhoon2.5
+```
+
+Prompt
+
+```text
+จัดโครงสร้างเอกสาร
+แยก Heading
+Section
+Metadata
+ห้ามสรุป
+```
+
+Output
+
+```json
+{
+  "document_type": "ITP",
+  "project": "...",
+  "heading": "...",
+  "content": "..."
+}
+```
+
+Store
+
+```text
+structured_document
+```
+
+---
+
+## Step 4 Chunking
+
+### ไม่ใช้ LLM
+
+ใช้
+
+```text
+Markdown Header Splitter
+
+Recursive Splitter
+```
+
+Config
+
+```yaml
+chunk_size: 800
+chunk_overlap: 120
+```
+
+Output
+
+```json
+{
+  "chunk_id": "...",
+  "heading": "...",
+  "content": "...",
+  "page": 12
+}
+```
+
+---
+
+## Step 5 Embedding
+
+### Input
+
+```text
+Chunk
+```
+
+### Process
+
+```text
+BGE-M3
+```
+
+### Output
+
+```text
+Vector
+```
+
+---
+
+## Step 6 Index
+
+Store in
+
+```text
+Qdrant
+```
+
+Payload
+
+```json
+{
+  "document_id": "...",
+  "page": 12,
+  "document_type": "ITP",
+  "heading": "Inspection",
+  "content": "..."
+}
+```
+
+---
+
+# Phase 3 : Retrieval
+
+## Step 1 User Query
+
+```text
+Slump Test สำหรับงานพื้นชั้น 2 คืออะไร
+```
+
+---
+
+## Step 2 Query Embedding
+
+```text
+BGE-M3
+```
+
+---
+
+## Step 3 Search
+
+```text
+Qdrant
+```
+
+Top K
+
+```text
+10-20
+```
+
+---
+
+## Step 4 Re-rank (แนะนำ)
+
+ใช้
+
+```text
+Typhoon2.5
+```
+
+หรือภายหลังเพิ่ม
+
+```text
+bge-reranker-v2
+```
+
+Flow
+
+```text
+Top20
+↓
+Top5
+```
+
+---
+
+## Step 5 Answer
+
+ใช้
+
+```text
+Typhoon2.5
+```
+
+Prompt
+
+```text
+ตอบจาก Context เท่านั้น
+อ้างอิงเอกสาร
+อ้างอิงหน้า
+ห้ามเดา
+```
+
+Output
+
+```text
+คำตอบ
+
+อ้างอิง:
+ITP-001 หน้า 12
+MS-005 หน้า 8
+```
+
+---
+
+# Phase 4 : Metadata Extraction
+
+เพิ่มภายหลัง
+
+Typhoon2.5 Extract
+
+```text
+Project
+Contractor
+Subcontractor
+Discipline
+Document Type
+Revision
+Date
+```
+
+เก็บใน PostgreSQL
+
+ช่วยทำ Filter Search เช่น
+
+```text
+Project = ABC
+Type = MIR
+Revision = C
+```
+
+ก่อนเข้า Qdrant
+
+---
+
+# Ollama Models
+
+## Typhoon OCR
+
+```dockerfile
+FROM scb10x/typhoon-ocr1.5-3b:latest
+```
+
+ไม่ต้อง custom
+
+---
+
+## Typhoon2.5
+
+```dockerfile
+FROM scb10x/typhoon2.5-qwen3-4b:latest
+
+PARAMETER temperature 0.1
+PARAMETER top_p 0.9
+PARAMETER repeat_penalty 1.05
+PARAMETER num_ctx 8192
+```
+
+**ไม่มี SYSTEM**
+
+---
+
+## Runtime Config
+
+### Structure
+
+```json
+{
+  "num_ctx": 8192,
+  "temperature": 0
+}
+```
+
+### Answer
+
+```json
+{
+  "num_ctx": 16384,
+  "temperature": 0.1
+}
+```
+
+---
+
+# MVP Roadmap
+
+## Sprint 1
+
+✅ Upload PDF
+✅ OCR
+✅ Store OCR
+✅ Chunking
+✅ Embedding
+✅ Qdrant Search
+
+---
+
+## Sprint 2
+
+✅ Typhoon2.5 Structuring
+✅ Metadata Extraction
+✅ Better Chunking
+
+---
+
+## Sprint 3
+
+✅ RAG QA
+✅ Citation
+✅ Source Reference
+
+---
+
+## Sprint 4
+
+✅ Hybrid Search (Vector + Metadata)
+✅ Re-ranking
+✅ Multi-document QA
+
+---
+
+### Architecture สุดท้าย
+
+```text
+PDF
+ ↓
+Typhoon OCR
+ ↓
+Raw OCR
+ ↓
+Typhoon2.5
+(Structure + Metadata)
+ ↓
+Markdown/Header Splitter
+ ↓
+Recursive Splitter
+ ↓
+BGE-M3
+ ↓
+Qdrant
+
+--------------------------------
+
+Question
+ ↓
+BGE-M3
+ ↓
+Qdrant
+ ↓
+Top-K Chunks
+ ↓
+Typhoon2.5
+ ↓
+Answer + Citation
+```
+
+สำหรับ MVP ผมจะ **ตัด Metadata Extraction ขั้นสูงและ Re-ranker ออกก่อน** แล้วทำให้ OCR → Search → Answer ใช้งานได้จริงภายใน 2–3 สัปดาห์แรก จากนั้นค่อยเพิ่มความแม่นยำทีละส่วน.
@@ -682,6 +682,7 @@
    "editor.mouseWheelZoom": true,
    "terminal.integrated.mouseWheelZoom": true,
    "terminal.integrated.tabs.title": "${process}-${cwd}",
+    "workbench.editor.sharedViewState": true,
  },
  // ========================================
  // LAUNCH CONFIGURATIONS
@@ -1,7 +1,7 @@
 # Project Memory Override

 > **Project:** NAP-DMS (LCBP3) — Laem Chabang Port Phase 3 Document Management System
-> **Version:** 1.9.9 (Last Synced: 2026-06-03)
+> **Version:** 1.9.10 (Last Synced: 2026-06-08)
 > **Stack:** NestJS 11 + Next.js 16 + TypeScript + MariaDB 11.8 + Redis + BullMQ + Elasticsearch + Ollama (on-prem AI)

 > [!IMPORTANT]
@@ -0,0 +1,22 @@
+\#Model
+
+
+
+
+
+| Model Name                     |  Size   |  Base                                                              |  PARAMETER | File |
+
+\----------
+
+| np-dms-ocr                     | 2.9GB | FROM scb10x/typhoon-ocr1.5-3b:latest                    | num\_ctx 8192 | np-dms-ocr-model.md |
+
+| np-dms-typhoon2.5            | 3.6GB | FROM scb10x/typhoon2.5-qwen3-4b:latest               | num\_ctx 8192 | np-dms-typhoon2.5.model.md |
+
+| np-dms-gemma4-4eb         | GB | FROM gemma4:e4b                                                | num\_ctx 8192 | np-dms-gemma4-4eb.model.md |
+
+| np-dms-openthaigpt1.5-7b  | GB | FROM promptnow/openthaigpt1.5-7b-instruct-q4\_k\_m   |  num\_ctx 8192 |
+
+| np-dms-openthaigpt1.5-14b | GB | FROM promptnow/openthaigpt1.5-14b-instruct-q4\_k\_m | num\_ctx 8192 |
+
+
+
@@ -0,0 +1,11 @@
+FROM scb10x/llama3.1-typhoon2-8b-instruct:latest
+
+PARAMETER num_ctx 8192
+PARAMETER num_predict 4096
+PARAMETER temperature 0.4
+PARAMETER top_k 40
+PARAMETER top_p 0.9
+PARAMETER repeat_penalty 1.15
+
+
+
@@ -0,0 +1,7 @@
+FROM scb10x/typhoon-ocr1.5-3b:latest
+
+PARAMETER num_ctx 8192
+PARAMETER num_predict 4096
+PARAMETER temperature 0.1
+PARAMETER top_p 0.1
+PARAMETER repeat_penalty 1.1
@@ -0,0 +1,9 @@
+FROM promptnow/openthaigpt1.5-14b-instruct-q4_k_m:latest
+
+PARAMETER num_ctx 8192
+PARAMETER num_predict 4096
+PARAMETER temperature 0.4
+PARAMETER top_k 40
+PARAMETER top_p 0.9
+PARAMETER repeat_penalty 1.15
+
@@ -0,0 +1,9 @@
+FROM promptnow/openthaigpt1.5-7b-instruct-q4_k_m:latest
+
+PARAMETER num_ctx 8192
+PARAMETER num_predict 4096
+PARAMETER temperature 0.4
+PARAMETER top_k 40
+PARAMETER top_p 0.9
+PARAMETER repeat_penalty 1.15
+
@@ -0,0 +1,12 @@
+FROM scb10x/typhoon2.5-qwen3-4b:latest
+
+
+
+PARAMETER num\_ctx 8192
+PARAMETER num\_predict 4096
+PARAMETER temperature 0.4
+
+PARAMETER top\_k 40
+PARAMETER top\_p 0.9
+PARAMETER repeat\_penalty 1.15
+
@@ -15,3 +15,5 @@
 | 2026-06-03 | v1.9.8  | Thai-Optimized AI Model Stack (ADR-034) — typhoon2.5-np-dms:latest + typhoon-np-dms-ocr:latest      | ✅ Complete                   |
 | 2026-06-05 | v1.9.8  | RAG Pipeline Enhancements (Spec 234 / ADR-035) — BGE-M3 + BGE-Reranker + Hybrid Qdrant (Session 14/15) | ✅ Complete                   |
 | 2026-06-06 | v1.9.9  | LLM JSON Parse Failure & VRAM Fix (ADR-035-135) — retry logic + keep_alive=0 + ESLint heap fix     | ✅ Complete                   |
+| 2026-06-08 | v1.9.10 | LLM JSON Response Truncation Fix — ขยาย num_ctx: 16384 (Session 16 โดย AGY Gemini 3.5 Flash (Medium)) | ✅ Complete                   |
+
@@ -0,0 +1,28 @@
+# Session 16 — 2026-06-08 (Fix LLM JSON Response Truncation in OCR Sandbox & Migration)
+
+## Summary
+
+แก้ไขปัญหา LLM JSON Response Truncation (ข้อความตอบกลับถูกตัดคำ) ใน OCR Sandbox Step 2 และ Migration Pipeline โดยการเพิ่มขนาด `num_ctx` ในการเรียกใช้งาน Ollama เป็น `16384` (ดำเนินการแก้ไขโดย AGY Gemini 3.5 Flash (Medium))
+
+
+## ปัญหาที่พบ (Root Cause)
+
+1. **Context Window Overflow:** เมื่อส่ง Prompt ขนาดใหญ่ที่มี OCR text (สูงสุด 15,000 ตัวอักษร) ผสมกับ Master Data Context (ประมาณ 6,000 ตัวอักษร) รวมกันทั้งหมด ~15.7K ตัวอักษร (คิดเป็นประมาณ 6,000-8,000 โทเค็นในโมเดลภาษาไทย) ทำให้ขนาดของ Context ใกล้เต็มหน้าต่าง `num_ctx 8192` ของโมเดล `typhoon2.5-np-dms:latest`
+2. **JSON Truncation:** เมื่อตัวแบบพยายามสร้างคำตอบ JSON (เช่น summary) ตัวโทเค็นรวม (Input + Output) จะชนเพดาน 8,192 โทเค็น ส่งผลให้ Ollama ตัดการสร้างข้อความกลางคัน และส่ง JSON ที่ไม่สมบูรณ์กลับมา ทำให้ JSON parse ล้มเหลวและแสดงข้อผิดพลาด `Failed to parse LLM response as JSON`
+
+## การแก้ไข (Fix)
+
+| ไฟล์ | การเปลี่ยนแปลง |
+| --- | --- |
+| `backend/src/modules/ai/processors/ai-batch.processor.ts` | 1. เพิ่ม `num_ctx: 16384` ใน `ollamaOptions` ของฟังก์ชัน `processSandboxExtract` และ `processSandboxAiExtract`<br>2. ปรับปรุง `processMigrateDocument` ให้บังคับส่ง `format: 'json'` และ `options: { num_ctx: 16384, num_predict: 4096 }` ให้ตรงกับระบบ Sandbox |
+| `backend/src/modules/ai/processors/ai-batch.processor.spec.ts` | ปรับปรุง Unit Test ของ `migrate-document` ให้ตรวจสอบว่าส่ง `format: 'json'` และ `num_ctx: 16384` ไปยัง OllamaService อย่างถูกต้อง |
+
+## กฎที่ Lock แล้ว
+
+- **Context Window Headroom:** ในการใช้งาน Prompt ที่มี Master Data ขนาดใหญ่ ร่วมกับ OCR text ให้ขยายขนาด `num_ctx` เป็น `16384` เสมอ เพื่อให้มี headroom เพียงพอสำหรับ generated JSON โดยไม่กระทบ VRAM เนื่องจากตัวโมเดลหลักมีขนาดเล็ก (~2.5GB) บนการ์ดจอ RTX 2060 Super 8GB
+
+## Verification
+
+- [x] Unit Test `pnpm --filter backend test src/modules/ai/processors/ai-batch.processor.spec.ts` ผ่าน 100% (701 tests pass)
+- [x] Build Backend `pnpm --filter backend run build` สำเร็จลุล่วง
+- [x] ผลการทดสอบจาก Ollama log บนเอกสาร 3 หน้า (13,173 ตัวอักษร / 6,401 โทเค็น) ประมวลผลและสร้าง JSON ตอบกลับสำเร็จเรียบร้อยโดยไม่มีการโดนบีบอัด/ตัดข้อความ (truncated = 0)