np-dms/lcbp3

Fork 0

Files

T

admin cd7d20ccd4

CI / CD Pipeline / build (push) Has been skipped

Details

CI / CD Pipeline / deploy (push) Has been skipped

Details

690609:2223 Prepare to MOD AI flow [skip CI]

2026-06-09 22:23:59 +07:00

8.4 KiB

Raw Permalink Blame History

AI Refactor

เนื่องจากการอัพเกรด จาก RTX2060 SUPER 8GB เป็น ASUS DUAL RTX5060 Ti 16GB

เป้าหมาย

ปรับปรุงประสิทธิภาพการประมวลผล AI โดยใช้ทรัพยากรใหม่ให้เหมาะสม, รวมถึงปรับปรุงขั้นตอนการทำงานให้เหมาะสมกับทรัพยากรใหม่

Typhoon OCR 1.5
Typhoon2.5-Qwen3-4B
BGE-M3
การตั้งค่าระบบคิว (BullMQ) ร่วมกับ AI

Model

Model Name	Size	Base FROM	PARAMETER	File
np-dms-ocr	2.9GB	scb10x/typhoon-ocr1.5-3b:latest	num_ctx 8192	np-dms-ocr-model.md
np-dms-typhoon2.5	3.6GB	scb10x/typhoon2.5-qwen3-4b:latest	num_ctx 8192	np-dms-typhoon2.5.model.md
np-dms-llama3.1-typhoon2-8b	5.5GB	scb10x/llama3.1-typhoon2-8b-instruct	num_ctx 8192	np-dms-llama3.1-typhoon2-8b.model.md
np-dms-gemma4-4eb	3.2GB	gemma4:e4b	num_ctx 8192	np-dms-gemma4-4eb.model.md
np-dms-openthaigpt-7b	8GB	promptnow/openthaigpt1.5-7b-instruct-q4_k_m	num_ctx 8192	np-dms-openthaigpt-7b.model.md
np-dms-openthaigpt-14b	9.7GB	promptnow/openthaigpt1.5-14b-instruct-q4_k_m	num_ctx 8192	np-dms-openthaigpt-14b.model.md

ollama create np-dms-typhoon2.5 -f np-dms-typhoon2.5.model.md

ollama create np-dms-llama3.1-typhoon2-8b -f np-dms-llama3.1-typhoon2-8b.model.md

ollama create np-dms-gemma4-4eb -f np-dms-gemma4-4eb.model.md

ollama create np-dms-openthaigpt-7b -f np-dms-openthaigpt-7b.model.md

ollama create np-dms-openthaigpt-14b -f np-dms-openthaigpt-14b.model.md

Architecture Decisions (RTX5060 Ti 16GB Optimized)

สรุปการตัดสินใจจาก grilling session — อัปเกรดจาก RTX2060 SUPER 8GB

VRAM Budget

คอมโพเนนต์	VRAM	หมายเหตุ
`typhoon2.5-np-dms`	3.6GB	โหลดค้างตลอด (resident)
`typhoon-np-dms-ocr`	2.9GB	transient (load on-demand)
BGE-M3	2.3GB	ย้ายเข้า GPU (Sidecar device='cuda')
BGE-Reranker-Large	1.5GB	ย้ายเข้า GPU (Sidecar device='cuda')
รวมสูงสุด	~10.3GB	เหลือ headroom ~5.7GB

BullMQ Concurrency

Queue	Concurrency	เหตุผล
`ai-realtime`	2	VRAM เหลือเยอะ, response เร็วขึ้น
`ai-batch`	1	background job, ป้องกัน VRAM overflow

Model Loading Strategy

โมเดล	กลยุทธ์	keep_alive
`typhoon2.5-np-dms`	โหลดค้างตลอด (ไม่ unload)	—
`typhoon-np-dms-ocr`	โหลดตาม demand, unload อัตโนมัติหลัง 5 นาที	300

Sidecar Changes (port 8765)

# ปัจจุบัน (CPU RAM)
POST /embed   → BGE-M3 (CPU)
POST /rerank  → BGE-Reranker (CPU)

# หลังอัปเกรด (GPU)
POST /embed   → BGE-M3 (GPU via device='cuda')
POST /rerank  → BGE-Reranker (GPU via device='cuda')
POST /ocr-upload  → Typhoon OCR (Ollama) ← ไม่เปลี่ยน
POST /normalize   → PyThaiNLP (CPU)      ← ไม่เปลี่ยน

Implementation Tasks

แก้ไข Sidecar Dockerfile — เพิ่ม CUDA runtime
แก้ไข Sidecar app.py — เปลี่ยน device='cuda' สำหรับ BGE models
แก้ไข docker-compose.yml — เพิ่ม NVIDIA Container Toolkit
อัปเดต BullMQ concurrency config (ai-realtime=2)
อัปเดต OCR keep_alive จาก 0 เป็น 300
ตรวจสอบ OllamaService รองรับ resident model
ทดสอบ VRAM usage จริงกับเอกสารขนาดใหญ่

Rollout Strategy

Big Bang — ระบบยังไม่เปิดใช้งาน production ทำการเปลี่ยนแปลงทั้งหมดในครั้งเดียว

Phase 1 : Foundation

1. Infrastructure

AI Services

Ollama
├── Typhoon OCR 1.5
├── Typhoon2.5-Qwen3-4B
└── BGE-M3

Database

Qdrant

Storage AI

File Serv
├── OCR Output
└── Processed Data

Phase 2 : Ingestion Pipeline

Step 1 Upload

PDF Upload
↓
Store Original File
↓
Create Job

Step 2 OCR

Input

PDF

Process

Typhoon OCR

Output

{
  "page": 1,
  "content": "..."
}

Store

raw_ocr

Table

document_pages

document_id
page_no
raw_text

Step 3 Structure

Input

Raw OCR Text

Process

Typhoon2.5

Prompt

จัดโครงสร้างเอกสาร
แยก Heading
Section
Metadata
ห้ามสรุป

Output

{
  "document_type": "ITP",
  "project": "...",
  "heading": "...",
  "content": "..."
}

Store

structured_document

Step 4 Chunking

ไม่ใช้ LLM

ใช้

Markdown Header Splitter
+
Recursive Splitter

Config

chunk_size: 800
chunk_overlap: 120

Output

{
  "chunk_id": "...",
  "heading": "...",
  "content": "...",
  "page": 12
}

Step 5 Embedding

Input

Chunk

Process

BGE-M3

Output

Vector

Step 6 Index

Store in

Qdrant

Payload

{
  "document_id": "...",
  "page": 12,
  "document_type": "ITP",
  "heading": "Inspection",
  "content": "..."
}

Phase 3 : Retrieval

Step 1 User Query

Slump Test สำหรับงานพื้นชั้น 2 คืออะไร

Step 2 Query Embedding

BGE-M3

Step 3 Search

Qdrant

Top K

10-20

Step 4 Re-rank (แนะนำ)

ใช้

Typhoon2.5

หรือภายหลังเพิ่ม

bge-reranker-v2

Flow

Top20
↓
Top5

Step 5 Answer

ใช้

Typhoon2.5

Prompt

ตอบจาก Context เท่านั้น
อ้างอิงเอกสาร
อ้างอิงหน้า
ห้ามเดา

Output

คำตอบ

อ้างอิง:
ITP-001 หน้า 12
MS-005 หน้า 8

Phase 4 : Metadata Extraction

เพิ่มภายหลัง

Typhoon2.5 Extract

Project
Contractor
Subcontractor
Discipline
Document Type
Revision
Date

เก็บใน PostgreSQL

ช่วยทำ Filter Search เช่น

Project = ABC
Type = MIR
Revision = C

ก่อนเข้า Qdrant

Ollama Models

Typhoon OCR

FROM scb10x/typhoon-ocr1.5-3b:latest

ไม่ต้อง custom

Typhoon2.5

FROM scb10x/typhoon2.5-qwen3-4b:latest

PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.05
PARAMETER num_ctx 8192

ไม่มี SYSTEM

Runtime Config

Structure

{
  "num_ctx": 8192,
  "temperature": 0
}

Answer

{
  "num_ctx": 16384,
  "temperature": 0.1
}

MVP Roadmap

Sprint 1

✅ Upload PDF ✅ OCR ✅ Store OCR ✅ Chunking ✅ Embedding ✅ Qdrant Search

Sprint 2

✅ Typhoon2.5 Structuring ✅ Metadata Extraction ✅ Better Chunking

Sprint 3

✅ RAG QA ✅ Citation ✅ Source Reference

Sprint 4

✅ Hybrid Search (Vector + Metadata) ✅ Re-ranking ✅ Multi-document QA

Architecture สุดท้าย

PDF
 ↓
Typhoon OCR
 ↓
Raw OCR
 ↓
Typhoon2.5
(Structure + Metadata)
 ↓
Markdown/Header Splitter
 ↓
Recursive Splitter
 ↓
BGE-M3
 ↓
Qdrant

--------------------------------

Question
 ↓
BGE-M3
 ↓
Qdrant
 ↓
Top-K Chunks
 ↓
Typhoon2.5
 ↓
Answer + Citation

สำหรับ MVP ผมจะ ตัด Metadata Extraction ขั้นสูงและ Re-ranker ออกก่อน แล้วทำให้ OCR → Search → Answer ใช้งานได้จริงภายใน 2–3 สัปดาห์แรก จากนั้นค่อยเพิ่มความแม่นยำทีละส่วน.

8.4 KiB Raw Permalink Blame History

AI Refactor

เป้าหมาย

Model

Architecture Decisions (RTX5060 Ti 16GB Optimized)

VRAM Budget

BullMQ Concurrency

Model Loading Strategy

Sidecar Changes (port 8765)

Implementation Tasks

Rollout Strategy

Phase 1 : Foundation

1. Infrastructure

AI Services

Database

Storage AI

Phase 2 : Ingestion Pipeline

Step 1 Upload

Step 2 OCR

Input

Process

Output

Step 3 Structure

Input

Process

Step 4 Chunking

ไม่ใช้ LLM

Step 5 Embedding

Input

Process

Output

Step 6 Index

Phase 3 : Retrieval

Step 1 User Query

Step 2 Query Embedding

Step 3 Search

Step 4 Re-rank (แนะนำ)

Step 5 Answer

Phase 4 : Metadata Extraction

Ollama Models

Typhoon OCR

Typhoon2.5

Runtime Config

Structure

Answer

MVP Roadmap

Sprint 1

Sprint 2

Sprint 3

Sprint 4

Architecture สุดท้าย

8.4 KiB

Raw Permalink Blame History