np-dms/lcbp3

Fork 0

Files

T

admin c95e0f537e

CI / CD Pipeline / build (push) Successful in 4m34s

Details

CI / CD Pipeline / deploy (push) Successful in 7m33s

Details

690404:1139 Modify ADR

2026-04-04 11:39:56 +07:00

25 KiB

Raw Blame History

ADR-017B: AI Document Classification

Status: Accepted Date: 2026-03-27 Version: 1.8.2 (Aligned with ADR-020) Review Cycle: Core ADR (Review every 6 months or Major Version upgrade) Decision Makers: Development Team, AI Integration Lead Gap Resolution: Addresses manual document classification inefficiency and inconsistency problems (Product Vision v1.8.5, Section 3.3) and AI-assisted workflow requirements (UAT Criteria, Section 4.8) Version Dependency:

Effective From: v1.8.2
Applies To: v1.8.2+ (AI classification features)
Backward Compatible: v1.8.0+ (Manual classification still available)
Required For: v1.9.0+ (Enhanced document management)

Related Documents:

ADR-020: AI Intelligence Integration Architecture — Overall AI Architecture & RFA-First Strategy
ADR-017: Ollama Data Migration Architecture
ADR-018: AI Boundary Policy — AI Physical Isolation (No Direct DB/Storage Access)
n8n Migration Setup Guide
Legacy Data Migration Plan
Migration Business Scope
Glossary

Note: ADR-017B extends ADR-017's scope from batch migration to AI-powered document classification, enabling automatic categorization and metadata extraction. Complies with ADR-018 (AI Isolation Policy) and is part of ADR-020 (Unified AI Architecture).

Context and Problem Statement

ปัญหาที่ต้องการแก้ไข

โครงการ LCBP3 มีเอกสารจำนวนมาก (ทั้งเก่าและใหม่) ที่ต้องจัดหมวดหมู่และสกัด Metadata อัตโนมัติ การจัดหมวดหมู่ด้วยมือมีปัญหาหลัก 3 ประการ:

Manual Labor สูง: ต้องใช้เวลานานในการอ่านและจัดหมวดหมู่เอกสารด้วยมือ
Human Error: ความผิดพลาดจากการจัดหมวดหมู่ (ผิดประเภท, ผิด Tag) โดยเฉพาะเอกสารที่ซับซ้อน
Inconsistency: การจัดหมวดหมู่ไม่สม่ำเสมอกันระหว่างผู้จัดการคนละคน

ข้อจำกัดที่สำคัญ

Data Privacy: เอกสารก่อสร้างท่าเรือเป็นความลับ ห้ามส่งข้อมูลขึ้น Cloud AI Provider
Cost Constraint: การใช้ Cloud AI (~$0.01–0.03 ต่อ Record) อาจสูงถึง $600 สำหรับ 20,000 records
Hardware Limitation: ต้องรันบนเครื่องที่มีอยู่ (i7-9700K / 32GB RAM / RTX 2060 Super 8GB)

Decision Drivers

Security First (ADR-018): AI ต้องรันบน Admin Desktop (Desk-5439) เท่านั้น — ห้ามรันบน QNAP/Production Server
Privacy Guaranteed: ประมวลผลภายในเครือข่ายองค์กร (On-Premise) เท่านั้น
Cost Effectiveness: Zero Cost สำหรับ AI Inference (ไม่มีค่า Pay-per-use)
Data Integrity: ข้อมูลที่ AI สกัดต้องผ่าน Human Verification ก่อน Commit ลงระบบจริง
AI Isolation (ADR-018): AI ห้ามเข้าถึง Database/Storage โดยตรง — ต้องสื่อสารผ่าน DMS API เท่านั้น
Recoverability: รองรับ Checkpoint/Resume และ Rollback ได้สมบูรณ์

Considered Options

Option 1: Manual Data Entry (No AI)

Pros:

ไม่ต้องลงทุน Hardware เพิ่ม
ความแม่นยำ 100% (ถ้าพิมพ์ถูก)

Cons:

❌ ใช้เวลานานมาก (อาจเป็นสัปดาห์หรือเดือน)
❌ Human Error สูง (Typo, Inconsistency)
❌ ไม่สามารถทำให้ Scanned PDF ค้นหาได้

Option 2: Cloud AI Service (OpenAI, Google Vision, Azure AI)

Pros:

AI ฉลาดสูง แม่นยำมาก
ไม่ต้องดูแล Infrastructure

Cons:

❌ ผิดนโยบาย Data Privacy — เอกสารก่อสร้างท่าเรือเป็นความลับ
❌ ค่าใช้จ่ายสูง (~$600 สำหรับ 20,000 records)
❌ Dependency กับ External Service

Option 3: Local AI (Ollama + Gemma 4 9B) + n8n + Human Verification ⭐ (Selected)

Pros:

✅ Privacy Guaranteed — รันภายในเครือข่ายองค์กร
✅ Zero Cost — ไม่มีค่า Pay-per-use
✅ AI Isolation (ADR-018) — AI รันบน Desktop แยกต่างหาก ไม่เข้าถึง DB โดยตรง
✅ Human-in-the-Loop — Admin ตรวจสอบก่อน Commit
✅ Recoverability — รองรับ Checkpoint/Resume
✅ Clean Architecture — แยก Migration Logic ออกจาก Core Application

Cons:

❌ ต้องเปิด Desktop ทิ้งไว้ดูแล GPU Temperature
❌ Model ขนาดเล็กอาจแม่นยำน้อยกว่า Cloud AI → ต้องมี Human Review Queue
❌ ใช้เวลา Migration ~16.6 ชั่วโมง (~3–4 คืน)

Decision Outcome

Chosen Option: Option 3 — Local AI (Ollama + Gemma 4 9B) + n8n + Human Verification

Rationale:

การใช้ AI จัดหมวดหมู่เอกสารอัตโนมัติด้วย Ollama ช่วยลดภาระงานจากการจัดหมวดหมู่ด้วยมือ พร้อมรักษาความปลอดภัยตามนโยบาย ADR-018 ที่กำหนดให้ AI ต้องรันบน Admin Desktop แยกต่างหาก และไม่สามารถเข้าถึง Database/Storage โดยตรง ทำให้ได้ประสิทธิภาพที่ดีขึ้นในการจัดการเอกสารขนาดใหญ่

Impact Analysis

Affected Components

Component	Impact Level	Description
Frontend Components	High	AI classification UI, suggestion displays, user confirmation flows
Backend Services	High	AI integration endpoints, validation services, classification logic
AI Infrastructure	Medium	Ollama model usage, prompt engineering, confidence scoring
Database Schema	Medium	AI suggestion storage, classification history, confidence tracking
User Experience	Medium	Workflow changes, AI-assisted classification interfaces
API Design	Medium	AI classification endpoints, response formats, error handling
Testing Framework	Low	AI accuracy tests, classification validation, user acceptance tests
Documentation	Low	User guides, AI classification procedures, troubleshooting

Required Changes

Change Category	Specific Changes	Priority
Frontend	Create AI classification suggestion components Build confidence score indicators Implement user confirmation dialogs Add classification history displays Create AI-assisted tagging interface	Critical
Backend	Implement AI classification service endpoints Create validation layer for AI suggestions Add confidence threshold processing Implement classification audit logging Create AI model communication interfaces	Critical
Database	Create AI classification suggestions table Add confidence score tracking fields Implement classification history logging Create user feedback storage for AI improvement Update data dictionary with AI fields	High
AI Integration	Setup Ollama model communication protocols Implement prompt engineering for document classification Create confidence scoring algorithms Setup fallback model switching logic Implement AI response validation	High
API	Create /api/ai/classify endpoint Implement AI suggestion confirmation endpoints Add AI service health check endpoints Create classification feedback endpoints Implement rate limiting for AI services	High
Testing	Create AI accuracy validation tests Implement classification consistency tests Add user acceptance testing for AI features Create performance tests for AI endpoints Implement security tests for AI boundaries	Medium
Documentation	Create user guide for AI classification features Document AI model limitations and best practices Create troubleshooting guide for AI issues Update API documentation with AI endpoints	Medium

Cross-Component Dependencies

Dependency	Source	Target	Impact
Frontend → AI API	Classification UI components	/api/ai/classify endpoint	User experience
Backend → AI Services	AI classification service	Ollama model API	Classification accuracy
Database → Classification	Classification results	AI suggestions table	Data persistence
Validation → AI Output	Validation layer	AI response processing	Quality control
Audit → AI Interactions	Logging service	Classification audit trail	Compliance
Testing → AI Accuracy	Test suites	Classification validation	Quality assurance

Implementation Architecture

System Components

Component	รายละเอียด	ที่ตั้ง
Migration Orchestrator	n8n (Docker บน QNAP NAS)	QNAP NAS
AI Processing Engine	Ollama + Gemma 4 9B (9.6 GB)	Admin Desktop (Desk-5439)
OCR Engine	Tesseract หรือ Google Vision (On-prem)	Same as AI
Verification UI	Next.js Frontend — Review Mode	Web Browser
Database	MariaDB — Staging + Production	QNAP NAS
File Storage	Two-Phase Storage (Temp → Permanent)	NAS Mounts

Workflow (Main Success Scenario)

[1. Ingestion]        → Batch Upload (PDF/Scan)
      ↓
[2. Pre-processing]   → File Validation (NestJS)
      ↓
[3. OCR Layer]        → Raw Text Extraction (Tesseract)
      ↓
[4. AI Analysis]      → Gemma 4 9B via Ollama (Desk-5439)
      ↓
[5. Staging]          → migration_review_queue (MariaDB)
      ↓
[6. Verification]     → Admin Review UI (Next.js)
      ↓
[7. Commit]           → POST /api/migration/commit_batch
      ↓
[8. Finalization]     → Permanent Storage (QNAP)

AI Processing Flow (ADR-018 Compliant)

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Scanned PDF   │────▶│  OCR Engine     │────▶│   Raw Text      │
│   (Staging)     │     │  (Tesseract)    │     │   (Text)        │
└─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                         │
                              ┌──────────────────────────┘
                              ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  DMS Backend    │◀────│  Ollama API     │◀────│  AI Prompt      │
│  (Validation)   │     │  (Desk-5439)    │     │  (Gemma 4 9B)   │
└────────┬────────┘     └─────────────────┘     └─────────────────┘
         │
         ▼
┌─────────────────┐
│  JSON Metadata  │────▶ migration_review_queue
│  (Validated)    │
└─────────────────┘

⚠️ ADR-018 Enforcement: AI (Ollama) อยู่บน Desktop Desk-5439 แยกต่างหาก — ไม่มี Direct DB Access หรือ File System Access โดยตรง ทุกการสื่อสารผ่าน DMS Backend API เท่านั้น

AI Prompt Specification (Prompt Engineering)

Document Analysis Prompt

บทบาท: คุณคือผู้ช่วยจัดการเอกสารวิศวกรรมโยธาประจำโครงการท่าเรือแหลมฉบัง เฟส 3
ภารกิจ: อ่านข้อความดิบจากการสแกน และสรุปข้อมูลออกมาในรูปแบบ JSON เท่านั้น

ข้อความดิบ: [RAW_TEXT_FROM_OCR]

รูปแบบ JSON ที่ต้องการ:
{
  "document_type": "รายงาน/สัญญา/จดหมายโต้ตอบ",
  "contract_number": "เลขที่สัญญา (ถ้ามี)",
  "contractor_name": "ชื่อบริษัทผู้รับเหมา",
  "work_phase": "Phase ของงาน (เช่น งานถมทะเล, งานไฟฟ้า)",
  "summary": "สรุปเนื้อหาสำคัญ 1-2 ประโยค",
  "key_dates": ["วันที่ในเอกสาร", "วันกำหนดส่งงาน"]
}

Extracted Metadata Schema

Field	Type	คำอธิบาย
`document_type`	enum	Correspondence / RFA / Drawing / Minutes / Contract
`contract_number`	string	เลขที่สัญญา (เช่น LCBP3-C1)
`contractor_name`	string	ชื่อบริษัทผู้รับเหมา
`work_phase`	string	Phase งาน (จาก Master Data)
`work_zone`	string	Zone F, Basin 3, etc.
`priority`	enum	ด่วนที่สุด / ด่วน / ปกติ
`summary`	string	สรุปเนื้อหา 4-5 ประโยค
`confidence`	float	0.0–1.0 (ความมั่นใจของ AI)
`suggested_tags`	array	Tags ที่แนะนำ (is_new: true/false)

⚠️ Patch Note: document_type ต้องตรงกับ System Enum จาก GET /api/meta/categories เท่านั้น — ห้าม hardcode Category List ใน Prompt (ดู ADR-017)

Security & Compliance (ADR-018)

AI Isolation Requirements

Rule	Implementation
Physical Isolation	Ollama รันบน Admin Desktop (Desk-5439) เท่านั้น
No Direct DB Access	AI ต้องสื่อสารผ่าน DMS API → Backend → Database
No Direct Storage Access	File operations ผ่าน StorageService เท่านั้น
Validation Layer	Backend ตรวจสอบ AI Output ก่อน Write ทุกครั้ง
Audit Logging	ทุก AI Request/Response บันทึกใน Audit Log

Two-Phase Storage (Storage Governance)

ข้อห้าม:

❌ mv /data/dms/staging_ai/TCC-COR-0001.pdf /final/path/...

ข้อบังคับ (ADR-017):

Phase 1: Temp Upload (โดย n8n)
✅ POST /api/storage/upload
   → ได้ผลลัพธ์เป็น attachment_id (เช่น 1024)
   → ไฟล์จะถูกระบุเป็น is_temporary = TRUE

Phase 2: Final Commit (โดย Admin ผ่าน Frontend)
✅ POST /api/migration/commit_batch
   body: { queue_ids: [1, 2, 3] }

Confidence Threshold Policy

ข้อมูลทุกชุดจาก AI จะถูกส่งเข้า migration_review_queue โดยจัดสถานะตาม Confidence:

Confidence Level	สถานะ	การดำเนินการ
`≥ 0.85` และ `is_valid = true`	PENDING	พร้อมให้ Admin Batch Import
`0.60–0.84`	PENDING	ไฮไลต์ให้ Admin ตรวจสอบก่อน
`< 0.60` หรือ `is_valid = false`	REJECTED	รอ Admin แก้ไข Manual
AI Parse Error	ERROR_LOG	Trigger Fallback Logic

Performance Estimation

Parameter	ค่า
Delay ระหว่าง Request	2 วินาที
Inference Time (avg)	~1 วินาที
เวลาต่อ Record	~3 วินาที
จำนวน Record	20,000
เวลารวม	~60,000 วินาที (~16.6 ชั่วโมง)
จำนวนคืนที่ต้องใช้	~3–4 คืน (รัน 22:00–06:00)

Recommendation: สำหรับเครื่อง i7-9700K / 32GB RAM / RTX 2060 Super 8GB แนะนำให้ใช้ Queue (BullMQ) ประมวลผลไฟล์แบบ Sequential (ทีละไฟล์) เพื่อให้ Gemma 4 9B (9.6 GB) ทำงานได้เสถียรที่สุดใน VRAM 8GB

Consequences

Positive Consequences

✅ Automated Classification: ลดเวลาการจัดหมวดหมู่เอกสารด้วยมืออย่างมีนัยสำคัญ
✅ Consistent Categorization: AI จัดหมวดหมู่อย่างสม่ำเสมอตามเกณฑ์เดียวกัน
✅ Security Compliant (ADR-018): AI Isolation ชัดเจน ปลอดภัย
✅ Human-in-the-Loop: ความถูกต้อง 100% ก่อนยืนยันการจัดหมวดหมู่
✅ Scalable: รองรับการจัดหมวดหมู่เอกสารจำนวนมาก
✅ Cost Effective: ไม่มีค่าใช้จ่ายต่อเอกสาร (On-premise AI)

Negative Consequences

❌ Operational Overhead: ต้องดูแล GPU Temperature และ Desktop Uptime
❌ Accuracy Trade-off: Model ขนาดเล็กอาจแม่นยำน้อยกว่า Cloud AI
❌ Hardware Dependency: ต้องพึ่งพา Desktop Desk-5439
❌ Processing Time: ต้องรอ AI processing สำหรับเอกสารแต่ละฉบับ

Mitigation Strategies

Human Review Queue — บังคับตรวจสอบทุก record ก่อน Commit
Confidence Threshold — แบ่งระดับตามความมั่นใจของ AI
Fallback Model — Auto-switch ไป mistral:7b-instruct เมื่อ Error ≥ Threshold
Monitoring — ตรวจสอบ GPU Temperature และ Progress ตลอดเวลา

ADR Review Cycle

Review Classification

Core ADR Status: This ADR is classified as a Core AI Feature due to its fundamental impact on document management workflows and AI integration patterns.

Review Schedule

Review Type	Frequency	Trigger	Scope
Regular Review	Every 6 months	Calendar-based	AI accuracy, user adoption, performance
Major Version Review	Every major version (v2.0.0, v3.0.0)	Version planning	Architecture relevance, new AI capabilities
Performance Review	Quarterly	Performance monitoring	AI processing times, accuracy metrics
User Feedback Review	Quarterly	User feedback analysis	User satisfaction, workflow improvements

Review Process

Phase 1: Preparation (1 week before review)

Metrics Collection
- AI classification accuracy rates and trends
- User adoption and satisfaction metrics
- Processing performance benchmarks
- Error rates and failure patterns
- User feedback and improvement suggestions
Stakeholder Notification
- Development Team
- AI Integration Lead
- Product Management
- User Experience Team
- Quality Assurance Team

Phase 2: Review Meeting (2-hour session)

Performance Assessment
- Review AI accuracy metrics against targets
- Analyze processing time performance
- Evaluate error rates and failure patterns
- Assess system resource utilization
User Experience Evaluation
- Review user adoption and satisfaction rates
- Analyze user feedback and improvement requests
- Evaluate workflow integration effectiveness
- Assess user interface and experience quality
Technology Assessment
- Review AI model performance and accuracy
- Evaluate new AI technologies and improvements
- Assess integration with other system components
- Review security and compliance status

Phase 3: Decision & Documentation (1 week after review)

Review Outcomes
- No Change: AI classification remains effective and accurate
- Update Required: Adjust AI parameters or user interface
- Enhancement: Add new classification capabilities or features
- Retire: AI classification no longer needed (unlikely)
Documentation Updates
- Update AI classification procedures and guidelines
- Revise user documentation and training materials
- Update performance metrics and targets
- Modify integration documentation

Review Criteria

Criterion	Question	Pass/Fail Threshold
Classification Accuracy	Is AI meeting target accuracy rates?	Pass: ≥85%, Fail: <85%
User Adoption	Are users actively using AI classification?	Pass: >70% adoption, Fail: ≤70%
Processing Performance	Are processing times within acceptable limits?	Pass: <10s per document, Fail: >10s
User Satisfaction	Are users satisfied with AI suggestions?	Pass: ≥4.0/5.0, Fail: <4.0/5.0
Error Rates	Are AI errors within acceptable thresholds?	Pass: <5% error rate, Fail: ≥5%
Integration Stability	Is AI integration stable and reliable?	Pass: >99% uptime, Fail: <99%

Review History Template

## Review Cycle [YYYY-MM-DD]

**Review Type:** [Regular/Major Version/Performance/User Feedback]
**Reviewers:** [Names and roles]
**Duration:** [Meeting date]

### Findings
- [Key findings from accuracy, performance, and user experience assessment]

### Issues Identified
- [Accuracy problems, performance bottlenecks, or user experience issues]

### Recommendations
- [AI model improvements, user interface enhancements, or workflow changes]

### Outcome
- [No Change/Update Required/Enhancement/Retire]

### Next Review Date
- [YYYY-MM-DD]

ADR-017: Ollama Data Migration Architecture — Architecture หลักสำหรับ Migration
ADR-018: AI Boundary Policy — Security Isolation สำหรับ AI
03-05-n8n-migration-setup-guide.md — คู่มือติดตั้ง n8n Workflow
03-04-legacy-data-migration.md — แผน Migration แบบละเอียด
03-06-migration-business-scope.md — Go/No-Go Gates และ Business Scope
00-02-glossary.md — คำศัพท์และตัวย่อในระบบ DMS

Document History

Version	Date	Author	Changes
1.0.0	2026-02-26	Nattanin	Initial Use Case Specification
1.8.1	2026-03-27	Tech Lead	Refactored to ADR format — Aligned with ADR-017, ADR-018, and Project Specs
1.8.3	2026-04-04	System Architect	Renamed — Changed from "Smart Legacy Document Digitization" to "AI Document Classification" for clarity and simplicity
1.8.4	2026-04-04	System Architect	Enhanced — Added Impact Analysis template, ADR Review Cycle process, Gap Linking to requirements, and Version Dependency tracking

Last Updated: 2026-04-04 Status: Accepted Next Review: 2026-06-01 (Quarterly review) Next 6-Month Review: 2026-10-04 (regular review cycle)

25 KiB Raw Blame History Unescape Escape

ADR-017B: AI Document Classification

Context and Problem Statement

ปัญหาที่ต้องการแก้ไข

ข้อจำกัดที่สำคัญ

Decision Drivers

Considered Options

Option 1: Manual Data Entry (No AI)

Option 2: Cloud AI Service (OpenAI, Google Vision, Azure AI)

Option 3: Local AI (Ollama + Gemma 4 9B) + n8n + Human Verification ⭐ (Selected)

Decision Outcome

Impact Analysis

Affected Components

Required Changes

Cross-Component Dependencies

Implementation Architecture

System Components

Workflow (Main Success Scenario)

AI Processing Flow (ADR-018 Compliant)

AI Prompt Specification (Prompt Engineering)

Document Analysis Prompt

Extracted Metadata Schema

Security & Compliance (ADR-018)

AI Isolation Requirements

Two-Phase Storage (Storage Governance)

Confidence Threshold Policy

Performance Estimation

Consequences

Positive Consequences

Negative Consequences

Mitigation Strategies

ADR Review Cycle

Review Classification

Review Schedule

Review Process

Phase 1: Preparation (1 week before review)

Phase 2: Review Meeting (2-hour session)

Phase 3: Decision & Documentation (1 week after review)

Review Criteria

Review History Template

Related Documents

Document History

25 KiB

Raw Blame History