np-dms/lcbp3

Fork 0

Files

T

admin 4391bbe61d

CI / CD Pipeline / build (push) Failing after 4m14s

Details

CI / CD Pipeline / deploy (push) Has been skipped

Details

690528:1524 ADR-030-230 context aware #02

2026-05-28 15:24:41 +07:00

15 KiB

Raw Blame History

ADR-030: Context-Aware Prompt Templates for OCR Metadata Extraction

Status: Accepted Date: 2026-05-27 Decision Makers: Development Team, System Architect Related Documents:

บริบทและปัญหา (Context and Problem Statement)

ADR-029 แก้ปัญหา hardcoded prompt โดยเก็บ prompt template ในตาราง ai_prompts แต่ยังมีข้อจำกัด:

ไม่มี Context Awareness: Prompt template ไม่สามารถระบุ master data context (projects, organizations, disciplines, etc.) ที่ AI ต้องใช้ในการ match ข้อมูล
Context Size ไม่ถูกควบคุม: ไม่มีกลไก filter master data ตาม project/contract scope ทำให้ context ใหญ่เกินไป
Language Hardcoded: Prompt template เดิมใช้ภาษาอังกฤษ แต่ LCBP3-DMS เป็นระบบภาษาไทย
Output Schema ไม่ครบ: สกัดเฉพาะ 8 fields แต่ยังขาด fields สำคัญสำหรับ correspondence creation (เช่น recipient organizations, UUID matching)

ปัจัยขับเคลื่อนการตัดสินใจ (Decision Drivers)

Context Filtering: ต้อง filter master data ตาม project/contract scope เพื่อลด context size และป้องกัน cross-project data leak (ADR-023A)
UUID Matching: AI ต้องส่งคืน UUID (ไม่ใช่ INT ID) ตาม ADR-019
Thai Language Support: Prompt และ output ต้องเป็นภาษาไทย
Flexible Configuration: Admin ต้องกำหนด filter criteria และ context config ได้ผ่าน AI Admin Console
Backward Compatibility: ต้องรองรับ prompt template เดิมที่ไม่มี context_config

ทางเลือกที่ถูกพิจารณา (Considered Options)

Option 1: เพิ่มคอลัมน์แยกต่างหาก (project_id, contract_id)

ข้อดี: Query ง่าย, type-safe
ข้อเสีย: ไม่ flexible, ถ้าอนาคตต้องการ filter criteria อื่นต้อง alter table อีก

Option 2: เก็บ filter criteria ใน field_schema (JSON)

ข้อดี: ไม่ต้อง alter table
ข้อเสีย: ผสม output schema กับ filter config, สับสน, query ซับซ้อน

Option 3: เพิ่ม context_config JSON column (ตัวเลือกที่ได้รับเลือก)

ข้อดี: Flexible, แยก concern ชัด (field_schema vs context_config), รองรับ config อนาคต
ข้อเสีย: ต้อง alter table ครั้งเดียว

ผลการตัดสินใจ (Decision Outcome)

ทางเลือกที่ได้รับเลือก: Option 3 — เพิ่ม context_config JSON column

ข้อตกลงหลัก (Core Decisions — Grilling Session 2026-05-27)

#	ประเด็น	การตัดสินใจ
1	Page Limit	3 หน้า (ตาม ADR-023A — classification/tagging rule)
2	Database Query Strategy	Option A — Backend ดึง master data แล้วส่งเป็น context ใน prompt (AI ไม่ query DB โดยตรงตาม ADR-023)
3	JSON Output Schema	11 fields: projectPublicId, correspondenceTypeCode, disciplineCode, originatorOrganizationPublicId, recipientOrganizationPublicIds, recipientTypes, subject, documentDate, tags, summary, confidence
4	Context Format	Option A — List format (array of objects) สำหรับ AI scan ง่าย
5	Filter Strategy	กำหนดใน prompt template โดย filter ด้วย projects และ contracts
6	Filter Storage	Option C — เพิ่ม `context_config` JSON column
7	Language	ภาษาไทย — Prompt instruction และ summary output
8	UUID Handling	ADR-019 — AI ส่งคืน UUID string (ไม่ใช่ INT ID)
9	Fallback Strategy	ถ้า AI ไม่พบ match → ส่ง `null` และต้องการ human validation

รายละเอียดเชิงสถาปัตยกรรม (Implementation Details)

1. Schema Changes (ADR-009)

-- Delta: เพิ่ม context_config สำหรับ filter criteria
ALTER TABLE ai_prompts 
ADD COLUMN context_config JSON NULL 
COMMENT 'Configuration สำหรับ context ที่ backend ต้องส่งให้ AI (filter, pageSize, language, etc.)';

2. context_config Structure

{
  "filter": {
    "projectId": 123,
    "contractId": 456
  },
  "pageSize": 3,
  "language": "th",
  "outputLanguage": "th"
}

Field Descriptions:

filter.projectId: INT หรือ null — Filter master data ตาม project scope
filter.contractId: INT หรือ null — Filter master data ตาม contract scope
pageSize: INT (default: 3) — จำนวนหน้า PDF ที่ OCR สกัด (ตาม ADR-023A)
language: string (default: "th") — ภาษา prompt instruction
outputLanguage: string (default: "th") — ภาษา output (summary)

3. JSON Output Schema (field_schema)

{
  "projectPublicId": "string|null",
  "correspondenceTypeCode": "string|null",
  "disciplineCode": "string|null",
  "originatorOrganizationPublicId": "string|null",
  "recipientOrganizationPublicIds": "string[]|null",
  "recipientTypes": "string[]|null",
  "subject": "string|null",
  "documentDate": "string:YYYY-MM-DD|null",
  "tags": "string[]|null",
  "summary": "string|null",
  "confidence": "float:0-1"
}

Mapping to Database:

projectPublicId → projects.uuid (ADR-019)
correspondenceTypeCode → correspondence_types.type_code
disciplineCode → disciplines.discipline_code
originatorOrganizationPublicId → organizations.uuid (originator_id)
recipientOrganizationPublicIds → organizations.uuid[] (correspondence_recipients)
recipientTypes → correspondence_recipients.recipient_type ("TO", "CC ")
subject → correspondence_revisions.title
documentDate → correspondence_revisions.document_date
tags → tags.tag_name[]
summary — AI-generated summary (4-5 ประโยคภาษาไทย)
confidence — AI confidence score (0.0-1.0)

4. Context Format (List Format)

Backend ส่ง master data context ในรูปแบบ:

{
  "availableProjects": [
    {"code": "LCBP3", "uuid": "0195...", "name": "โครงการ LCBP3"}
  ],
  "availableOrganizations": [
    {"code": "กทท.", "uuid": "0195...", "name": "การทางพิเศษแห่งประเทศไทย"},
    {"code": "TEAM", "uuid": "0195...", "name": "TEAM Consulting"}
  ],
  "availableDisciplines": [
    {"code": "GEN", "name": "General"},
    {"code": "STR", "name": "Structural"}
  ],
  "availableCorrespondenceTypes": [
    {"code": "RFA", "name": "Request for Approval"},
    {"code": "RFI", "name": "Request for Information"}
  ],
  "availableTags": [
    {"name": "Urgent", "color": "red"},
    {"name": "Review", "color": "blue"}
  ]
}

5. Prompt Template Example (ภาษาไทย)

คุณเป็นเอนจิ้นสกัดข้อมูลเอกสารมืออาชีพ
วิเคราะห์ข้อความ OCR จากเอกสารโครงการ (3 หน้าแรกเท่านั้น) และสกัดข้อมูลเมตาดาต้า

ข้อความ OCR:
{{ocr_text}}

ข้อมูลอ้างอิงที่ใช้ได้:
{{master_data_context}}

สกัด fields ต่อไปนี้:
1. projectPublicId: UUID ของโครงการ (จาก availableProjects)
2. correspondenceTypeCode: รหัสประเภทเอกสาร (เช่น RFA, RFI)
3. disciplineCode: รหัสสาขางาน (เช่น GEN, STR)
4. originatorOrganizationPublicId: UUID ขององค์กรผู้ส่ง
5. recipientOrganizationPublicIds: UUID[] ขององค์กรผู้รับ (หลายองค์กรได้)
6. recipientTypes: string[] ("TO", "CC")
7. subject: หัวข้อเอกสาร
8. documentDate: วันที่เอกสาร (YYYY-MM-DD)
9. tags: string[] รายชื่อ tags
10. summary: สรุปเอกสาร 4-5 ประโยคภาษาไทย
11. confidence: ความมั่นใจ (0.0-1.0)

ส่งคืนเฉพาะ JSON object ที่ถูกต้อง ไม่รวม markdown code blocks

6. Backend Implementation

AiPromptsService.resolveContext()

async resolveContext(activePrompt: AiPrompt): Promise<Record<string, unknown>> {
  const config = activePrompt.contextConfig || {};
  const filter = config.filter || {};
  
  const projectId = filter.projectId;
  const contractId = filter.contractId;
  
  // Query master data with filter
  const projects = await this.projectService.findAll({ projectId });
  const organizations = await this.organizationService.findAll({ projectId, contractId });
  const disciplines = await this.disciplineService.findAll({ contractId });
  const correspondenceTypes = await this.correspondenceTypeService.findAll();
  const tags = await this.tagsService.findAll({ projectId });
  
  return {
    availableProjects: projects.map(p => ({ code: p.projectCode, uuid: p.uuid, name: p.projectName })),
    availableOrganizations: organizations.map(o => ({ code: o.organizationCode, uuid: o.uuid, name: o.organizationName })),
    availableDisciplines: disciplines.map(d => ({ code: d.disciplineCode, name: d.codeNameTh })),
    availableCorrespondenceTypes: correspondenceTypes.map(t => ({ code: t.typeCode, name: t.typeName })),
    availableTags: tags.map(t => ({ name: t.tagName, color: t.colorCode })),
  };
}

AiBatchProcessor.processSandboxExtract()

const activePrompt = await this.aiPromptsService.getActive('ocr_extraction');
const context = await this.aiPromptsService.resolveContext(activePrompt);

const prompt = activePrompt.template
  .replace('{{ocr_text}}', ocrText)
  .replace('{{master_data_context}}', JSON.stringify(context, null, 2));

7. Frontend Changes (AI Admin Console)

Prompt Editor UI:

เพิ่ม section "Context Configuration"
Dropdown เลือก Project (optional)
Dropdown เลือก Contract (optional)
Input field: Page Size (default: 3)
Select: Language (th/en)

n8n Workflow:

เพิ่ม node "Select Project/Contract" ก่อน "OCR Extraction"
ส่ง projectId และ contractId ไปยัง DMS API /ai/ocr-extract
Backend resolve context จาก context_config หรือ override ด้วย input parameters

ผลกระทบ (Consequences)

ผลดี

Admin กำหนด filter criteria ได้ runtime ผ่าน AI Admin Console
Context size ถูกควบคุม ตาม project/contract scope
AI ส่งคืน UUID ตาม ADR-019 พร้อมนำเข้า
Prompt และ output เป็นภาษาไทย
Flexible สำหรับ config อนาคต (language, temperature, etc.)

ผลเสีย / ข้อระวัง

ต้อง alter table ai_prompts เพิ่ม context_config column
Backend query master data เพิ่มขึ้นต่อ job (mitigate ด้วย Redis cache)
ต้อง migrate prompt template เดิม (seed data) ให้มี context_config
n8n workflow ต้องอัปเดตให้รองรับ filter criteria

Migration Plan

Phase 1: Database Schema

Run delta SQL: ALTER TABLE ai_prompts ADD COLUMN context_config JSON NULL
Update AiPrompt entity เพิ่ม contextConfig property
Update DTOs (CreateAiPromptDto, UpdateAiPromptDto)

Phase 2: Backend Logic

Implement AiPromptsService.resolveContext()
Update AiBatchProcessor สำหรับส่ง context ไปให้ AI
Add Redis cache สำหรับ master data context (TTL: 300s)

Phase 3: Seed Data

Update seed data ใน 2026-05-25-create-ai-prompts.sql
เพิ่ม context_config สำหรับ version 1 (null = no filter)
สร้าง version 2 ใหม่ด้วย prompt template ภาษาไทย + context_config example

Phase 4: Frontend

Update AI Admin Console UI ให้มี Context Configuration section
Update n8n workflow ให้รองรับ filter criteria

Phase 5: Testing

Unit tests สำหรับ context resolution logic
Integration tests สำหรับ sandbox และ migration pipeline
Manual testing ด้วย prompt template ใหม่

Grilling Session Log

2026-05-27 — grilling session ผ่าน Antigravity AI
Q1: Page limit → 3 หน้า (ตาม ADR-023A)
Q2: Database query strategy → Option A (Backend ดึง master data ส่ง context)
Q3: JSON output schema → 11 fields (UUID-based ตาม ADR-019) พร้อมปรับปรุง recipients เป็น Object Array: Array<{ organizationPublicId: string, recipientType: "TO" | "CC" }>
Q4: Context format → Option A (List format)
Q5: Filter strategy → กำหนดใน prompt template โดย filter ด้วย projects/contracts
Q6: Filter storage → Option C (เพิ่ม context_config JSON column)
Q7: Tag Suggestion → Option A (ให้ Backend ทำการ Diff ระหว่าง tags ที่ได้มากับ availableTags เพื่อระบุสถานะ isNew: true เอง)
Q8: Project Scope Priority → Option C (หาก Template มีการผูกโครงการไว้ใน context_config แต่ request พยายาม override ไปโครงการอื่น ระบบจะทำการ Reject ด้วย ForbiddenException ทันที)
Q9: Database Typo Cleanup → Option C (อนุมัติล้าง whitespace typo ของตัวแปร 'CC ' ให้ถูกต้องเป็น 'CC' ทั้งระบบในระดับ Database Schema และอัปเดตไฟล์โครงสร้างหลัก)

ADR-029: Dynamic Prompt Management (base architecture)
ADR-023A: Unified AI Architecture — Model Revision (3-page rule, AI boundary)
ADR-019: Hybrid Identifier Strategy (UUID handling)
ADR-007: Error Handling Strategy (layered error classification)
ADR-009: Database Migration Strategy (direct SQL edits)

15 KiB Raw Blame History