refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
This commit is contained in:
@@ -518,6 +518,7 @@ graph TB
|
|||||||
- **np-dms-ai** - Main LLM for classification, tagging, extraction, RAG answers
|
- **np-dms-ai** - Main LLM for classification, tagging, extraction, RAG answers
|
||||||
- **np-dms-ocr** - OCR model through the sidecar, with adaptive residency from ADR-033
|
- **np-dms-ocr** - OCR model through the sidecar, with adaptive residency from ADR-033
|
||||||
- **BGE-M3 + BGE Reranker** - Retrieval stack served by the OCR sidecar
|
- **BGE-M3 + BGE Reranker** - Retrieval stack served by the OCR sidecar
|
||||||
|
- **OCR Sidecar Phase 1 hardening** - ADR-040 keeps X-API-Key before ADR-041 cutover, enforces upload-base path canonicalization, and verifies adaptive residency/CPU fallback with `tests/unit/ocr-sidecar/` plus `tests/integration/ocr-sidecar/`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
@@ -304,6 +304,8 @@ _Avoid_: Throw exception from tool, Untyped error
|
|||||||
- **"Master Data context parity (Gap 5)"** — resolved: Sandbox (`processSandboxExtract`/`processSandboxAiExtract`) ปัจจุบัน skip master data context ถ้า `projectPublicId='default'` → ทำให้ prompt content ต่างจาก production. Sandbox UI ต้องให้ admin ระบุ `projectPublicId` (และ `contractPublicId`) จริง; `aiPromptsService.resolveContext` ต้องถูกเรียกด้วย ID จริงเสมอ (ไม่ใช้ `'default'` เพื่อ skip); `aiPromptsService` จะคืนค่า empty context ถ้า project/contract ไม่มี master data
|
- **"Master Data context parity (Gap 5)"** — resolved: Sandbox (`processSandboxExtract`/`processSandboxAiExtract`) ปัจจุบัน skip master data context ถ้า `projectPublicId='default'` → ทำให้ prompt content ต่างจาก production. Sandbox UI ต้องให้ admin ระบุ `projectPublicId` (และ `contractPublicId`) จริง; `aiPromptsService.resolveContext` ต้องถูกเรียกด้วย ID จริงเสมอ (ไม่ใช้ `'default'` เพื่อ skip); `aiPromptsService` จะคืนค่า empty context ถ้า project/contract ไม่มี master data
|
||||||
- **"Apply Guardrails (Gap 6)"** — resolved: Apply to Production เป็น critical config change → ต้องมี guardrails ตาม AGENTS.md: (1) **Idempotency-Key** header mandatory สำหรับ `POST /api/ai/profiles/:profileName/apply` (Redis dedupe 5 นาที); (2) **CASL Guard** `@UseGuards(CaslGuard)` + permission `system.manage_ai`; (3) **Param Validation** class-validator (`@Min(0) @Max(1)` สำหรับ temperature/topP); (4) **Audit Trail** `ai_audit_logs` บันทึก `action='APPLY_PROFILE'`, user, old→new values; (5) **Range Guard** service layer throw `BusinessException` ถ้า out of range
|
- **"Apply Guardrails (Gap 6)"** — resolved: Apply to Production เป็น critical config change → ต้องมี guardrails ตาม AGENTS.md: (1) **Idempotency-Key** header mandatory สำหรับ `POST /api/ai/profiles/:profileName/apply` (Redis dedupe 5 นาที); (2) **CASL Guard** `@UseGuards(CaslGuard)` + permission `system.manage_ai`; (3) **Param Validation** class-validator (`@Min(0) @Max(1)` สำหรับ temperature/topP); (4) **Audit Trail** `ai_audit_logs` บันทึก `action='APPLY_PROFILE'`, user, old→new values; (5) **Range Guard** service layer throw `BusinessException` ถ้า out of range
|
||||||
- **"Entity/Service canonicalModel mapping (Gap 7)"** — resolved: `AiExecutionProfileEntity` ไม่มี mapping `canonical_model` column; `getProfileParameters` (`:125`) hardcode `canonicalModel: 'np-dms-ai'` → ต้องเพิ่ม `@Column({ name: 'canonical_model' })` ใน Entity; แก้ `getProfileParameters` อ่านจาก column แทน hardcode; สร้าง accessor `getModelDefaults(canonicalModel)` สำหรับ query ตาม canonical_model โดยตรง
|
- **"Entity/Service canonicalModel mapping (Gap 7)"** — resolved: `AiExecutionProfileEntity` ไม่มี mapping `canonical_model` column; `getProfileParameters` (`:125`) hardcode `canonicalModel: 'np-dms-ai'` → ต้องเพิ่ม `@Column({ name: 'canonical_model' })` ใน Entity; แก้ `getProfileParameters` อ่านจาก column แทน hardcode; สร้าง accessor `getModelDefaults(canonicalModel)` สำหรับ query ตาม canonical_model โดยตรง
|
||||||
|
- **"OCR Sidecar X-API-Key"** — resolved: ใช้ **Network Isolation Only** (ADR-040 D5) — supersede ADR-033 §7; ลบ `X-API-Key` validation จาก sidecar endpoints; ตรวจสอบผ่าน Docker-internal network (post-consolidation) หรือ VLAN/firewall ACL (interim cross-host); sequencing: ลบ `X-API-Key` เฉพาะเมื่อ ADR-041 cutover เสร็จ (single Docker host)
|
||||||
|
- **"Cross-host trust gap ของ OCR sidecar"** — resolved: ใช้ **Server Consolidation** (ADR-041) — co-locate ทุก services บน single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB); sidecar+backend อยู่บน Docker bridge เดียวกัน → Docker-internal isolation จริง; QNAP ยังคงเป็น NAS (CIFS) สำหรับ file storage
|
||||||
|
|
||||||
## ADRs ที่เกี่ยวข้องกับ AI Runtime Layer
|
## ADRs ที่เกี่ยวข้องกับ AI Runtime Layer
|
||||||
|
|
||||||
|
|||||||
@@ -51,7 +51,7 @@ QDRANT_URL=http://localhost:6333
|
|||||||
|
|
||||||
# Ollama (Admin Desktop Desk-5439 — ADR-034 Thai-Optimized Model Stack)
|
# Ollama (Admin Desktop Desk-5439 — ADR-034 Thai-Optimized Model Stack)
|
||||||
OLLAMA_MODEL_MAIN=typhoon2.5-np-dms:latest
|
OLLAMA_MODEL_MAIN=typhoon2.5-np-dms:latest
|
||||||
OLLAMA_MODEL_OCR=typhoon-np-dms-ocr:latest
|
OLLAMA_MODEL_OCR=np-dms-ocr:latest
|
||||||
OLLAMA_MODEL_EMBED=nomic-embed-text
|
OLLAMA_MODEL_EMBED=nomic-embed-text
|
||||||
OLLAMA_EMBED_MODEL=nomic-embed-text
|
OLLAMA_EMBED_MODEL=nomic-embed-text
|
||||||
OLLAMA_RAG_MODEL=typhoon2.5-np-dms:latest
|
OLLAMA_RAG_MODEL=typhoon2.5-np-dms:latest
|
||||||
@@ -67,12 +67,10 @@ AI_REALTIME_CONCURRENCY=2
|
|||||||
QDRANT_HOST=http://192.168.10.8:6333
|
QDRANT_HOST=http://192.168.10.8:6333
|
||||||
QDRANT_COLLECTION=lcbp3_documents
|
QDRANT_COLLECTION=lcbp3_documents
|
||||||
|
|
||||||
# OCR sidecar (PaddleOCR on Desk-5439)
|
# OCR sidecar (np-dms-ocr on Desk-5439)
|
||||||
OCR_CHAR_THRESHOLD=100
|
OCR_CHAR_THRESHOLD=100
|
||||||
OCR_API_URL=http://192.168.10.8:8765
|
OCR_API_URL=http://192.168.10.100:8765
|
||||||
|
OCR_SIDECAR_API_KEY=change-me-sidecar-api-key
|
||||||
# Thai preprocessing microservice (PyThaiNLP — Admin Desktop)
|
|
||||||
THAI_PREPROCESS_URL=http://192.168.10.8:8765
|
|
||||||
|
|
||||||
# ADR-023 forbids cloud AI fallback for project documents.
|
# ADR-023 forbids cloud AI fallback for project documents.
|
||||||
|
|
||||||
|
|||||||
@@ -134,7 +134,7 @@ export class AiQueueService {
|
|||||||
filePublicId?: string;
|
filePublicId?: string;
|
||||||
pdfPath?: string;
|
pdfPath?: string;
|
||||||
engineType?: string;
|
engineType?: string;
|
||||||
typhoonOptions?: {
|
ocrOptions?: {
|
||||||
temperature?: number;
|
temperature?: number;
|
||||||
topP?: number;
|
topP?: number;
|
||||||
repeatPenalty?: number;
|
repeatPenalty?: number;
|
||||||
@@ -154,7 +154,7 @@ export class AiQueueService {
|
|||||||
filePublicId: payload.filePublicId,
|
filePublicId: payload.filePublicId,
|
||||||
pdfPath: payload.pdfPath,
|
pdfPath: payload.pdfPath,
|
||||||
engineType: payload.engineType,
|
engineType: payload.engineType,
|
||||||
typhoonOptions: payload.typhoonOptions,
|
ocrOptions: payload.ocrOptions,
|
||||||
contractPublicId: payload.contractPublicId,
|
contractPublicId: payload.contractPublicId,
|
||||||
...payload.extraPayload,
|
...payload.extraPayload,
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -567,7 +567,7 @@ export class AiController {
|
|||||||
},
|
},
|
||||||
engineType: {
|
engineType: {
|
||||||
type: 'string',
|
type: 'string',
|
||||||
enum: ['auto', 'tesseract', 'np-dms-ocr', 'typhoon-np-dms-ocr'],
|
enum: ['auto', 'np-dms-ocr'],
|
||||||
description: 'OCR engine ที่ต้องการใช้ (default: auto)',
|
description: 'OCR engine ที่ต้องการใช้ (default: auto)',
|
||||||
},
|
},
|
||||||
temperature: {
|
temperature: {
|
||||||
@@ -607,19 +607,14 @@ export class AiController {
|
|||||||
const attachment = await this.fileStorageService.upload(file, user.user_id);
|
const attachment = await this.fileStorageService.upload(file, user.user_id);
|
||||||
const requestPublicId = uuidv7();
|
const requestPublicId = uuidv7();
|
||||||
// ตรวจสอบและ normalize engineType ให้เป็นค่าที่ valid
|
// ตรวจสอบและ normalize engineType ให้เป็นค่าที่ valid
|
||||||
const validEngineTypes = [
|
const validEngineTypes = ['auto', 'np-dms-ocr'] as const;
|
||||||
'auto',
|
|
||||||
'tesseract',
|
|
||||||
'np-dms-ocr',
|
|
||||||
'typhoon-np-dms-ocr',
|
|
||||||
] as const;
|
|
||||||
const resolvedEngineType: SandboxOcrEngineType = validEngineTypes.includes(
|
const resolvedEngineType: SandboxOcrEngineType = validEngineTypes.includes(
|
||||||
engineType as SandboxOcrEngineType
|
engineType as SandboxOcrEngineType
|
||||||
)
|
)
|
||||||
? (engineType as SandboxOcrEngineType)
|
? (engineType as SandboxOcrEngineType)
|
||||||
: 'auto';
|
: 'auto';
|
||||||
// แปลง string จาก multipart form เป็น number (optional override)
|
// แปลง string จาก multipart form เป็น number (optional override)
|
||||||
const typhoonOptions = {
|
const ocrOptions = {
|
||||||
...(temperature !== undefined && {
|
...(temperature !== undefined && {
|
||||||
temperature: parseFloat(temperature),
|
temperature: parseFloat(temperature),
|
||||||
}),
|
}),
|
||||||
@@ -634,7 +629,7 @@ export class AiController {
|
|||||||
idempotencyKey: requestPublicId,
|
idempotencyKey: requestPublicId,
|
||||||
pdfPath: attachment.filePath,
|
pdfPath: attachment.filePath,
|
||||||
engineType: resolvedEngineType,
|
engineType: resolvedEngineType,
|
||||||
...(Object.keys(typhoonOptions).length > 0 && { typhoonOptions }),
|
...(Object.keys(ocrOptions).length > 0 && { ocrOptions }),
|
||||||
}
|
}
|
||||||
);
|
);
|
||||||
return { requestPublicId, jobId, status: 'queued' };
|
return { requestPublicId, jobId, status: 'queued' };
|
||||||
|
|||||||
@@ -8,7 +8,7 @@
|
|||||||
// - 2026-05-22: นำเข้าและลงทะเบียน CleanupTempFilesWorker (T016) เพื่อลบไฟล์แนบชั่วคราวหมดอายุ
|
// - 2026-05-22: นำเข้าและลงทะเบียน CleanupTempFilesWorker (T016) เพื่อลบไฟล์แนบชั่วคราวหมดอายุ
|
||||||
// - 2026-05-23: ลงทะเบียน MigrationProgress + AiMigrationCheckpointService (ADR-023A)
|
// - 2026-05-23: ลงทะเบียน MigrationProgress + AiMigrationCheckpointService (ADR-023A)
|
||||||
// - 2026-05-25: ลงทะเบียน AiAvailableModel สำหรับ AI Model Management (ADR-027).
|
// - 2026-05-25: ลงทะเบียน AiAvailableModel สำหรับ AI Model Management (ADR-027).
|
||||||
// - 2026-05-30: ลงทะเบียน VramMonitorService, OcrCacheService, TyphoonOcrProcessor, TyphoonLlmProcessor (ADR-032).
|
// - 2026-05-30: ลงทะเบียน VramMonitorService, OcrCacheService, NpDmsOcrProcessor, NpDmsAiProcessor (ADR-032).
|
||||||
// - 2026-06-13: ลงทะเบียน AiSandboxProfile สำหรับ ADR-036 sandbox-production parity
|
// - 2026-06-13: ลงทะเบียน AiSandboxProfile สำหรับ ADR-036 sandbox-production parity
|
||||||
// Module สำหรับ AI Gateway — ลงทะเบียน Services และ Controllers (ADR-023)
|
// Module สำหรับ AI Gateway — ลงทะเบียน Services และ Controllers (ADR-023)
|
||||||
|
|
||||||
@@ -75,13 +75,13 @@ import {
|
|||||||
QUEUE_AI_VECTOR_DELETION,
|
QUEUE_AI_VECTOR_DELETION,
|
||||||
} from '../common/constants/queue.constants';
|
} from '../common/constants/queue.constants';
|
||||||
import {
|
import {
|
||||||
TyphoonOcrProcessor,
|
NpDmsOcrProcessor,
|
||||||
QUEUE_TYPHOON_OCR,
|
QUEUE_NP_DMS_OCR,
|
||||||
} from './processors/typhoon-ocr.processor';
|
} from './processors/np-dms-ocr-processor';
|
||||||
import {
|
import {
|
||||||
TyphoonLlmProcessor,
|
NpDmsAiProcessor,
|
||||||
QUEUE_TYPHOON_LLM,
|
QUEUE_NP_DMS_AI,
|
||||||
} from './processors/typhoon-llm.processor';
|
} from './processors/np-dms-ai.processor';
|
||||||
|
|
||||||
@Module({
|
@Module({
|
||||||
imports: [
|
imports: [
|
||||||
@@ -129,7 +129,7 @@ import {
|
|||||||
{ name: QUEUE_AI_VECTOR_DELETION },
|
{ name: QUEUE_AI_VECTOR_DELETION },
|
||||||
// Typhoon OCR + LLM queues: concurrency=1 เพื่อป้องกัน VRAM overflow (ADR-032)
|
// Typhoon OCR + LLM queues: concurrency=1 เพื่อป้องกัน VRAM overflow (ADR-032)
|
||||||
{
|
{
|
||||||
name: QUEUE_TYPHOON_OCR,
|
name: QUEUE_NP_DMS_OCR,
|
||||||
defaultJobOptions: {
|
defaultJobOptions: {
|
||||||
attempts: 2,
|
attempts: 2,
|
||||||
backoff: { type: 'exponential', delay: 5000 },
|
backoff: { type: 'exponential', delay: 5000 },
|
||||||
@@ -138,7 +138,7 @@ import {
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
name: QUEUE_TYPHOON_LLM,
|
name: QUEUE_NP_DMS_AI,
|
||||||
defaultJobOptions: {
|
defaultJobOptions: {
|
||||||
attempts: 2,
|
attempts: 2,
|
||||||
backoff: { type: 'exponential', delay: 5000 },
|
backoff: { type: 'exponential', delay: 5000 },
|
||||||
@@ -198,9 +198,9 @@ import {
|
|||||||
AiRagProcessor,
|
AiRagProcessor,
|
||||||
// Phase 5: Vector Deletion async processor (ADR-023 FR-008)
|
// Phase 5: Vector Deletion async processor (ADR-023 FR-008)
|
||||||
AiVectorDeletionProcessor,
|
AiVectorDeletionProcessor,
|
||||||
// ADR-032: Typhoon OCR + LLM sequential processors (concurrency=1)
|
// ADR-032: np-dms-ocr + np-dms-ai sequential processors (concurrency=1)
|
||||||
TyphoonOcrProcessor,
|
NpDmsOcrProcessor,
|
||||||
TyphoonLlmProcessor,
|
NpDmsAiProcessor,
|
||||||
// US4: Execution Profiles Service (T044)
|
// US4: Execution Profiles Service (T044)
|
||||||
AiExecutionProfilesService,
|
AiExecutionProfilesService,
|
||||||
// RbacGuard ต้องการ UserService จาก UserModule
|
// RbacGuard ต้องการ UserService จาก UserModule
|
||||||
|
|||||||
@@ -80,7 +80,7 @@ describe('AiService', () => {
|
|||||||
|
|
||||||
const mockOllamaService = {
|
const mockOllamaService = {
|
||||||
getMainModelName: jest.fn().mockReturnValue('typhoon2.5-np-dms:latest'),
|
getMainModelName: jest.fn().mockReturnValue('typhoon2.5-np-dms:latest'),
|
||||||
getOcrModelName: jest.fn().mockReturnValue('typhoon-np-dms-ocr:latest'),
|
getOcrModelName: jest.fn().mockReturnValue('np-dms-ocr:latest'),
|
||||||
checkHealth: jest.fn().mockResolvedValue({
|
checkHealth: jest.fn().mockResolvedValue({
|
||||||
status: 'HEALTHY',
|
status: 'HEALTHY',
|
||||||
latencyMs: 120,
|
latencyMs: 120,
|
||||||
|
|||||||
@@ -41,7 +41,7 @@ export class AiAuditLog extends UuidBaseEntity {
|
|||||||
@Column({ name: 'model_name', type: 'varchar', length: 100, nullable: true })
|
@Column({ name: 'model_name', type: 'varchar', length: 100, nullable: true })
|
||||||
modelName?: string;
|
modelName?: string;
|
||||||
|
|
||||||
// ประเภท OCR/LLM model ที่ใช้ เช่น tesseract, typhoon-ocr-3b, typhoon2.1-gemma3-4b (ADR-032)
|
// ประเภท OCR/LLM model ที่ใช้ เช่น fast-path, np-dms-ocr, np-dms-ai (ADR-032)
|
||||||
@Index('idx_ai_audit_model_type')
|
@Index('idx_ai_audit_model_type')
|
||||||
@Column({ name: 'model_type', type: 'varchar', length: 50, nullable: true })
|
@Column({ name: 'model_type', type: 'varchar', length: 50, nullable: true })
|
||||||
modelType?: string;
|
modelType?: string;
|
||||||
|
|||||||
@@ -1,12 +1,13 @@
|
|||||||
// File: src/modules/ai/entities/ocr-engine-configuration.entity.ts
|
// File: src/modules/ai/entities/ocr-engine-configuration.entity.ts
|
||||||
// Change Log
|
// Change Log
|
||||||
// - 2026-05-30: สร้าง OcrEngineConfiguration class สำหรับเก็บข้อมูลการตั้งค่า OCR Engine (T010, US1)
|
// - 2026-05-30: สร้าง OcrEngineConfiguration class สำหรับเก็บข้อมูลการตั้งค่า OCR Engine (T010, US1)
|
||||||
|
// - 2026-06-20: เปลี่ยน TESSERACT → FAST_PATH, TYPHOON_OCR → NP_DMS_OCR ตามการทำความสะอาด legacy references
|
||||||
|
|
||||||
import { ApiProperty } from '@nestjs/swagger';
|
import { ApiProperty } from '@nestjs/swagger';
|
||||||
|
|
||||||
export enum OcrEngineType {
|
export enum OcrEngineType {
|
||||||
TESSERACT = 'tesseract',
|
FAST_PATH = 'fast_path',
|
||||||
TYPHOON_OCR = 'typhoon_ocr',
|
NP_DMS_OCR = 'np_dms_ocr',
|
||||||
}
|
}
|
||||||
|
|
||||||
/** คลาสสำหรับเก็บข้อมูลการตั้งค่า OCR Engine (ไม่ผูกกับตาราง SQL ตาม data-model.md) */
|
/** คลาสสำหรับเก็บข้อมูลการตั้งค่า OCR Engine (ไม่ผูกกับตาราง SQL ตาม data-model.md) */
|
||||||
|
|||||||
@@ -738,7 +738,7 @@ describe('AiBatchProcessor', () => {
|
|||||||
expect(ocrService.detectAndExtract).toHaveBeenCalledWith({
|
expect(ocrService.detectAndExtract).toHaveBeenCalledWith({
|
||||||
pdfPath: '/files/test.pdf',
|
pdfPath: '/files/test.pdf',
|
||||||
activeProfile: 'quality',
|
activeProfile: 'quality',
|
||||||
typhoonOptions: {
|
ocrOptions: {
|
||||||
temperature: 0.15,
|
temperature: 0.15,
|
||||||
topP: 0.65,
|
topP: 0.65,
|
||||||
repeatPenalty: 1.15,
|
repeatPenalty: 1.15,
|
||||||
|
|||||||
@@ -34,7 +34,7 @@ import { OcrService } from '../services/ocr.service';
|
|||||||
import {
|
import {
|
||||||
SandboxOcrEngineService,
|
SandboxOcrEngineService,
|
||||||
SandboxOcrEngineType,
|
SandboxOcrEngineType,
|
||||||
OcrTyphoonOptions,
|
OcrNpDmsOptions,
|
||||||
} from '../services/sandbox-ocr-engine.service';
|
} from '../services/sandbox-ocr-engine.service';
|
||||||
import {
|
import {
|
||||||
OllamaService,
|
OllamaService,
|
||||||
@@ -562,7 +562,7 @@ export class AiBatchProcessor extends WorkerHost {
|
|||||||
})
|
})
|
||||||
);
|
);
|
||||||
try {
|
try {
|
||||||
let ocrParams: OcrTyphoonOptions | undefined = undefined;
|
let ocrParams: OcrNpDmsOptions | undefined = undefined;
|
||||||
if (engineType === 'np-dms-ocr') {
|
if (engineType === 'np-dms-ocr') {
|
||||||
try {
|
try {
|
||||||
const ocrDraft =
|
const ocrDraft =
|
||||||
@@ -705,7 +705,7 @@ export class AiBatchProcessor extends WorkerHost {
|
|||||||
const { idempotencyKey, payload } = data;
|
const { idempotencyKey, payload } = data;
|
||||||
const pdfPath = payload.pdfPath as string;
|
const pdfPath = payload.pdfPath as string;
|
||||||
const engineType = (payload.engineType as SandboxOcrEngineType) || 'auto';
|
const engineType = (payload.engineType as SandboxOcrEngineType) || 'auto';
|
||||||
const typhoonOptions = payload.typhoonOptions as
|
const ocrOptions = payload.ocrOptions as
|
||||||
| { temperature?: number; topP?: number; repeatPenalty?: number }
|
| { temperature?: number; topP?: number; repeatPenalty?: number }
|
||||||
| undefined;
|
| undefined;
|
||||||
|
|
||||||
@@ -722,7 +722,7 @@ export class AiBatchProcessor extends WorkerHost {
|
|||||||
})
|
})
|
||||||
);
|
);
|
||||||
|
|
||||||
let ocrParams = typhoonOptions;
|
let ocrParams = ocrOptions;
|
||||||
if (!ocrParams && engineType === 'np-dms-ocr') {
|
if (!ocrParams && engineType === 'np-dms-ocr') {
|
||||||
try {
|
try {
|
||||||
const ocrDraft =
|
const ocrDraft =
|
||||||
@@ -1078,7 +1078,7 @@ export class AiBatchProcessor extends WorkerHost {
|
|||||||
ocrResult = await this.ocrService.detectAndExtract({
|
ocrResult = await this.ocrService.detectAndExtract({
|
||||||
pdfPath: attachment.filePath,
|
pdfPath: attachment.filePath,
|
||||||
activeProfile: job.data.effectiveProfile,
|
activeProfile: job.data.effectiveProfile,
|
||||||
typhoonOptions: job.data.ocrSnapshotParams,
|
ocrOptions: job.data.ocrSnapshotParams,
|
||||||
});
|
});
|
||||||
} catch (err: unknown) {
|
} catch (err: unknown) {
|
||||||
const errMsg = err instanceof Error ? err.message : String(err);
|
const errMsg = err instanceof Error ? err.message : String(err);
|
||||||
|
|||||||
+29
-28
@@ -1,8 +1,9 @@
|
|||||||
// File: src/modules/ai/processors/typhoon-llm.processor.ts
|
// File: backend/src/modules/ai/processors/np-dms-ai.processor.ts
|
||||||
// Change Log
|
// Change Log
|
||||||
// - 2026-05-30: Initial processor สำหรับ Typhoon LLM sequential jobs (T009d, ADR-032)
|
// - 2026-05-30: Initial processor สำหรับ np-dms-ai sequential jobs (T009d, ADR-032)
|
||||||
// รันด้วย concurrency=1 เพื่อป้องกัน VRAM overflow บน RTX 2060 Super (8GB)
|
// รันด้วย concurrency=1 เพื่อป้องกัน VRAM overflow บน RTX 2060 Super (8GB)
|
||||||
// ใช้ keep_alive=0 ผ่าน Ollama API เพื่อ unload model หลังประมวลผล
|
// ใช้ keep_alive=0 ผ่าน Ollama API เพื่อ unload model หลังประมวลผล
|
||||||
|
// - 2026-06-20: เปลี่ยนชื่อจาก typhoon-llm.processor.ts เป็น np-dms-ai.processor.ts
|
||||||
|
|
||||||
import { Processor, WorkerHost } from '@nestjs/bullmq';
|
import { Processor, WorkerHost } from '@nestjs/bullmq';
|
||||||
import { Logger } from '@nestjs/common';
|
import { Logger } from '@nestjs/common';
|
||||||
@@ -16,14 +17,14 @@ import axios from 'axios';
|
|||||||
import { AiAuditLog, AiAuditStatus } from '../entities/ai-audit-log.entity';
|
import { AiAuditLog, AiAuditStatus } from '../entities/ai-audit-log.entity';
|
||||||
import { VramMonitorService } from '../services/vram-monitor.service';
|
import { VramMonitorService } from '../services/vram-monitor.service';
|
||||||
|
|
||||||
/** ชื่อ queue สำหรับ Typhoon LLM jobs */
|
/** ชื่อ queue สำหรับ np-dms-ai LLM jobs */
|
||||||
export const QUEUE_TYPHOON_LLM = 'typhoon-llm';
|
export const QUEUE_NP_DMS_AI = 'np-dms-ai';
|
||||||
|
|
||||||
/** รูปแบบข้อมูล job ใน Typhoon LLM queue */
|
/** รูปแบบข้อมูล job ใน np-dms-ai LLM queue */
|
||||||
export interface TyphoonLlmJobData {
|
export interface NpDmsAiJobData {
|
||||||
/** prompt ที่จะส่งให้ Typhoon LLM */
|
/** prompt ที่จะส่งให้ np-dms-ai LLM */
|
||||||
prompt: string;
|
prompt: string;
|
||||||
/** ชื่อ model เช่น scb10x/typhoon2.1-gemma3-4b */
|
/** ชื่อ model เช่น typhoon2.5-np-dms:latest */
|
||||||
model?: string;
|
model?: string;
|
||||||
/** idempotencyKey สำหรับ Redis result key */
|
/** idempotencyKey สำหรับ Redis result key */
|
||||||
idempotencyKey: string;
|
idempotencyKey: string;
|
||||||
@@ -39,19 +40,19 @@ interface OllamaGenerateResponse {
|
|||||||
done: boolean;
|
done: boolean;
|
||||||
}
|
}
|
||||||
|
|
||||||
// VRAM ที่ Typhoon 2.1 Gemma3 4B ต้องการ (MB) — ตาม ADR-032
|
// VRAM ที่ np-dms-ai ต้องการ (MB) — ตาม ADR-032
|
||||||
const TYPHOON_LLM_REQUIRED_VRAM_MB = 4500;
|
const NP_DMS_AI_REQUIRED_VRAM_MB = 4500;
|
||||||
// Timeout 120 วินาทีสำหรับ LLM generation
|
// Timeout 120 วินาทีสำหรับ LLM generation
|
||||||
const TYPHOON_LLM_TIMEOUT_MS = 120000;
|
const NP_DMS_AI_TIMEOUT_MS = 120000;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Processor สำหรับ Typhoon LLM jobs ที่รันแบบ sequential (concurrency=1)
|
* Processor สำหรับ np-dms-ai LLM jobs ที่รันแบบ sequential (concurrency=1)
|
||||||
* เพื่อป้องกัน VRAM overflow เมื่อรัน LLM หลายงานพร้อมกันบน RTX 2060 Super
|
* เพื่อป้องกัน VRAM overflow เมื่อรัน LLM หลายงานพร้อมกันบน RTX 2060 Super
|
||||||
* ตาม ADR-032: lockDuration=180000ms รองรับ 120s timeout + buffer
|
* ตาม ADR-032: lockDuration=180000ms รองรับ 120s timeout + buffer
|
||||||
*/
|
*/
|
||||||
@Processor(QUEUE_TYPHOON_LLM, { concurrency: 1, lockDuration: 180000 })
|
@Processor(QUEUE_NP_DMS_AI, { concurrency: 1, lockDuration: 180000 })
|
||||||
export class TyphoonLlmProcessor extends WorkerHost {
|
export class NpDmsAiProcessor extends WorkerHost {
|
||||||
private readonly logger = new Logger(TyphoonLlmProcessor.name);
|
private readonly logger = new Logger(NpDmsAiProcessor.name);
|
||||||
private readonly ollamaUrl: string;
|
private readonly ollamaUrl: string;
|
||||||
private readonly defaultModel: string;
|
private readonly defaultModel: string;
|
||||||
|
|
||||||
@@ -68,25 +69,25 @@ export class TyphoonLlmProcessor extends WorkerHost {
|
|||||||
this.configService.get<string>('AI_HOST_URL', 'http://localhost:11434')
|
this.configService.get<string>('AI_HOST_URL', 'http://localhost:11434')
|
||||||
);
|
);
|
||||||
this.defaultModel = this.configService.get<string>(
|
this.defaultModel = this.configService.get<string>(
|
||||||
'OLLAMA_MODEL_TYPHOON',
|
'OLLAMA_MODEL_MAIN',
|
||||||
'scb10x/typhoon2.1-gemma3-4b'
|
'typhoon2.5-np-dms:latest'
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** ประมวลผล Typhoon LLM job ทีละงาน */
|
/** ประมวลผล np-dms-ai LLM job ทีละงาน */
|
||||||
async process(job: Job<TyphoonLlmJobData>): Promise<void> {
|
async process(job: Job<NpDmsAiJobData>): Promise<void> {
|
||||||
const { prompt, model, idempotencyKey, documentPublicId } = job.data;
|
const { prompt, model, idempotencyKey, documentPublicId } = job.data;
|
||||||
const startTime = Date.now();
|
const startTime = Date.now();
|
||||||
const targetModel = model ?? this.defaultModel;
|
const targetModel = model ?? this.defaultModel;
|
||||||
this.logger.log(
|
this.logger.log(
|
||||||
`Typhoon LLM job started — idempotencyKey=${idempotencyKey}, model=${targetModel}`
|
`np-dms-ai LLM job started — idempotencyKey=${idempotencyKey}, model=${targetModel}`
|
||||||
);
|
);
|
||||||
// ตรวจสอบ VRAM ก่อนโหลด model
|
// ตรวจสอบ VRAM ก่อนโหลด model
|
||||||
const hasCapacity = await this.vramMonitorService.hasVramCapacity(
|
const hasCapacity = await this.vramMonitorService.hasVramCapacity(
|
||||||
TYPHOON_LLM_REQUIRED_VRAM_MB
|
NP_DMS_AI_REQUIRED_VRAM_MB
|
||||||
);
|
);
|
||||||
if (!hasCapacity) {
|
if (!hasCapacity) {
|
||||||
const errMsg = `VRAM ไม่เพียงพอสำหรับ ${targetModel} (ต้องการ ${TYPHOON_LLM_REQUIRED_VRAM_MB}MB) — retry ภายหลัง`;
|
const errMsg = `VRAM ไม่เพียงพอสำหรับ ${targetModel} (ต้องการ ${NP_DMS_AI_REQUIRED_VRAM_MB}MB) — retry ภายหลัง`;
|
||||||
this.logger.warn(errMsg);
|
this.logger.warn(errMsg);
|
||||||
await this.saveResult(idempotencyKey, {
|
await this.saveResult(idempotencyKey, {
|
||||||
status: 'failed',
|
status: 'failed',
|
||||||
@@ -117,7 +118,7 @@ export class TyphoonLlmProcessor extends WorkerHost {
|
|||||||
},
|
},
|
||||||
keep_alive: 0,
|
keep_alive: 0,
|
||||||
},
|
},
|
||||||
{ timeout: TYPHOON_LLM_TIMEOUT_MS }
|
{ timeout: NP_DMS_AI_TIMEOUT_MS }
|
||||||
);
|
);
|
||||||
const processingTimeMs = Date.now() - startTime;
|
const processingTimeMs = Date.now() - startTime;
|
||||||
const generatedText = response.data.response ?? '';
|
const generatedText = response.data.response ?? '';
|
||||||
@@ -136,11 +137,11 @@ export class TyphoonLlmProcessor extends WorkerHost {
|
|||||||
processingTimeMs,
|
processingTimeMs,
|
||||||
});
|
});
|
||||||
this.logger.log(
|
this.logger.log(
|
||||||
`Typhoon LLM completed — ${generatedText.length} chars, ${processingTimeMs}ms`
|
`np-dms-ai LLM completed — ${generatedText.length} chars, ${processingTimeMs}ms`
|
||||||
);
|
);
|
||||||
} catch (err: unknown) {
|
} catch (err: unknown) {
|
||||||
const errMsg = err instanceof Error ? err.message : String(err);
|
const errMsg = err instanceof Error ? err.message : String(err);
|
||||||
this.logger.error(`Typhoon LLM job failed: ${errMsg}`);
|
this.logger.error(`np-dms-ai LLM job failed: ${errMsg}`);
|
||||||
await this.saveResult(idempotencyKey, {
|
await this.saveResult(idempotencyKey, {
|
||||||
status: 'failed',
|
status: 'failed',
|
||||||
errorMessage: errMsg,
|
errorMessage: errMsg,
|
||||||
@@ -169,7 +170,7 @@ export class TyphoonLlmProcessor extends WorkerHost {
|
|||||||
}
|
}
|
||||||
): Promise<void> {
|
): Promise<void> {
|
||||||
await this.redis.setex(
|
await this.redis.setex(
|
||||||
`ai:typhoon:llm:${idempotencyKey}`,
|
`ai:np-dms-ai:llm:${idempotencyKey}`,
|
||||||
3600,
|
3600,
|
||||||
JSON.stringify({
|
JSON.stringify({
|
||||||
idempotencyKey,
|
idempotencyKey,
|
||||||
@@ -179,7 +180,7 @@ export class TyphoonLlmProcessor extends WorkerHost {
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** บันทึก audit log สำหรับ Typhoon LLM interaction */
|
/** บันทึก audit log สำหรับ np-dms-ai LLM interaction */
|
||||||
private async writeAuditLog(params: {
|
private async writeAuditLog(params: {
|
||||||
documentPublicId?: string;
|
documentPublicId?: string;
|
||||||
model: string;
|
model: string;
|
||||||
@@ -189,7 +190,7 @@ export class TyphoonLlmProcessor extends WorkerHost {
|
|||||||
}): Promise<void> {
|
}): Promise<void> {
|
||||||
const log = this.auditLogRepo.create({
|
const log = this.auditLogRepo.create({
|
||||||
documentPublicId: params.documentPublicId,
|
documentPublicId: params.documentPublicId,
|
||||||
aiModel: 'typhoon-llm',
|
aiModel: 'np-dms-ai',
|
||||||
modelName: params.model,
|
modelName: params.model,
|
||||||
modelType: 'llm',
|
modelType: 'llm',
|
||||||
status: params.status,
|
status: params.status,
|
||||||
+19
-18
@@ -1,8 +1,9 @@
|
|||||||
// File: src/modules/ai/processors/typhoon-ocr.processor.ts
|
// File: src/modules/ai/processors/np-dms-ocr-processor.ts
|
||||||
// Change Log
|
// Change Log
|
||||||
// - 2026-05-30: Initial processor สำหรับ Typhoon OCR sequential jobs (T009c, ADR-032)
|
// - 2026-05-30: Initial processor สำหรับ Typhoon OCR sequential jobs (T009c, ADR-032)
|
||||||
// รันด้วย concurrency=1 เพื่อป้องกัน VRAM overflow บน RTX 2060 Super (8GB)
|
// รันด้วย concurrency=1 เพื่อป้องกัน VRAM overflow บน RTX 2060 Super (8GB)
|
||||||
// ใช้ keep_alive=0 ผ่าน sidecar Ollama API เพื่อ unload model หลังประมวลผล
|
// ใช้ keep_alive=0 ผ่าน sidecar Ollama API เพื่อ unload model หลังประมวลผล
|
||||||
|
// - 2026-06-20: เปลี่ยนชื่อไฟล์จาก typhoon-ocr.processor.ts → np-dms-ocr-processor.ts
|
||||||
|
|
||||||
import { Processor, WorkerHost } from '@nestjs/bullmq';
|
import { Processor, WorkerHost } from '@nestjs/bullmq';
|
||||||
import { Logger } from '@nestjs/common';
|
import { Logger } from '@nestjs/common';
|
||||||
@@ -17,24 +18,24 @@ import { VramMonitorService } from '../services/vram-monitor.service';
|
|||||||
import {
|
import {
|
||||||
SandboxOcrEngineService,
|
SandboxOcrEngineService,
|
||||||
SandboxOcrEngineType,
|
SandboxOcrEngineType,
|
||||||
OcrTyphoonOptions,
|
OcrNpDmsOptions,
|
||||||
} from '../services/sandbox-ocr-engine.service';
|
} from '../services/sandbox-ocr-engine.service';
|
||||||
|
|
||||||
/** ชื่อ queue สำหรับ Typhoon OCR jobs */
|
/** ชื่อ queue สำหรับ np-dms-ocr jobs */
|
||||||
export const QUEUE_TYPHOON_OCR = 'typhoon-ocr';
|
export const QUEUE_NP_DMS_OCR = 'np-dms-ocr';
|
||||||
|
|
||||||
/** รูปแบบข้อมูล job ใน Typhoon OCR queue */
|
/** รูปแบบข้อมูล job ใน np-dms-ocr queue */
|
||||||
export interface TyphoonOcrJobData {
|
export interface NpDmsOcrJobData {
|
||||||
/** public path ของไฟล์ PDF ที่ต้องการ OCR */
|
/** public path ของไฟล์ PDF ที่ต้องการ OCR */
|
||||||
pdfPath: string;
|
pdfPath: string;
|
||||||
/** engineType: 'typhoon-np-dms-ocr' สำหรับ queue นี้ */
|
/** engineType: 'np-dms-ocr' สำหรับ queue นี้ */
|
||||||
engineType: SandboxOcrEngineType;
|
engineType: SandboxOcrEngineType;
|
||||||
/** idempotencyKey สำหรับ Redis result key */
|
/** idempotencyKey สำหรับ Redis result key */
|
||||||
idempotencyKey: string;
|
idempotencyKey: string;
|
||||||
/** documentPublicId สำหรับ audit log (optional) */
|
/** documentPublicId สำหรับ audit log (optional) */
|
||||||
documentPublicId?: string;
|
documentPublicId?: string;
|
||||||
/** Typhoon OCR options จาก sandbox UI เพื่อ override Modelfile defaults (optional) */
|
/** np-dms-ocr options จาก sandbox UI เพื่อ override Modelfile defaults (optional) */
|
||||||
typhoonOptions?: OcrTyphoonOptions;
|
ocrOptions?: OcrNpDmsOptions;
|
||||||
}
|
}
|
||||||
|
|
||||||
// VRAM ที่ Typhoon OCR-3B ต้องการ (MB) — ตาม ADR-032
|
// VRAM ที่ Typhoon OCR-3B ต้องการ (MB) — ตาม ADR-032
|
||||||
@@ -45,9 +46,9 @@ const TYPHOON_OCR_REQUIRED_VRAM_MB = 4000;
|
|||||||
* เพื่อป้องกัน VRAM overflow เมื่อทำ OCR หลายงานพร้อมกันบน RTX 2060 Super
|
* เพื่อป้องกัน VRAM overflow เมื่อทำ OCR หลายงานพร้อมกันบน RTX 2060 Super
|
||||||
* ตาม ADR-032: lockDuration=180000ms รองรับ 120s timeout + buffer
|
* ตาม ADR-032: lockDuration=180000ms รองรับ 120s timeout + buffer
|
||||||
*/
|
*/
|
||||||
@Processor(QUEUE_TYPHOON_OCR, { concurrency: 1, lockDuration: 180000 })
|
@Processor(QUEUE_NP_DMS_OCR, { concurrency: 1, lockDuration: 180000 })
|
||||||
export class TyphoonOcrProcessor extends WorkerHost {
|
export class NpDmsOcrProcessor extends WorkerHost {
|
||||||
private readonly logger = new Logger(TyphoonOcrProcessor.name);
|
private readonly logger = new Logger(NpDmsOcrProcessor.name);
|
||||||
|
|
||||||
constructor(
|
constructor(
|
||||||
@InjectRedis() private readonly redis: Redis,
|
@InjectRedis() private readonly redis: Redis,
|
||||||
@@ -61,13 +62,13 @@ export class TyphoonOcrProcessor extends WorkerHost {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/** ประมวลผล Typhoon OCR job ทีละงาน */
|
/** ประมวลผล Typhoon OCR job ทีละงาน */
|
||||||
async process(job: Job<TyphoonOcrJobData>): Promise<void> {
|
async process(job: Job<NpDmsOcrJobData>): Promise<void> {
|
||||||
const {
|
const {
|
||||||
pdfPath,
|
pdfPath,
|
||||||
engineType,
|
engineType,
|
||||||
idempotencyKey,
|
idempotencyKey,
|
||||||
documentPublicId,
|
documentPublicId,
|
||||||
typhoonOptions,
|
ocrOptions,
|
||||||
} = job.data;
|
} = job.data;
|
||||||
const startTime = Date.now();
|
const startTime = Date.now();
|
||||||
this.logger.log(
|
this.logger.log(
|
||||||
@@ -116,7 +117,7 @@ export class TyphoonOcrProcessor extends WorkerHost {
|
|||||||
const result = await this.sandboxOcrEngineService.detectAndExtract(
|
const result = await this.sandboxOcrEngineService.detectAndExtract(
|
||||||
pdfPath,
|
pdfPath,
|
||||||
engineType,
|
engineType,
|
||||||
typhoonOptions
|
ocrOptions
|
||||||
);
|
);
|
||||||
const processingTimeMs = Date.now() - startTime;
|
const processingTimeMs = Date.now() - startTime;
|
||||||
// บันทึกผลลัพธ์ใน Redis cache (24h TTL)
|
// บันทึกผลลัพธ์ใน Redis cache (24h TTL)
|
||||||
@@ -171,7 +172,7 @@ export class TyphoonOcrProcessor extends WorkerHost {
|
|||||||
}
|
}
|
||||||
): Promise<void> {
|
): Promise<void> {
|
||||||
await this.redis.setex(
|
await this.redis.setex(
|
||||||
`ai:typhoon:ocr:${idempotencyKey}`,
|
`ai:np-dms-ocr:${idempotencyKey}`,
|
||||||
3600,
|
3600,
|
||||||
JSON.stringify({
|
JSON.stringify({
|
||||||
idempotencyKey,
|
idempotencyKey,
|
||||||
@@ -193,8 +194,8 @@ export class TyphoonOcrProcessor extends WorkerHost {
|
|||||||
}): Promise<void> {
|
}): Promise<void> {
|
||||||
const log = this.auditLogRepo.create({
|
const log = this.auditLogRepo.create({
|
||||||
documentPublicId: params.documentPublicId,
|
documentPublicId: params.documentPublicId,
|
||||||
aiModel: 'typhoon-ocr',
|
aiModel: 'np-dms-ocr',
|
||||||
modelName: 'typhoon-np-dms-ocr:latest',
|
modelName: 'np-dms-ocr:latest',
|
||||||
modelType: params.engineType,
|
modelType: params.engineType,
|
||||||
status: params.status,
|
status: params.status,
|
||||||
processingTimeMs: params.processingTimeMs,
|
processingTimeMs: params.processingTimeMs,
|
||||||
@@ -97,7 +97,7 @@ export class AiPolicyService {
|
|||||||
*/
|
*/
|
||||||
getCanonicalModelName(modelName: string): 'np-dms-ai' | 'np-dms-ocr' {
|
getCanonicalModelName(modelName: string): 'np-dms-ai' | 'np-dms-ocr' {
|
||||||
const name = modelName.toLowerCase();
|
const name = modelName.toLowerCase();
|
||||||
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) {
|
if (name.includes('ocr')) {
|
||||||
return 'np-dms-ocr';
|
return 'np-dms-ocr';
|
||||||
}
|
}
|
||||||
return 'np-dms-ai';
|
return 'np-dms-ai';
|
||||||
|
|||||||
@@ -4,13 +4,13 @@
|
|||||||
// - 2026-05-25: แก้ไข AggregateError (empty message) จาก axios โดย wrap เป็น Error พร้อม context ที่ชัดเจน.
|
// - 2026-05-25: แก้ไข AggregateError (empty message) จาก axios โดย wrap เป็น Error พร้อม context ที่ชัดเจน.
|
||||||
// - 2026-05-25: เพิ่ม path remapping (OCR_UPLOAD_BASE_PATH) เพื่อแปลง local upload path เป็น path ที่ sidecar เห็นผ่าน CIFS.
|
// - 2026-05-25: เพิ่ม path remapping (OCR_UPLOAD_BASE_PATH) เพื่อแปลง local upload path เป็น path ที่ sidecar เห็นผ่าน CIFS.
|
||||||
// - 2026-05-29: เพิ่ม checkHealth() เพื่อตรวจสอบสุขภาพของ OCR sidecar สำหรับ getSystemHealth() (ADR-027)
|
// - 2026-05-29: เพิ่ม checkHealth() เพื่อตรวจสอบสุขภาพของ OCR sidecar สำหรับ getSystemHealth() (ADR-027)
|
||||||
// - 2026-05-30: เปลี่ยนจาก PaddleOCR เป็น Tesseract OCR เพื่อความเข้ากันได้กับ CPU เก่า
|
// - 2026-05-30: เปลี่ยนจาก PaddleOCR เป็น fast-path (PyMuPDF text layer) เพื่อความเข้ากันได้กับ CPU เก่า
|
||||||
// - 2026-05-30: เพิ่ม VRAM insufficiency guard สำหรับ Typhoon OCR engine (T016a, ADR-032)
|
// - 2026-05-30: เพิ่ม VRAM insufficiency guard สำหรับ Typhoon OCR engine (T016a, ADR-032)
|
||||||
// - 2026-05-30: ปรับปรุงสำหรับ Dynamic OCR Engine selection, Caching, และ Graceful Fallback (T013, T014, T016, T022, T023, US1)
|
// - 2026-05-30: ปรับปรุงสำหรับ Dynamic OCR Engine selection, Caching, และ Graceful Fallback (T013, T014, T016, T022, T023, US1)
|
||||||
// - 2026-06-01: ปรับปรุง remapPath ให้รองรับ Windows absolute และ relative path ได้แม่นยำ 100%
|
// - 2026-06-01: ปรับปรุง remapPath ให้รองรับ Windows absolute และ relative path ได้แม่นยำ 100%
|
||||||
// - 2026-06-01: เปลี่ยน processWithTesseract/processWithTyphoon ให้ส่ง file content ผ่าน multipart ไปยัง /ocr-upload แทนการส่ง path
|
// - 2026-06-01: เปลี่ยน processWithFastPath/processWithNpDmsOcr ให้ส่ง file content ผ่าน multipart ไปยัง /ocr-upload แทนการส่ง path
|
||||||
// - 2026-06-02: ส่งค่า X-API-Key ใน request headers ไปยัง ocr-sidecar เพื่อความมั่นคงปลอดภัยสูงสุด (ADR-033, Suggestion 2)
|
// - 2026-06-02: ส่งค่า X-API-Key ใน request headers ไปยัง ocr-sidecar เพื่อความมั่นคงปลอดภัยสูงสุด (ADR-033, Suggestion 2)
|
||||||
// - 2026-06-04: ADR-034 — เปลี่ยน TYPHOON_ENGINE.engineName เป็น typhoon-np-dms-ocr:latest ตรงกับชื่อโมเดลใน Ollama
|
// - 2026-06-04: ADR-034 — เปลี่ยน TYPHOON_ENGINE.engineName เป็น np-dms-ocr:latest ตรงกับชื่อโมเดลใน Ollama
|
||||||
// - 2026-06-11: US2 - คำนวณ OCR residency keep_alive แบบ dynamic ตาม VRAM headroom และ active profile
|
// - 2026-06-11: US2 - คำนวณ OCR residency keep_alive แบบ dynamic ตาม VRAM headroom และ active profile
|
||||||
// - 2026-06-13: US5 - เพิ่มการส่ง temperature, topP และ repeatPenalty ไปยัง OCR sidecar ผ่าน multipart form (T070)
|
// - 2026-06-13: US5 - เพิ่มการส่ง temperature, topP และ repeatPenalty ไปยัง OCR sidecar ผ่าน multipart form (T070)
|
||||||
|
|
||||||
@@ -28,6 +28,9 @@ import {
|
|||||||
} from '../entities/ocr-engine-configuration.entity';
|
} from '../entities/ocr-engine-configuration.entity';
|
||||||
import { OcrEngineResponseDto } from '../dto/ocr-engine-response.dto';
|
import { OcrEngineResponseDto } from '../dto/ocr-engine-response.dto';
|
||||||
import { SystemSetting } from '../entities/system-setting.entity';
|
import { SystemSetting } from '../entities/system-setting.entity';
|
||||||
|
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
|
||||||
|
import { AiPromptsService } from '../prompts/ai-prompts.service';
|
||||||
|
import { BusinessException } from '../../../common/exceptions';
|
||||||
import { AiAuditLog, AiAuditStatus } from '../entities/ai-audit-log.entity';
|
import { AiAuditLog, AiAuditStatus } from '../entities/ai-audit-log.entity';
|
||||||
import { OcrCacheService } from './ocr-cache.service';
|
import { OcrCacheService } from './ocr-cache.service';
|
||||||
import { VramMonitorService } from './vram-monitor.service';
|
import { VramMonitorService } from './vram-monitor.service';
|
||||||
@@ -41,7 +44,7 @@ export interface OcrDetectionInput {
|
|||||||
pdfPath?: string;
|
pdfPath?: string;
|
||||||
documentPublicId?: string; // เพิ่มเพื่อการทำ audit logs
|
documentPublicId?: string; // เพิ่มเพื่อการทำ audit logs
|
||||||
activeProfile?: ExecutionProfile;
|
activeProfile?: ExecutionProfile;
|
||||||
typhoonOptions?: {
|
ocrOptions?: {
|
||||||
temperature?: number;
|
temperature?: number;
|
||||||
topP?: number;
|
topP?: number;
|
||||||
repeatPenalty?: number;
|
repeatPenalty?: number;
|
||||||
@@ -68,16 +71,16 @@ const OCR_ACTIVE_ENGINE_KEY = 'OCR_ACTIVE_ENGINE';
|
|||||||
const OCR_ACTIVE_ENGINE_CACHE_KEY = 'system_settings:OCR_ACTIVE_ENGINE';
|
const OCR_ACTIVE_ENGINE_CACHE_KEY = 'system_settings:OCR_ACTIVE_ENGINE';
|
||||||
const OCR_ACTIVE_ENGINE_TTL_SECONDS = 30;
|
const OCR_ACTIVE_ENGINE_TTL_SECONDS = 30;
|
||||||
|
|
||||||
const TESSERACT_ENGINE_ID = '019505a1-7c3e-7000-8000-abc123def001';
|
const FAST_PATH_ENGINE_ID = '019505a1-7c3e-7000-8000-abc123def001';
|
||||||
const TYPHOON_ENGINE_ID = '019505a1-7c3e-7000-8000-abc123def002';
|
const OCR_ENGINE_ID = '019505a1-7c3e-7000-8000-abc123def002';
|
||||||
|
|
||||||
// VRAM ที่ Typhoon OCR-3B ต้องการ (MB)
|
// VRAM ที่ np-dms-ocr ต้องการ (MB)
|
||||||
const TYPHOON_OCR_REQUIRED_VRAM_MB = 4000;
|
const OCR_REQUIRED_VRAM_MB = 4000;
|
||||||
|
|
||||||
const TESSERACT_ENGINE: OcrEngineConfiguration = {
|
const FAST_PATH_ENGINE: OcrEngineConfiguration = {
|
||||||
engineId: TESSERACT_ENGINE_ID,
|
engineId: FAST_PATH_ENGINE_ID,
|
||||||
engineName: 'Tesseract OCR',
|
engineName: 'Fast Path (PyMuPDF)',
|
||||||
engineType: OcrEngineType.TESSERACT,
|
engineType: OcrEngineType.FAST_PATH,
|
||||||
isActive: true,
|
isActive: true,
|
||||||
vramRequirementMB: 0,
|
vramRequirementMB: 0,
|
||||||
processingTimeLimitSeconds: 30,
|
processingTimeLimitSeconds: 30,
|
||||||
@@ -87,25 +90,25 @@ const TESSERACT_ENGINE: OcrEngineConfiguration = {
|
|||||||
updatedAt: new Date('2026-05-30T00:00:00Z'),
|
updatedAt: new Date('2026-05-30T00:00:00Z'),
|
||||||
};
|
};
|
||||||
|
|
||||||
const TYPHOON_ENGINE: OcrEngineConfiguration = {
|
const OCR_ENGINE: OcrEngineConfiguration = {
|
||||||
engineId: TYPHOON_ENGINE_ID,
|
engineId: OCR_ENGINE_ID,
|
||||||
engineName: 'typhoon-np-dms-ocr:latest',
|
engineName: 'np-dms-ocr:latest',
|
||||||
engineType: OcrEngineType.TYPHOON_OCR,
|
engineType: OcrEngineType.NP_DMS_OCR,
|
||||||
isActive: true,
|
isActive: true,
|
||||||
vramRequirementMB: TYPHOON_OCR_REQUIRED_VRAM_MB,
|
vramRequirementMB: OCR_REQUIRED_VRAM_MB,
|
||||||
processingTimeLimitSeconds: 60,
|
processingTimeLimitSeconds: 60,
|
||||||
concurrentLimit: 1,
|
concurrentLimit: 1,
|
||||||
fallbackEngineId: TESSERACT_ENGINE_ID,
|
fallbackEngineId: FAST_PATH_ENGINE_ID,
|
||||||
createdAt: new Date('2026-05-30T00:00:00Z'),
|
createdAt: new Date('2026-05-30T00:00:00Z'),
|
||||||
updatedAt: new Date('2026-05-30T00:00:00Z'),
|
updatedAt: new Date('2026-05-30T00:00:00Z'),
|
||||||
};
|
};
|
||||||
|
|
||||||
const ENGINES_MAP = new Map<string, OcrEngineConfiguration>([
|
const ENGINES_MAP = new Map<string, OcrEngineConfiguration>([
|
||||||
[TESSERACT_ENGINE_ID, TESSERACT_ENGINE],
|
[FAST_PATH_ENGINE_ID, FAST_PATH_ENGINE],
|
||||||
[TYPHOON_ENGINE_ID, TYPHOON_ENGINE],
|
[OCR_ENGINE_ID, OCR_ENGINE],
|
||||||
]);
|
]);
|
||||||
|
|
||||||
/** บริการเลือก fast path หรือ OCR sidecar (Tesseract/Typhoon) พร้อมความสามารถในสลับ Engine และ Caching */
|
/** บริการเลือก fast path หรือ OCR sidecar (np-dms-ocr) พร้อมความสามารถในสลับ Engine และ Caching */
|
||||||
@Injectable()
|
@Injectable()
|
||||||
export class OcrService {
|
export class OcrService {
|
||||||
private readonly logger = new Logger(OcrService.name);
|
private readonly logger = new Logger(OcrService.name);
|
||||||
@@ -121,6 +124,9 @@ export class OcrService {
|
|||||||
private readonly settingRepo: Repository<SystemSetting>,
|
private readonly settingRepo: Repository<SystemSetting>,
|
||||||
@InjectRepository(AiAuditLog)
|
@InjectRepository(AiAuditLog)
|
||||||
private readonly auditLogRepo: Repository<AiAuditLog>,
|
private readonly auditLogRepo: Repository<AiAuditLog>,
|
||||||
|
@InjectRepository(AiExecutionProfile)
|
||||||
|
private readonly profileRepo: Repository<AiExecutionProfile>,
|
||||||
|
private readonly aiPromptsService: AiPromptsService,
|
||||||
private readonly ocrCacheService: OcrCacheService,
|
private readonly ocrCacheService: OcrCacheService,
|
||||||
private readonly vramMonitorService: VramMonitorService,
|
private readonly vramMonitorService: VramMonitorService,
|
||||||
private readonly aiPolicyService: AiPolicyService,
|
private readonly aiPolicyService: AiPolicyService,
|
||||||
@@ -131,10 +137,15 @@ export class OcrService {
|
|||||||
'OCR_API_URL',
|
'OCR_API_URL',
|
||||||
'http://localhost:8765'
|
'http://localhost:8765'
|
||||||
);
|
);
|
||||||
this.ocrSidecarApiKey = this.configService.get<string>(
|
const ocrSidecarApiKey = this.configService.get<string>(
|
||||||
'OCR_SIDECAR_API_KEY',
|
'OCR_SIDECAR_API_KEY'
|
||||||
'lcbp3-dms-ocr-sidecar-secure-token-2026'
|
|
||||||
);
|
);
|
||||||
|
if (!ocrSidecarApiKey) {
|
||||||
|
throw new Error(
|
||||||
|
'OCR_SIDECAR_API_KEY is required — กรุณาตั้งค่า environment variable'
|
||||||
|
);
|
||||||
|
}
|
||||||
|
this.ocrSidecarApiKey = ocrSidecarApiKey;
|
||||||
this.vramHeadroomThresholdMb = this.configService.get<number>(
|
this.vramHeadroomThresholdMb = this.configService.get<number>(
|
||||||
'VRAM_HEADROOM_THRESHOLD_MB',
|
'VRAM_HEADROOM_THRESHOLD_MB',
|
||||||
this.configService.get<number>('AI_VRAM_HEADROOM_THRESHOLD_MB', 3000)
|
this.configService.get<number>('AI_VRAM_HEADROOM_THRESHOLD_MB', 3000)
|
||||||
@@ -272,7 +283,7 @@ export class OcrService {
|
|||||||
where: { settingKey: OCR_ACTIVE_ENGINE_KEY },
|
where: { settingKey: OCR_ACTIVE_ENGINE_KEY },
|
||||||
});
|
});
|
||||||
|
|
||||||
const activeEngine = setting?.settingValue ?? TESSERACT_ENGINE_ID;
|
const activeEngine = setting?.settingValue ?? FAST_PATH_ENGINE_ID;
|
||||||
await this.redis.set(
|
await this.redis.set(
|
||||||
OCR_ACTIVE_ENGINE_CACHE_KEY,
|
OCR_ACTIVE_ENGINE_CACHE_KEY,
|
||||||
activeEngine,
|
activeEngine,
|
||||||
@@ -284,7 +295,7 @@ export class OcrService {
|
|||||||
this.logger.error(
|
this.logger.error(
|
||||||
`Failed to get active OCR engine: ${error instanceof Error ? error.message : String(error)}`
|
`Failed to get active OCR engine: ${error instanceof Error ? error.message : String(error)}`
|
||||||
);
|
);
|
||||||
return TESSERACT_ENGINE_ID;
|
return FAST_PATH_ENGINE_ID;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -330,20 +341,20 @@ export class OcrService {
|
|||||||
|
|
||||||
const activeEngineId = await this.getActiveEngineId();
|
const activeEngineId = await this.getActiveEngineId();
|
||||||
|
|
||||||
if (activeEngineId === TYPHOON_ENGINE_ID) {
|
if (activeEngineId === OCR_ENGINE_ID) {
|
||||||
return this.processWithTyphoon(input);
|
return this.processWithNpDmsOcr(input);
|
||||||
} else {
|
} else {
|
||||||
return this.processWithTesseract(input);
|
return this.processWithFastPath(input);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/** ประมวลผลผ่าน Tesseract OCR โดยส่ง file content ผ่าน multipart */
|
/** ประมวลผลผ่าน Fast Path (PyMuPDF text layer) โดยส่ง file content ผ่าน multipart */
|
||||||
private async processWithTesseract(
|
private async processWithFastPath(
|
||||||
input: OcrDetectionInput
|
input: OcrDetectionInput
|
||||||
): Promise<OcrDetectionResult> {
|
): Promise<OcrDetectionResult> {
|
||||||
const startTime = Date.now();
|
const startTime = Date.now();
|
||||||
try {
|
try {
|
||||||
this.logger.debug(`Tesseract OCR processing: ${input.pdfPath}`);
|
this.logger.debug(`Fast Path processing: ${input.pdfPath}`);
|
||||||
const fileBuffer = fs.readFileSync(input.pdfPath!);
|
const fileBuffer = fs.readFileSync(input.pdfPath!);
|
||||||
const form = new FormData();
|
const form = new FormData();
|
||||||
form.append(
|
form.append(
|
||||||
@@ -364,9 +375,9 @@ export class OcrService {
|
|||||||
const durationMs = Date.now() - startTime;
|
const durationMs = Date.now() - startTime;
|
||||||
await this.writeAuditLog({
|
await this.writeAuditLog({
|
||||||
documentPublicId: input.documentPublicId,
|
documentPublicId: input.documentPublicId,
|
||||||
aiModel: 'tesseract',
|
aiModel: 'fast-path',
|
||||||
modelName: 'tesseract-ocr',
|
modelName: 'pymupdf',
|
||||||
modelType: 'tesseract',
|
modelType: 'fast-path',
|
||||||
status: AiAuditStatus.SUCCESS,
|
status: AiAuditStatus.SUCCESS,
|
||||||
processingTimeMs: durationMs,
|
processingTimeMs: durationMs,
|
||||||
cacheHit: false,
|
cacheHit: false,
|
||||||
@@ -384,36 +395,70 @@ export class OcrService {
|
|||||||
: String(err);
|
: String(err);
|
||||||
await this.writeAuditLog({
|
await this.writeAuditLog({
|
||||||
documentPublicId: input.documentPublicId,
|
documentPublicId: input.documentPublicId,
|
||||||
aiModel: 'tesseract',
|
aiModel: 'fast-path',
|
||||||
modelName: 'tesseract-ocr',
|
modelName: 'pymupdf',
|
||||||
modelType: 'tesseract',
|
modelType: 'fast-path',
|
||||||
status: AiAuditStatus.FAILED,
|
status: AiAuditStatus.FAILED,
|
||||||
processingTimeMs: durationMs,
|
processingTimeMs: durationMs,
|
||||||
errorMessage: cause,
|
errorMessage: cause,
|
||||||
cacheHit: false,
|
cacheHit: false,
|
||||||
});
|
});
|
||||||
throw new Error(`Tesseract OCR Sidecar failed: ${cause}`);
|
throw new Error(`Fast Path OCR Sidecar failed: ${cause}`);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/** ประมวลผลผ่าน Typhoon OCR */
|
/** ประมวลผลผ่าน np-dms-ocr (Ollama) */
|
||||||
private async processWithTyphoon(
|
private async processWithNpDmsOcr(
|
||||||
input: OcrDetectionInput
|
input: OcrDetectionInput
|
||||||
): Promise<OcrDetectionResult> {
|
): Promise<OcrDetectionResult> {
|
||||||
const startTime = Date.now();
|
const startTime = Date.now();
|
||||||
try {
|
try {
|
||||||
const hasCapacity = await this.vramMonitorService.hasVramCapacity(
|
const hasCapacity =
|
||||||
TYPHOON_OCR_REQUIRED_VRAM_MB
|
await this.vramMonitorService.hasVramCapacity(OCR_REQUIRED_VRAM_MB);
|
||||||
);
|
|
||||||
if (!hasCapacity) {
|
if (!hasCapacity) {
|
||||||
this.logger.warn(
|
this.logger.warn(
|
||||||
`VRAM insufficient for Typhoon OCR. Falling back to Tesseract baseline.`
|
`VRAM insufficient for np-dms-ocr. Falling back to fast-path.`
|
||||||
);
|
);
|
||||||
return this.processWithTesseract(input);
|
return this.processWithFastPath(input);
|
||||||
}
|
}
|
||||||
const residency = await this.calculateOcrResidency(input.activeProfile);
|
await this.calculateOcrResidency(input.activeProfile);
|
||||||
const keepAlive = residency.keepAliveSeconds;
|
|
||||||
this.logger.debug(`Typhoon OCR processing: ${input.pdfPath}`);
|
// Resolve runtime parameters from DB (ocr-extract profile)
|
||||||
|
const profile = await this.profileRepo.findOne({
|
||||||
|
where: { profileName: 'ocr-extract' },
|
||||||
|
});
|
||||||
|
const runtimeParams = {
|
||||||
|
temperature: profile ? Number(profile.temperature) : 0.1,
|
||||||
|
top_p: profile ? Number(profile.topP) : 0.5,
|
||||||
|
repeat_penalty: profile ? Number(profile.repeatPenalty) : 1.0,
|
||||||
|
max_tokens: profile?.maxTokens ?? 16000,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Override with input ocrOptions if provided
|
||||||
|
if (input.ocrOptions?.temperature !== undefined) {
|
||||||
|
runtimeParams.temperature = input.ocrOptions.temperature;
|
||||||
|
}
|
||||||
|
if (input.ocrOptions?.topP !== undefined) {
|
||||||
|
runtimeParams.top_p = input.ocrOptions.topP;
|
||||||
|
}
|
||||||
|
if (input.ocrOptions?.repeatPenalty !== undefined) {
|
||||||
|
runtimeParams.repeat_penalty = input.ocrOptions.repeatPenalty;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Resolve Active Prompt from DB (ocr_extraction)
|
||||||
|
const activePrompt =
|
||||||
|
await this.aiPromptsService.getActive('ocr_extraction');
|
||||||
|
if (!activePrompt) {
|
||||||
|
throw new BusinessException(
|
||||||
|
'NO_ACTIVE_PROMPT',
|
||||||
|
'No active ocr_extraction prompt found',
|
||||||
|
'ไม่พบ Prompt OCR สำหรับดึงข้อมูลที่เปิดใช้งาน'
|
||||||
|
);
|
||||||
|
}
|
||||||
|
const systemPrompt = activePrompt.template;
|
||||||
|
const dmsTags = activePrompt.contextConfig?.dmsTags;
|
||||||
|
|
||||||
|
this.logger.debug(`np-dms-ocr processing: ${input.pdfPath}`);
|
||||||
const fileBuffer = fs.readFileSync(input.pdfPath!);
|
const fileBuffer = fs.readFileSync(input.pdfPath!);
|
||||||
const form = new FormData();
|
const form = new FormData();
|
||||||
form.append(
|
form.append(
|
||||||
@@ -421,20 +466,18 @@ export class OcrService {
|
|||||||
new Blob([fileBuffer], { type: 'application/pdf' }),
|
new Blob([fileBuffer], { type: 'application/pdf' }),
|
||||||
'upload.pdf'
|
'upload.pdf'
|
||||||
);
|
);
|
||||||
form.append('engine', 'typhoon-np-dms-ocr');
|
form.append('engine', 'np-dms-ocr');
|
||||||
form.append('keep_alive', String(keepAlive));
|
form.append('systemPrompt', systemPrompt);
|
||||||
if (input.typhoonOptions?.temperature !== undefined) {
|
if (dmsTags) {
|
||||||
form.append('temperature', String(input.typhoonOptions.temperature));
|
form.append('dmsTags', JSON.stringify(dmsTags));
|
||||||
}
|
|
||||||
if (input.typhoonOptions?.topP !== undefined) {
|
|
||||||
form.append('topP', String(input.typhoonOptions.topP));
|
|
||||||
}
|
|
||||||
if (input.typhoonOptions?.repeatPenalty !== undefined) {
|
|
||||||
form.append(
|
|
||||||
'repeatPenalty',
|
|
||||||
String(input.typhoonOptions.repeatPenalty)
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
form.append('runtimeParams', JSON.stringify(runtimeParams));
|
||||||
|
|
||||||
|
// Append individual overrides for backward compatibility
|
||||||
|
form.append('temperature', String(runtimeParams.temperature));
|
||||||
|
form.append('topP', String(runtimeParams.top_p));
|
||||||
|
form.append('repeatPenalty', String(runtimeParams.repeat_penalty));
|
||||||
|
|
||||||
const response = await axios.post<OcrSidecarResponse>(
|
const response = await axios.post<OcrSidecarResponse>(
|
||||||
`${this.ocrApiUrl}/ocr-upload`,
|
`${this.ocrApiUrl}/ocr-upload`,
|
||||||
form,
|
form,
|
||||||
@@ -447,9 +490,9 @@ export class OcrService {
|
|||||||
const durationMs = Date.now() - startTime;
|
const durationMs = Date.now() - startTime;
|
||||||
await this.writeAuditLog({
|
await this.writeAuditLog({
|
||||||
documentPublicId: input.documentPublicId,
|
documentPublicId: input.documentPublicId,
|
||||||
aiModel: 'typhoon-ocr',
|
aiModel: 'np-dms-ocr',
|
||||||
modelName: 'typhoon-np-dms-ocr:latest',
|
modelName: 'np-dms-ocr:latest',
|
||||||
modelType: 'typhoon-ocr',
|
modelType: 'np-dms-ocr',
|
||||||
status: AiAuditStatus.SUCCESS,
|
status: AiAuditStatus.SUCCESS,
|
||||||
processingTimeMs: durationMs,
|
processingTimeMs: durationMs,
|
||||||
cacheHit: false,
|
cacheHit: false,
|
||||||
@@ -460,9 +503,9 @@ export class OcrService {
|
|||||||
};
|
};
|
||||||
} catch (err: unknown) {
|
} catch (err: unknown) {
|
||||||
this.logger.warn(
|
this.logger.warn(
|
||||||
`Typhoon OCR failed, trying fallback baseline (Tesseract): ${err instanceof Error ? err.message : String(err)}`
|
`np-dms-ocr failed, trying fallback to fast-path: ${err instanceof Error ? err.message : String(err)}`
|
||||||
);
|
);
|
||||||
return this.processWithTesseract(input);
|
return this.processWithFastPath(input);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -1,14 +1,17 @@
|
|||||||
// File: src/modules/ai/services/sandbox-ocr-engine.service.spec.ts
|
// File: src/modules/ai/services/sandbox-ocr-engine.service.spec.ts
|
||||||
// Change Log:
|
// Change Log:
|
||||||
// - 2026-06-14: สร้าง unit tests สำหรับ SandboxOcrEngineService ครอบคลุม detectAndExtract ทุก engine
|
// - 2026-06-14: สร้าง unit tests สำหรับ SandboxOcrEngineService ครอบคลุม detectAndExtract ทุก engine
|
||||||
|
// - 2026-06-20: เพิ่ม mock getRepositoryToken(AiExecutionProfile) สำหรับทดสอบ parameter governance
|
||||||
|
|
||||||
import { Test, TestingModule } from '@nestjs/testing';
|
import { Test, TestingModule } from '@nestjs/testing';
|
||||||
import { ConfigService } from '@nestjs/config';
|
import { ConfigService } from '@nestjs/config';
|
||||||
|
import { getRepositoryToken } from '@nestjs/typeorm';
|
||||||
import axios from 'axios';
|
import axios from 'axios';
|
||||||
import * as fs from 'fs';
|
import * as fs from 'fs';
|
||||||
import { SandboxOcrEngineService } from './sandbox-ocr-engine.service';
|
import { SandboxOcrEngineService } from './sandbox-ocr-engine.service';
|
||||||
import { OcrService } from './ocr.service';
|
import { OcrService } from './ocr.service';
|
||||||
import { AiPromptsService } from '../prompts/ai-prompts.service';
|
import { AiPromptsService } from '../prompts/ai-prompts.service';
|
||||||
|
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
|
||||||
|
|
||||||
jest.mock('axios');
|
jest.mock('axios');
|
||||||
jest.mock('fs');
|
jest.mock('fs');
|
||||||
@@ -16,14 +19,31 @@ jest.mock('fs');
|
|||||||
const mockedAxios = axios as jest.Mocked<typeof axios>;
|
const mockedAxios = axios as jest.Mocked<typeof axios>;
|
||||||
const mockedFs = fs as jest.Mocked<typeof fs>;
|
const mockedFs = fs as jest.Mocked<typeof fs>;
|
||||||
|
|
||||||
/** OcrService mock สำหรับ tesseract/fast-path */
|
/** OcrService mock สำหรับ fast-path */
|
||||||
const mockOcrService = {
|
const mockOcrService = {
|
||||||
detectAndExtract: jest.fn(),
|
detectAndExtract: jest.fn(),
|
||||||
};
|
};
|
||||||
|
|
||||||
/** AiPromptsService mock สำหรับ ocr_system prompt */
|
/** AiPromptsService mock สำหรับ ocr_system prompt */
|
||||||
const mockAiPromptsService = {
|
const mockAiPromptsService = {
|
||||||
getActive: jest.fn(),
|
getActive: jest.fn().mockResolvedValue({
|
||||||
|
template: 'mock active system prompt',
|
||||||
|
contextConfig: {
|
||||||
|
dmsTags: ['tag1', 'tag2'],
|
||||||
|
},
|
||||||
|
}),
|
||||||
|
};
|
||||||
|
|
||||||
|
/** AiExecutionProfile mock repository */
|
||||||
|
const mockProfile = {
|
||||||
|
profileName: 'ocr-extract',
|
||||||
|
temperature: 0.1,
|
||||||
|
topP: 0.5,
|
||||||
|
repeatPenalty: 1.0,
|
||||||
|
maxTokens: 16000,
|
||||||
|
};
|
||||||
|
const mockProfileRepository = {
|
||||||
|
findOne: jest.fn().mockResolvedValue(mockProfile),
|
||||||
};
|
};
|
||||||
|
|
||||||
/** ConfigService mock */
|
/** ConfigService mock */
|
||||||
@@ -48,6 +68,10 @@ describe('SandboxOcrEngineService', () => {
|
|||||||
{ provide: ConfigService, useValue: mockConfigService },
|
{ provide: ConfigService, useValue: mockConfigService },
|
||||||
{ provide: OcrService, useValue: mockOcrService },
|
{ provide: OcrService, useValue: mockOcrService },
|
||||||
{ provide: AiPromptsService, useValue: mockAiPromptsService },
|
{ provide: AiPromptsService, useValue: mockAiPromptsService },
|
||||||
|
{
|
||||||
|
provide: getRepositoryToken(AiExecutionProfile),
|
||||||
|
useValue: mockProfileRepository,
|
||||||
|
},
|
||||||
],
|
],
|
||||||
}).compile();
|
}).compile();
|
||||||
service = module.get<SandboxOcrEngineService>(SandboxOcrEngineService);
|
service = module.get<SandboxOcrEngineService>(SandboxOcrEngineService);
|
||||||
@@ -65,7 +89,7 @@ describe('SandboxOcrEngineService', () => {
|
|||||||
});
|
});
|
||||||
const result = await service.detectAndExtract('/tmp/file.pdf', 'auto');
|
const result = await service.detectAndExtract('/tmp/file.pdf', 'auto');
|
||||||
expect(result.text).toBe('auto extracted text');
|
expect(result.text).toBe('auto extracted text');
|
||||||
expect(result.engineUsed).toBe('tesseract');
|
expect(result.engineUsed).toBe('fast-path');
|
||||||
expect(result.fallbackUsed).toBe(false);
|
expect(result.fallbackUsed).toBe(false);
|
||||||
expect(mockOcrService.detectAndExtract).toHaveBeenCalledWith({
|
expect(mockOcrService.detectAndExtract).toHaveBeenCalledWith({
|
||||||
pdfPath: '/tmp/file.pdf',
|
pdfPath: '/tmp/file.pdf',
|
||||||
@@ -83,42 +107,6 @@ describe('SandboxOcrEngineService', () => {
|
|||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
describe('detectAndExtract() — engine=tesseract', () => {
|
|
||||||
it('ควร route ไปยัง OcrService เมื่อ engine=tesseract', async () => {
|
|
||||||
mockOcrService.detectAndExtract.mockResolvedValueOnce({
|
|
||||||
text: 'tesseract text',
|
|
||||||
ocrUsed: true,
|
|
||||||
});
|
|
||||||
const result = await service.detectAndExtract(
|
|
||||||
'/tmp/file.pdf',
|
|
||||||
'tesseract'
|
|
||||||
);
|
|
||||||
expect(result.engineUsed).toBe('tesseract');
|
|
||||||
expect(result.fallbackUsed).toBe(false);
|
|
||||||
});
|
|
||||||
});
|
|
||||||
|
|
||||||
describe('detectAndExtract() — engine=typhoon-np-dms-ocr (legacy alias)', () => {
|
|
||||||
it('ควรแปลง typhoon-np-dms-ocr เป็น np-dms-ocr และส่งไปยัง sidecar', async () => {
|
|
||||||
const mockBuffer = Buffer.from('pdf content');
|
|
||||||
(mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer);
|
|
||||||
mockedAxios.post = jest.fn().mockResolvedValueOnce({
|
|
||||||
data: {
|
|
||||||
text: 'ocr text via alias',
|
|
||||||
ocrUsed: true,
|
|
||||||
engineUsed: 'np-dms-ocr',
|
|
||||||
},
|
|
||||||
});
|
|
||||||
const result = await service.detectAndExtract(
|
|
||||||
'/tmp/file.pdf',
|
|
||||||
'typhoon-np-dms-ocr'
|
|
||||||
);
|
|
||||||
expect(result.text).toBe('ocr text via alias');
|
|
||||||
expect(result.engineUsed).toBe('np-dms-ocr');
|
|
||||||
expect(result.fallbackUsed).toBe(false);
|
|
||||||
});
|
|
||||||
});
|
|
||||||
|
|
||||||
describe('detectAndExtract() — engine=np-dms-ocr (sidecar path)', () => {
|
describe('detectAndExtract() — engine=np-dms-ocr (sidecar path)', () => {
|
||||||
it('ควรส่ง file ไปยัง sidecar /ocr-upload สำเร็จ', async () => {
|
it('ควรส่ง file ไปยัง sidecar /ocr-upload สำเร็จ', async () => {
|
||||||
const mockBuffer = Buffer.from('pdf binary data');
|
const mockBuffer = Buffer.from('pdf binary data');
|
||||||
@@ -149,7 +137,7 @@ describe('SandboxOcrEngineService', () => {
|
|||||||
);
|
);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('ควรส่ง typhoonOptions (temperature, topP, repeatPenalty) ไปใน form data', async () => {
|
it('ควรส่ง ocrOptions (temperature, topP, repeatPenalty) ไปใน form data', async () => {
|
||||||
const mockBuffer = Buffer.from('pdf data');
|
const mockBuffer = Buffer.from('pdf data');
|
||||||
(mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer);
|
(mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer);
|
||||||
mockedAxios.post = jest.fn().mockResolvedValueOnce({
|
mockedAxios.post = jest.fn().mockResolvedValueOnce({
|
||||||
@@ -178,13 +166,13 @@ describe('SandboxOcrEngineService', () => {
|
|||||||
expect(result.engineUsed).toBe('np-dms-ocr'); // resolvedEngineType fallback
|
expect(result.engineUsed).toBe('np-dms-ocr'); // resolvedEngineType fallback
|
||||||
});
|
});
|
||||||
|
|
||||||
it('ควร fallback ไปยัง Tesseract เมื่อ fs.readFileSync ล้มเหลว (outer catch fallback)', async () => {
|
it('ควร fallback ไปยัง fast-path เมื่อ fs.readFileSync ล้มเหลว (outer catch fallback)', async () => {
|
||||||
(mockedFs.readFileSync as jest.Mock).mockImplementationOnce(() => {
|
(mockedFs.readFileSync as jest.Mock).mockImplementationOnce(() => {
|
||||||
throw new Error('ENOENT: file not found');
|
throw new Error('ENOENT: file not found');
|
||||||
});
|
});
|
||||||
// service จะ catch error และ fallback ไปยัง Tesseract
|
// service จะ catch error และ fallback ไปยัง fast-path
|
||||||
mockOcrService.detectAndExtract.mockResolvedValueOnce({
|
mockOcrService.detectAndExtract.mockResolvedValueOnce({
|
||||||
text: 'tesseract fallback text',
|
text: 'fast-path fallback text',
|
||||||
ocrUsed: true,
|
ocrUsed: true,
|
||||||
});
|
});
|
||||||
const result = await service.detectAndExtract(
|
const result = await service.detectAndExtract(
|
||||||
@@ -192,10 +180,10 @@ describe('SandboxOcrEngineService', () => {
|
|||||||
'np-dms-ocr'
|
'np-dms-ocr'
|
||||||
);
|
);
|
||||||
expect(result.fallbackUsed).toBe(true);
|
expect(result.fallbackUsed).toBe(true);
|
||||||
expect(result.engineUsed).toBe('tesseract');
|
expect(result.engineUsed).toBe('fast-path');
|
||||||
});
|
});
|
||||||
|
|
||||||
it('ควร fallback ไปยัง Tesseract เมื่อ sidecar HTTP error เกิดขึ้น', async () => {
|
it('ควร fallback ไปยัง fast-path เมื่อ sidecar HTTP error เกิดขึ้น', async () => {
|
||||||
const mockBuffer = Buffer.from('pdf data');
|
const mockBuffer = Buffer.from('pdf data');
|
||||||
(mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer);
|
(mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer);
|
||||||
mockedAxios.post = jest.fn().mockRejectedValueOnce(
|
mockedAxios.post = jest.fn().mockRejectedValueOnce(
|
||||||
@@ -204,16 +192,16 @@ describe('SandboxOcrEngineService', () => {
|
|||||||
})
|
})
|
||||||
);
|
);
|
||||||
mockOcrService.detectAndExtract.mockResolvedValueOnce({
|
mockOcrService.detectAndExtract.mockResolvedValueOnce({
|
||||||
text: 'tesseract fallback result',
|
text: 'fast-path fallback result',
|
||||||
ocrUsed: true,
|
ocrUsed: true,
|
||||||
});
|
});
|
||||||
const result = await service.detectAndExtract(
|
const result = await service.detectAndExtract(
|
||||||
'/tmp/doc.pdf',
|
'/tmp/doc.pdf',
|
||||||
'np-dms-ocr'
|
'np-dms-ocr'
|
||||||
);
|
);
|
||||||
expect(result.text).toBe('tesseract fallback result');
|
expect(result.text).toBe('fast-path fallback result');
|
||||||
expect(result.fallbackUsed).toBe(true);
|
expect(result.fallbackUsed).toBe(true);
|
||||||
expect(result.engineUsed).toBe('tesseract');
|
expect(result.engineUsed).toBe('fast-path');
|
||||||
});
|
});
|
||||||
|
|
||||||
it('ควร fallback ไปยัง fast-path เมื่อ sidecar error และ OcrService ส่ง ocrUsed=false', async () => {
|
it('ควร fallback ไปยัง fast-path เมื่อ sidecar error และ OcrService ส่ง ocrUsed=false', async () => {
|
||||||
|
|||||||
@@ -3,26 +3,26 @@
|
|||||||
// - 2026-05-30: แยก SandboxOcrEngineService ออกจาก OcrService เพื่อรองรับการเลือก Typhoon OCR เฉพาะ sandbox โดยไม่กระทบ core OCR flow
|
// - 2026-05-30: แยก SandboxOcrEngineService ออกจาก OcrService เพื่อรองรับการเลือก Typhoon OCR เฉพาะ sandbox โดยไม่กระทบ core OCR flow
|
||||||
// - 2026-06-01: เปลี่ยนจาก remapPath + pdfPath ไปเป็น multipart file upload ไปยัง /ocr-upload (แก้ปัญหา Docker WSL2 mount)
|
// - 2026-06-01: เปลี่ยนจาก remapPath + pdfPath ไปเป็น multipart file upload ไปยัง /ocr-upload (แก้ปัญหา Docker WSL2 mount)
|
||||||
// - 2026-06-02: ส่งค่า X-API-Key ใน request headers ไปยัง ocr-sidecar เพื่อความมั่นคงปลอดภัยสูงสุด (ADR-033, Suggestion 2)
|
// - 2026-06-02: ส่งค่า X-API-Key ใน request headers ไปยัง ocr-sidecar เพื่อความมั่นคงปลอดภัยสูงสุด (ADR-033, Suggestion 2)
|
||||||
// - 2026-06-04: ADR-034 — เพิ่ม 'typhoon-np-dms-ocr' เป็น canonical SandboxOcrEngineType; legacy aliases ยังรองรับ
|
// - 2026-06-04: ADR-034 — เพิ่ม 'np-dms-ocr' เป็น canonical SandboxOcrEngineType
|
||||||
// - 2026-06-04: เพิ่ม OcrTyphoonOptions interface; รับ temperature/topP/repeatPenalty จาก frontend sandbox เพื่อ override Modelfile defaults
|
// - 2026-06-04: เพิ่ม OcrNpDmsOptions interface; รับ temperature/topP/repeatPenalty จาก frontend sandbox เพื่อ override Modelfile defaults
|
||||||
// - 2026-06-13: ADR-036 — เปลี่ยน canonical SandboxOcrEngineType เป็น np-dms-ocr และคง legacy alias
|
// - 2026-06-13: ADR-036 — เปลี่ยน canonical SandboxOcrEngineType เป็น np-dms-ocr
|
||||||
// - 2026-06-17: เพิ่ม AiPromptsService injection และส่ง systemPrompt form field จาก active ocr_system prompt (T028)
|
// - 2026-06-17: เพิ่ม AiPromptsService injection และส่ง systemPrompt form field จาก active ocr_system prompt (T028)
|
||||||
|
|
||||||
import { Injectable, Logger } from '@nestjs/common';
|
import { Injectable, Logger } from '@nestjs/common';
|
||||||
import { ConfigService } from '@nestjs/config';
|
import { ConfigService } from '@nestjs/config';
|
||||||
|
import { InjectRepository } from '@nestjs/typeorm';
|
||||||
|
import { Repository } from 'typeorm';
|
||||||
import axios from 'axios';
|
import axios from 'axios';
|
||||||
import * as fs from 'fs';
|
import * as fs from 'fs';
|
||||||
import { OcrService } from './ocr.service';
|
import { OcrService } from './ocr.service';
|
||||||
import { AiPromptsService } from '../prompts/ai-prompts.service';
|
import { AiPromptsService } from '../prompts/ai-prompts.service';
|
||||||
|
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
|
||||||
|
import { BusinessException } from '../../../common/exceptions';
|
||||||
|
|
||||||
export type SandboxOcrEngineType =
|
export type SandboxOcrEngineType = 'auto' | 'np-dms-ocr';
|
||||||
| 'auto'
|
|
||||||
| 'tesseract'
|
|
||||||
| 'np-dms-ocr'
|
|
||||||
| 'typhoon-np-dms-ocr';
|
|
||||||
|
|
||||||
/** ค่า parameter สำหรับ Typhoon OCR ที่ override Modelfile defaults ได้จาก sandbox UI */
|
/** ค่า parameter สำหรับ np-dms-ocr ที่ override Modelfile defaults ได้จาก sandbox UI */
|
||||||
export interface OcrTyphoonOptions {
|
export interface OcrNpDmsOptions {
|
||||||
temperature?: number;
|
temperature?: number;
|
||||||
topP?: number;
|
topP?: number;
|
||||||
repeatPenalty?: number;
|
repeatPenalty?: number;
|
||||||
@@ -50,7 +50,9 @@ export class SandboxOcrEngineService {
|
|||||||
constructor(
|
constructor(
|
||||||
private readonly configService: ConfigService,
|
private readonly configService: ConfigService,
|
||||||
private readonly ocrService: OcrService,
|
private readonly ocrService: OcrService,
|
||||||
private readonly aiPromptsService: AiPromptsService
|
private readonly aiPromptsService: AiPromptsService,
|
||||||
|
@InjectRepository(AiExecutionProfile)
|
||||||
|
private readonly profileRepo: Repository<AiExecutionProfile>
|
||||||
) {
|
) {
|
||||||
this.ocrApiUrl = this.configService.get<string>(
|
this.ocrApiUrl = this.configService.get<string>(
|
||||||
'OCR_API_URL',
|
'OCR_API_URL',
|
||||||
@@ -62,26 +64,23 @@ export class SandboxOcrEngineService {
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** รัน OCR ตาม engine ที่เลือก โดย fallback กลับไป Tesseract baseline เมื่อ Typhoon ล้มเหลว */
|
/** รัน OCR ตาม engine ที่เลือก โดย fallback กลับไป fast-path เมื่อ np-dms-ocr ล้มเหลว */
|
||||||
async detectAndExtract(
|
async detectAndExtract(
|
||||||
pdfPath: string,
|
pdfPath: string,
|
||||||
engineType: SandboxOcrEngineType = 'auto',
|
engineType: SandboxOcrEngineType = 'auto',
|
||||||
typhoonOptions?: OcrTyphoonOptions
|
ocrOptions?: OcrNpDmsOptions
|
||||||
): Promise<SandboxOcrResult> {
|
): Promise<SandboxOcrResult> {
|
||||||
const resolvedEngineType =
|
const resolvedEngineType = engineType;
|
||||||
engineType === 'typhoon-np-dms-ocr' ? 'np-dms-ocr' : engineType;
|
|
||||||
this.logger.log(
|
this.logger.log(
|
||||||
`detectAndExtract called — engine="${resolvedEngineType}" pdfPath="${pdfPath}" typhoonOptions=${JSON.stringify(typhoonOptions ?? null)}`
|
`detectAndExtract called — engine="${resolvedEngineType}" pdfPath="${pdfPath}" ocrOptions=${JSON.stringify(ocrOptions ?? null)}`
|
||||||
);
|
);
|
||||||
if (resolvedEngineType === 'auto' || resolvedEngineType === 'tesseract') {
|
if (resolvedEngineType === 'auto') {
|
||||||
this.logger.log(
|
this.logger.log(`engine="${resolvedEngineType}" → routing to fast-path`);
|
||||||
`engine="${resolvedEngineType}" → routing to Tesseract/fast-path`
|
|
||||||
);
|
|
||||||
const result = await this.ocrService.detectAndExtract({ pdfPath });
|
const result = await this.ocrService.detectAndExtract({ pdfPath });
|
||||||
return {
|
return {
|
||||||
text: result.text,
|
text: result.text,
|
||||||
ocrUsed: result.ocrUsed,
|
ocrUsed: result.ocrUsed,
|
||||||
engineUsed: result.ocrUsed ? 'tesseract' : 'fast-path',
|
engineUsed: result.ocrUsed ? 'fast-path' : 'fast-path',
|
||||||
fallbackUsed: false,
|
fallbackUsed: false,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
@@ -103,6 +102,42 @@ export class SandboxOcrEngineService {
|
|||||||
);
|
);
|
||||||
throw fsErr;
|
throw fsErr;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Resolve runtime parameters from DB (ocr-extract profile)
|
||||||
|
const profile = await this.profileRepo.findOne({
|
||||||
|
where: { profileName: 'ocr-extract' },
|
||||||
|
});
|
||||||
|
const runtimeParams = {
|
||||||
|
temperature: profile ? Number(profile.temperature) : 0.1,
|
||||||
|
top_p: profile ? Number(profile.topP) : 0.5,
|
||||||
|
repeat_penalty: profile ? Number(profile.repeatPenalty) : 1.0,
|
||||||
|
max_tokens: profile?.maxTokens ?? 16000,
|
||||||
|
};
|
||||||
|
|
||||||
|
// Override with sandbox options if provided
|
||||||
|
if (ocrOptions?.temperature !== undefined) {
|
||||||
|
runtimeParams.temperature = ocrOptions.temperature;
|
||||||
|
}
|
||||||
|
if (ocrOptions?.topP !== undefined) {
|
||||||
|
runtimeParams.top_p = ocrOptions.topP;
|
||||||
|
}
|
||||||
|
if (ocrOptions?.repeatPenalty !== undefined) {
|
||||||
|
runtimeParams.repeat_penalty = ocrOptions.repeatPenalty;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Resolve Active Prompt from DB (ocr_extraction)
|
||||||
|
const activePrompt =
|
||||||
|
await this.aiPromptsService.getActive('ocr_extraction');
|
||||||
|
if (!activePrompt) {
|
||||||
|
throw new BusinessException(
|
||||||
|
'NO_ACTIVE_PROMPT',
|
||||||
|
'No active ocr_extraction prompt found',
|
||||||
|
'ไม่พบ Prompt OCR สำหรับดึงข้อมูลที่เปิดใช้งาน'
|
||||||
|
);
|
||||||
|
}
|
||||||
|
const systemPrompt = activePrompt.template;
|
||||||
|
const dmsTags = activePrompt.contextConfig?.dmsTags;
|
||||||
|
|
||||||
const form = new FormData();
|
const form = new FormData();
|
||||||
form.append(
|
form.append(
|
||||||
'file',
|
'file',
|
||||||
@@ -110,32 +145,19 @@ export class SandboxOcrEngineService {
|
|||||||
'upload.pdf'
|
'upload.pdf'
|
||||||
);
|
);
|
||||||
form.append('engine', resolvedEngineType);
|
form.append('engine', resolvedEngineType);
|
||||||
if (typhoonOptions?.temperature !== undefined) {
|
form.append('systemPrompt', systemPrompt);
|
||||||
form.append('temperature', String(typhoonOptions.temperature));
|
if (dmsTags) {
|
||||||
}
|
form.append('dmsTags', JSON.stringify(dmsTags));
|
||||||
if (typhoonOptions?.topP !== undefined) {
|
|
||||||
form.append('topP', String(typhoonOptions.topP));
|
|
||||||
}
|
|
||||||
if (typhoonOptions?.repeatPenalty !== undefined) {
|
|
||||||
form.append('repeatPenalty', String(typhoonOptions.repeatPenalty));
|
|
||||||
}
|
|
||||||
// ดึง active ocr_system prompt และส่งไป sidecar
|
|
||||||
try {
|
|
||||||
const activeOcrSystemPrompt =
|
|
||||||
await this.aiPromptsService.getActive('ocr_system');
|
|
||||||
if (activeOcrSystemPrompt && activeOcrSystemPrompt.template) {
|
|
||||||
form.append('systemPrompt', activeOcrSystemPrompt.template);
|
|
||||||
this.logger.log(
|
|
||||||
`Injected active ocr_system prompt (version ${activeOcrSystemPrompt.versionNumber})`
|
|
||||||
);
|
|
||||||
}
|
|
||||||
} catch (promptErr: unknown) {
|
|
||||||
this.logger.warn(
|
|
||||||
`Failed to retrieve active ocr_system prompt, proceeding without: ${promptErr instanceof Error ? promptErr.message : String(promptErr)}`
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
form.append('runtimeParams', JSON.stringify(runtimeParams));
|
||||||
|
|
||||||
|
// Append individual overrides for backward compatibility
|
||||||
|
form.append('temperature', String(runtimeParams.temperature));
|
||||||
|
form.append('topP', String(runtimeParams.top_p));
|
||||||
|
form.append('repeatPenalty', String(runtimeParams.repeat_penalty));
|
||||||
|
|
||||||
this.logger.log(
|
this.logger.log(
|
||||||
`Sending to sidecar — engine=${engineType} options=${JSON.stringify(typhoonOptions ?? {})}`
|
`Sending to sidecar — engine=${engineType} options=${JSON.stringify(ocrOptions ?? {})}`
|
||||||
);
|
);
|
||||||
const response = await axios.post<SandboxOcrSidecarResponse>(
|
const response = await axios.post<SandboxOcrSidecarResponse>(
|
||||||
`${this.ocrApiUrl}/ocr-upload`,
|
`${this.ocrApiUrl}/ocr-upload`,
|
||||||
@@ -183,9 +205,9 @@ export class SandboxOcrEngineService {
|
|||||||
? `HTTP ${axiosStatus} — ${cause} — sidecar detail: ${axiosDetail}`
|
? `HTTP ${axiosStatus} — ${cause} — sidecar detail: ${axiosDetail}`
|
||||||
: `HTTP ${axiosStatus} — ${cause}`;
|
: `HTTP ${axiosStatus} — ${cause}`;
|
||||||
this.logger.error(
|
this.logger.error(
|
||||||
`[DIAG] Typhoon OCR FAILED — engine="${engineType}" url="${this.ocrApiUrl}/ocr-upload" error: ${fullCause}`
|
`[DIAG] np-dms-ocr FAILED — engine="${engineType}" url="${this.ocrApiUrl}/ocr-upload" error: ${fullCause}`
|
||||||
);
|
);
|
||||||
this.logger.warn(`Falling back to Tesseract due to: ${fullCause}`);
|
this.logger.warn(`Falling back to fast-path due to: ${fullCause}`);
|
||||||
|
|
||||||
const fallbackResult = await this.ocrService.detectAndExtract({
|
const fallbackResult = await this.ocrService.detectAndExtract({
|
||||||
pdfPath,
|
pdfPath,
|
||||||
@@ -193,7 +215,7 @@ export class SandboxOcrEngineService {
|
|||||||
return {
|
return {
|
||||||
text: fallbackResult.text,
|
text: fallbackResult.text,
|
||||||
ocrUsed: fallbackResult.ocrUsed,
|
ocrUsed: fallbackResult.ocrUsed,
|
||||||
engineUsed: fallbackResult.ocrUsed ? 'tesseract' : 'fast-path',
|
engineUsed: fallbackResult.ocrUsed ? 'fast-path' : 'fast-path',
|
||||||
fallbackUsed: true,
|
fallbackUsed: true,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -54,7 +54,7 @@ describe('AiPolicyService', () => {
|
|||||||
|
|
||||||
describe('getCanonicalModelName', () => {
|
describe('getCanonicalModelName', () => {
|
||||||
it('ควรคืนค่า np-dms-ocr สำหรับชื่อโมเดลที่มีคำว่า ocr', () => {
|
it('ควรคืนค่า np-dms-ocr สำหรับชื่อโมเดลที่มีคำว่า ocr', () => {
|
||||||
expect(service.getCanonicalModelName('typhoon-np-dms-ocr:latest')).toBe(
|
expect(service.getCanonicalModelName('np-dms-ocr:latest')).toBe(
|
||||||
'np-dms-ocr'
|
'np-dms-ocr'
|
||||||
);
|
);
|
||||||
expect(service.getCanonicalModelName('my-ocr-model')).toBe('np-dms-ocr');
|
expect(service.getCanonicalModelName('my-ocr-model')).toBe('np-dms-ocr');
|
||||||
|
|||||||
@@ -1,6 +1,7 @@
|
|||||||
// File: backend/src/modules/ai/tests/ocr-residency.spec.ts
|
// File: backend/src/modules/ai/tests/ocr-residency.spec.ts
|
||||||
// Change Log:
|
// Change Log:
|
||||||
// - 2026-06-11: Initial unit tests for adaptive OCR residency
|
// - 2026-06-11: Initial unit tests for adaptive OCR residency
|
||||||
|
// - 2026-06-20: เพิ่ม mock สำหรับ AiExecutionProfile repository และ AiPromptsService เพื่อรองรับ parameter governance
|
||||||
|
|
||||||
import { Test, TestingModule } from '@nestjs/testing';
|
import { Test, TestingModule } from '@nestjs/testing';
|
||||||
import { ConfigService } from '@nestjs/config';
|
import { ConfigService } from '@nestjs/config';
|
||||||
@@ -11,6 +12,8 @@ import { AiPolicyService } from '../services/ai-policy.service';
|
|||||||
import { OcrCacheService } from '../services/ocr-cache.service';
|
import { OcrCacheService } from '../services/ocr-cache.service';
|
||||||
import { SystemSetting } from '../entities/system-setting.entity';
|
import { SystemSetting } from '../entities/system-setting.entity';
|
||||||
import { AiAuditLog } from '../entities/ai-audit-log.entity';
|
import { AiAuditLog } from '../entities/ai-audit-log.entity';
|
||||||
|
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
|
||||||
|
import { AiPromptsService } from '../prompts/ai-prompts.service';
|
||||||
|
|
||||||
describe('OcrService Adaptive Residency (US2)', () => {
|
describe('OcrService Adaptive Residency (US2)', () => {
|
||||||
let service: OcrService;
|
let service: OcrService;
|
||||||
@@ -36,6 +39,23 @@ describe('OcrService Adaptive Residency (US2)', () => {
|
|||||||
create: jest.fn().mockReturnValue({}),
|
create: jest.fn().mockReturnValue({}),
|
||||||
save: jest.fn().mockResolvedValue({}),
|
save: jest.fn().mockResolvedValue({}),
|
||||||
};
|
};
|
||||||
|
const mockProfileRepo = {
|
||||||
|
findOne: jest.fn().mockResolvedValue({
|
||||||
|
profileName: 'ocr-extract',
|
||||||
|
temperature: 0.1,
|
||||||
|
topP: 0.5,
|
||||||
|
repeatPenalty: 1.0,
|
||||||
|
maxTokens: 16000,
|
||||||
|
}),
|
||||||
|
};
|
||||||
|
const mockAiPromptsService = {
|
||||||
|
getActive: jest.fn().mockResolvedValue({
|
||||||
|
template: 'mock active system prompt',
|
||||||
|
contextConfig: {
|
||||||
|
dmsTags: ['tag1', 'tag2'],
|
||||||
|
},
|
||||||
|
}),
|
||||||
|
};
|
||||||
const mockOcrCacheService = {};
|
const mockOcrCacheService = {};
|
||||||
const mockVramMonitorService = {
|
const mockVramMonitorService = {
|
||||||
getVramHeadroom: jest.fn(),
|
getVramHeadroom: jest.fn(),
|
||||||
@@ -61,6 +81,11 @@ describe('OcrService Adaptive Residency (US2)', () => {
|
|||||||
provide: getRepositoryToken(AiAuditLog),
|
provide: getRepositoryToken(AiAuditLog),
|
||||||
useValue: mockAiAuditLogRepo,
|
useValue: mockAiAuditLogRepo,
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
provide: getRepositoryToken(AiExecutionProfile),
|
||||||
|
useValue: mockProfileRepo,
|
||||||
|
},
|
||||||
|
{ provide: AiPromptsService, useValue: mockAiPromptsService },
|
||||||
{ provide: OcrCacheService, useValue: mockOcrCacheService },
|
{ provide: OcrCacheService, useValue: mockOcrCacheService },
|
||||||
{ provide: VramMonitorService, useValue: mockVramMonitorService },
|
{ provide: VramMonitorService, useValue: mockVramMonitorService },
|
||||||
{ provide: AiPolicyService, useValue: mockAiPolicyService },
|
{ provide: AiPolicyService, useValue: mockAiPolicyService },
|
||||||
|
|||||||
@@ -1,6 +1,8 @@
|
|||||||
// File: backend/src/modules/ai/tests/ocr.service.spec.ts
|
// File: backend/src/modules/ai/tests/ocr.service.spec.ts
|
||||||
// Change Log:
|
// Change Log:
|
||||||
// - 2026-06-13: Initial unit tests for OCR parameter wiring (T066)
|
// - 2026-06-13: Initial unit tests for OCR parameter wiring (T066)
|
||||||
|
// - 2026-06-20: เพิ่ม mock สำหรับ AiExecutionProfile repository และ AiPromptsService เพื่อรองรับ parameter governance
|
||||||
|
|
||||||
import { Test, TestingModule } from '@nestjs/testing';
|
import { Test, TestingModule } from '@nestjs/testing';
|
||||||
import { ConfigService } from '@nestjs/config';
|
import { ConfigService } from '@nestjs/config';
|
||||||
import { getRepositoryToken } from '@nestjs/typeorm';
|
import { getRepositoryToken } from '@nestjs/typeorm';
|
||||||
@@ -10,12 +12,17 @@ import { AiPolicyService } from '../services/ai-policy.service';
|
|||||||
import { OcrCacheService } from '../services/ocr-cache.service';
|
import { OcrCacheService } from '../services/ocr-cache.service';
|
||||||
import { SystemSetting } from '../entities/system-setting.entity';
|
import { SystemSetting } from '../entities/system-setting.entity';
|
||||||
import { AiAuditLog } from '../entities/ai-audit-log.entity';
|
import { AiAuditLog } from '../entities/ai-audit-log.entity';
|
||||||
|
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
|
||||||
|
import { AiPromptsService } from '../prompts/ai-prompts.service';
|
||||||
import axios from 'axios';
|
import axios from 'axios';
|
||||||
import * as fs from 'fs';
|
import * as fs from 'fs';
|
||||||
|
|
||||||
jest.mock('axios');
|
jest.mock('axios');
|
||||||
jest.mock('fs');
|
jest.mock('fs');
|
||||||
|
|
||||||
describe('OcrService Parameter Wiring (T066)', () => {
|
describe('OcrService Parameter Wiring (T066)', () => {
|
||||||
let service: OcrService;
|
let service: OcrService;
|
||||||
|
|
||||||
const mockConfigService = {
|
const mockConfigService = {
|
||||||
get: jest.fn((key: string, defaultValue?: unknown): unknown => {
|
get: jest.fn((key: string, defaultValue?: unknown): unknown => {
|
||||||
const config: Record<string, unknown> = {
|
const config: Record<string, unknown> = {
|
||||||
@@ -29,16 +36,39 @@ describe('OcrService Parameter Wiring (T066)', () => {
|
|||||||
return config[key] ?? defaultValue;
|
return config[key] ?? defaultValue;
|
||||||
}),
|
}),
|
||||||
};
|
};
|
||||||
|
|
||||||
const mockSystemSettingRepo = {
|
const mockSystemSettingRepo = {
|
||||||
findOne: jest.fn().mockResolvedValue({
|
findOne: jest.fn().mockResolvedValue({
|
||||||
settingValue: '019505a1-7c3e-7000-8000-abc123def002',
|
settingValue: '019505a1-7c3e-7000-8000-abc123def002',
|
||||||
}),
|
}),
|
||||||
};
|
};
|
||||||
|
|
||||||
const mockAiAuditLogRepo = {
|
const mockAiAuditLogRepo = {
|
||||||
create: jest.fn().mockReturnValue({}),
|
create: jest.fn().mockReturnValue({}),
|
||||||
save: jest.fn().mockResolvedValue({}),
|
save: jest.fn().mockResolvedValue({}),
|
||||||
};
|
};
|
||||||
|
|
||||||
|
const mockProfileRepo = {
|
||||||
|
findOne: jest.fn().mockResolvedValue({
|
||||||
|
profileName: 'ocr-extract',
|
||||||
|
temperature: 0.1,
|
||||||
|
topP: 0.5,
|
||||||
|
repeatPenalty: 1.0,
|
||||||
|
maxTokens: 16000,
|
||||||
|
}),
|
||||||
|
};
|
||||||
|
|
||||||
|
const mockAiPromptsService = {
|
||||||
|
getActive: jest.fn().mockResolvedValue({
|
||||||
|
template: 'mock active system prompt',
|
||||||
|
contextConfig: {
|
||||||
|
dmsTags: ['tag1', 'tag2'],
|
||||||
|
},
|
||||||
|
}),
|
||||||
|
};
|
||||||
|
|
||||||
const mockOcrCacheService = {};
|
const mockOcrCacheService = {};
|
||||||
|
|
||||||
const mockVramMonitorService = {
|
const mockVramMonitorService = {
|
||||||
getVramHeadroom: jest.fn().mockResolvedValue({
|
getVramHeadroom: jest.fn().mockResolvedValue({
|
||||||
totalMb: 16384,
|
totalMb: 16384,
|
||||||
@@ -49,12 +79,15 @@ describe('OcrService Parameter Wiring (T066)', () => {
|
|||||||
}),
|
}),
|
||||||
hasVramCapacity: jest.fn().mockResolvedValue(true),
|
hasVramCapacity: jest.fn().mockResolvedValue(true),
|
||||||
};
|
};
|
||||||
|
|
||||||
const mockAiPolicyService = {};
|
const mockAiPolicyService = {};
|
||||||
|
|
||||||
const mockRedis = {
|
const mockRedis = {
|
||||||
get: jest.fn().mockResolvedValue(null),
|
get: jest.fn().mockResolvedValue(null),
|
||||||
set: jest.fn().mockResolvedValue('OK'),
|
set: jest.fn().mockResolvedValue('OK'),
|
||||||
del: jest.fn().mockResolvedValue(1),
|
del: jest.fn().mockResolvedValue(1),
|
||||||
};
|
};
|
||||||
|
|
||||||
beforeEach(async () => {
|
beforeEach(async () => {
|
||||||
const module: TestingModule = await Test.createTestingModule({
|
const module: TestingModule = await Test.createTestingModule({
|
||||||
providers: [
|
providers: [
|
||||||
@@ -68,6 +101,11 @@ describe('OcrService Parameter Wiring (T066)', () => {
|
|||||||
provide: getRepositoryToken(AiAuditLog),
|
provide: getRepositoryToken(AiAuditLog),
|
||||||
useValue: mockAiAuditLogRepo,
|
useValue: mockAiAuditLogRepo,
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
provide: getRepositoryToken(AiExecutionProfile),
|
||||||
|
useValue: mockProfileRepo,
|
||||||
|
},
|
||||||
|
{ provide: AiPromptsService, useValue: mockAiPromptsService },
|
||||||
{ provide: OcrCacheService, useValue: mockOcrCacheService },
|
{ provide: OcrCacheService, useValue: mockOcrCacheService },
|
||||||
{ provide: VramMonitorService, useValue: mockVramMonitorService },
|
{ provide: VramMonitorService, useValue: mockVramMonitorService },
|
||||||
{ provide: AiPolicyService, useValue: mockAiPolicyService },
|
{ provide: AiPolicyService, useValue: mockAiPolicyService },
|
||||||
@@ -88,7 +126,7 @@ describe('OcrService Parameter Wiring (T066)', () => {
|
|||||||
await service.detectAndExtract({
|
await service.detectAndExtract({
|
||||||
pdfPath: '/path/to/test.pdf',
|
pdfPath: '/path/to/test.pdf',
|
||||||
documentPublicId: 'doc-123',
|
documentPublicId: 'doc-123',
|
||||||
typhoonOptions: {
|
ocrOptions: {
|
||||||
temperature: 0.15,
|
temperature: 0.15,
|
||||||
topP: 0.65,
|
topP: 0.65,
|
||||||
repeatPenalty: 1.15,
|
repeatPenalty: 1.15,
|
||||||
@@ -104,7 +142,7 @@ describe('OcrService Parameter Wiring (T066)', () => {
|
|||||||
const formData = postCallArgs[1];
|
const formData = postCallArgs[1];
|
||||||
expect(url).toBe('http://localhost:8765/ocr-upload');
|
expect(url).toBe('http://localhost:8765/ocr-upload');
|
||||||
expect(formData).toBeInstanceOf(FormData);
|
expect(formData).toBeInstanceOf(FormData);
|
||||||
expect(formData.get('engine')).toBe('typhoon-np-dms-ocr');
|
expect(formData.get('engine')).toBe('np-dms-ocr');
|
||||||
expect(formData.get('temperature')).toBe('0.15');
|
expect(formData.get('temperature')).toBe('0.15');
|
||||||
expect(formData.get('topP')).toBe('0.65');
|
expect(formData.get('topP')).toBe('0.65');
|
||||||
expect(formData.get('repeatPenalty')).toBe('1.15');
|
expect(formData.get('repeatPenalty')).toBe('1.15');
|
||||||
|
|||||||
@@ -0,0 +1,255 @@
|
|||||||
|
# OCR Sidecar — แผนการ Refactor by CLAUDE
|
||||||
|
**ไฟล์:** `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py`
|
||||||
|
**วันที่วิเคราะห์:** 2026-06-20
|
||||||
|
**GPU ปัจจุบัน:** RTX 5060 Ti 16GB
|
||||||
|
**ไฟล์:** `ocr-sidecar-refactor-plan-cluade.md`
|
||||||
|
---
|
||||||
|
|
||||||
|
## สรุปปัญหาที่พบ
|
||||||
|
|
||||||
|
| # | ปัญหา | ความรุนแรง | หมวด |
|
||||||
|
|---|-------|-----------|------|
|
||||||
|
| P1 | Hardcoded default API key ใน source code | 🔴 Critical | Security |
|
||||||
|
| P2 | `process_ocr` เป็น sync function — block event loop | 🔴 Critical | Performance |
|
||||||
|
| P3 | God Service — รวม OCR + Embed + Rerank + Normalize ไว้ด้วยกัน | 🔴 Critical | Architecture |
|
||||||
|
| P4 | Business logic อยู่ใน sidecar แทน backend | 🟡 Medium | Architecture |
|
||||||
|
| P5 | VRAM contention logic ล้าสมัย (ออกแบบมาสำหรับ 8GB) | 🟡 Medium | Performance |
|
||||||
|
| P6 | `on_event("startup")` deprecated + blocking | 🟡 Medium | Code Quality |
|
||||||
|
| P7 | `import tempfile` ซ้ำ | 🟢 Low | Code Quality |
|
||||||
|
| P8 | JSON parse fallback ไม่มี warning log | 🟢 Low | Observability |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## VRAM Budget (RTX 5060 Ti 16GB)
|
||||||
|
|
||||||
|
```
|
||||||
|
np-dms-ocr (typhoon-ocr 3B) ~3–4 GB
|
||||||
|
np-dms-ai (llama3.2 3B) ~2–3 GB
|
||||||
|
BGE-M3 (BAAI/bge-m3) ~2 GB
|
||||||
|
Reranker (bge-reranker-large) ~1 GB
|
||||||
|
─────────────────────────────────────────
|
||||||
|
รวมประมาณ ~8–10 GB ✅ พอดีใน 16GB
|
||||||
|
```
|
||||||
|
|
||||||
|
**ผลกระทบ:** โหลดทุก model พร้อมกันได้ — VRAM Arbiter และ `keep_alive: 0` ไม่จำเป็นอีกต่อไป
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## สิ่งที่ควรย้ายไป Backend (NestJS)
|
||||||
|
|
||||||
|
| สิ่งที่ย้าย | เหตุผล |
|
||||||
|
|------------|--------|
|
||||||
|
| API Key Authentication | Sidecar อยู่ใน internal Docker network — ไม่ต้องการ auth layer ซ้อน |
|
||||||
|
| `systemPrompt` validation + length check | Business rule — backend ควรเป็นผู้กำหนดและ validate ก่อนส่งมา |
|
||||||
|
| `/normalize` endpoint ทั้งหมด | Pipeline step ที่ backend orchestrate เอง |
|
||||||
|
| Engine selection + alias normalization | Backend ควร resolve engine แล้วส่งชื่อที่ถูกต้องมาตรงๆ |
|
||||||
|
| Fast-path text extraction (auto engine) | การตัดสินใจว่า "ต้อง OCR ไหม" เป็น business rule ของ backend |
|
||||||
|
| Page range calculation | Backend รู้ document metadata อยู่แล้ว |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## แผนการ Refactor แบ่งเป็น 3 Phase
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 1 — Security & Critical Bugs
|
||||||
|
**เป้าหมาย:** แก้ปัญหา critical ที่กระทบ production ทันที
|
||||||
|
**ขนาดงาน:** ~1 วัน
|
||||||
|
|
||||||
|
#### 1.1 ลบ Hardcoded Default API Key
|
||||||
|
```python
|
||||||
|
# ❌ ก่อน
|
||||||
|
OCR_SIDECAR_API_KEY = os.getenv("OCR_SIDECAR_API_KEY", "lcbp3-dms-ocr-sidecar-secure-token-2026")
|
||||||
|
|
||||||
|
# ✅ หลัง
|
||||||
|
OCR_SIDECAR_API_KEY = os.getenv("OCR_SIDECAR_API_KEY")
|
||||||
|
if not OCR_SIDECAR_API_KEY:
|
||||||
|
raise RuntimeError("OCR_SIDECAR_API_KEY environment variable must be set")
|
||||||
|
```
|
||||||
|
> ต้อง rotate key ที่ expose ใน git history ด้วย
|
||||||
|
|
||||||
|
#### 1.2 เปลี่ยน `process_ocr` เป็น Async
|
||||||
|
```python
|
||||||
|
# ❌ ก่อน
|
||||||
|
def process_ocr(...) -> str:
|
||||||
|
with httpx.Client(timeout=OCR_TIMEOUT) as client:
|
||||||
|
response = client.post(...)
|
||||||
|
|
||||||
|
# ✅ หลัง
|
||||||
|
async def process_ocr(...) -> str:
|
||||||
|
async with httpx.AsyncClient(timeout=OCR_TIMEOUT) as client:
|
||||||
|
response = await client.post(...)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 1.3 เปลี่ยน `keep_alive` จาก 0 เป็นค่าที่เหมาะสม
|
||||||
|
```python
|
||||||
|
# ❌ ก่อน — unload ทันทีเพราะ VRAM ไม่พอ (8GB era)
|
||||||
|
"keep_alive": options_override.get("keep_alive", 0)
|
||||||
|
|
||||||
|
# ✅ หลัง — keep ไว้เพราะ 16GB พอ
|
||||||
|
"keep_alive": options_override.get("keep_alive", 300)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 2 — Performance & Code Quality
|
||||||
|
**เป้าหมาย:** ลบ legacy code ที่ออกแบบมาสำหรับ 8GB GPU และปรับปรุง startup
|
||||||
|
**ขนาดงาน:** ~1 วัน
|
||||||
|
|
||||||
|
#### 2.1 ลบ VRAM Contention Logic ทั้งหมด
|
||||||
|
```python
|
||||||
|
# ❌ ลบออกทั้งหมด
|
||||||
|
from services.vram_monitor import get_vram_headroom
|
||||||
|
headroom = get_vram_headroom()
|
||||||
|
if not headroom.query_success:
|
||||||
|
device = "cpu"
|
||||||
|
elif headroom.available_mb < threshold_mb:
|
||||||
|
device = "cpu"
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# ✅ แทนด้วย fixed device
|
||||||
|
bge_model = BGEM3FlagModel('BAAI/bge-m3', use_fp16=True) # fp16 ได้แล้วบน 16GB
|
||||||
|
# device = "cuda" เสมอ — ไม่ต้อง dynamic selection
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2.2 เปลี่ยน Startup ไปใช้ `lifespan`
|
||||||
|
```python
|
||||||
|
# ❌ ก่อน — deprecated
|
||||||
|
@app.on_event("startup")
|
||||||
|
def load_bge_models():
|
||||||
|
bge_model = BGEM3FlagModel(...)
|
||||||
|
|
||||||
|
# ✅ หลัง
|
||||||
|
from contextlib import asynccontextmanager
|
||||||
|
|
||||||
|
@asynccontextmanager
|
||||||
|
async def lifespan(app: FastAPI):
|
||||||
|
await asyncio.to_thread(load_models) # ไม่ block event loop
|
||||||
|
yield
|
||||||
|
|
||||||
|
app = FastAPI(title="OCR Sidecar", version="3.0.0", lifespan=lifespan)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2.3 แก้ duplicate import และ JSON parse warning
|
||||||
|
```python
|
||||||
|
# ลบ import tempfile ที่ซ้ำใน /ocr-upload
|
||||||
|
|
||||||
|
# เพิ่ม log warning ใน JSON parse fallback
|
||||||
|
try:
|
||||||
|
result_text = json.loads(raw_text).get("natural_text", raw_text)
|
||||||
|
except (json.JSONDecodeError, AttributeError):
|
||||||
|
logger.warning(f"[DIAG] Failed to parse JSON response, using raw text. Preview: {raw_text[:100]}")
|
||||||
|
result_text = raw_text
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2.4 Validate `pdf_path` ก่อนส่งเข้า `process_ocr`
|
||||||
|
```python
|
||||||
|
# เพิ่มใน _process_pdf_doc
|
||||||
|
resolved_path = pdf_path or (str(doc.name) if hasattr(doc, 'name') and doc.name else None)
|
||||||
|
if not resolved_path or resolved_path in ("", "<memory>"):
|
||||||
|
raise ValueError("Invalid PDF path — ต้องส่ง pdf_path ที่ valid เข้ามาด้วย")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Phase 3 — Architecture Separation
|
||||||
|
**เป้าหมาย:** แยก concerns ออกจากกัน ให้ sidecar เป็น pure compute worker
|
||||||
|
**ขนาดงาน:** ~2–3 วัน
|
||||||
|
|
||||||
|
#### 3.1 ย้าย `/normalize` ไป Backend
|
||||||
|
|
||||||
|
Backend เรียก PyThaiNLP โดยตรง หรือสร้าง microservice แยก:
|
||||||
|
```
|
||||||
|
n8n → POST /api/rag/normalize (NestJS) → PyThaiNLP → return normalized text
|
||||||
|
```
|
||||||
|
ลบ `/normalize` endpoint ออกจาก sidecar ทั้งหมด
|
||||||
|
|
||||||
|
#### 3.2 ย้าย Authentication ออกจาก Sidecar
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# docker-compose — จำกัด network แทน API key
|
||||||
|
services:
|
||||||
|
ocr-sidecar:
|
||||||
|
networks:
|
||||||
|
- internal # ไม่ expose ออก external network
|
||||||
|
# ไม่มี ports mapping ออก host
|
||||||
|
```
|
||||||
|
|
||||||
|
Backend (NestJS) เรียก sidecar ผ่าน internal network โดยไม่ต้องส่ง API key
|
||||||
|
|
||||||
|
#### 3.3 Sidecar รับ Resolved Input เท่านั้น
|
||||||
|
|
||||||
|
Backend ทำ pre-processing ก่อนแล้วส่งมา:
|
||||||
|
|
||||||
|
```
|
||||||
|
Backend (NestJS)
|
||||||
|
├─ ตรวจสอบ PDF มี text layer หรือไม่ (fast-path decision)
|
||||||
|
├─ กำหนด engine ที่จะใช้ (ไม่มี "auto" ใน sidecar)
|
||||||
|
├─ validate systemPrompt
|
||||||
|
├─ คำนวณ page range
|
||||||
|
└─► POST /ocr { engine: "np-dms-ocr", pages: [1,2,3], systemPrompt: "..." }
|
||||||
|
```
|
||||||
|
|
||||||
|
Sidecar เหลือหน้าที่เดียว: **รับ input → เรียก model → คืน result**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Target Architecture หลัง Refactor
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────┐
|
||||||
|
│ Backend (NestJS) │
|
||||||
|
│ │
|
||||||
|
│ - Fast-path text extraction decision │
|
||||||
|
│ - Engine selection & validation │
|
||||||
|
│ - systemPrompt validation │
|
||||||
|
│ - Page range calculation │
|
||||||
|
│ - Thai text normalization (PyThaiNLP) │
|
||||||
|
│ - Auth & rate limiting │
|
||||||
|
└────────────────────┬────────────────────────┘
|
||||||
|
│ internal Docker network
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────┐
|
||||||
|
│ OCR Sidecar (compute only) │
|
||||||
|
│ │
|
||||||
|
│ POST /ocr ← PDF path + page list │
|
||||||
|
│ POST /ocr-upload ← multipart file │
|
||||||
|
│ POST /embed ← normalized text │
|
||||||
|
│ POST /rerank ← query + chunks │
|
||||||
|
│ GET /health │
|
||||||
|
│ │
|
||||||
|
│ Models (always loaded, CUDA): │
|
||||||
|
│ - np-dms-ocr via Ollama (keep_alive=300) │
|
||||||
|
│ - BGE-M3 fp16 │
|
||||||
|
│ - BGE-Reranker-Large fp16 │
|
||||||
|
└─────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Checklist สรุป
|
||||||
|
|
||||||
|
### Phase 1 (Critical — ทำก่อน)
|
||||||
|
- [ ] ลบ hardcoded default API key + rotate key ใน secrets
|
||||||
|
- [ ] เปลี่ยน `process_ocr` เป็น async + `httpx.AsyncClient`
|
||||||
|
- [ ] เปลี่ยน `keep_alive` default จาก 0 เป็น 300
|
||||||
|
|
||||||
|
### Phase 2 (Performance)
|
||||||
|
- [ ] ลบ VRAM contention logic ทั้งหมด (`get_vram_headroom`, dynamic device)
|
||||||
|
- [ ] เปลี่ยน `use_fp16=False` เป็น `use_fp16=True` สำหรับ BGE models
|
||||||
|
- [ ] เปลี่ยน `on_event("startup")` เป็น `lifespan` + `asyncio.to_thread`
|
||||||
|
- [ ] ลบ duplicate `import tempfile`
|
||||||
|
- [ ] เพิ่ม log warning ใน JSON parse fallback
|
||||||
|
- [ ] Validate `pdf_path` ก่อนส่งเข้า `process_ocr`
|
||||||
|
|
||||||
|
### Phase 3 (Architecture)
|
||||||
|
- [ ] ย้าย `/normalize` ไป Backend
|
||||||
|
- [ ] ย้าย engine selection + alias normalization ไป Backend
|
||||||
|
- [ ] ย้าย fast-path decision ไป Backend
|
||||||
|
- [ ] จำกัด sidecar network เป็น internal-only แทน API key auth
|
||||||
|
- [ ] ลบ `/normalize`, auth middleware ออกจาก sidecar
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*เอกสารนี้จัดทำจากการ code review วันที่ 2026-06-20 — ควร update เมื่อ architecture เปลี่ยน*
|
||||||
@@ -0,0 +1,259 @@
|
|||||||
|
# 📋 OCR Sidecar — แผนการ Refactor by QWEN
|
||||||
|
|
||||||
|
**Project:** NAP-DMS (OCR Sidecar Modernization)
|
||||||
|
**Target Hardware:** NVIDIA RTX 5060 Ti 16GB
|
||||||
|
**Date:** 2026-06-20
|
||||||
|
**Owner:** Document Intelligence Engine / Senior Full Stack Developer
|
||||||
|
**Status:** 🟡 Planning Phase
|
||||||
|
**ไฟล์:** `ocr-sidecar-refactor-plan-qwen.md`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 1. Executive Summary
|
||||||
|
|
||||||
|
แผนการ Refactor ครั้งนี้มีเป้าหมายเพื่อเปลี่ยน **OCR Sidecar** จาก "Fat Worker ที่แบกรับ Business Logic และ Hardware Decision" ให้กลายเป็น **"Pure Dumb Worker"** ที่โฟกัสเฉพาะการทำ AI Inference เท่านั้น โดยย้าย Orchestration, Security Gatekeeping, และ VRAM Management กลับไปให้ **NestJS Backend** เป็นผู้ควบคุมผ่านกลไก **Global Mutex + Task Queue**
|
||||||
|
|
||||||
|
### 🎯 Key Objectives
|
||||||
|
1. ✅ **Security:** ปิดช่องโหว่ Path Traversal ใน `/ocr` endpoint
|
||||||
|
2. ✅ **Architecture:** แยก Business Logic (DMS Tags, Noise Filtering, Pagination) ออกจาก Inference Layer
|
||||||
|
3. ✅ **Performance:** ลด Latency 2-5 วินาที/Request โดยยกเลิกการย้าย Model ข้าม RAM↔VRAM
|
||||||
|
4. ✅ **Stability:** ป้องกัน OOM Crash บน RTX 5060 Ti 16GB ด้วย Backend-Controlled Mutex
|
||||||
|
5. ✅ **Scalability:** Sidecar รับ Request แบบ "1 หน้า = 1 Request" เพื่อรองรับ Horizontal Scaling
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🏗️ 2. Architecture Comparison
|
||||||
|
|
||||||
|
### 🔴 Current Architecture (Anti-Pattern)
|
||||||
|
```
|
||||||
|
┌─────────────────┐ ┌─────────────────────────────────────┐
|
||||||
|
│ NestJS API │ ──────► │ Python Sidecar (Fat Worker) │
|
||||||
|
│ │ │ ┌───────────────────────────────┐ │
|
||||||
|
│ (Minimal Logic)│ │ │ ❌ Path Validation │ │
|
||||||
|
│ │ │ │ ❌ DMS Tag Injection │ │
|
||||||
|
│ │ │ │ ❌ Noise Filtering │ │
|
||||||
|
│ │ │ │ ❌ Page Loop Orchestration │ │
|
||||||
|
│ │ │ │ ❌ VRAM Decision (per req) │ │
|
||||||
|
│ │ │ │ ❌ Model .to('cuda'/'cpu') │ │
|
||||||
|
│ │ │ └───────────────────────────────┘ │
|
||||||
|
└─────────────────┘ └─────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### 🟢 Target Architecture (Best Practice)
|
||||||
|
```
|
||||||
|
┌──────────────────────────────────────────────────────────────────┐
|
||||||
|
│ NestJS Backend (Orchestrator) │
|
||||||
|
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
|
||||||
|
│ │ Path Guard │ │ Prompt Builder│ │ VRAM Mutex (Global) │ │
|
||||||
|
│ │ (Canonical) │ │ (DMS Tags) │ │ ──► Sequential GPU Ops │ │
|
||||||
|
│ └─────────────┘ └──────────────┘ └────────────────────────┘ │
|
||||||
|
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
|
||||||
|
│ │ PDF Splitter│ │ Noise Filter │ │ BullMQ Task Queue │ │
|
||||||
|
│ │ (Per Page) │ │ (Regex) │ │ ──► Concurrency Ctrl │ │
|
||||||
|
│ └─────────────┘ └──────────────┘ └────────────────────────┘ │
|
||||||
|
└──────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
│ HTTP (1 page = 1 request)
|
||||||
|
▼
|
||||||
|
┌──────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Python Sidecar (Pure Dumb Worker) │
|
||||||
|
│ ┌────────────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ ✅ PDF → Image (PyMuPDF) │ │
|
||||||
|
│ │ ✅ Ollama /v1/chat/completions call │ │
|
||||||
|
│ │ ✅ BGE-M3 Embedding (Fixed on GPU at startup) │ │
|
||||||
|
│ │ ✅ BGE-Reranker (Fixed on GPU at startup) │ │
|
||||||
|
│ │ ✅ Thai NLP Normalize (PyThaiNLP) │ │
|
||||||
|
│ └────────────────────────────────────────────────────────────┘ │
|
||||||
|
└──────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 3. VRAM Budget Analysis (RTX 5060 Ti 16GB)
|
||||||
|
|
||||||
|
| Component | Model | VRAM Usage | Status |
|
||||||
|
|-----------|-------|------------|--------|
|
||||||
|
| **BGE-M3 + Reranker** | `BAAI/bge-m3` + `bge-reranker-large` | **~4.5 GB** | 🔒 **Resident** (Load once at startup, stay on GPU) |
|
||||||
|
| **np-dms-ocr** (VLM 3B) | Q4_K_M quantized | **~5.0 GB** | 🔄 **Ephemeral** (Loaded on-demand, `keep_alive=0`) |
|
||||||
|
| **np-dms-ai** (LLM 7B-8B) | Q4_K_M quantized | **~6.0 GB** | 🔄 **Ephemeral** (Loaded on-demand, `keep_alive=10m`) |
|
||||||
|
| **CUDA Context + OS** | System overhead | **~1.5 GB** | 🔒 **Fixed** |
|
||||||
|
| **Total Peak** | — | **~10.5 GB** | ✅ **Safe** (Headroom ~5.5 GB) |
|
||||||
|
|
||||||
|
### ⚠️ Critical Rule
|
||||||
|
**ห้าม** โหลด `np-dms-ocr` และ `np-dms-ai` พร้อมกันเด็ดขาด (5 + 6 = 11 GB + BGE 4.5 GB = 15.5 GB → OOM Risk)
|
||||||
|
**ทางแก้:** NestJS Backend ต้องใช้ **Mutex** บังคับให้ทำงานแบบ Sequential เท่านั้น
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 4. Task Breakdown
|
||||||
|
|
||||||
|
### 🔴 Phase 1: Security & Critical Fixes (Priority: CRITICAL)
|
||||||
|
**Scope:** `app.py` only — ต้องทำก่อน Deploy
|
||||||
|
|
||||||
|
| # | Task | File | Status |
|
||||||
|
|---|------|------|--------|
|
||||||
|
| 1.1 | แก้ **Path Traversal** ใน `/ocr` ด้วย Path Canonicalization | `app.py` | ⬜ |
|
||||||
|
| 1.2 | แก้ **Mutable Default Argument** (`options_override: dict = {}`) | `app.py` | ⬜ |
|
||||||
|
| 1.3 | ลบ `import tempfile` ที่ซ้ำซ้อนใน `ocr_upload` | `app.py` | ⬜ |
|
||||||
|
| 1.4 | เปลี่ยน `@app.on_event("startup")` → `lifespan` context manager | `app.py` | ⬜ |
|
||||||
|
|
||||||
|
**Acceptance Criteria:**
|
||||||
|
- [ ] ส่ง `pdfPath: "../../../../etc/passwd"` ต้องได้ HTTP 403
|
||||||
|
- [ ] Pytest ผ่าน 100% สำหรับ Security Test Suite
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 🟠 Phase 2: Move Business Logic to Backend (Priority: HIGH)
|
||||||
|
**Scope:** NestJS Backend + `app.py` simplification
|
||||||
|
|
||||||
|
| # | Task | File | Status |
|
||||||
|
|---|------|------|--------|
|
||||||
|
| 2.1 | สร้าง `PromptBuilderService` ใน NestJS (Inject DMS Tags) | `backend/src/ocr/prompt-builder.service.ts` | ⬜ |
|
||||||
|
| 2.2 | สร้าง `PdfSplitterService` (แยก PDF เป็น N หน้า) | `backend/src/ocr/pdf-splitter.service.ts` | ⬜ |
|
||||||
|
| 2.3 | สร้าง `OcrNoiseFilterService` (Regex-based cleanup) | `backend/src/ocr/noise-filter.service.ts` | ⬜ |
|
||||||
|
| 2.4 | สร้าง `OcrOrchestratorService` (Loop + Concurrent Calls) | `backend/src/ocr/orchestrator.service.ts` | ⬜ |
|
||||||
|
| 2.5 | **ลบ** DMS Tag injection ออกจาก `process_ocr()` | `app.py` | ⬜ |
|
||||||
|
| 2.6 | **ลบ** `filter_ocr_noise()` ออกจาก Sidecar | `app.py` | ⬜ |
|
||||||
|
| 2.7 | **ลบ** Page loop ออกจาก `_process_pdf_doc()` (รับ page_num เดียว) | `app.py` | ⬜ |
|
||||||
|
|
||||||
|
**Acceptance Criteria:**
|
||||||
|
- [ ] Sidecar รับ Request แบบ "1 หน้า = 1 Request" เท่านั้น
|
||||||
|
- [ ] Backend สามารถประกอบ Prompt แบบ Dynamic ได้ (รองรับ Metadata Fields ใหม่ๆ)
|
||||||
|
- [ ] Concurrent OCR 5 หน้าพร้อมกัน ทำได้ผ่าน BullMQ
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 🟡 Phase 3: VRAM & GPU Management (Priority: HIGH)
|
||||||
|
**Scope:** `app.py` + NestJS Mutex
|
||||||
|
|
||||||
|
| # | Task | File | Status |
|
||||||
|
|---|------|------|--------|
|
||||||
|
| 3.1 | **ลบ** `bge_model.model.to("cuda"/"cpu")` ออกจาก `/embed`, `/rerank` | `app.py` | ⬜ |
|
||||||
|
| 3.2 | **แก้** `load_bge_models()` ให้ `.to("cuda")` ครั้งเดียวตอน Startup | `app.py` | ⬜ |
|
||||||
|
| 3.3 | **ลบ** `get_vram_headroom()` decision logic (เหลือแค่ Log) | `app.py` | ⬜ |
|
||||||
|
| 3.4 | สร้าง `VramMutexService` ใน NestJS (Global Async Lock) | `backend/src/gpu/vram-mutex.service.ts` | ⬜ |
|
||||||
|
| 3.5 | สร้าง `GpuTaskQueue` (BullMQ) สำหรับ OCR/Chat/Rerank | `backend/src/gpu/gpu-queue.service.ts` | ⬜ |
|
||||||
|
| 3.6 | ตั้ง `keep_alive=0` สำหรับ `np-dms-ocr` ใน Ollama config | `docker-compose.yml` | ⬜ |
|
||||||
|
| 3.7 | ตั้ง `keep_alive=10m` สำหรับ `np-dms-ai` ใน Ollama config | `docker-compose.yml` | ⬜ |
|
||||||
|
|
||||||
|
**Acceptance Criteria:**
|
||||||
|
- [ ] BGE-M3 โหลดเข้า GPU ครั้งเดียวตอน Container Start
|
||||||
|
- [ ] ไม่เกิด OOM Crash แม้รัน OCR + Chat สลับกัน 100 รอบ
|
||||||
|
- [ ] Latency ของ `/embed` ลดลงจาก ~3s → ~0.3s ต่อ Request
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 🟢 Phase 4: Sidecar Simplification (Priority: MEDIUM)
|
||||||
|
**Scope:** `app.py` cleanup
|
||||||
|
|
||||||
|
| # | Task | File | Status |
|
||||||
|
|---|------|------|--------|
|
||||||
|
| 4.1 | เปลี่ยน `httpx.Client` → Global `httpx.AsyncClient` (Connection Pool) | `app.py` | ⬜ |
|
||||||
|
| 4.2 | เปลี่ยน endpoint ทั้งหมดเป็น `async def` | `app.py` | ⬜ |
|
||||||
|
| 4.3 | เปลี่ยน `process_ocr()` → `async def process_ocr()` | `app.py` | ⬜ |
|
||||||
|
| 4.4 | เพิ่ม OpenTelemetry tracing (span per request) | `app.py` | ⬜ |
|
||||||
|
| 4.5 | เพิ่ม Prometheus metrics (`ocr_requests_total`, `inference_duration_seconds`) | `app.py` | ⬜ |
|
||||||
|
|
||||||
|
**Acceptance Criteria:**
|
||||||
|
- [ ] Sidecar รองรับ 50 concurrent requests ได้โดยไม่ Timeout
|
||||||
|
- [ ] มี Grafana Dashboard แสดง Latency p95/p99
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📦 5. File Changes Summary
|
||||||
|
|
||||||
|
### 🗑️ Files to DELETE / Simplify
|
||||||
|
| File | Action | Reason |
|
||||||
|
|------|--------|--------|
|
||||||
|
| `app.py::filter_ocr_noise()` | **Delete** | ย้ายไป NestJS |
|
||||||
|
| `app.py::DMS tag injection` | **Delete** | ย้ายไป NestJS PromptBuilder |
|
||||||
|
| `app.py::Page loop` | **Delete** | ย้ายไป NestJS Orchestrator |
|
||||||
|
| `app.py::VRAM decision` | **Delete** | ย้ายไป NestJS Mutex |
|
||||||
|
| `services/vram_monitor.py` | **Delete** | ไม่จำเป็นแล้ว |
|
||||||
|
|
||||||
|
### 🆕 Files to CREATE (NestJS)
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `backend/src/ocr/prompt-builder.service.ts` | ประกอบ Prompt + Inject DMS Tags |
|
||||||
|
| `backend/src/ocr/pdf-splitter.service.ts` | แยก PDF เป็น Buffer ต่อหน้า |
|
||||||
|
| `backend/src/ocr/noise-filter.service.ts` | Regex-based text cleanup |
|
||||||
|
| `backend/src/ocr/orchestrator.service.ts` | จัดการ Page Loop + Concurrency |
|
||||||
|
| `backend/src/gpu/vram-mutex.service.ts` | Global Async Lock สำหรับ GPU Ops |
|
||||||
|
| `backend/src/gpu/gpu-queue.service.ts` | BullMQ Queue สำหรับ GPU Tasks |
|
||||||
|
|
||||||
|
### ✏️ Files to MODIFY
|
||||||
|
| File | Changes |
|
||||||
|
|------|---------|
|
||||||
|
| `app.py` | Simplify to Pure Worker (~50% reduction) |
|
||||||
|
| `docker-compose.yml` | เพิ่ม Ollama `keep_alive` config |
|
||||||
|
| `Modelfile` | Sync options กับ Sidecar payload |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ✅ 6. Definition of Done (DoD)
|
||||||
|
|
||||||
|
### 🔒 Security
|
||||||
|
- [ ] Path Traversal test ผ่าน 100%
|
||||||
|
- [ ] API Key validation ครอบคลุมทุก endpoint
|
||||||
|
- [ ] `systemPrompt` length validation ทำงานถูกต้อง
|
||||||
|
|
||||||
|
### ⚡ Performance
|
||||||
|
- [ ] `/embed` latency < 500ms (p95)
|
||||||
|
- [ ] `/rerank` latency < 800ms (p95)
|
||||||
|
- [ ] OCR per page < 30s (รวม cold start)
|
||||||
|
- [ ] Concurrent 5 pages OCR ทำได้ภายใน 60s
|
||||||
|
|
||||||
|
### 🛡️ Stability
|
||||||
|
- [ ] ไม่เกิด OOM Crash ใน 24-hour stress test
|
||||||
|
- [ ] Sidecar auto-recover จาก Ollama timeout ได้
|
||||||
|
- [ ] VRAM usage คงที่ (ไม่เกิด memory leak)
|
||||||
|
|
||||||
|
### 📊 Observability
|
||||||
|
- [ ] Structured logging (JSON) ในทุก endpoint
|
||||||
|
- [ ] Prometheus metrics exposed ที่ `/metrics`
|
||||||
|
- [ ] Grafana dashboard พร้อมใช้งาน
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚠️ 7. Risks & Mitigations
|
||||||
|
|
||||||
|
| Risk | Impact | Mitigation |
|
||||||
|
|------|--------|------------|
|
||||||
|
| **OOM Crash** เมื่อโหลด 2 LLM พร้อมกัน | 🔴 Critical | NestJS Mutex บังคับ Sequential + Ollama `keep_alive=0` |
|
||||||
|
| **Path Traversal** ใน `/ocr` | 🔴 Critical | Canonicalization + Base Path Whitelist |
|
||||||
|
| **BGE-M3 Load ช้า** ตอน Startup | 🟡 Medium | Pre-download model ใน Dockerfile (no runtime download) |
|
||||||
|
| **Ollama Cold Start** (~65s) | 🟡 Medium | ใช้ Warm-up endpoint ตอน Container Start |
|
||||||
|
| **VRAM Fragmentation** จาก `.to()` calls | 🟡 Medium | **ลบ** `.to()` calls ออกทั้งหมด (Phase 3) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 8. Rollout Plan
|
||||||
|
|
||||||
|
```
|
||||||
|
Week 1: Phase 1 (Security) ────────────────► Deploy to Staging
|
||||||
|
Week 2: Phase 3 (VRAM) ────────────────────► Load Test 24h
|
||||||
|
Week 3: Phase 2 (Move Logic to Backend) ───► Integration Test
|
||||||
|
Week 4: Phase 4 (Simplification) ──────────► Production Release
|
||||||
|
```
|
||||||
|
|
||||||
|
### 🔄 Rollback Strategy
|
||||||
|
- ทุก Phase ต้องมี **Feature Flag** เปิด/ปิดได้
|
||||||
|
- Sidecar เก่ายังคง Deploy คู่ขนานได้ 2 สัปดาห์หลัง Release
|
||||||
|
- NestJS Backend สามารถ Fallback ไปใช้ Sidecar เก่าได้ผ่าน Env Var `OCR_SIDECAR_VERSION=v1|v2`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📚 9. References
|
||||||
|
|
||||||
|
- [ADR-023A] OCR Engine Selection (revised 2026-06-11)
|
||||||
|
- [ADR-033] Engine Switching Strategy
|
||||||
|
- [ADR-034] np-dms-ocr as Canonical Engine
|
||||||
|
- [ADR-036] Model Naming Convention
|
||||||
|
- [T015], [T025], [T026-T028] — Technical Specs จาก Change Log
|
||||||
|
- [NAP-DMS Spec 04-00] Infrastructure & OPS
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Prepared by:** Document Intelligence Engine
|
||||||
|
**Reviewed by:** _Pending_
|
||||||
|
**Approved by:** _Pending_
|
||||||
@@ -75,9 +75,9 @@ function normalizeLoadedModels(value: unknown): VramLoadedModelView[] {
|
|||||||
if (typeof item === 'string') {
|
if (typeof item === 'string') {
|
||||||
const name = item.toLowerCase();
|
const name = item.toLowerCase();
|
||||||
let normName = item;
|
let normName = item;
|
||||||
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) {
|
if (name.includes(OCR_MODEL_NAME)) {
|
||||||
normName = OCR_MODEL_NAME;
|
normName = OCR_MODEL_NAME;
|
||||||
} else if (name.includes('typhoon') || name.includes(MAIN_MODEL_NAME)) {
|
} else if (name.includes(MAIN_MODEL_NAME)) {
|
||||||
normName = MAIN_MODEL_NAME;
|
normName = MAIN_MODEL_NAME;
|
||||||
}
|
}
|
||||||
return {
|
return {
|
||||||
@@ -95,9 +95,9 @@ function normalizeLoadedModels(value: unknown): VramLoadedModelView[] {
|
|||||||
const rawName = model.modelName ?? model.name ?? `model-${index + 1}`;
|
const rawName = model.modelName ?? model.name ?? `model-${index + 1}`;
|
||||||
const name = rawName.toLowerCase();
|
const name = rawName.toLowerCase();
|
||||||
let normName = rawName;
|
let normName = rawName;
|
||||||
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) {
|
if (name.includes(OCR_MODEL_NAME)) {
|
||||||
normName = OCR_MODEL_NAME;
|
normName = OCR_MODEL_NAME;
|
||||||
} else if (name.includes('typhoon') || name.includes(MAIN_MODEL_NAME)) {
|
} else if (name.includes(MAIN_MODEL_NAME)) {
|
||||||
normName = MAIN_MODEL_NAME;
|
normName = MAIN_MODEL_NAME;
|
||||||
}
|
}
|
||||||
return {
|
return {
|
||||||
@@ -115,8 +115,8 @@ function normalizeLoadedModels(value: unknown): VramLoadedModelView[] {
|
|||||||
|
|
||||||
function toCanonicalModel(rawName: string): string {
|
function toCanonicalModel(rawName: string): string {
|
||||||
const name = rawName.toLowerCase();
|
const name = rawName.toLowerCase();
|
||||||
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) return OCR_MODEL_NAME;
|
if (name.includes(OCR_MODEL_NAME)) return OCR_MODEL_NAME;
|
||||||
if (name.includes('typhoon') || name.includes(MAIN_MODEL_NAME)) return MAIN_MODEL_NAME;
|
if (name.includes(MAIN_MODEL_NAME)) return MAIN_MODEL_NAME;
|
||||||
return rawName;
|
return rawName;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -193,8 +193,8 @@ export default function AiAdminConsolePage() {
|
|||||||
new Set(
|
new Set(
|
||||||
rawHealthOllamaModels.map((m) => {
|
rawHealthOllamaModels.map((m) => {
|
||||||
const name = m.toLowerCase();
|
const name = m.toLowerCase();
|
||||||
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) return OCR_MODEL_NAME;
|
if (name.includes(OCR_MODEL_NAME)) return OCR_MODEL_NAME;
|
||||||
if (name.includes('typhoon') || name.includes(MAIN_MODEL_NAME)) return MAIN_MODEL_NAME;
|
if (name.includes(MAIN_MODEL_NAME)) return MAIN_MODEL_NAME;
|
||||||
return m;
|
return m;
|
||||||
})
|
})
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -76,7 +76,7 @@ export default function OcrEngineSelector() {
|
|||||||
</CardHeader>
|
</CardHeader>
|
||||||
<CardContent className="space-y-4">
|
<CardContent className="space-y-4">
|
||||||
{engines.map((engine) => {
|
{engines.map((engine) => {
|
||||||
const isTyphoon = engine.engineType === 'typhoon_ocr';
|
const isAiPowered = engine.engineType === 'np_dms_ocr';
|
||||||
return (
|
return (
|
||||||
<div
|
<div
|
||||||
key={engine.engineId}
|
key={engine.engineId}
|
||||||
@@ -95,14 +95,14 @@ export default function OcrEngineSelector() {
|
|||||||
กำลังใช้งาน
|
กำลังใช้งาน
|
||||||
</Badge>
|
</Badge>
|
||||||
)}
|
)}
|
||||||
{isTyphoon && (
|
{isAiPowered && (
|
||||||
<Badge variant="secondary" className="text-[10px] h-4 bg-purple-500/10 text-purple-600 dark:text-purple-400 border-purple-500/20">
|
<Badge variant="secondary" className="text-[10px] h-4 bg-purple-500/10 text-purple-600 dark:text-purple-400 border-purple-500/20">
|
||||||
AI Powered
|
AI Powered
|
||||||
</Badge>
|
</Badge>
|
||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
<p className="text-xs text-muted-foreground leading-relaxed">
|
<p className="text-xs text-muted-foreground leading-relaxed">
|
||||||
{isTyphoon
|
{isAiPowered
|
||||||
? 'สกัดภาษาไทยความแม่นยำสูง (95%+) เหมาะสำหรับภาษาไทยผสมอังกฤษ'
|
? 'สกัดภาษาไทยความแม่นยำสูง (95%+) เหมาะสำหรับภาษาไทยผสมอังกฤษ'
|
||||||
: 'เอนจินมาตรฐานเบสไลน์ ประมวลผลรวดเร็วและใช้ทรัพยากรต่ำ'}
|
: 'เอนจินมาตรฐานเบสไลน์ ประมวลผลรวดเร็วและใช้ทรัพยากรต่ำ'}
|
||||||
</p>
|
</p>
|
||||||
@@ -111,7 +111,7 @@ export default function OcrEngineSelector() {
|
|||||||
<Server className="h-3 w-3" />
|
<Server className="h-3 w-3" />
|
||||||
จำกัดพร้อมกัน: {engine.concurrentLimit} งาน
|
จำกัดพร้อมกัน: {engine.concurrentLimit} งาน
|
||||||
</span>
|
</span>
|
||||||
{isTyphoon && (
|
{isAiPowered && (
|
||||||
<>
|
<>
|
||||||
<span className="flex items-center gap-1 text-purple-600 dark:text-purple-400">
|
<span className="flex items-center gap-1 text-purple-600 dark:text-purple-400">
|
||||||
<Cpu className="h-3 w-3" />
|
<Cpu className="h-3 w-3" />
|
||||||
|
|||||||
@@ -133,9 +133,9 @@ export default function OcrSandboxPromptManager() {
|
|||||||
// 2-step flow states
|
// 2-step flow states
|
||||||
const [sandboxStep, setSandboxStep] = useState<'ocr' | 'ai'>('ocr');
|
const [sandboxStep, setSandboxStep] = useState<'ocr' | 'ai'>('ocr');
|
||||||
const [selectedOcrEngine, setSelectedOcrEngine] = useState<string>('auto');
|
const [selectedOcrEngine, setSelectedOcrEngine] = useState<string>('auto');
|
||||||
const [typhoonTemperature, setTyphoonTemperature] = useState<number>(0.1);
|
const [ocrTemperature, setOcrTemperature] = useState<number>(0.1);
|
||||||
const [typhoonTopP, setTyphoonTopP] = useState<number>(0.1);
|
const [ocrTopP, setOcrTopP] = useState<number>(0.1);
|
||||||
const [typhoonRepeatPenalty, setTyphoonRepeatPenalty] = useState<number>(1.1);
|
const [ocrRepeatPenalty, setOcrRepeatPenalty] = useState<number>(1.1);
|
||||||
const { data: ocrEnginesData } = useQuery<OcrEngineResponse[]>({
|
const { data: ocrEnginesData } = useQuery<OcrEngineResponse[]>({
|
||||||
queryKey: ['ocr-engines'],
|
queryKey: ['ocr-engines'],
|
||||||
queryFn: () => adminAiService.getOcrEngines(),
|
queryFn: () => adminAiService.getOcrEngines(),
|
||||||
@@ -250,9 +250,9 @@ export default function OcrSandboxPromptManager() {
|
|||||||
if (!ocrEnginesData) return base;
|
if (!ocrEnginesData) return base;
|
||||||
const mapped = ocrEnginesData.map((e: OcrEngineResponse) => {
|
const mapped = ocrEnginesData.map((e: OcrEngineResponse) => {
|
||||||
const value =
|
const value =
|
||||||
e.engineType === 'tesseract'
|
e.engineType === 'fast_path'
|
||||||
? 'tesseract'
|
? 'auto'
|
||||||
: e.engineType === 'typhoon_ocr'
|
: e.engineType === 'np_dms_ocr'
|
||||||
? 'np-dms-ocr'
|
? 'np-dms-ocr'
|
||||||
: e.engineType;
|
: e.engineType;
|
||||||
const vramLabel =
|
const vramLabel =
|
||||||
@@ -354,13 +354,13 @@ export default function OcrSandboxPromptManager() {
|
|||||||
try {
|
try {
|
||||||
resetSandbox();
|
resetSandbox();
|
||||||
setSandboxStep('ocr');
|
setSandboxStep('ocr');
|
||||||
const typhoonOptions = selectedOcrEngine === 'np-dms-ocr'
|
const ocrOptions = selectedOcrEngine === 'np-dms-ocr'
|
||||||
? { temperature: typhoonTemperature, topP: typhoonTopP, repeatPenalty: typhoonRepeatPenalty }
|
? { temperature: ocrTemperature, topP: ocrTopP, repeatPenalty: ocrRepeatPenalty }
|
||||||
: undefined;
|
: undefined;
|
||||||
const { requestPublicId } = await adminAiService.submitSandboxOcr(
|
const { requestPublicId } = await adminAiService.submitSandboxOcr(
|
||||||
ocrFile,
|
ocrFile,
|
||||||
selectedOcrEngine,
|
selectedOcrEngine,
|
||||||
typhoonOptions
|
ocrOptions
|
||||||
);
|
);
|
||||||
toast.success(t('ai.prompt.uploadSuccess'));
|
toast.success(t('ai.prompt.uploadSuccess'));
|
||||||
// Poll สำหรับผลลัพธ์ OCR
|
// Poll สำหรับผลลัพธ์ OCR
|
||||||
@@ -429,9 +429,9 @@ export default function OcrSandboxPromptManager() {
|
|||||||
setOcrResult(null);
|
setOcrResult(null);
|
||||||
setSelectedPromptVersion(undefined);
|
setSelectedPromptVersion(undefined);
|
||||||
setSelectedOcrEngine('auto');
|
setSelectedOcrEngine('auto');
|
||||||
setTyphoonTemperature(0.1);
|
setOcrTemperature(0.1);
|
||||||
setTyphoonTopP(0.1);
|
setOcrTopP(0.1);
|
||||||
setTyphoonRepeatPenalty(1.1);
|
setOcrRepeatPenalty(1.1);
|
||||||
setOcrFile(null);
|
setOcrFile(null);
|
||||||
setSelectedProjectPublicId('');
|
setSelectedProjectPublicId('');
|
||||||
setSelectedContractPublicId('');
|
setSelectedContractPublicId('');
|
||||||
@@ -677,37 +677,37 @@ export default function OcrSandboxPromptManager() {
|
|||||||
</div>
|
</div>
|
||||||
{selectedOcrEngine === 'np-dms-ocr' && (
|
{selectedOcrEngine === 'np-dms-ocr' && (
|
||||||
<div className="space-y-3 rounded-md border border-dashed border-amber-500/30 bg-amber-500/5 p-3">
|
<div className="space-y-3 rounded-md border border-dashed border-amber-500/30 bg-amber-500/5 p-3">
|
||||||
<p className="text-xs font-medium text-amber-600 dark:text-amber-400">Typhoon OCR Options <span className="font-normal text-muted-foreground">(override Modelfile defaults)</span></p>
|
<p className="text-xs font-medium text-amber-600 dark:text-amber-400">OCR Options <span className="font-normal text-muted-foreground">(override Modelfile defaults)</span></p>
|
||||||
<div className="space-y-1">
|
<div className="space-y-1">
|
||||||
<div className="flex justify-between text-xs">
|
<div className="flex justify-between text-xs">
|
||||||
<label>Temperature</label>
|
<label>Temperature</label>
|
||||||
<span className="font-mono text-muted-foreground">{typhoonTemperature.toFixed(2)}</span>
|
<span className="font-mono text-muted-foreground">{ocrTemperature.toFixed(2)}</span>
|
||||||
</div>
|
</div>
|
||||||
<input type="range" min={0} max={1} step={0.01}
|
<input type="range" min={0} max={1} step={0.01}
|
||||||
value={typhoonTemperature}
|
value={ocrTemperature}
|
||||||
onChange={(e) => setTyphoonTemperature(parseFloat(e.target.value))}
|
onChange={(e) => setOcrTemperature(parseFloat(e.target.value))}
|
||||||
className="w-full h-1.5 accent-amber-500"
|
className="w-full h-1.5 accent-amber-500"
|
||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
<div className="space-y-1">
|
<div className="space-y-1">
|
||||||
<div className="flex justify-between text-xs">
|
<div className="flex justify-between text-xs">
|
||||||
<label>Top-P</label>
|
<label>Top-P</label>
|
||||||
<span className="font-mono text-muted-foreground">{typhoonTopP.toFixed(2)}</span>
|
<span className="font-mono text-muted-foreground">{ocrTopP.toFixed(2)}</span>
|
||||||
</div>
|
</div>
|
||||||
<input type="range" min={0} max={1} step={0.01}
|
<input type="range" min={0} max={1} step={0.01}
|
||||||
value={typhoonTopP}
|
value={ocrTopP}
|
||||||
onChange={(e) => setTyphoonTopP(parseFloat(e.target.value))}
|
onChange={(e) => setOcrTopP(parseFloat(e.target.value))}
|
||||||
className="w-full h-1.5 accent-amber-500"
|
className="w-full h-1.5 accent-amber-500"
|
||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
<div className="space-y-1">
|
<div className="space-y-1">
|
||||||
<div className="flex justify-between text-xs">
|
<div className="flex justify-between text-xs">
|
||||||
<label>Repeat Penalty</label>
|
<label>Repeat Penalty</label>
|
||||||
<span className="font-mono text-muted-foreground">{typhoonRepeatPenalty.toFixed(2)}</span>
|
<span className="font-mono text-muted-foreground">{ocrRepeatPenalty.toFixed(2)}</span>
|
||||||
</div>
|
</div>
|
||||||
<input type="range" min={1} max={2} step={0.01}
|
<input type="range" min={1} max={2} step={0.01}
|
||||||
value={typhoonRepeatPenalty}
|
value={ocrRepeatPenalty}
|
||||||
onChange={(e) => setTyphoonRepeatPenalty(parseFloat(e.target.value))}
|
onChange={(e) => setOcrRepeatPenalty(parseFloat(e.target.value))}
|
||||||
className="w-full h-1.5 accent-amber-500"
|
className="w-full h-1.5 accent-amber-500"
|
||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
@@ -864,14 +864,14 @@ export default function OcrSandboxPromptManager() {
|
|||||||
{ocrResult.engineUsed === 'np-dms-ocr'
|
{ocrResult.engineUsed === 'np-dms-ocr'
|
||||||
? 'np-dms-ocr'
|
? 'np-dms-ocr'
|
||||||
: ocrResult.ocrUsed
|
: ocrResult.ocrUsed
|
||||||
? 'Tesseract'
|
? 'Fast Path (OCR)'
|
||||||
: 'Fast Path (Text Layer)'}
|
: 'Fast Path (Text Layer)'}
|
||||||
</Badge>
|
</Badge>
|
||||||
</CardHeader>
|
</CardHeader>
|
||||||
<CardContent className="pt-4">
|
<CardContent className="pt-4">
|
||||||
{ocrResult.fallbackUsed && (
|
{ocrResult.fallbackUsed && (
|
||||||
<div className="mb-3 rounded-md border border-amber-500/20 bg-amber-500/5 px-3 py-2 text-xs text-amber-600 dark:text-amber-400">
|
<div className="mb-3 rounded-md border border-amber-500/20 bg-amber-500/5 px-3 py-2 text-xs text-amber-600 dark:text-amber-400">
|
||||||
np-dms-ocr unavailable. Fallback to Tesseract was used for this run.
|
np-dms-ocr unavailable. Fallback to Fast Path was used for this run.
|
||||||
</div>
|
</div>
|
||||||
)}
|
)}
|
||||||
<div className="relative rounded-md bg-muted p-4 font-mono text-xs overflow-auto max-h-[200px] border border-border/10">
|
<div className="relative rounded-md bg-muted p-4 font-mono text-xs overflow-auto max-h-[200px] border border-border/10">
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
// File: frontend/components/admin/ai/SandboxTestArea.tsx
|
// File: frontend/components/admin/ai/SandboxTestArea.tsx
|
||||||
// Change Log:
|
// Change Log:
|
||||||
// - 2026-06-15: Created SandboxTestArea component with UI elements for 3-step sandbox testing (T038)
|
// - 2026-06-15: Created SandboxTestArea component with UI elements for 3-step sandbox testing (T038)
|
||||||
// - 2026-06-17: ลบ Tesseract ออกจาก OCR Engine dropdown ตาม ADR-035 (ใช้ Typhoon OCR ผ่าน Ollama)
|
// - 2026-06-17: ลบ Tesseract ออกจาก OCR Engine dropdown ตาม ADR-035 (ใช้ np-dms-ocr ผ่าน Ollama)
|
||||||
|
|
||||||
import React, { useState } from 'react';
|
import React, { useState } from 'react';
|
||||||
import { Card, CardContent, CardHeader, CardTitle, CardDescription } from '@/components/ui/card';
|
import { Card, CardContent, CardHeader, CardTitle, CardDescription } from '@/components/ui/card';
|
||||||
@@ -254,8 +254,8 @@ export default function SandboxTestArea({
|
|||||||
<SelectValue placeholder="เลือกเอนจิน..." />
|
<SelectValue placeholder="เลือกเอนจิน..." />
|
||||||
</SelectTrigger>
|
</SelectTrigger>
|
||||||
<SelectContent>
|
<SelectContent>
|
||||||
<SelectItem value="auto" className="text-xs">Auto (Fast Path / Typhoon OCR)</SelectItem>
|
<SelectItem value="auto" className="text-xs">Auto (Fast Path / np-dms-ocr)</SelectItem>
|
||||||
<SelectItem value="np-dms-ocr" className="text-xs">Typhoon OCR (AI Vision)</SelectItem>
|
<SelectItem value="np-dms-ocr" className="text-xs">np-dms-ocr (AI Vision)</SelectItem>
|
||||||
</SelectContent>
|
</SelectContent>
|
||||||
</Select>
|
</Select>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -28,16 +28,16 @@ vi.mock('sonner', () => ({
|
|||||||
const mockEngines = [
|
const mockEngines = [
|
||||||
{
|
{
|
||||||
engineId: 'engine-1',
|
engineId: 'engine-1',
|
||||||
engineName: 'Tesseract OCR',
|
engineName: 'Fast Path (PyMuPDF)',
|
||||||
engineType: 'tesseract',
|
engineType: 'fast_path',
|
||||||
isCurrentActive: true,
|
isCurrentActive: true,
|
||||||
concurrentLimit: 4,
|
concurrentLimit: 10,
|
||||||
vramRequirementMB: 0,
|
vramRequirementMB: 0,
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
engineId: 'engine-2',
|
engineId: 'engine-2',
|
||||||
engineName: 'Typhoon OCR',
|
engineName: 'np-dms-ocr',
|
||||||
engineType: 'typhoon_ocr',
|
engineType: 'np_dms_ocr',
|
||||||
isCurrentActive: false,
|
isCurrentActive: false,
|
||||||
concurrentLimit: 1,
|
concurrentLimit: 1,
|
||||||
vramRequirementMB: 4096,
|
vramRequirementMB: 4096,
|
||||||
@@ -52,7 +52,7 @@ describe('OcrEngineSelector', () => {
|
|||||||
it('renders loading state initially', () => {
|
it('renders loading state initially', () => {
|
||||||
// Return a promise that doesn't resolve immediately to keep it in loading state
|
// Return a promise that doesn't resolve immediately to keep it in loading state
|
||||||
(adminAiService.getOcrEngines as any).mockReturnValue(new Promise(() => {}));
|
(adminAiService.getOcrEngines as any).mockReturnValue(new Promise(() => {}));
|
||||||
|
|
||||||
const { container } = render(<OcrEngineSelector />);
|
const { container } = render(<OcrEngineSelector />);
|
||||||
// Card with animate-pulse
|
// Card with animate-pulse
|
||||||
expect(container.querySelector('.animate-pulse')).toBeInTheDocument();
|
expect(container.querySelector('.animate-pulse')).toBeInTheDocument();
|
||||||
@@ -60,24 +60,24 @@ describe('OcrEngineSelector', () => {
|
|||||||
|
|
||||||
it('renders engines list successfully after loading', async () => {
|
it('renders engines list successfully after loading', async () => {
|
||||||
(adminAiService.getOcrEngines as any).mockResolvedValue(mockEngines);
|
(adminAiService.getOcrEngines as any).mockResolvedValue(mockEngines);
|
||||||
|
|
||||||
render(<OcrEngineSelector />);
|
render(<OcrEngineSelector />);
|
||||||
|
|
||||||
await waitFor(() => {
|
await waitFor(() => {
|
||||||
expect(screen.getByText('ระบบจัดการ OCR Engine')).toBeInTheDocument();
|
expect(screen.getByText('ระบบจัดการ OCR Engine')).toBeInTheDocument();
|
||||||
});
|
});
|
||||||
|
|
||||||
expect(screen.getByText('Tesseract OCR')).toBeInTheDocument();
|
expect(screen.getByText('Fast Path (PyMuPDF)')).toBeInTheDocument();
|
||||||
expect(screen.getByText('Typhoon OCR')).toBeInTheDocument();
|
expect(screen.getByText('np-dms-ocr')).toBeInTheDocument();
|
||||||
expect(screen.getByText('กำลังใช้งาน')).toBeInTheDocument(); // Badge for active engine
|
expect(screen.getByText('กำลังใช้งาน')).toBeInTheDocument(); // Badge for active engine
|
||||||
expect(screen.getByText('AI Powered')).toBeInTheDocument(); // Badge for typhoon
|
expect(screen.getByText('AI Powered')).toBeInTheDocument(); // Badge for np-dms-ocr
|
||||||
});
|
});
|
||||||
|
|
||||||
it('calls selectOcrEngine and shows success toast when changing engine', async () => {
|
it('calls selectOcrEngine and shows success toast when changing engine', async () => {
|
||||||
const user = userEvent.setup();
|
const user = userEvent.setup();
|
||||||
(adminAiService.getOcrEngines as any).mockResolvedValue(mockEngines);
|
(adminAiService.getOcrEngines as any).mockResolvedValue(mockEngines);
|
||||||
(adminAiService.selectOcrEngine as any).mockResolvedValue({});
|
(adminAiService.selectOcrEngine as any).mockResolvedValue({});
|
||||||
|
|
||||||
render(<OcrEngineSelector />);
|
render(<OcrEngineSelector />);
|
||||||
|
|
||||||
await waitFor(() => {
|
await waitFor(() => {
|
||||||
@@ -86,21 +86,21 @@ describe('OcrEngineSelector', () => {
|
|||||||
|
|
||||||
// The active engine will have "เลือกอยู่แล้ว", the inactive will have "สลับใช้งาน"
|
// The active engine will have "เลือกอยู่แล้ว", the inactive will have "สลับใช้งาน"
|
||||||
const switchButton = screen.getByRole('button', { name: /สลับใช้งาน/i });
|
const switchButton = screen.getByRole('button', { name: /สลับใช้งาน/i });
|
||||||
|
|
||||||
await act(async () => {
|
await act(async () => {
|
||||||
await user.click(switchButton);
|
await user.click(switchButton);
|
||||||
});
|
});
|
||||||
|
|
||||||
expect(adminAiService.selectOcrEngine).toHaveBeenCalledWith('engine-2');
|
expect(adminAiService.selectOcrEngine).toHaveBeenCalledWith('engine-2');
|
||||||
expect(toast.success).toHaveBeenCalledWith('เปลี่ยนเอนจิน OCR หลักเป็น Typhoon OCR สำเร็จ');
|
expect(toast.success).toHaveBeenCalledWith('เปลี่ยนเอนจิน OCR หลักเป็น np-dms-ocr สำเร็จ');
|
||||||
|
|
||||||
// It should fetch engines again
|
// It should fetch engines again
|
||||||
expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(2);
|
expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(2);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('shows error toast if fetching fails', async () => {
|
it('shows error toast if fetching fails', async () => {
|
||||||
(adminAiService.getOcrEngines as any).mockRejectedValue(new Error('Network error'));
|
(adminAiService.getOcrEngines as any).mockRejectedValue(new Error('Network error'));
|
||||||
|
|
||||||
render(<OcrEngineSelector />);
|
render(<OcrEngineSelector />);
|
||||||
|
|
||||||
await waitFor(() => {
|
await waitFor(() => {
|
||||||
@@ -112,7 +112,7 @@ describe('OcrEngineSelector', () => {
|
|||||||
const user = userEvent.setup();
|
const user = userEvent.setup();
|
||||||
(adminAiService.getOcrEngines as any).mockResolvedValue(mockEngines);
|
(adminAiService.getOcrEngines as any).mockResolvedValue(mockEngines);
|
||||||
(adminAiService.selectOcrEngine as any).mockRejectedValue(new Error('Select error'));
|
(adminAiService.selectOcrEngine as any).mockRejectedValue(new Error('Select error'));
|
||||||
|
|
||||||
render(<OcrEngineSelector />);
|
render(<OcrEngineSelector />);
|
||||||
|
|
||||||
await waitFor(() => {
|
await waitFor(() => {
|
||||||
@@ -120,7 +120,7 @@ describe('OcrEngineSelector', () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
const switchButton = screen.getByRole('button', { name: /สลับใช้งาน/i });
|
const switchButton = screen.getByRole('button', { name: /สลับใช้งาน/i });
|
||||||
|
|
||||||
await act(async () => {
|
await act(async () => {
|
||||||
await user.click(switchButton);
|
await user.click(switchButton);
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -18,17 +18,17 @@ vi.mock('@/lib/services/admin-ai.service', () => ({
|
|||||||
|
|
||||||
const engines: OcrEngineResponse[] = [
|
const engines: OcrEngineResponse[] = [
|
||||||
{
|
{
|
||||||
engineId: 'tesseract',
|
engineId: 'fast-path',
|
||||||
engineName: 'Tesseract OCR',
|
engineName: 'Fast Path (PyMuPDF)',
|
||||||
engineType: 'tesseract',
|
engineType: 'fast_path',
|
||||||
isCurrentActive: true,
|
isCurrentActive: true,
|
||||||
concurrentLimit: 4,
|
concurrentLimit: 10,
|
||||||
vramRequirementMB: 0,
|
vramRequirementMB: 0,
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
engineId: 'typhoon',
|
engineId: 'np-dms-ocr',
|
||||||
engineName: 'Typhoon OCR',
|
engineName: 'np-dms-ocr',
|
||||||
engineType: 'typhoon_ocr',
|
engineType: 'np_dms_ocr',
|
||||||
isCurrentActive: false,
|
isCurrentActive: false,
|
||||||
concurrentLimit: 1,
|
concurrentLimit: 1,
|
||||||
vramRequirementMB: 6144,
|
vramRequirementMB: 6144,
|
||||||
@@ -44,8 +44,8 @@ describe('OcrEngineSelector', () => {
|
|||||||
|
|
||||||
it('renders OCR engine data from admin service', async () => {
|
it('renders OCR engine data from admin service', async () => {
|
||||||
render(<OcrEngineSelector />);
|
render(<OcrEngineSelector />);
|
||||||
expect(await screen.findByText('Tesseract OCR')).toBeInTheDocument();
|
expect(await screen.findByText('Fast Path (PyMuPDF)')).toBeInTheDocument();
|
||||||
expect(screen.getByText('Typhoon OCR')).toBeInTheDocument();
|
expect(screen.getByText('np-dms-ocr')).toBeInTheDocument();
|
||||||
expect(screen.getByText('AI Powered')).toBeInTheDocument();
|
expect(screen.getByText('AI Powered')).toBeInTheDocument();
|
||||||
expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(1);
|
expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(1);
|
||||||
});
|
});
|
||||||
@@ -55,9 +55,9 @@ describe('OcrEngineSelector', () => {
|
|||||||
render(<OcrEngineSelector />);
|
render(<OcrEngineSelector />);
|
||||||
await user.click(await screen.findByRole('button', { name: 'สลับใช้งาน' }));
|
await user.click(await screen.findByRole('button', { name: 'สลับใช้งาน' }));
|
||||||
await waitFor(() => {
|
await waitFor(() => {
|
||||||
expect(adminAiService.selectOcrEngine).toHaveBeenCalledWith('typhoon');
|
expect(adminAiService.selectOcrEngine).toHaveBeenCalledWith('np-dms-ocr');
|
||||||
});
|
});
|
||||||
expect(toast.success).toHaveBeenCalledWith('เปลี่ยนเอนจิน OCR หลักเป็น Typhoon OCR สำเร็จ');
|
expect(toast.success).toHaveBeenCalledWith('เปลี่ยนเอนจิน OCR หลักเป็น np-dms-ocr สำเร็จ');
|
||||||
expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(2);
|
expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(2);
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -100,7 +100,7 @@ vi.mock('@/lib/services/admin-ai.service', () => ({
|
|||||||
adminAiService: {
|
adminAiService: {
|
||||||
getOcrEngines: vi.fn().mockResolvedValue([
|
getOcrEngines: vi.fn().mockResolvedValue([
|
||||||
{
|
{
|
||||||
engineType: 'typhoon_ocr',
|
engineType: 'np_dms_ocr',
|
||||||
engineName: 'np-dms-ocr',
|
engineName: 'np-dms-ocr',
|
||||||
vramRequirementMB: 4096,
|
vramRequirementMB: 4096,
|
||||||
isCurrentActive: true,
|
isCurrentActive: true,
|
||||||
|
|||||||
@@ -281,19 +281,19 @@ export const adminAiService = {
|
|||||||
submitSandboxOcr: async (
|
submitSandboxOcr: async (
|
||||||
file: File,
|
file: File,
|
||||||
engineType: string = 'auto',
|
engineType: string = 'auto',
|
||||||
typhoonOptions?: { temperature?: number; topP?: number; repeatPenalty?: number }
|
ocrOptions?: { temperature?: number; topP?: number; repeatPenalty?: number }
|
||||||
): Promise<{ requestPublicId: string; jobId: string; status: string }> => {
|
): Promise<{ requestPublicId: string; jobId: string; status: string }> => {
|
||||||
const formData = new FormData();
|
const formData = new FormData();
|
||||||
formData.append('file', file);
|
formData.append('file', file);
|
||||||
formData.append('engineType', engineType);
|
formData.append('engineType', engineType);
|
||||||
if (typhoonOptions?.temperature !== undefined) {
|
if (ocrOptions?.temperature !== undefined) {
|
||||||
formData.append('temperature', String(typhoonOptions.temperature));
|
formData.append('temperature', String(ocrOptions.temperature));
|
||||||
}
|
}
|
||||||
if (typhoonOptions?.topP !== undefined) {
|
if (ocrOptions?.topP !== undefined) {
|
||||||
formData.append('topP', String(typhoonOptions.topP));
|
formData.append('topP', String(ocrOptions.topP));
|
||||||
}
|
}
|
||||||
if (typhoonOptions?.repeatPenalty !== undefined) {
|
if (ocrOptions?.repeatPenalty !== undefined) {
|
||||||
formData.append('repeatPenalty', String(typhoonOptions.repeatPenalty));
|
formData.append('repeatPenalty', String(ocrOptions.repeatPenalty));
|
||||||
}
|
}
|
||||||
const { data } = await api.post('/ai/admin/sandbox/ocr', formData, {
|
const { data } = await api.post('/ai/admin/sandbox/ocr', formData, {
|
||||||
headers: {
|
headers: {
|
||||||
|
|||||||
@@ -26,42 +26,50 @@
|
|||||||
|
|
||||||
> การตัดสินใจเหล่านี้ **ไม่สามารถเปลี่ยนแปลงได้** โดยไม่ได้รับ Explicit Approval
|
> การตัดสินใจเหล่านี้ **ไม่สามารถเปลี่ยนแปลงได้** โดยไม่ได้รับ Explicit Approval
|
||||||
|
|
||||||
| ID | Decision | ADR |
|
| ID | Decision | ADR |
|
||||||
| --- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |
|
| --- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------ |
|
||||||
| D1 | n8n = Migration Phase orchestrator เท่านั้น — ห้ามทำ New Correspondence pipeline ผ่าน n8n | ADR-023A |
|
| D1 | n8n = Migration Phase orchestrator เท่านั้น — ห้ามทำ New Correspondence pipeline ผ่าน n8n | ADR-023A |
|
||||||
| D2 | New Correspondence → BullMQ `ai-realtime` queue โดยตรง (ไม่ผ่าน n8n) | ADR-023A |
|
| D2 | New Correspondence → BullMQ `ai-realtime` queue โดยตรง (ไม่ผ่าน n8n) | ADR-023A |
|
||||||
| D3 | n8n ต้อง call `POST /api/ai/jobs` (DMS Backend) เท่านั้น — ห้าม call Ollama/Qdrant โดยตรง | ADR-023A |
|
| D3 | n8n ต้อง call `POST /api/ai/jobs` (DMS Backend) เท่านั้น — ห้าม call Ollama/Qdrant โดยตรง | ADR-023A |
|
||||||
| D4 | Excel metadata ส่งไปพร้อม AI job เป็น context (docNumber, title, sender ฯลฯ) | Session 2 |
|
| D4 | Excel metadata ส่งไปพร้อม AI job เป็น context (docNumber, title, sender ฯลฯ) | Session 2 |
|
||||||
| D5 | Tag suggestion ใช้ทาง C: แนะนำ existing tags + สร้างใหม่ได้ถ้าไม่มี (`isNew: true` flag) | Session 2 |
|
| D5 | Tag suggestion ใช้ทาง C: แนะนำ existing tags + สร้างใหม่ได้ถ้าไม่มี (`isNew: true` flag) | Session 2 |
|
||||||
| D6 | Editable Review Form: AI pre-fill → user approve/edit → submit (human-in-the-loop ทุกครั้ง) | ADR-023 |
|
| D6 | Editable Review Form: AI pre-fill → user approve/edit → submit (human-in-the-loop ทุกครั้ง) | ADR-023 |
|
||||||
| D7 | UUID Strategy: `publicId` (UUIDv7) เท่านั้นสำหรับ Public API — INT PK ต้อง `@Exclude()` | ADR-019 |
|
| D7 | UUID Strategy: `publicId` (UUIDv7) เท่านั้นสำหรับ Public API — INT PK ต้อง `@Exclude()` | ADR-019 |
|
||||||
| D8 | Schema changes: แก้ SQL โดยตรง + เพิ่ม `deltas/*.sql` — ห้ามใช้ TypeORM migration files | ADR-009 |
|
| D8 | Schema changes: แก้ SQL โดยตรง + เพิ่ม `deltas/*.sql` — ห้ามใช้ TypeORM migration files | ADR-009 |
|
||||||
| D9 | Qdrant search ต้องส่ง `projectPublicId` เป็น mandatory parameter ทุกครั้ง (compile-time) | ADR-023A |
|
| D9 | Qdrant search ต้องส่ง `projectPublicId` เป็น mandatory parameter ทุกครั้ง (compile-time) | ADR-023A |
|
||||||
| D10 | AI model stack: `np-dms-ai:latest` (Main LLM) + `np-dms-ocr:latest` (OCR, keep_alive:0) + `BGE-M3` (Dense 1024 + Sparse Embedding) + `BGE-Reranker-Large` (Reranker) on Admin Desktop — `nomic-embed-text` ถูกแทนที่แล้ว (ADR-034/035) | ADR-034/035 |
|
| D10 | AI model stack: `np-dms-ai:latest` (Main LLM) + `np-dms-ocr:latest` (OCR, keep_alive:0) + `BGE-M3` (Dense 1024 + Sparse Embedding) + `BGE-Reranker-Large` (Reranker) on Admin Desktop — `nomic-embed-text` ถูกแทนที่แล้ว (ADR-034/035) | ADR-034/035 |
|
||||||
| D11 | RAG Embedding trigger: `syncStatus()` → `enqueueRagPrepare()` เมื่อ status ≠ DRAFT; jobId = `rag-prepare:{documentPublicId}:{revisionNumber}` (BullMQ dedup); delete-before-upsert ทุกครั้ง | ADR-035 |
|
| D11 | RAG Embedding trigger: `syncStatus()` → `enqueueRagPrepare()` เมื่อ status ≠ DRAFT; jobId = `rag-prepare:{documentPublicId}:{revisionNumber}` (BullMQ dedup); delete-before-upsert ทุกครั้ง | ADR-035 |
|
||||||
| D12 | Qdrant collection `lcbp3_vectors` = Hybrid schema: `bge_dense` (1024 dims, Cosine) + `bge_sparse` (SPLADE); payload indexes: `project_public_id` (tenant), `doc_public_id`, `status_code`, `doc_type` | ADR-035 |
|
| D12 | Qdrant collection `lcbp3_vectors` = Hybrid schema: `bge_dense` (1024 dims, Cosine) + `bge_sparse` (SPLADE); payload indexes: `project_public_id` (tenant), `doc_public_id`, `status_code`, `doc_type` | ADR-035 |
|
||||||
| D13 | **Analysis Phase required** — ต้องอ่าน `docker-compose*.yml`, `deploy.sh`, `main.ts` ก่อนแนะนำ URL/Port/Path — ห้ามเดา | AGENTS.md |
|
| D13 | **Analysis Phase required** — ต้องอ่าน `docker-compose*.yml`, `deploy.sh`, `main.ts` ก่อนแนะนำ URL/Port/Path — ห้ามเดา | AGENTS.md |
|
||||||
| D14 | Sandbox-Production Parity: บันทึก draft ใน `ai_sandbox_profiles` และปรับใช้ไป production `ai_execution_profiles` ผ่าน apply API (Idempotency-Key + CASL guard); sandbox pipeline ดึง project/contract ID จริงเพื่อ parity prompt context | ADR-036 |
|
| D14 | Sandbox-Production Parity: บันทึก draft ใน `ai_sandbox_profiles` และปรับใช้ไป production `ai_execution_profiles` ผ่าน apply API (Idempotency-Key + CASL guard); sandbox pipeline ดึง project/contract ID จริงเพื่อ parity prompt context | ADR-036 |
|
||||||
| D15 | SandboxTabs ต้องโหลด active prompts ทั้ง ocr_system และ ocr_extraction จาก service เพื่อแสดง prompt info ทั้ง 2 steps ตาม FR-009, FR-010 (Feature-238) | Feature-238 |
|
| D15 | SandboxTabs ต้องโหลด active prompts ทั้ง ocr_system และ ocr_extraction จาก service เพื่อแสดง prompt info ทั้ง 2 steps ตาม FR-009, FR-010 (Feature-238) | Feature-238 |
|
||||||
| D16 | Backend VRAM service ต้องส่ง loadedModels พร้อม vramUsageMB (bytes → MB) เพื่อให้ frontend แสดงผล VRAM usage ของแต่ละ model ได้ถูกต้อง | Session 2026-06-18 |
|
| D16 | Backend VRAM service ต้องส่ง loadedModels พร้อม vramUsageMB (bytes → MB) เพื่อให้ frontend แสดงผล VRAM usage ของแต่ละ model ได้ถูกต้อง | Session 2026-06-18 |
|
||||||
| D17 | สถานะพับ/คลี่ของการ์ดและเซกชันในหน้า AI Admin Console จะเก็บลงใน localStorage เพื่อรักษาสถานะ และการพับไม่มีผลต่อ background query polling | Feature-240 |
|
| D17 | สถานะพับ/คลี่ของการ์ดและเซกชันในหน้า AI Admin Console จะเก็บลงใน localStorage เพื่อรักษาสถานะ และการพับไม่มีผลต่อ background query polling | Feature-240 |
|
||||||
| D18 | Deploy script ต้องตรวจสอบ ClamAV health status ก่อน recreation — ถ้า healthy ให้ recreate เฉพาะ backend/frontend (skip 5-minute healthcheck delay) | Session 2026-06-19 |
|
| D18 | Deploy script ต้องตรวจสอบ ClamAV health status ก่อน recreation — ถ้า healthy ให้ recreate เฉพาะ backend/frontend (skip 5-minute healthcheck delay) | Session 2026-06-19 |
|
||||||
| D19 | CI timeout ต้องอย่างน้อย 30 minutes เพื่อรองรับ ClamAV startup กรณีต้อง recreate full stack | Session 2026-06-19 |
|
| D19 | CI timeout ต้องอย่างน้อย 30 minutes เพื่อรองรับ ClamAV startup กรณีต้อง recreate full stack | Session 2026-06-19 |
|
||||||
| D20 | AI Admin frontend services ต้อง normalize API response envelope ที่อาจซ้อน `data` ก่อน render; VRAM `totalVRAMMB = 0` คือ unknown capacity ไม่ใช่ OOM Guard | Session 2026-06-19 |
|
| D20 | AI Admin frontend services ต้อง normalize API response envelope ที่อาจซ้อน `data` ก่อน render; VRAM `totalVRAMMB = 0` คือ unknown capacity ไม่ใช่ OOM Guard | Session 2026-06-19 |
|
||||||
|
| D21 | OCR Sidecar = Pure Compute Worker — orchestration/params อยู่ใน backend existing services (reject PromptBuilderService, OcrNoiseFilterService, OcrOrchestratorService) | ADR-040 D1 |
|
||||||
|
| D22 | Wire `calculate_ocr_residency()` ใน `process_ocr` — keep_alive เป็น lazy resource param (ADR-036 Gap-2), ห้าม fixed value | ADR-040 D3 |
|
||||||
|
| D23 | Retain vram_monitor + CPU-fallback for `/embed`,`/rerank` — ห้าม force BGE+Reranker GPU-resident, เคารณะ LLM-First GPU Ownership + CPU Fallback Retrieval | ADR-040 D4 |
|
||||||
|
| D24 | Remove X-API-Key from sidecar — auth = network isolation (supersedes ADR-033 §7), sequencing: ลบเฉพาะหลัง ADR-041 cutover (single Docker host) | ADR-040 D5 |
|
||||||
|
| D25 | Server Consolidation — co-locate ทุก services บน single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB), retire Desk-5439 | ADR-041 D1 |
|
||||||
|
| D26 | ASUSTOR (192.168.10.9) = Primary NAS (CIFS share np-dms-as), QNAP = Backup server เท่านั้น | ADR-041 D2 |
|
||||||
|
| D27 | Docker-internal network only for sidecar/Ollama — enables ADR-040 D5 network-only auth, QNAP backend → new host consolidation | ADR-041 D3 |
|
||||||
|
| D28 | Canonical naming enforced: `np-dms-ai` (LLM), `np-dms-ocr` (OCR), `fast-path` (PyMuPDF) — ลบ `typhoon-llm`, `tesseract`, `Typhoon OCR` ออกจาก code; `OCR_SIDECAR_API_KEY` mandatory (no default); backend ไม่ส่ง `keep_alive` (sidecar คำนวณเอง) | ADR-040/034 |
|
||||||
|
|
||||||
## Environment & Services
|
## Environment & Services
|
||||||
|
|
||||||
| Service | Local URL / Port | Production | Notes |
|
| Service | Local URL / Port | Production | Notes |
|
||||||
| ---------------- | ----------------------------- | --------------------------------- | ------------------------------------------------------------------------------------------ |
|
| ---------------- | ----------------------------- | --------------------------------- | -------------------------------------------------------------------------------------------------- |
|
||||||
| **Backend API** | `http://localhost:3001` | `https://backend.np-dms.work/api` | NestJS — port 3000 in container, exposed via Nginx Proxy Manager |
|
| **Backend API** | `http://localhost:3001` | `https://backend.np-dms.work/api` | NestJS — port 3000 in container, exposed via Nginx Proxy Manager |
|
||||||
| **Frontend** | `http://localhost:3000` | QNAP `192.168.10.8` | Next.js |
|
| **Frontend** | `http://localhost:3000` | QNAP `192.168.10.8` | Next.js |
|
||||||
| **MariaDB** | `localhost:3307` | QNAP internal | DB: `lcbp3`, root via docker |
|
| **MariaDB** | `localhost:3307` | QNAP internal | DB: `lcbp3`, root via docker |
|
||||||
| **Redis** | `localhost:6379` | QNAP internal | BullMQ + session store |
|
| **Redis** | `localhost:6379` | QNAP internal | BullMQ + session store |
|
||||||
| **Ollama** | `http://192.168.10.100:11434` | Admin Desktop (Desk-5439) | typhoon2.5-np-dms:latest (main) + typhoon-np-dms-ocr:latest (OCR, keep_alive:0) |
|
| **Ollama** | `http://192.168.10.100:11434` | Admin Desktop (Desk-5439) | typhoon2.5-np-dms:latest (main) + typhoon-np-dms-ocr:latest (OCR, keep_alive:0) |
|
||||||
| **Qdrant** | `http://localhost:6333` | Admin Desktop (Desk-5439) | Vector DB — requires projectPublicId |
|
| **Qdrant** | `http://localhost:6333` | Admin Desktop (Desk-5439) | Vector DB — requires projectPublicId |
|
||||||
| **OCR Sidecar** | `http://192.168.10.100:8765` | Admin Desktop (Desk-5439) | Tesseract (fallback) / Typhoon OCR-3B (primary) + BGE-M3 `/embed` + BGE-Reranker `/rerank` |
|
| **OCR Sidecar** | `http://192.168.10.100:8765` | Admin Desktop (Desk-5439) | np-dms-ocr (Ollama) + BGE-M3 `/embed` + BGE-Reranker `/rerank`; async I/O, lifespan, no /normalize |
|
||||||
| **Gitea** | `https://git.np-dms.work` | QNAP `192.168.10.8` | Source + CI/CD |
|
| **Gitea** | `https://git.np-dms.work` | QNAP `192.168.10.8` | Source + CI/CD |
|
||||||
| **Gitea Runner** | ASUSTOR `192.168.10.9` | — | CI runner |
|
| **Gitea Runner** | ASUSTOR `192.168.10.9` | — | CI runner |
|
||||||
|
|
||||||
### Key Environment Variables
|
### Key Environment Variables
|
||||||
|
|
||||||
@@ -75,6 +83,36 @@ QDRANT_URL
|
|||||||
|
|
||||||
## Next Session Focus
|
## Next Session Focus
|
||||||
|
|
||||||
|
### OCR Backend Cleanup (Session 2026-06-20) ✅ COMPLETE
|
||||||
|
|
||||||
|
- [x] **P1-1:** ลบ `keep_alive` จาก backend form data
|
||||||
|
- [x] **P1-2:** ลบ hardcoded API key defaults (ocr.service.ts + sandbox-ocr-engine.service.ts)
|
||||||
|
- [x] **P2-1:** Align env var `OCR_SIDECAR_API_KEY` ใน `.env.example`
|
||||||
|
- [x] **P2-2:** Fix OCR URL + ลบ `THAI_PREPROCESS_URL` ใน `.env.example`
|
||||||
|
- [x] **P2-5:** Bump Dockerfile เป็น `python:3.11-slim`
|
||||||
|
- [x] **P3-1/P3-2:** Wrap sync VRAM calls ใน `asyncio.to_thread()`
|
||||||
|
- [x] **Rename typhoon-llm → np-dms-ai:** สร้าง `np-dms-ai.processor.ts`, ลบ `typhoon-llm.processor.ts`, อัปเดต `ai.module.ts`
|
||||||
|
- [x] **Tesseract cleanup:** enum, entity, controller, service, audit log, tests
|
||||||
|
- [x] **User renamed:** `typhoon-ocr.processor.ts` → `np-dms-ocr-processor.ts`
|
||||||
|
- [x] **Rename TyphoonOcr → NpDmsOcr:** `TyphoonOcrProcessor` → `NpDmsOcrProcessor`, `QUEUE_TYPHOON_OCR` → `QUEUE_NP_DMS_OCR`, `OcrTyphoonOptions` → `OcrNpDmsOptions`, `typhoonOptions` → `ocrOptions` (backend 7 files + 3 tests)
|
||||||
|
- [x] **Frontend cleanup:** `isTyphoon` → `isAiPowered`, state vars `typhoon*` → `ocr*`, Tesseract mocks → Fast Path, dead `typhoon_ocr` checks removed, `page.tsx` model name constants
|
||||||
|
- [ ] **Verify:** `tsc --noEmit` หลัง rename ครบ (backend + frontend)
|
||||||
|
|
||||||
|
### ADR-040/041 Implementation
|
||||||
|
|
||||||
|
- [x] **OCR Sidecar Refactor (Speckit-140):** Phases 1-6, 8, 9 complete (T001-T046, T054-T063)
|
||||||
|
- [x] Phase 1-2: Setup + Foundational (T001-T006)
|
||||||
|
- [x] Phase 3: US1 Security Hardening (T007-T015) — path traversal, API key fail-fast
|
||||||
|
- [x] Phase 4: US2 GPU Resource Management (T016-T025) — residency wiring, CPU fallback
|
||||||
|
- [x] Phase 5: US3 Parameter Governance (T026-T040) — backend param resolution
|
||||||
|
- [x] Phase 6: US4 Async I/O (T041-T046) — async def, lifespan context manager, AsyncClient
|
||||||
|
- [x] Phase 8: Remove /normalize endpoint (T054-T055)
|
||||||
|
- [x] Phase 9: Polish & validation (T056-T063) — Dockerfile, docker-compose, README, quickstart
|
||||||
|
- [ ] Phase 7: US5 Network Isolation Auth (T047-T053) — BLOCKED until ADR-041 cutover
|
||||||
|
- [ ] **ADR-041 Infrastructure:** Provision new host, mount ASUSTOR CIFS, deploy docker-compose
|
||||||
|
- [ ] **ADR-040 Auth Removal:** Remove X-API-Key from sidecar + backend (T048-T053) — **ONLY AFTER ADR-041 cutover**
|
||||||
|
- [ ] **ADR-041 Cutover:** Migrate DB/ES, update DNS, smoke tests, retire Desk-5439
|
||||||
|
|
||||||
### N8N Migration & E2E Testing
|
### N8N Migration & E2E Testing
|
||||||
|
|
||||||
- [ ] **Import `n8n.workflow.v2.json`** เข้า n8n UI และทดสอบ End-to-End
|
- [ ] **Import `n8n.workflow.v2.json`** เข้า n8n UI และทดสอบ End-to-End
|
||||||
|
|||||||
@@ -6,3 +6,7 @@
|
|||||||
QNAP_SMB_USER=your_qnap_username
|
QNAP_SMB_USER=your_qnap_username
|
||||||
QNAP_SMB_PASS=your_qnap_password
|
QNAP_SMB_PASS=your_qnap_password
|
||||||
|
|
||||||
|
# OCR Sidecar security and storage boundary
|
||||||
|
OCR_SIDECAR_API_KEY=change-me-sidecar-api-key
|
||||||
|
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
|
||||||
|
|
||||||
|
|||||||
@@ -9,15 +9,17 @@
|
|||||||
# Container รันบน CPU เท่านั้น ไม่ต้องการ CUDA/GPU ใน container
|
# Container รันบน CPU เท่านั้น ไม่ต้องการ CUDA/GPU ใน container
|
||||||
# - 2026-06-11: เพิ่ม typhoon-ocr ใน requirements.txt — poppler-utils มีอยู่แล้ว (ใช้โดย prepare_ocr_messages)
|
# - 2026-06-11: เพิ่ม typhoon-ocr ใน requirements.txt — poppler-utils มีอยู่แล้ว (ใช้โดย prepare_ocr_messages)
|
||||||
# - 2026-06-11: ตัด tesseract-ocr, tesseract-ocr-tha, tesseract-ocr-eng, libsm6, libxext6, libxrender1, libfontconfig1, libx11-6 — ไม่ใช้ Tesseract อีกต่อไป
|
# - 2026-06-11: ตัด tesseract-ocr, tesseract-ocr-tha, tesseract-ocr-eng, libsm6, libxext6, libxrender1, libfontconfig1, libx11-6 — ไม่ใช้ Tesseract อีกต่อไป
|
||||||
|
# - 2026-06-20: ADR-040 Phase 6+8 — เพิ่ม curl สำหรับ HEALTHCHECK; ลด start_period เป็น 10s (async startup ไม่ block)
|
||||||
|
|
||||||
FROM python:3.10-slim
|
FROM python:3.11-slim
|
||||||
|
|
||||||
# ติดตั้ง system dependencies สำหรับ PDF processing และ PyMuPDF
|
# ติดตั้ง system dependencies สำหรับ PDF processing, PyMuPDF และ curl สำหรับ healthcheck
|
||||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||||
libglib2.0-0 \
|
libglib2.0-0 \
|
||||||
libgl1 \
|
libgl1 \
|
||||||
libgomp1 \
|
libgomp1 \
|
||||||
poppler-utils \
|
poppler-utils \
|
||||||
|
curl \
|
||||||
&& rm -rf /var/lib/apt/lists/*
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
WORKDIR /app
|
WORKDIR /app
|
||||||
|
|||||||
@@ -0,0 +1,115 @@
|
|||||||
|
# OCR Sidecar — Desk-5439
|
||||||
|
|
||||||
|
HTTP API server สำหรับสกัดข้อความจาก PDF ผ่าน np-dms-ocr (Ollama) — รันบน Desk-5439 ตาม ADR-023A/ADR-040.
|
||||||
|
|
||||||
|
## สถาปัตยกรรม
|
||||||
|
|
||||||
|
```
|
||||||
|
Backend (QNAP) → POST /ocr-upload → OCR Sidecar (Desk-5439:8765)
|
||||||
|
↓
|
||||||
|
PyMuPDF (fast-path: chars > 100)
|
||||||
|
↓ (ถ้า chars ≤ 100)
|
||||||
|
prepare_ocr_messages (typhoon_ocr)
|
||||||
|
+ poppler/pdftoppm (PDF → image)
|
||||||
|
↓
|
||||||
|
np-dms-ocr via Ollama /v1/chat/completions
|
||||||
|
↓
|
||||||
|
JSON → natural_text (Markdown)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
| Endpoint | Method | Auth | หน้าที่ |
|
||||||
|
|----------|--------|------|---------|
|
||||||
|
| `/health` | GET | — | ตรวจสอบสถานะ sidecar |
|
||||||
|
| `/ocr` | POST | X-API-Key | OCR จาก path (ใช้เมื่อ shared volume mount) |
|
||||||
|
| `/ocr-upload` | POST | X-API-Key | OCR จาก multipart file upload |
|
||||||
|
| `/embed` | POST | X-API-Key | BGE-M3 embedding (Dense + Sparse) พร้อม CPU fallback |
|
||||||
|
| `/rerank` | POST | X-API-Key | BGE-Reranker-Large chunk re-ranker พร้อม CPU fallback |
|
||||||
|
|
||||||
|
**Removed endpoints:**
|
||||||
|
- `POST /normalize` — ลบออกแล้วตาม ADR-040 Phase 8 (ไม่มี consumers)
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
| Variable | Default | หน้าที่ |
|
||||||
|
|----------|---------|---------|
|
||||||
|
| `OCR_SIDECAR_API_KEY` | (required) | API key สำหรับ authentication (Phase 1) |
|
||||||
|
| `OCR_SIDECAR_UPLOAD_BASE` | `/mnt/uploads` | Base path whitelist สำหรับ path traversal protection |
|
||||||
|
| `OLLAMA_API_URL` | `http://host.docker.internal:11434` | Ollama API URL |
|
||||||
|
| `OCR_MODEL` | `np-dms-ocr:latest` | ชื่อ OCR model ใน Ollama |
|
||||||
|
| `OCR_TIMEOUT` | `360` | Timeout วินาทีต่อ request |
|
||||||
|
| `OCR_CHAR_THRESHOLD` | `100` | Fast-path threshold (chars > 100 = ใช้ text layer โดยตรง) |
|
||||||
|
| `OCR_MAX_PAGES` | `0` | จำนวนหน้าสูงสุด (0 = ทุกหน้า) |
|
||||||
|
| `OCR_ACTIVE_PROFILE` | (optional) | ชื่อ profile ใน `ai_execution_profiles` |
|
||||||
|
| `VRAM_HEADROOM_THRESHOLD_MB` | `3000.0` | Threshold สำหรับ CPU fallback |
|
||||||
|
| `RETRIEVAL_TIMEOUT_SECONDS` | `30.0` | Timeout สำหรับ /embed และ /rerank |
|
||||||
|
| `MAX_SYSTEM_PROMPT_LENGTH` | `10000` | ความยาวสูงสุดของ systemPrompt |
|
||||||
|
|
||||||
|
## การ Deploy
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. คัดลอก .env.example เป็น .env และกรอกค่า
|
||||||
|
cp .env.example .env
|
||||||
|
# แก้ OCR_SIDECAR_API_KEY เป็นค่าจริง
|
||||||
|
|
||||||
|
# 2. Build และรัน
|
||||||
|
docker compose up -d --build
|
||||||
|
|
||||||
|
# 3. ตรวจสอบ
|
||||||
|
curl http://192.168.10.100:8765/health
|
||||||
|
```
|
||||||
|
|
||||||
|
## การทดสอบ
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# รันทุก test (จาก project root)
|
||||||
|
python -m pytest tests/ -v
|
||||||
|
|
||||||
|
# รันเฉพาะ unit tests
|
||||||
|
python -m pytest tests/unit/ocr-sidecar/ -v
|
||||||
|
|
||||||
|
# รันเฉพาะ integration tests
|
||||||
|
python -m pytest tests/integration/ocr-sidecar/ -v
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test Coverage
|
||||||
|
|
||||||
|
| Test File | หน้าที่ |
|
||||||
|
|-----------|---------|
|
||||||
|
| `test_path_traversal.py` | Path traversal protection (US1) |
|
||||||
|
| `test_api_key_validation.py` | API key validation (US1) |
|
||||||
|
| `test_residency_wiring.py` | Adaptive OCR residency wiring (US2) |
|
||||||
|
| `test_cpu_fallback.py` | CPU fallback for /embed and /rerank (US2) |
|
||||||
|
| `test_parameter_governance.py` | Runtime parameter governance (US3) |
|
||||||
|
| `test_active_prompt.py` | System prompt + DMS tags injection (US3) |
|
||||||
|
| `test_async_performance.py` | Async I/O + lifespan + concurrent requests (US4) |
|
||||||
|
|
||||||
|
## ADR-040 Phases
|
||||||
|
|
||||||
|
| Phase | Status | หน้าที่ |
|
||||||
|
|-------|--------|---------|
|
||||||
|
| Phase 1-2 | ✅ Complete | Setup + Foundational |
|
||||||
|
| Phase 3 | ✅ Complete | US1: Security Hardening |
|
||||||
|
| Phase 4 | ✅ Complete | US2: GPU Resource Management |
|
||||||
|
| Phase 5 | ✅ Complete | US3: Parameter Governance |
|
||||||
|
| Phase 6 | ✅ Complete | US4: Async I/O Performance |
|
||||||
|
| Phase 7 | ⏳ Blocked | US5: Network Isolation Auth (รอ ADR-041) |
|
||||||
|
| Phase 8 | ✅ Complete | Remove /normalize endpoint |
|
||||||
|
| Phase 9 | ✅ Complete | Polish & documentation |
|
||||||
|
|
||||||
|
## ไฟล์ในโปรเจกต์
|
||||||
|
|
||||||
|
```
|
||||||
|
ocr-sidecar/
|
||||||
|
├── app.py — FastAPI server (async I/O, lifespan)
|
||||||
|
├── Dockerfile — Docker image (python:3.10-slim + poppler + curl)
|
||||||
|
├── docker-compose.yml — Compose config (ocr-sidecar + ollama-metrics)
|
||||||
|
├── requirements.txt — Python dependencies
|
||||||
|
├── .env.example — Environment template
|
||||||
|
├── services/
|
||||||
|
│ ├── vram_monitor.py — VRAM headroom monitoring
|
||||||
|
│ └── residency_policy.py — Adaptive OCR residency calculation
|
||||||
|
└── tests/
|
||||||
|
└── test_retrieval_fallback.py — Retrieval fallback tests
|
||||||
|
```
|
||||||
@@ -27,56 +27,77 @@
|
|||||||
# - 2026-06-17: ลบชื่อ Typhoon ออกจากทุกส่วน: process_with_typhoon_ocr → process_ocr, FastAPI title, comments, ตัวแปรต่างๆ
|
# - 2026-06-17: ลบชื่อ Typhoon ออกจากทุกส่วน: process_with_typhoon_ocr → process_ocr, FastAPI title, comments, ตัวแปรต่างๆ
|
||||||
# - 2026-06-17: เพิ่ม systemPrompt parameter ใน /ocr-upload, _process_pdf_doc, process_ocr เพื่อรองรับ dynamic OCR system prompt injection (T026-T028)
|
# - 2026-06-17: เพิ่ม systemPrompt parameter ใน /ocr-upload, _process_pdf_doc, process_ocr เพื่อรองรับ dynamic OCR system prompt injection (T026-T028)
|
||||||
# - 2026-06-18: เพิ่ม MAX_SYSTEM_PROMPT_LENGTH environment variable สำหรับ configurable validation (fix-3)
|
# - 2026-06-18: เพิ่ม MAX_SYSTEM_PROMPT_LENGTH environment variable สำหรับ configurable validation (fix-3)
|
||||||
|
# - 2026-06-20: ADR-040 Phase 1-4 — ลบ default API key, เพิ่ม path whitelist, และ wire adaptive OCR residency
|
||||||
|
# - 2026-06-20: ADR-040 Phase 6 — async I/O refactor: async process_ocr, AsyncClient via lifespan, asyncio.to_thread model loading
|
||||||
|
# - 2026-06-20: ADR-040 Phase 8 — ลบ /normalize endpoint (ไม่มี consumers) และ pythainlp imports
|
||||||
|
|
||||||
import os
|
import os
|
||||||
import logging
|
import logging
|
||||||
import re
|
import re
|
||||||
import base64
|
|
||||||
import json
|
import json
|
||||||
import tempfile
|
import tempfile
|
||||||
import fitz # PyMuPDF (ใช้สำหรับ page count + fast-path text extraction)
|
import fitz # PyMuPDF (ใช้สำหรับ page count + fast-path text extraction)
|
||||||
import httpx
|
import httpx
|
||||||
import asyncio
|
import asyncio
|
||||||
|
from contextlib import asynccontextmanager
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
from PIL import Image
|
|
||||||
import io
|
|
||||||
from typhoon_ocr import prepare_ocr_messages # External library from SCB10X (PyPI) — provides OCR message preparation for np-dms-ocr
|
from typhoon_ocr import prepare_ocr_messages # External library from SCB10X (PyPI) — provides OCR message preparation for np-dms-ocr
|
||||||
from services.vram_monitor import get_vram_headroom
|
from services.vram_monitor import get_vram_headroom
|
||||||
|
from services.residency_policy import calculate_ocr_residency
|
||||||
|
|
||||||
from fastapi import FastAPI, HTTPException, UploadFile, File, Form, Depends, Security, status
|
from fastapi import FastAPI, HTTPException, UploadFile, File, Form, Depends, Security, status
|
||||||
from fastapi.security.api_key import APIKeyHeader
|
from fastapi.security.api_key import APIKeyHeader
|
||||||
from pydantic import BaseModel
|
from pydantic import BaseModel
|
||||||
from pythainlp.tokenize import word_tokenize
|
|
||||||
from pythainlp.util import normalize as thai_normalize
|
|
||||||
from FlagEmbedding import BGEM3FlagModel, FlagReranker
|
from FlagEmbedding import BGEM3FlagModel, FlagReranker
|
||||||
|
|
||||||
|
|
||||||
logging.basicConfig(level=logging.INFO)
|
logging.basicConfig(level=logging.INFO)
|
||||||
logger = logging.getLogger("ocr-sidecar")
|
logger = logging.getLogger("ocr-sidecar")
|
||||||
|
|
||||||
app = FastAPI(title="OCR Sidecar", version="2.0.0")
|
|
||||||
|
|
||||||
# Initialize BGE-M3 and Reranker singletons
|
# Initialize BGE-M3 and Reranker singletons
|
||||||
bge_model = None
|
bge_model = None
|
||||||
reranker = None
|
reranker = None
|
||||||
|
# Shared AsyncClient สำหรับ Ollama API (T043: สร้างใน lifespan context manager)
|
||||||
|
ollama_client: httpx.AsyncClient | None = None
|
||||||
|
|
||||||
@app.on_event("startup")
|
|
||||||
def load_bge_models():
|
def _load_bge_models() -> tuple:
|
||||||
global bge_model, reranker
|
"""โหลด BGE-M3 และ Reranker models บน CPU RAM (T046: เรียกผ่าน asyncio.to_thread)"""
|
||||||
logger.info("Loading BGE-M3 and Reranker models on CPU RAM...")
|
logger.info("Loading BGE-M3 and Reranker models on CPU RAM...")
|
||||||
try:
|
try:
|
||||||
# BGE-M3: BAAI/bge-m3, use_fp16=False for CPU
|
bge = BGEM3FlagModel('BAAI/bge-m3', use_fp16=False)
|
||||||
bge_model = BGEM3FlagModel('BAAI/bge-m3', use_fp16=False)
|
rerank = FlagReranker('BAAI/bge-reranker-large', use_fp16=False)
|
||||||
# Reranker: BAAI/bge-reranker-large, use_fp16=False for CPU
|
|
||||||
reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=False)
|
|
||||||
logger.info("BGE-M3 and Reranker models loaded successfully.")
|
logger.info("BGE-M3 and Reranker models loaded successfully.")
|
||||||
|
return bge, rerank
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Failed to load BGE models: {e}")
|
logger.error(f"Failed to load BGE models: {e}")
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
|
||||||
|
@asynccontextmanager
|
||||||
|
async def lifespan(app_instance: FastAPI):
|
||||||
|
"""T043/T045: Lifespan context manager แทน @app.on_event('startup') — จัดการ AsyncClient และ model loading"""
|
||||||
|
global bge_model, reranker, ollama_client
|
||||||
|
# T043: สร้าง shared AsyncClient สำหรับ Ollama API
|
||||||
|
ollama_client = httpx.AsyncClient(timeout=OCR_TIMEOUT)
|
||||||
|
logger.info(f"Shared AsyncClient created (timeout={OCR_TIMEOUT}s)")
|
||||||
|
# T046: โหลด models ผ่าน asyncio.to_thread เพื่อไม่ block startup
|
||||||
|
bge_model, reranker = await asyncio.to_thread(_load_bge_models)
|
||||||
|
yield
|
||||||
|
# Cleanup: ปิด AsyncClient
|
||||||
|
if ollama_client:
|
||||||
|
await ollama_client.aclose()
|
||||||
|
logger.info("Shared AsyncClient closed.")
|
||||||
|
|
||||||
|
|
||||||
|
app = FastAPI(title="OCR Sidecar", version="2.0.0", lifespan=lifespan)
|
||||||
|
|
||||||
|
|
||||||
# กำหนดค่าโทเค็นความปลอดภัยของ Sidecar ตามข้อเสนอแนะในการรักษาความมั่นคงปลอดภัย
|
# กำหนดค่าโทเค็นความปลอดภัยของ Sidecar ตามข้อเสนอแนะในการรักษาความมั่นคงปลอดภัย
|
||||||
OCR_SIDECAR_API_KEY = os.getenv("OCR_SIDECAR_API_KEY", "lcbp3-dms-ocr-sidecar-secure-token-2026")
|
OCR_SIDECAR_API_KEY = os.getenv("OCR_SIDECAR_API_KEY")
|
||||||
|
if not OCR_SIDECAR_API_KEY:
|
||||||
|
raise RuntimeError("OCR_SIDECAR_API_KEY is required for OCR sidecar startup")
|
||||||
|
|
||||||
# กำหนดค่าความยาวสูงสุดของ systemPrompt (fix-3: configurable validation)
|
# กำหนดค่าความยาวสูงสุดของ systemPrompt (fix-3: configurable validation)
|
||||||
MAX_SYSTEM_PROMPT_LENGTH = int(os.getenv("MAX_SYSTEM_PROMPT_LENGTH", "10000"))
|
MAX_SYSTEM_PROMPT_LENGTH = int(os.getenv("MAX_SYSTEM_PROMPT_LENGTH", "10000"))
|
||||||
@@ -94,6 +115,8 @@ MAX_PAGES = int(os.getenv("OCR_MAX_PAGES", "0")) # 0 = ทุกหน้า
|
|||||||
OLLAMA_API_URL = os.getenv("OLLAMA_API_URL", "http://host.docker.internal:11434")
|
OLLAMA_API_URL = os.getenv("OLLAMA_API_URL", "http://host.docker.internal:11434")
|
||||||
OCR_MODEL = os.getenv("OCR_MODEL", "np-dms-ocr:latest")
|
OCR_MODEL = os.getenv("OCR_MODEL", "np-dms-ocr:latest")
|
||||||
OCR_TIMEOUT = int(os.getenv("OCR_TIMEOUT", "360")) # รองรับ cold-start ~65s + inference ~30s/page
|
OCR_TIMEOUT = int(os.getenv("OCR_TIMEOUT", "360")) # รองรับ cold-start ~65s + inference ~30s/page
|
||||||
|
OCR_SIDECAR_UPLOAD_BASE = os.getenv("OCR_SIDECAR_UPLOAD_BASE", "/mnt/uploads")
|
||||||
|
OCR_ACTIVE_PROFILE = os.getenv("OCR_ACTIVE_PROFILE")
|
||||||
|
|
||||||
logger.info(f"OCR Sidecar initialized (model={OCR_MODEL}, ollama={OLLAMA_API_URL})")
|
logger.info(f"OCR Sidecar initialized (model={OCR_MODEL}, ollama={OLLAMA_API_URL})")
|
||||||
|
|
||||||
@@ -111,11 +134,29 @@ def filter_ocr_noise(text: str) -> str:
|
|||||||
filtered.append(line)
|
filtered.append(line)
|
||||||
return "\n".join(filtered)
|
return "\n".join(filtered)
|
||||||
|
|
||||||
|
def validate_pdf_path(pdf_path: str) -> Path:
|
||||||
|
"""Canonicalize path และยืนยันว่าอยู่ใต้ OCR_SIDECAR_UPLOAD_BASE"""
|
||||||
|
canonical_path = os.path.abspath(os.path.realpath(pdf_path))
|
||||||
|
canonical_base = os.path.abspath(os.path.realpath(OCR_SIDECAR_UPLOAD_BASE))
|
||||||
|
try:
|
||||||
|
common_path = os.path.commonpath([canonical_path, canonical_base])
|
||||||
|
except ValueError:
|
||||||
|
common_path = ""
|
||||||
|
if common_path != canonical_base:
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_403_FORBIDDEN,
|
||||||
|
detail="Path outside whitelisted base directory",
|
||||||
|
)
|
||||||
|
return Path(canonical_path)
|
||||||
|
|
||||||
class OcrRequest(BaseModel):
|
class OcrRequest(BaseModel):
|
||||||
pdfPath: str
|
pdfPath: str
|
||||||
maxPages: Optional[int] = None
|
maxPages: Optional[int] = None
|
||||||
engine: Optional[str] = None
|
engine: Optional[str] = None
|
||||||
keep_alive: Optional[int] = None
|
keep_alive: Optional[int] = None
|
||||||
|
runtime_params: Optional[dict] = None
|
||||||
|
system_prompt: Optional[str] = None
|
||||||
|
dms_tags: Optional[dict] = None
|
||||||
|
|
||||||
class OcrResponse(BaseModel):
|
class OcrResponse(BaseModel):
|
||||||
text: str
|
text: str
|
||||||
@@ -133,8 +174,18 @@ def health():
|
|||||||
"ollamaUrl": OLLAMA_API_URL,
|
"ollamaUrl": OLLAMA_API_URL,
|
||||||
}
|
}
|
||||||
|
|
||||||
def _process_pdf_doc(doc: fitz.Document, selected_engine: str, max_pages: int, ocr_options: dict = {}, pdf_path: str | None = None, system_prompt: Optional[str] = None) -> OcrResponse:
|
async def _process_pdf_doc(
|
||||||
|
doc: fitz.Document,
|
||||||
|
selected_engine: str,
|
||||||
|
max_pages: int,
|
||||||
|
ocr_options: Optional[dict] = None,
|
||||||
|
pdf_path: str | None = None,
|
||||||
|
system_prompt: Optional[str] = None,
|
||||||
|
runtime_params: Optional[dict] = None,
|
||||||
|
dms_tags: Optional[dict] = None,
|
||||||
|
) -> OcrResponse:
|
||||||
"""ประมวลผล fitz.Document ด้วย engine ที่เลือก — shared logic สำหรับ /ocr และ /ocr-upload"""
|
"""ประมวลผล fitz.Document ด้วย engine ที่เลือก — shared logic สำหรับ /ocr และ /ocr-upload"""
|
||||||
|
ocr_options = ocr_options or {}
|
||||||
pages_to_process = list(range(min(len(doc), max_pages) if max_pages > 0 else len(doc)))
|
pages_to_process = list(range(min(len(doc), max_pages) if max_pages > 0 else len(doc)))
|
||||||
page_count = len(pages_to_process)
|
page_count = len(pages_to_process)
|
||||||
|
|
||||||
@@ -163,7 +214,16 @@ def _process_pdf_doc(doc: fitz.Document, selected_engine: str, max_pages: int, o
|
|||||||
raise ValueError("ไม่สามารถหา PDF path — ต้องส่ง pdf_path เข้ามาด้วย")
|
raise ValueError("ไม่สามารถหา PDF path — ต้องส่ง pdf_path เข้ามาด้วย")
|
||||||
ocr_text_parts = []
|
ocr_text_parts = []
|
||||||
for i in pages_to_process:
|
for i in pages_to_process:
|
||||||
ocr_text_parts.append(process_ocr(resolved_path, page_num=i + 1, options_override=ocr_options, system_prompt=system_prompt))
|
ocr_text_parts.append(
|
||||||
|
await process_ocr(
|
||||||
|
resolved_path,
|
||||||
|
page_num=i + 1,
|
||||||
|
options_override=ocr_options,
|
||||||
|
system_prompt=system_prompt,
|
||||||
|
runtime_params=runtime_params,
|
||||||
|
dms_tags=dms_tags,
|
||||||
|
)
|
||||||
|
)
|
||||||
ocr_text = filter_ocr_noise("\n".join(ocr_text_parts).strip())
|
ocr_text = filter_ocr_noise("\n".join(ocr_text_parts).strip())
|
||||||
return OcrResponse(
|
return OcrResponse(
|
||||||
text=ocr_text,
|
text=ocr_text,
|
||||||
@@ -180,7 +240,16 @@ def _process_pdf_doc(doc: fitz.Document, selected_engine: str, max_pages: int, o
|
|||||||
raise ValueError("ไม่สามารถหา PDF path — ต้องส่ง pdf_path เข้ามาด้วย")
|
raise ValueError("ไม่สามารถหา PDF path — ต้องส่ง pdf_path เข้ามาด้วย")
|
||||||
fallback_parts = []
|
fallback_parts = []
|
||||||
for i in pages_to_process:
|
for i in pages_to_process:
|
||||||
fallback_parts.append(process_ocr(resolved_path, page_num=i + 1, options_override=ocr_options, system_prompt=system_prompt))
|
fallback_parts.append(
|
||||||
|
await process_ocr(
|
||||||
|
resolved_path,
|
||||||
|
page_num=i + 1,
|
||||||
|
options_override=ocr_options,
|
||||||
|
system_prompt=system_prompt,
|
||||||
|
runtime_params=runtime_params,
|
||||||
|
dms_tags=dms_tags,
|
||||||
|
)
|
||||||
|
)
|
||||||
fallback_text = filter_ocr_noise("\n".join(fallback_parts).strip())
|
fallback_text = filter_ocr_noise("\n".join(fallback_parts).strip())
|
||||||
return OcrResponse(
|
return OcrResponse(
|
||||||
text=fallback_text,
|
text=fallback_text,
|
||||||
@@ -190,91 +259,162 @@ def _process_pdf_doc(doc: fitz.Document, selected_engine: str, max_pages: int, o
|
|||||||
engineUsed="np-dms-ocr",
|
engineUsed="np-dms-ocr",
|
||||||
)
|
)
|
||||||
|
|
||||||
def process_ocr(pdf_path: str, page_num: int = 1, options_override: dict = {}, system_prompt: Optional[str] = None) -> str:
|
async def process_ocr(
|
||||||
|
pdf_path: str,
|
||||||
|
page_num: int = 1,
|
||||||
|
options_override: Optional[dict] = None,
|
||||||
|
system_prompt: Optional[str] = None,
|
||||||
|
runtime_params: Optional[dict] = None,
|
||||||
|
dms_tags: Optional[dict] = None,
|
||||||
|
) -> str:
|
||||||
"""เรียก np-dms-ocr ผ่าน Ollama /v1/chat/completions — รับ PDF path โดยตรง ไม่ต้องแปลง PIL Image"""
|
"""เรียก np-dms-ocr ผ่าน Ollama /v1/chat/completions — รับ PDF path โดยตรง ไม่ต้องแปลง PIL Image"""
|
||||||
|
options_override = options_override or {}
|
||||||
|
if "keep_alive" in options_override:
|
||||||
|
raise ValueError("keep_alive must be calculated by OCR residency policy")
|
||||||
|
residency = await asyncio.to_thread(calculate_ocr_residency, OCR_ACTIVE_PROFILE)
|
||||||
model_name = OCR_MODEL
|
model_name = OCR_MODEL
|
||||||
# prepare_ocr_messages จัดการ PDF → image ผ่าน poppler/pdftoppm ภายใน
|
# prepare_ocr_messages จัดการ PDF → image ผ่าน poppler/pdftoppm ภายใน
|
||||||
messages = prepare_ocr_messages(pdf_path, task_type="structure", page_num=page_num)
|
messages = prepare_ocr_messages(pdf_path, task_type="structure", page_num=page_num)
|
||||||
# inject system prompt ถ้ามี (ก่อน DMS tags)
|
# inject system prompt ถ้ามี (ก่อน DMS tags)
|
||||||
if system_prompt:
|
if system_prompt:
|
||||||
messages[0]["content"].append({"type": "text", "text": system_prompt})
|
messages[0]["content"].append({"type": "text", "text": system_prompt})
|
||||||
# inject DMS-specific extraction tags ต่อท้าย content
|
|
||||||
messages[0]["content"].append({
|
# Dynamic dms_tags mapping to prompts
|
||||||
"type": "text",
|
if dms_tags:
|
||||||
"text": (
|
dms_text = "Additionally:\n"
|
||||||
|
for key in dms_tags.keys():
|
||||||
|
readable_name = re.sub(r'(?<!^)(?=[A-Z])|_', ' ', key).lower()
|
||||||
|
dms_text += f"- Wrap {readable_name} with <{key}>...</{key}>\n"
|
||||||
|
dms_text += "If a field is not found, omit the tag."
|
||||||
|
else:
|
||||||
|
# Fallback to default DMS extraction tags
|
||||||
|
dms_text = (
|
||||||
"Additionally:\n"
|
"Additionally:\n"
|
||||||
"- Wrap document number with <document_number>...</document_number>\n"
|
"- Wrap document number with <document_number>...</document_number>\n"
|
||||||
"- Wrap document date with <document_date>...</document_date>\n"
|
"- Wrap document date with <document_date>...</document_date>\n"
|
||||||
"- Wrap received date with <received_date>...</received_date>\n"
|
"- Wrap received date with <received_date>...</received_date>\n"
|
||||||
"If a field is not found, omit the tag."
|
"If a field is not found, omit the tag."
|
||||||
),
|
)
|
||||||
|
|
||||||
|
# inject DMS-specific extraction tags ต่อท้าย content
|
||||||
|
messages[0]["content"].append({
|
||||||
|
"type": "text",
|
||||||
|
"text": dms_text,
|
||||||
})
|
})
|
||||||
|
|
||||||
|
# Resolve runtime parameters: remove hardcoded fallback values from sidecar
|
||||||
|
# Use empty dict if runtime_params not provided to allow Ollama Modelfile default
|
||||||
|
params = {}
|
||||||
|
if runtime_params:
|
||||||
|
if hasattr(runtime_params, "dict"):
|
||||||
|
params = runtime_params.dict()
|
||||||
|
elif isinstance(runtime_params, dict):
|
||||||
|
params = runtime_params
|
||||||
|
|
||||||
|
# Options override (e.g., from Sandbox form parameter overrides) takes precedence
|
||||||
|
merged_params = {}
|
||||||
|
if params:
|
||||||
|
merged_params.update(params)
|
||||||
|
if options_override:
|
||||||
|
merged_params.update(options_override)
|
||||||
|
|
||||||
# ค่า default ตาม official; options_override ยัง override ได้บางส่วน
|
# ค่า default ตาม official; options_override ยัง override ได้บางส่วน
|
||||||
|
logger.info(
|
||||||
|
f"OCR residency decision: keep_alive={residency.keep_alive_seconds}s "
|
||||||
|
f"reason={residency.reason} headroom={residency.vram_headroom_mb}MB"
|
||||||
|
)
|
||||||
payload = {
|
payload = {
|
||||||
"model": model_name,
|
"model": model_name,
|
||||||
"messages": messages,
|
"messages": messages,
|
||||||
"max_tokens": 16000,
|
|
||||||
"stream": False,
|
"stream": False,
|
||||||
"repetition_penalty": options_override.get("repeat_penalty", 1.2),
|
"keep_alive": residency.keep_alive_seconds,
|
||||||
"temperature": options_override.get("temperature", 0.1),
|
|
||||||
"top_p": options_override.get("top_p", 0.6),
|
|
||||||
"keep_alive": options_override.get("keep_alive", 0), # Unload model ทันทีหลังเสร็จงานเพื่อคืน VRAM ให้ np-dms-ai ใช้งานได้
|
|
||||||
}
|
}
|
||||||
# ใช้ Ollama OpenAI-compatible endpoint (/v1/chat/completions)
|
|
||||||
with httpx.Client(timeout=OCR_TIMEOUT) as client:
|
# Only send keys to Ollama if they are defined in merged_params (to support Modelfile fallback)
|
||||||
response = client.post(
|
if "temperature" in merged_params and merged_params["temperature"] is not None:
|
||||||
f"{OLLAMA_API_URL}/v1/chat/completions",
|
payload["temperature"] = float(merged_params["temperature"])
|
||||||
json=payload,
|
if "top_p" in merged_params and merged_params["top_p"] is not None:
|
||||||
headers={"Authorization": "Bearer ollama"},
|
payload["top_p"] = float(merged_params["top_p"])
|
||||||
|
if "repeat_penalty" in merged_params and merged_params["repeat_penalty"] is not None:
|
||||||
|
payload["repetition_penalty"] = float(merged_params["repeat_penalty"])
|
||||||
|
elif "repetition_penalty" in merged_params and merged_params["repetition_penalty"] is not None:
|
||||||
|
payload["repetition_penalty"] = float(merged_params["repetition_penalty"])
|
||||||
|
if "max_tokens" in merged_params and merged_params["max_tokens"] is not None:
|
||||||
|
payload["max_tokens"] = int(merged_params["max_tokens"])
|
||||||
|
|
||||||
|
# T044: ใช้ shared AsyncClient (ollama_client) แทน httpx.Client แบบ sync
|
||||||
|
# ถ้า ollama_client ยังไม่ถูกสร้าง (เช่น unit test ที่เรียกตรง) ให้สร้างชั่วคราว
|
||||||
|
client = ollama_client
|
||||||
|
if client is None:
|
||||||
|
client = httpx.AsyncClient(timeout=OCR_TIMEOUT)
|
||||||
|
response = await client.post(
|
||||||
|
f"{OLLAMA_API_URL}/v1/chat/completions",
|
||||||
|
json=payload,
|
||||||
|
headers={"Authorization": "Bearer ollama"},
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
data = response.json()
|
||||||
|
raw_text = str(data.get("choices", [{}])[0].get("message", {}).get("content", "")).strip()
|
||||||
|
# parse JSON output จาก model (format: {"natural_text": "..."})
|
||||||
|
try:
|
||||||
|
result_text = json.loads(raw_text).get("natural_text", raw_text)
|
||||||
|
except (json.JSONDecodeError, AttributeError):
|
||||||
|
result_text = raw_text
|
||||||
|
logger.info(
|
||||||
|
f"[DIAG] Ollama response — model={model_name} "
|
||||||
|
f"textLen={len(result_text)} "
|
||||||
|
f"done={data.get('done')} "
|
||||||
|
f"done_reason={data.get('done_reason')} "
|
||||||
|
f"eval_count={data.get('eval_count', 0)}"
|
||||||
|
)
|
||||||
|
if not result_text:
|
||||||
|
logger.warning(
|
||||||
|
f"[DIAG] Ollama returned empty response — full response keys: {list(data.keys())}"
|
||||||
)
|
)
|
||||||
response.raise_for_status()
|
# ปิด temporary client ถ้าสร้างชั่วคราว
|
||||||
data = response.json()
|
if ollama_client is None:
|
||||||
raw_text = str(data.get("choices", [{}])[0].get("message", {}).get("content", "")).strip()
|
await client.aclose()
|
||||||
# parse JSON output จาก model (format: {"natural_text": "..."})
|
return result_text
|
||||||
try:
|
|
||||||
result_text = json.loads(raw_text).get("natural_text", raw_text)
|
|
||||||
except (json.JSONDecodeError, AttributeError):
|
|
||||||
result_text = raw_text
|
|
||||||
logger.info(
|
|
||||||
f"[DIAG] Ollama response — model={model_name} "
|
|
||||||
f"textLen={len(result_text)} "
|
|
||||||
f"done={data.get('done')} "
|
|
||||||
f"done_reason={data.get('done_reason')} "
|
|
||||||
f"eval_count={data.get('eval_count', 0)}"
|
|
||||||
)
|
|
||||||
if not result_text:
|
|
||||||
logger.warning(
|
|
||||||
f"[DIAG] Ollama returned empty response — full response keys: {list(data.keys())}"
|
|
||||||
)
|
|
||||||
return result_text
|
|
||||||
|
|
||||||
@app.post("/ocr", response_model=OcrResponse, dependencies=[Depends(get_api_key)])
|
@app.post("/ocr", response_model=OcrResponse, dependencies=[Depends(get_api_key)])
|
||||||
def ocr_extract(req: OcrRequest):
|
async def ocr_extract(req: OcrRequest):
|
||||||
"""OCR จาก path (legacy — ใช้เมื่อ sidecar และ backend เข้าถึง storage เดียวกัน)"""
|
"""OCR จาก path (legacy — ใช้เมื่อ sidecar และ backend เข้าถึง storage เดียวกัน)"""
|
||||||
pdf_path = Path(req.pdfPath)
|
if req.keep_alive is not None:
|
||||||
|
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="keep_alive is managed by OCR residency policy")
|
||||||
|
pdf_path = validate_pdf_path(req.pdfPath)
|
||||||
if not pdf_path.exists():
|
if not pdf_path.exists():
|
||||||
raise HTTPException(status_code=404, detail=f"ไม่พบไฟล์: {req.pdfPath}")
|
raise HTTPException(status_code=404, detail=f"ไม่พบไฟล์: {req.pdfPath}")
|
||||||
selected_engine = (req.engine or "auto").strip().lower()
|
selected_engine = (req.engine or "auto").strip().lower()
|
||||||
max_pages = req.maxPages or MAX_PAGES
|
max_pages = req.maxPages or MAX_PAGES
|
||||||
ocr_options = {}
|
ocr_options = {}
|
||||||
if req.keep_alive is not None:
|
|
||||||
ocr_options["keep_alive"] = req.keep_alive
|
|
||||||
try:
|
try:
|
||||||
doc = fitz.open(str(pdf_path))
|
doc = fitz.open(str(pdf_path))
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
raise HTTPException(status_code=422, detail=f"เปิดไฟล์ PDF ล้มเหลว: {e}")
|
raise HTTPException(status_code=422, detail=f"เปิดไฟล์ PDF ล้มเหลว: {e}")
|
||||||
return _process_pdf_doc(doc, selected_engine, max_pages, ocr_options)
|
return await _process_pdf_doc(
|
||||||
|
doc,
|
||||||
|
selected_engine,
|
||||||
|
max_pages,
|
||||||
|
ocr_options,
|
||||||
|
pdf_path=str(pdf_path),
|
||||||
|
system_prompt=req.system_prompt,
|
||||||
|
runtime_params=req.runtime_params,
|
||||||
|
dms_tags=req.dms_tags,
|
||||||
|
)
|
||||||
|
|
||||||
@app.post("/ocr-upload", response_model=OcrResponse, dependencies=[Depends(get_api_key)])
|
@app.post("/ocr-upload", response_model=OcrResponse, dependencies=[Depends(get_api_key)])
|
||||||
def ocr_upload(
|
async def ocr_upload(
|
||||||
file: UploadFile = File(...),
|
file: UploadFile = File(...),
|
||||||
engine: str = Form(default="auto"),
|
engine: str = Form(default="auto"),
|
||||||
maxPages: int = Form(default=0),
|
maxPages: int = Form(default=0),
|
||||||
temperature: Optional[float] = Form(default=None),
|
temperature: Optional[float] = Form(default=None),
|
||||||
topP: Optional[float] = Form(default=None),
|
topP: Optional[float] = Form(default=None),
|
||||||
repeatPenalty: Optional[float] = Form(default=None),
|
repeatPenalty: Optional[float] = Form(default=None),
|
||||||
|
maxTokens: Optional[int] = Form(default=None),
|
||||||
keep_alive: Optional[int] = Form(default=None),
|
keep_alive: Optional[int] = Form(default=None),
|
||||||
systemPrompt: Optional[str] = Form(default=None),
|
systemPrompt: Optional[str] = Form(default=None),
|
||||||
|
dmsTags: Optional[str] = Form(default=None),
|
||||||
|
runtimeParams: Optional[str] = Form(default=None),
|
||||||
):
|
):
|
||||||
"""OCR จาก multipart file upload — ไม่ต้องการ shared volume mount"""
|
"""OCR จาก multipart file upload — ไม่ต้องการ shared volume mount"""
|
||||||
# Validate systemPrompt ถ้ามีส่งมา (gap-1: sidecar validation)
|
# Validate systemPrompt ถ้ามีส่งมา (gap-1: sidecar validation)
|
||||||
@@ -292,6 +432,22 @@ def ocr_upload(
|
|||||||
)
|
)
|
||||||
selected_engine = engine.strip().lower()
|
selected_engine = engine.strip().lower()
|
||||||
max_pages = maxPages or MAX_PAGES
|
max_pages = maxPages or MAX_PAGES
|
||||||
|
|
||||||
|
# Parse runtimeParams and dmsTags from form-data JSON strings if provided
|
||||||
|
runtime_params_dict = {}
|
||||||
|
if runtimeParams:
|
||||||
|
try:
|
||||||
|
runtime_params_dict = json.loads(runtimeParams)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to parse runtimeParams JSON: {e}")
|
||||||
|
|
||||||
|
dms_tags_dict = None
|
||||||
|
if dmsTags:
|
||||||
|
try:
|
||||||
|
dms_tags_dict = json.loads(dmsTags)
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to parse dmsTags JSON: {e}")
|
||||||
|
|
||||||
# รวม options override สำหรับ np-dms-ocr (ถ้า frontend ส่งมา)
|
# รวม options override สำหรับ np-dms-ocr (ถ้า frontend ส่งมา)
|
||||||
ocr_options: dict = {}
|
ocr_options: dict = {}
|
||||||
if temperature is not None:
|
if temperature is not None:
|
||||||
@@ -300,10 +456,11 @@ def ocr_upload(
|
|||||||
ocr_options["top_p"] = topP
|
ocr_options["top_p"] = topP
|
||||||
if repeatPenalty is not None:
|
if repeatPenalty is not None:
|
||||||
ocr_options["repeat_penalty"] = repeatPenalty
|
ocr_options["repeat_penalty"] = repeatPenalty
|
||||||
|
if maxTokens is not None:
|
||||||
|
ocr_options["max_tokens"] = maxTokens
|
||||||
if keep_alive is not None:
|
if keep_alive is not None:
|
||||||
ocr_options["keep_alive"] = keep_alive
|
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="keep_alive is managed by OCR residency policy")
|
||||||
pdf_bytes = file.file.read()
|
pdf_bytes = file.file.read()
|
||||||
import tempfile
|
|
||||||
tmp_pdf_path: str | None = None
|
tmp_pdf_path: str | None = None
|
||||||
try:
|
try:
|
||||||
# บันทึก PDF เป็น temp file เพื่อให้ prepare_ocr_messages อ่านได้ผ่าน path
|
# บันทึก PDF เป็น temp file เพื่อให้ prepare_ocr_messages อ่านได้ผ่าน path
|
||||||
@@ -315,29 +472,20 @@ def ocr_upload(
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
raise HTTPException(status_code=422, detail=f"เปิดไฟล์ PDF ล้มเหลว: {e}")
|
raise HTTPException(status_code=422, detail=f"เปิดไฟล์ PDF ล้มเหลว: {e}")
|
||||||
logger.info(f"OCR upload: {file.filename} engine={selected_engine} options={ocr_options or 'modelfile-defaults'}")
|
logger.info(f"OCR upload: {file.filename} engine={selected_engine} options={ocr_options or 'modelfile-defaults'}")
|
||||||
return _process_pdf_doc(doc, selected_engine, max_pages, ocr_options, pdf_path=tmp_pdf_path, system_prompt=systemPrompt)
|
return await _process_pdf_doc(
|
||||||
|
doc,
|
||||||
|
selected_engine,
|
||||||
|
max_pages,
|
||||||
|
ocr_options,
|
||||||
|
pdf_path=tmp_pdf_path,
|
||||||
|
system_prompt=systemPrompt,
|
||||||
|
runtime_params=runtime_params_dict,
|
||||||
|
dms_tags=dms_tags_dict,
|
||||||
|
)
|
||||||
finally:
|
finally:
|
||||||
if tmp_pdf_path:
|
if tmp_pdf_path:
|
||||||
Path(tmp_pdf_path).unlink(missing_ok=True)
|
Path(tmp_pdf_path).unlink(missing_ok=True)
|
||||||
|
|
||||||
class NormalizeRequest(BaseModel):
|
|
||||||
text: str
|
|
||||||
|
|
||||||
class NormalizeResponse(BaseModel):
|
|
||||||
normalized: str
|
|
||||||
|
|
||||||
@app.post("/normalize", response_model=NormalizeResponse, dependencies=[Depends(get_api_key)])
|
|
||||||
def normalize_text(req: NormalizeRequest):
|
|
||||||
"""Normalize Thai text ด้วย PyThaiNLP สำหรับ rag-thai-preprocess queue"""
|
|
||||||
try:
|
|
||||||
# normalize unicode + ตัดคำแล้วต่อกลับด้วย space เพื่อ embedding
|
|
||||||
normalized = thai_normalize(req.text)
|
|
||||||
tokens = word_tokenize(normalized, engine="newmm", keep_whitespace=False)
|
|
||||||
result = " ".join(tokens)
|
|
||||||
return NormalizeResponse(normalized=result)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Thai normalize failed, returning raw text: {e}")
|
|
||||||
return NormalizeResponse(normalized=req.text)
|
|
||||||
class EmbedRequest(BaseModel):
|
class EmbedRequest(BaseModel):
|
||||||
text: str
|
text: str
|
||||||
|
|
||||||
@@ -362,7 +510,7 @@ async def embed_text(req: EmbedRequest):
|
|||||||
raise HTTPException(status_code=503, detail="BGE-M3 model not loaded")
|
raise HTTPException(status_code=503, detail="BGE-M3 model not loaded")
|
||||||
threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0"))
|
threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0"))
|
||||||
timeout_sec = float(os.getenv("RETRIEVAL_TIMEOUT_SECONDS", "30.0"))
|
timeout_sec = float(os.getenv("RETRIEVAL_TIMEOUT_SECONDS", "30.0"))
|
||||||
headroom = get_vram_headroom()
|
headroom = await asyncio.to_thread(get_vram_headroom)
|
||||||
device = "cuda"
|
device = "cuda"
|
||||||
reason = "headroom-sufficient"
|
reason = "headroom-sufficient"
|
||||||
if not headroom.query_success:
|
if not headroom.query_success:
|
||||||
@@ -427,7 +575,7 @@ async def rerank_chunks(req: RerankRequest):
|
|||||||
return RerankResponse(scores=[], ranked_indices=[], device="cpu")
|
return RerankResponse(scores=[], ranked_indices=[], device="cpu")
|
||||||
threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0"))
|
threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0"))
|
||||||
timeout_sec = float(os.getenv("RETRIEVAL_TIMEOUT_SECONDS", "30.0"))
|
timeout_sec = float(os.getenv("RETRIEVAL_TIMEOUT_SECONDS", "30.0"))
|
||||||
headroom = get_vram_headroom()
|
headroom = await asyncio.to_thread(get_vram_headroom)
|
||||||
device = "cuda"
|
device = "cuda"
|
||||||
reason = "headroom-sufficient"
|
reason = "headroom-sufficient"
|
||||||
if not headroom.query_success:
|
if not headroom.query_success:
|
||||||
|
|||||||
+8
-3
@@ -1,5 +1,5 @@
|
|||||||
# File: specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/docker-compose.yml
|
# File: specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/docker-compose.yml
|
||||||
# Tesseract OCR Sidecar — รันบน Desk-5439 (AI Isolation Host) ตาม ADR-023A
|
# OCR Sidecar — รันบน Desk-5439 (AI Isolation Host) ตาม ADR-023A/ADR-040
|
||||||
# Change Log:
|
# Change Log:
|
||||||
# - 2026-05-25: Initial compose file สำหรับ Tesseract OCR HTTP sidecar
|
# - 2026-05-25: Initial compose file สำหรับ Tesseract OCR HTTP sidecar
|
||||||
# - 2026-05-25: แก้ volumes ให้ถูกต้องสำหรับ Windows + Docker Desktop
|
# - 2026-05-25: แก้ volumes ให้ถูกต้องสำหรับ Windows + Docker Desktop
|
||||||
@@ -16,6 +16,7 @@
|
|||||||
# - 2026-06-11: US2 & US3 - เพิ่ม VRAM headroom, residency window, pressure threshold, retrieval timeout env variables
|
# - 2026-06-11: US2 & US3 - เพิ่ม VRAM headroom, residency window, pressure threshold, retrieval timeout env variables
|
||||||
# - 2026-06-13: ADR-036 — เปลี่ยน TYPHOON_OCR_MODEL เป็น OCR_MODEL=np-dms-ocr:latest
|
# - 2026-06-13: ADR-036 — เปลี่ยน TYPHOON_OCR_MODEL เป็น OCR_MODEL=np-dms-ocr:latest
|
||||||
# - 2026-06-17: ลบชื่อ Typhoon ออกจากทุก environment variable และ comment (เปลี่ยนเป็น OCR_* ตาม ADR-036)
|
# - 2026-06-17: ลบชื่อ Typhoon ออกจากทุก environment variable และ comment (เปลี่ยนเป็น OCR_* ตาม ADR-036)
|
||||||
|
# - 2026-06-20: ADR-040 Phase 6+8 — ลบ OCR_LANG, USE_GPU (stale Tesseract config); เพิ่ม OCR_SIDECAR_API_KEY, OCR_ACTIVE_PROFILE
|
||||||
#
|
#
|
||||||
# วิธีรัน:
|
# วิธีรัน:
|
||||||
# docker compose up -d --build
|
# docker compose up -d --build
|
||||||
@@ -39,8 +40,12 @@ services:
|
|||||||
OCR_CHAR_THRESHOLD: "100"
|
OCR_CHAR_THRESHOLD: "100"
|
||||||
OCR_PORT: "8765"
|
OCR_PORT: "8765"
|
||||||
OCR_MAX_PAGES: "0"
|
OCR_MAX_PAGES: "0"
|
||||||
OCR_LANG: "tha+eng" # Tesseract language code (Thai + English)
|
# ─── Security (ADR-040 Phase 1) ─────────────────────────────────
|
||||||
USE_GPU: "false" # OCR sidecar รันบน CPU, np-dms-ocr ใช้ Ollama แยก
|
# OCR_SIDECAR_API_KEY: อ่านจาก .env file (ห้าม hardcode ใน compose)
|
||||||
|
OCR_SIDECAR_API_KEY: ${OCR_SIDECAR_API_KEY}
|
||||||
|
# ─── Adaptive OCR Residency (ADR-040 Phase 4) ───────────────────
|
||||||
|
# OCR_ACTIVE_PROFILE: ชื่อ profile ใน ai_execution_profiles (ถ้าไม่ระบุ จะใช้ default)
|
||||||
|
OCR_ACTIVE_PROFILE: ${OCR_ACTIVE_PROFILE:-}
|
||||||
# ─── OCR via Ollama (ADR-034) ───────────────────────────────────
|
# ─── OCR via Ollama (ADR-034) ───────────────────────────────────
|
||||||
# ชี้ตรงไปยัง Ollama (port 11434) ไม่ผ่าน metrics proxy
|
# ชี้ตรงไปยัง Ollama (port 11434) ไม่ผ่าน metrics proxy
|
||||||
# (proxy ไม่ forward /api/generate ได้ถูกต้อง — ทำให้ response ว่าง)
|
# (proxy ไม่ forward /api/generate ได้ถูกต้อง — ทำให้ response ว่าง)
|
||||||
|
|||||||
+1
-2
@@ -5,14 +5,13 @@
|
|||||||
# - 2026-05-30: เพิ่ม opencv-python สำหรับ image preprocessing (threshold, denoise) เพื่อเพิ่มความแม่นยำ OCR
|
# - 2026-05-30: เพิ่ม opencv-python สำหรับ image preprocessing (threshold, denoise) เพื่อเพิ่มความแม่นยำ OCR
|
||||||
# - 2026-06-11: เพิ่ม typhoon-ocr สำหรับ prepare_ocr_messages (official prompt builder สำหรับ typhoon-ocr1.5-3b)
|
# - 2026-06-11: เพิ่ม typhoon-ocr สำหรับ prepare_ocr_messages (official prompt builder สำหรับ typhoon-ocr1.5-3b)
|
||||||
# - 2026-06-11: ตัด pytesseract, opencv-python, numpy ออก — ไม่ใช้ Tesseract อีกต่อไป
|
# - 2026-06-11: ตัด pytesseract, opencv-python, numpy ออก — ไม่ใช้ Tesseract อีกต่อไป
|
||||||
|
# - 2026-06-20: ADR-040 Phase 8 — ตัด pythainlp และ Pillow ออก (ไม่มี /normalize endpoint แล้ว, process_ocr ใช้ prepare_ocr_messages)
|
||||||
|
|
||||||
PyMuPDF==1.24.0
|
PyMuPDF==1.24.0
|
||||||
fastapi==0.111.0
|
fastapi==0.111.0
|
||||||
uvicorn[standard]==0.30.1
|
uvicorn[standard]==0.30.1
|
||||||
python-multipart==0.0.9
|
python-multipart==0.0.9
|
||||||
pythainlp==5.0.4
|
|
||||||
httpx==0.27.0
|
httpx==0.27.0
|
||||||
Pillow==10.0.0
|
|
||||||
FlagEmbedding>=1.2.0
|
FlagEmbedding>=1.2.0
|
||||||
typhoon-ocr>=0.4.1
|
typhoon-ocr>=0.4.1
|
||||||
|
|
||||||
|
|||||||
+1
-1
@@ -17,7 +17,7 @@ class OcrResidencyDecision:
|
|||||||
|
|
||||||
def calculate_ocr_residency(active_profile: str = None) -> OcrResidencyDecision:
|
def calculate_ocr_residency(active_profile: str = None) -> OcrResidencyDecision:
|
||||||
"""
|
"""
|
||||||
คำนวณ keep_alive สำหรับ Typhoon OCR จาก VRAM headroom และ active profile ของโมเดลหลัก
|
คำนวณ keep_alive สำหรับ np-dms-ocr จาก VRAM headroom และ active profile ของโมเดลหลัก
|
||||||
"""
|
"""
|
||||||
threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0"))
|
threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0"))
|
||||||
residency_window = int(os.getenv("OCR_RESIDENCY_WINDOW_SECONDS", "120"))
|
residency_window = int(os.getenv("OCR_RESIDENCY_WINDOW_SECONDS", "120"))
|
||||||
|
|||||||
@@ -0,0 +1,210 @@
|
|||||||
|
<!-- File: specs/06-Decision-Records/ADR-040-ocr-sidecar-refactor.md -->
|
||||||
|
<!-- Change Log
|
||||||
|
- 2026-06-20: Created initial ADR-040 documenting OCR sidecar refactor decisions.
|
||||||
|
- Supersedes ADR-033 §7 (X-API-Key sidecar auth) in favor of network isolation.
|
||||||
|
- Preserves resolved GPU policies (Adaptive Residency, CPU Fallback, LLM-First Ownership).
|
||||||
|
- Aligns with ADR-036 Profile-Only Parameter Governance.
|
||||||
|
- References ADR-041 for server consolidation enabling network-only auth.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# ADR-040: OCR Sidecar Refactor — Pure Compute Worker, Preserved GPU Policy, Network-Trust Boundary
|
||||||
|
|
||||||
|
**Status:** Proposed
|
||||||
|
**Date:** 2026-06-20
|
||||||
|
**Supersedes:** ADR-033 §7 (X-API-Key sidecar auth)
|
||||||
|
**Amends:** ADR-036 §5 (sidecar contract), ADR-034 (model identity unchanged)
|
||||||
|
**Related Documents:**
|
||||||
|
- [ADR-016: Security & Authentication](./ADR-016-security-authentication.md)
|
||||||
|
- [ADR-008: Email Notification Strategy](./ADR-008-email-notification-strategy.md)
|
||||||
|
- [ADR-029: Dynamic Prompt Management](./ADR-029-dynamic-prompt-management.md)
|
||||||
|
- [ADR-037: Active Prompt System](./ADR-037-active-prompt-system.md)
|
||||||
|
- [ADR-035: AI Pipeline & OCR Integration](./ADR-035-ai-pipeline-ocr-integration.md)
|
||||||
|
- [ADR-041: Server Consolidation](./ADR-041-server-consolidation.md)
|
||||||
|
- [CONTEXT.md](../../00-overview/CONTEXT.md)
|
||||||
|
- [OCR Sidecar Refactor Plan - Claude](../../../docs/ocr-sidecar-refactor-plan-cluade.md)
|
||||||
|
- [OCR Sidecar Refactor Plan - Qwen](../../../docs/ocr-sidecar-refactor-plan-qwen.md)
|
||||||
|
|
||||||
|
> **Note:** ADR numbers 038–039 are intentionally reserved/skipped.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 Context and Problem Statement
|
||||||
|
|
||||||
|
### Current Architecture
|
||||||
|
|
||||||
|
OCR Sidecar บน Desk-5439 (RTX 5060 Ti 16GB) ทำหน้าที่เป็น FastAPI HTTP service สำหรับ:
|
||||||
|
- `/ocr` - สกัดข้อความจาก PDF ผ่าน Typhoon OCR (np-dms-ocr via Ollama)
|
||||||
|
- `/embed` - สร้าง vector embedding ผ่าน BGE-M3
|
||||||
|
- `/rerank` - จัดลำดับผลลัพธ์ retrieval ผ่าน FlagReranker
|
||||||
|
- `/normalize` - normalize ภาษาไทย (ใช้โดย ThaiPreprocessProcessor)
|
||||||
|
|
||||||
|
### Problems Identified
|
||||||
|
|
||||||
|
จากการทบทวนสองแผน refactor (Claude + Qwen) พบปัญหาดังนี้:
|
||||||
|
|
||||||
|
1. **Security Bug:** Hardcoded default API key (`lcbp3-dms-ocr-sidecar-secure-token-2026`) ใน `app.py` — หาก leak จะไม่สามารถ rotate ได้โดยไม่ rebuild container
|
||||||
|
2. **Synchronous Blocking I/O:** `process_ocr` ใช้ `httpx.Client` แบบ sync ทำให้ block event loop ของ FastAPI
|
||||||
|
3. **Deprecated Startup Pattern:** ใช้ `@app.on_event("startup")` แทน `lifespan` context manager
|
||||||
|
4. **Hardcoded keep_alive:** `process_ocr` บังคับ `keep_alive: 0` แต่ไม่ได้เรียก `calculate_ocr_residency()` จาก `residency_policy.py` — ทำให้ Adaptive OCR Residency policy ไม่ทำงาน
|
||||||
|
5. **Hardcoded Runtime Parameters:** `temperature`, `top_p`, `repeat_penalty`, `max_tokens` ถูก hardcode ใน sidecar แทนการดึงจาก `ai_execution_profiles` (ADR-036 Profile-Only Parameter Governance)
|
||||||
|
6. **Path Traversal Vulnerability:** `/ocr` endpoint เปิดไฟล์ตาม `req.pdfPath` โดยไม่มี canonicalization/whitelist — เสี่ยง arbitrary file read (ADR-016)
|
||||||
|
7. **Cross-Host Trust Gap:** ปัจจุบัน sidecar อยู่บน Desk-5439 (192.168.10.100) และ backend อยู่บน QNAP (192.168.10.8) — "Docker internal network" เป็นเท็จ ต้องพึ่ง VLAN/firewall ACL
|
||||||
|
8. **Mutable Default Argument:** `process_with_typhoon_ocr(pdf_path, ..., options_override={})` — Python anti-pattern
|
||||||
|
|
||||||
|
### Conflict with Canonical Specs
|
||||||
|
|
||||||
|
การทบทวนทั้งสองแผนพบว่า:
|
||||||
|
- **Claude** สมมติ `np-dms-ai = llama3.2 3B (~2–3GB)` แต่ ADR-034/CONTEXT ระบุ `np-dms-ai` runtime คือ Typhoon-2.5 (~7–8B) — VRAM budget ผิด
|
||||||
|
- **ทั้งสองแผน** เสนอลบ `vram_monitor.py` / `residency_policy.py` และบังคับ BGE+Reranker GPU-resident — ละเมิด LLM-First GPU Ownership + CPU Fallback Retrieval ที่ CONTEXT.md ได้ resolve ไว้แล้ว
|
||||||
|
- **ทั้งสองแผน** ถือ `keep_alive` เป็น fixed config value — ละเมิด ADR-036 Gap-2 (keep_alive = lazy resource param via residency policy)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚙️ Decision Drivers
|
||||||
|
|
||||||
|
* **Preserve Resolved GPU Policy:** Adaptive OCR Residency + CPU Fallback Retrieval + LLM-First GPU Ownership (CONTEXT.md)
|
||||||
|
* **Profile-Only Parameter Governance:** พารามิเตอร์ AI model (temperature, top_p, keep_alive) ต้องมาจาก `ai_execution_profiles` row `ocr-extract` (ADR-036)
|
||||||
|
* **Security (ADR-016):** Path traversal hardening, no hardcoded secrets
|
||||||
|
* **Network Trust Boundary:** Server consolidation (ADR-041) ทำให้ Docker-internal isolation เป็นไปได้จริง
|
||||||
|
* **No Invented Orchestration:** ห้ามสร้าง `VramMutexService`, `GpuTaskQueue`, `PromptBuilderService` ใหม่ — ใช้ existing services/Active Prompt ตาม ADR-008, ADR-029/037
|
||||||
|
* **ADR-023A Boundary:** AI sidecar ห้ามเข้าถึง DB/storage โดยตรง
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🏛️ Decisions
|
||||||
|
|
||||||
|
### D1: Sidecar as Pure Compute Worker
|
||||||
|
Sidecar ทำหน้าที่เป็น compute worker เท่านั้น — orchestration, parameter governance, และ business logic อยู่ใน backend (existing services)
|
||||||
|
- **Reject:** การสร้าง `PromptBuilderService`, `OcrNoiseFilterService`, `OcrOrchestratorService` ใหม่ (Qwen plan)
|
||||||
|
- **Fast-path decision** (PyMuPDF chars > 100 → fast path): คงไว้ใน sidecar
|
||||||
|
- **Page range calculation:** ย้ายไป backend
|
||||||
|
- **Engine selection:** ไม่ต้องมีแล้ว — ใช้ np-dms-ocr ตัวเดียว (Typhoon OCR)
|
||||||
|
- **systemPrompt validation** (ตรวจสอบ placeholders เช่น `{{ocr_text}}`): backend
|
||||||
|
|
||||||
|
### D2: Remove /normalize Endpoint
|
||||||
|
- **ตัด /normalize endpoint** ออกจาก sidecar
|
||||||
|
- **ใช้แค่ np-dms-ocr (OCR)** เท่านั้น — sidecar ไม่รองรับ Thai normalization
|
||||||
|
- ThaiPreprocessProcessor ไม่มีการใช้งาน — ไม่ต้องแก้ไข backend
|
||||||
|
|
||||||
|
### D3: Async I/O + Lifespan + Shared AsyncClient
|
||||||
|
- `process_ocr` → `async def`
|
||||||
|
- ใช้ `httpx.AsyncClient` shared ผ่าน lifespan context manager
|
||||||
|
- เปลี่ยนจาก `@app.on_event("startup")` เป็น `@asynccontextmanager` lifespan
|
||||||
|
- Load models ผ่าน `asyncio.to_thread` เพื่อไม่ block startup
|
||||||
|
|
||||||
|
### D4: keep_alive via calculate_ocr_residency() (Lazy, ADR-036 Gap-2)
|
||||||
|
- Wire `calculate_ocr_residency(active_profile)` เข้า `process_ocr`
|
||||||
|
- ไม่ใช้ fixed value (Claude 300, Qwen 0/10m)
|
||||||
|
- **ไม่รับ** explicit `options_override["keep_alive"]` จาก backend — keep_alive เป็น lazy resource param ที่คำนวณณ process time เท่านั้น (ADR-036 Gap-2)
|
||||||
|
- **Reject:** การลบ `vram_monitor.py` / `residency_policy.py`
|
||||||
|
|
||||||
|
### D5: Retain vram_monitor + CPU-Fallback for /embed, /rerank
|
||||||
|
- **Reject:** การบังคับ BGE-M3 + Reranker GPU-resident ถาวร
|
||||||
|
- **Keep:** Dynamic CPU/GPU selection ผ่าน `.to(device)` logic
|
||||||
|
- เป็นการ implement LLM-First GPU Ownership + CPU Fallback Retrieval
|
||||||
|
|
||||||
|
### D6: Remove Hardcoded Default Key; Auth = Network Isolation (2-Phase)
|
||||||
|
- **Phase 1** (ก่อน consolidation): ลบ hardcoded default `OCR_SIDECAR_API_KEY` — fail-fast ถ้า env missing
|
||||||
|
- **Phase 2** (หลัง consolidation): **Supersedes ADR-033 §7** — ลบ `X-API-Key` validation จาก sidecar endpoints และ backend send-side
|
||||||
|
- **Network Isolation:** ตรวจสอบผ่าน Docker-internal network (post-consolidation) หรือ VLAN/firewall ACL (interim cross-host)
|
||||||
|
- **Sequencing:** ลบ `X-API-Key` เฉพาะเมื่อ ADR-041 cutover เสร็จ (single Docker host)
|
||||||
|
- **Interim Period:** ระหว่าง Phase 1 และ Phase 2, sidecar และ backend ต้อง **ยังคง** validate และส่ง `X-API-Key`
|
||||||
|
- Rotate leaked key ก่อน cutover
|
||||||
|
|
||||||
|
### D7: Path Canonicalization + Base-Path Whitelist on /ocr
|
||||||
|
- Canonicalize `pdfPath` ผ่าน `os.path.abspath()` + `os.path.realpath()`
|
||||||
|
- Whitelist base path = `OCR_SIDECAR_UPLOAD_BASE` (CIFS mount base)
|
||||||
|
- Reject paths ที่ไม่ได้อยู่ภายใต้ base path → 403 Forbidden
|
||||||
|
|
||||||
|
### D8: Runtime Params from Job Snapshot (ocr-extract row)
|
||||||
|
- **Backend** resolve params จาก `ai_execution_profiles` (row `ocr-extract` สำหรับ OCR, profile สำหรับ LLM)
|
||||||
|
- **Backend** ส่ง params (`temperature`, `top_p`, `repeat_penalty`, `max_tokens`) ไปให้ sidecar
|
||||||
|
- **Sidecar** รับ params จาก backend แล้วส่งต่อไป Ollama (ในทุกครั้งที่ load/generate)
|
||||||
|
- ห้าม hardcode defaults ใน sidecar
|
||||||
|
- Modfile ทำหน้าที่เป็น last-resort fallback เท่านั้น
|
||||||
|
- Align กับ ADR-036 Profile-Only Parameter Governance
|
||||||
|
|
||||||
|
### D9: DMS Tags + SystemPrompt from Active Prompt
|
||||||
|
- **Backend** resolve systemPrompt จาก Active Prompt ใน `ai_prompts` (ADR-029/037)
|
||||||
|
- **Backend** resolve DMS extraction tags (`<document_number>`, `<document_date>`, `<received_date>`) จาก Active Prompt
|
||||||
|
- **Backend** ส่งทั้ง systemPrompt และ DMS tags ไปให้ sidecar
|
||||||
|
- **Sidecar** รับ systemPrompt และ DMS tags จาก backend แล้วส่งต่อไป Ollama (ในทุกครั้งที่ load/generate)
|
||||||
|
- **Reject:** การสร้าง `PromptBuilderService` ใหม่เป็น prompt authority
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Implementation Tasks
|
||||||
|
|
||||||
|
### Phase 1 — ก่อน ADR-041 Consolidation (ยังคง X-API-Key)
|
||||||
|
|
||||||
|
| Task ID | Component | Summary | Status |
|
||||||
|
| :--- | :--- | :--- | :--- |
|
||||||
|
| T001 | Sidecar | Remove hardcoded default API key (fail-fast if env missing) | Pending |
|
||||||
|
| T002 | Sidecar | Fix mutable default arg `options_override={}` | Pending |
|
||||||
|
| T003 | Sidecar | Remove duplicate `import tempfile` | Pending |
|
||||||
|
| T004 | Sidecar | Refactor to async I/O + shared AsyncClient | Pending |
|
||||||
|
| T005 | Sidecar | Replace `@app.on_event("startup")` with lifespan | Pending |
|
||||||
|
| T006 | Sidecar | Wire `calculate_ocr_residency()` into `process_ocr` | Pending |
|
||||||
|
| T007 | Sidecar | Path canonicalization + base-path whitelist on `/ocr` | Pending |
|
||||||
|
| T008 | Sidecar | Remove hardcoded runtime params (use from job snapshot) | Pending |
|
||||||
|
| T009 | Sidecar | Receive systemPrompt + DMS tags from backend, pass to Ollama | Pending |
|
||||||
|
| T010 | Sidecar | Remove `/normalize` endpoint (D2) | Pending |
|
||||||
|
| T011 | Backend | Send runtime params from `ai_execution_profiles` snapshot to sidecar | Pending |
|
||||||
|
| T012 | Backend | Wire Active Prompt injection for DMS tags + systemPrompt | Pending |
|
||||||
|
| T013 | Tests | Pytest for path-traversal (403) | Pending |
|
||||||
|
| T014 | Tests | Unit check for residency wiring | Pending |
|
||||||
|
|
||||||
|
### Phase 2 — หลัง ADR-041 Consolidation (ลบ X-API-Key)
|
||||||
|
|
||||||
|
| Task ID | Component | Summary | Status |
|
||||||
|
| :--- | :--- | :--- | :--- |
|
||||||
|
| T016 | Sidecar | Remove `X-API-Key` validation from endpoints | Pending (ADR-041 cutover) |
|
||||||
|
| T017 | Backend | Remove `X-API-Key` send-side in `OcrService` | Pending (ADR-041 cutover) |
|
||||||
|
| T018 | Backend | Remove `X-API-Key` send-side in `SandboxOcrEngineService` | Pending (ADR-041 cutover) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
|
||||||
|
* **OOM Safety Retained:** รักษา Adaptive OCR Residency + CPU Fallback Retrieval — ป้องกัน VRAM exhaustion
|
||||||
|
* **Spec-Consistent:** สอดคล้องกับ ADR-036, ADR-029/037, CONTEXT.md
|
||||||
|
* **Smaller Sidecar Surface:** Pure compute worker — ไม่มี business logic หรือ parameter governance
|
||||||
|
* **Security Hardened:** Path traversal fix, no hardcoded secrets
|
||||||
|
* **Performance:** Async I/O ลด blocking, shared AsyncClient ลด connection overhead
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
|
||||||
|
* **Lose Defense-in-Depth Auth:** ลบ `X-API-Key` ทำให้ขึ้นอยู่กับ network isolation เท่านั้น — mitigated โดย ACL/bridge network
|
||||||
|
* **Cross-Host Firewall Rule Mandatory:** ใน topology ปัจจุบัน (ก่อน consolidation) ต้องมี VLAN/firewall ACL เป็น interim constraint
|
||||||
|
* **Migration Complexity:** Sequencing ของ auth removal ต้อง sync กับ ADR-041 cutover
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚫 Out of Scope (Future ADR)
|
||||||
|
|
||||||
|
* 1-page-1-request horizontal scaling rework (Qwen 2.7) — ต้องการ separate spec + load evidence
|
||||||
|
* OpenTelemetry/Prometheus/Grafana observability (Qwen 4.4–4.5) — separate ticket
|
||||||
|
* **/normalize endpoint** — ตัดออกจาก sidecar แล้ว (D2); ThaiPreprocessProcessor ไม่มีการใช้งาน
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔄 Rollback Plan
|
||||||
|
|
||||||
|
* Revert `app.py` ไปเวอร์ชันก่อน refactor
|
||||||
|
* Restore `X-API-Key` send-side ใน `OcrService` และ `SandboxOcrEngineService`
|
||||||
|
* Re-pin `keep_alive` default เป็น `0` ใน `process_ocr`
|
||||||
|
* Restore hardcoded runtime params (ถ้าต้องการ emergency fallback)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 Verification Plan
|
||||||
|
|
||||||
|
1. Confirm backend send-side `X-API-Key` locations:
|
||||||
|
- `backend/src/modules/ai/services/ocr.service.ts`
|
||||||
|
- `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`
|
||||||
|
2. Confirm `calculate_ocr_residency` ไม่ถูกเรียกใช้ใน `app.py` (grep) ก่อน claim gap
|
||||||
|
3. ✅ ยืนยันแล้ว: ไม่มี consumer ใดใช้ `/normalize` endpoint (grep ไม่พบใน backend)
|
||||||
|
4. Pytest สำหรับ path-traversal (expect 403)
|
||||||
|
5. Unit test สำหรับ residency wiring
|
||||||
@@ -0,0 +1,336 @@
|
|||||||
|
<!-- File: specs/06-Decision-Records/ADR-041-server-consolidation.md -->
|
||||||
|
<!-- Change Log
|
||||||
|
- 2026-06-20: Created initial ADR-041 documenting server consolidation decision.
|
||||||
|
- Co-locate all services on single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB).
|
||||||
|
- QNAP remains NAS for uploads/permanent storage via CIFS.
|
||||||
|
- Enables ADR-040 network-only auth for sidecar via Docker-internal isolation.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# ADR-041: Single-Host Server Consolidation
|
||||||
|
|
||||||
|
**Status:** Proposed
|
||||||
|
**Date:** 2026-06-20
|
||||||
|
**Related Documents:**
|
||||||
|
- [ADR-040: OCR Sidecar Refactor](./ADR-040-ocr-sidecar-refactor.md)
|
||||||
|
- [ADR-016: Security & Authentication](./ADR-016-security-authentication.md)
|
||||||
|
- [ADR-023A: Unified AI Architecture](./ADR-023A-unified-ai-architecture.md)
|
||||||
|
- [ADR-034: AI Model Change](./ADR-034-AI-model-change.md)
|
||||||
|
- [CONTEXT.md](../../00-overview/CONTEXT.md)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 Context and Problem Statement
|
||||||
|
|
||||||
|
### Current Architecture
|
||||||
|
|
||||||
|
ปัจจุบัน LCBP3-DMS กระจาย services ไว้บนหลายเครื่อง:
|
||||||
|
|
||||||
|
| Service | Host | Hardware | Network |
|
||||||
|
|---------|------|----------|---------|
|
||||||
|
| Ollama (np-dms-ai, np-dms-ocr, nomic-embed) | Desk-5439 | RTX 4060 Ti 16GB | VLAN 10 (192.168.10.100) |
|
||||||
|
| OCR Sidecar (FastAPI) | Desk-5439 | Same as above | VLAN 10 (192.168.10.100) |
|
||||||
|
| Backend (NestJS) | QNAP NAS | - | VLAN 10 (192.168.10.8) |
|
||||||
|
| Frontend (Next.js) | QNAP NAS | - | VLAN 10 (192.168.10.8) |
|
||||||
|
| Redis | QNAP NAS | - | VLAN 10 (192.168.10.8) |
|
||||||
|
| MariaDB | QNAP NAS | - | VLAN 10 (192.168.10.8) |
|
||||||
|
| Elasticsearch | QNAP NAS | - | VLAN 10 (192.168.10.8) |
|
||||||
|
| File Storage | QNAP NAS | - | CIFS share `np-dms-as` |
|
||||||
|
|
||||||
|
### Problems Identified
|
||||||
|
|
||||||
|
1. **Cross-Host Trust Boundary:** Backend ↔ sidecar/Ollama ผ่าน LAN (VLAN 10) — ต้องพึ่ง VLAN/firewall ACL สำหรับ isolation (ADR-040 §4)
|
||||||
|
2. **Management Complexity:** Services กระจายบน 2 hosts → deployment, monitoring, troubleshooting ซับซ้อน
|
||||||
|
3. **GPU Resource Fragmentation:** Desk-5439 มี GPU แต่ CPU/RAM น้อย → ไม่สามารถรัน backend ได้
|
||||||
|
4. **Network Latency:** Backend ↔ Ollama ผ่าน LAN เพิ่ม latency สำหรับ AI inference
|
||||||
|
5. **Hardware Underutilization:** QNAP NAS มี CPU/RAM แต่ไม่มี GPU → ไม่สามารถรัน AI models ได้
|
||||||
|
|
||||||
|
### New Hardware
|
||||||
|
|
||||||
|
มีเซิร์ฟเวอร์ใหม่พร้อมใช้งาน:
|
||||||
|
- **CPU:** Ryzen 5 5600 (6 cores / 12 threads)
|
||||||
|
- **RAM:** 32GB DDR4
|
||||||
|
- **GPU:** RTX 5060 Ti 16GB
|
||||||
|
- **Storage:** SSD (OS) + HDD (data)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚙️ Decision Drivers
|
||||||
|
|
||||||
|
* **Simplify Architecture:** ลดจำนวน hosts จาก 2 → 1
|
||||||
|
* **Enable Docker-Internal Isolation:** Sidecar + backend อยู่บน Docker bridge เดียวกัน → network auth จริง (ADR-040 D5)
|
||||||
|
* **Better Resource Utilization:** Single host มีทั้ง CPU, RAM, GPU ในเครื่องเดียว
|
||||||
|
* **Reduce Network Latency:** Backend ↔ Ollama ผ่าน localhost แทน LAN
|
||||||
|
* **Maintain Data Separation:** QNAP ยังคงเป็น NAS สำหรับ file storage
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🏛️ Decisions
|
||||||
|
|
||||||
|
### D1: Co-locate All Services on Single Docker Host
|
||||||
|
ย้าย services ทั้งหมดไปรันบนเซิร์ฟเวอร์ใหม่:
|
||||||
|
- Ollama (np-dms-ai, np-dms-ocr, nomic-embed)
|
||||||
|
- OCR Sidecar (FastAPI)
|
||||||
|
- Backend (NestJS)
|
||||||
|
- Frontend (Next.js)
|
||||||
|
- Redis
|
||||||
|
- MariaDB
|
||||||
|
- Elasticsearch
|
||||||
|
|
||||||
|
**Retire Desk-5439** หลัง cutover สำเร็จ
|
||||||
|
|
||||||
|
### D2: ASUSTOR as Primary NAS, QNAP as Backup
|
||||||
|
QNAP (192.168.10.8) ลดบทบาทเป็น backup server เท่านั้น
|
||||||
|
|
||||||
|
ASUSTOR (192.168.10.9) เป็น Primary NAS สำหรับ:
|
||||||
|
- Upload temp storage (`/data/uploads/temp`)
|
||||||
|
- Permanent file storage (`/data/uploads/permanent`)
|
||||||
|
- CIFS share `np-dms-as` ถูก mount บน new host ผ่าน:
|
||||||
|
- `/mnt/uploads/temp` → `//192.168.10.9/np-dms-as/data/uploads/temp`
|
||||||
|
- `/mnt/uploads/permanent` → `//192.168.10.9/np-dms-as/data/uploads/permanent`
|
||||||
|
|
||||||
|
### D3: Docker-Internal Network Only for Sidecar/Ollama
|
||||||
|
- Sidecar และ Ollama **ไม่ publish ports ไป LAN** (ใช้ `expose` แทน `ports`)
|
||||||
|
- Services อยู่บน internal Docker bridge network (`dms-internal`)
|
||||||
|
- Backend ติดต่อ sidecar/Ollama ผ่าน `http://sidecar:8765` และ `http://ollama:11434` (service names)
|
||||||
|
- Frontend ติดต่อ backend ผ่าน `http://backend:3000`
|
||||||
|
- เฉพาะ Frontend และ Backend เท่านั้นที่ publish ports ไป LAN (80, 443, 3000)
|
||||||
|
|
||||||
|
**Enables ADR-040 D5:** Network isolation ผ่าน Docker-internal bridge → ลบ `X-API-Key` ได้จริง
|
||||||
|
|
||||||
|
### D4: GPU VRAM Management Reinforced
|
||||||
|
RTX 5060 Ti 16GB ต้องรองรับ:
|
||||||
|
- `np-dms-ai` (Typhoon-2.5 ~7–8B) ~6–8GB
|
||||||
|
- `np-dms-ocr` (Typhoon OCR) ~5GB
|
||||||
|
- `nomic-embed-text` ~0.5GB
|
||||||
|
- BGE-M3 + Reranker (ถ้า GPU-resident) ~4.5GB
|
||||||
|
- CUDA overhead ~1.5GB
|
||||||
|
|
||||||
|
**Total ≈ 15.5GB → OOM risk หาก load พร้อมกันทั้งหมด**
|
||||||
|
|
||||||
|
**Mandatory:**
|
||||||
|
- ADR-040 D3 (Adaptive OCR Residency via `calculate_ocr_residency()`)
|
||||||
|
- ADR-040 D4 (CPU Fallback Retrieval for embed/rerank)
|
||||||
|
- LLM-First GPU Ownership (CONTEXT.md)
|
||||||
|
- ไม่บังคับ BGE+Reranker GPU-resident ถาวร
|
||||||
|
|
||||||
|
### D5: RAM Budget Considerations
|
||||||
|
32GB RAM ต้องรองรับ:
|
||||||
|
- Node.js (Frontend) ~500MB
|
||||||
|
- NestJS (Backend) ~1–2GB
|
||||||
|
- MariaDB ~4–8GB (ขึ้นกับ dataset size)
|
||||||
|
- Redis ~500MB
|
||||||
|
- Elasticsearch ~2–4GB (ขึ้นกับ index size)
|
||||||
|
- Python (Sidecar) ~500MB
|
||||||
|
- Ollama ~1–2GB
|
||||||
|
- BGE/Reranker CPU-fallback tensors ~2–4GB
|
||||||
|
|
||||||
|
**Action Items:**
|
||||||
|
- Size DB/ES/Redis memory limits ก่อน cutover
|
||||||
|
- Monitor RAM usage หลัง cutover
|
||||||
|
- พิจารณา swap space ถ้าจำเป็น
|
||||||
|
|
||||||
|
### D6: Single Point of Failure (SPOF) Mitigation
|
||||||
|
Single host = SPOF risk
|
||||||
|
|
||||||
|
**Mitigation:**
|
||||||
|
- Regular backup ของ database และ file storage (QNAP)
|
||||||
|
- Disaster recovery plan สำหรับ hardware failure
|
||||||
|
- พิจารณา cold standby หรือ failover strategy ในอนาคต
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Implementation Tasks
|
||||||
|
|
||||||
|
| Task ID | Phase | Summary | Status |
|
||||||
|
| :--- | :--- | :--- | :--- |
|
||||||
|
| T001 | Provision | Install Docker + Docker Compose on new host | Pending |
|
||||||
|
| T002 | Provision | Mount CIFS share from ASUSTOR to `/mnt/uploads` | Pending |
|
||||||
|
| T003 | Deploy | Create `docker-compose.yml` for new host topology | Pending |
|
||||||
|
| T004 | Deploy | Configure internal bridge network (`dms-internal`) | Pending |
|
||||||
|
| T005 | Deploy | Deploy services (Ollama, sidecar, backend, frontend, Redis, DB, ES) | Pending |
|
||||||
|
| T006 | Migrate | Migrate MariaDB data from QNAP to new host | Pending |
|
||||||
|
| T007 | Migrate | Migrate Elasticsearch indices from QNAP to new host | Pending |
|
||||||
|
| T008 | Cutover | Update DNS/load balancer to point to new host | Pending |
|
||||||
|
| T009 | Cutover | Run smoke tests on new host | Pending |
|
||||||
|
| T010 | ADR-040 | Remove `X-API-Key` from sidecar + backend (ADR-040 D5) | Pending |
|
||||||
|
| T011 | Cleanup | Stop services on QNAP (QNAP becomes backup server) | Pending |
|
||||||
|
| T012 | Cleanup | Retire Desk-5439 | Pending |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Target docker-compose Layout (Draft)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
version: '3.8'
|
||||||
|
|
||||||
|
networks:
|
||||||
|
dms-internal:
|
||||||
|
driver: bridge
|
||||||
|
dms-frontend:
|
||||||
|
driver: bridge
|
||||||
|
|
||||||
|
services:
|
||||||
|
# GPU Services (internal-only, no LAN publish)
|
||||||
|
ollama:
|
||||||
|
image: ollama/ollama:latest
|
||||||
|
container_name: lcbp3-ollama
|
||||||
|
restart: unless-stopped
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
count: 1
|
||||||
|
capabilities: [gpu]
|
||||||
|
volumes:
|
||||||
|
- ollama_models:/root/.ollama
|
||||||
|
networks:
|
||||||
|
- dms-internal
|
||||||
|
expose:
|
||||||
|
- "11434"
|
||||||
|
environment:
|
||||||
|
- OLLAMA_KEEP_ALIVE=-1
|
||||||
|
|
||||||
|
ocr-sidecar:
|
||||||
|
build:
|
||||||
|
context: ./specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar
|
||||||
|
container_name: lcbp3-ocr-sidecar
|
||||||
|
restart: unless-stopped
|
||||||
|
volumes:
|
||||||
|
- asustor_uploads:/mnt/uploads:ro # Read-only CIFS mount from ASUSTOR
|
||||||
|
networks:
|
||||||
|
- dms-internal
|
||||||
|
expose:
|
||||||
|
- "8765"
|
||||||
|
depends_on:
|
||||||
|
- ollama
|
||||||
|
environment:
|
||||||
|
- OLLAMA_API_URL=http://ollama:11434
|
||||||
|
- OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
|
||||||
|
|
||||||
|
# Backend Services (internal-only)
|
||||||
|
backend:
|
||||||
|
build:
|
||||||
|
context: ./backend
|
||||||
|
container_name: lcbp3-backend
|
||||||
|
restart: unless-stopped
|
||||||
|
volumes:
|
||||||
|
- asustor_uploads:/app/uploads:ro
|
||||||
|
networks:
|
||||||
|
- dms-internal
|
||||||
|
- dms-frontend
|
||||||
|
expose:
|
||||||
|
- "3000"
|
||||||
|
depends_on:
|
||||||
|
- ollama
|
||||||
|
- ocr-sidecar
|
||||||
|
- redis
|
||||||
|
- mariadb
|
||||||
|
- elasticsearch
|
||||||
|
environment:
|
||||||
|
- OCR_API_URL=http://ocr-sidecar:8765
|
||||||
|
- OLLAMA_API_URL=http://ollama:11434
|
||||||
|
|
||||||
|
# Frontend (LAN publish)
|
||||||
|
frontend:
|
||||||
|
build:
|
||||||
|
context: ./frontend
|
||||||
|
container_name: lcbp3-frontend
|
||||||
|
restart: unless-stopped
|
||||||
|
networks:
|
||||||
|
- dms-frontend
|
||||||
|
ports:
|
||||||
|
- "3000:3000"
|
||||||
|
depends_on:
|
||||||
|
- backend
|
||||||
|
|
||||||
|
# Data Services
|
||||||
|
redis:
|
||||||
|
image: redis:7-alpine
|
||||||
|
container_name: lcbp3-redis
|
||||||
|
restart: unless-stopped
|
||||||
|
networks:
|
||||||
|
- dms-internal
|
||||||
|
volumes:
|
||||||
|
- redis_data:/data
|
||||||
|
|
||||||
|
mariadb:
|
||||||
|
image: mariadb:10.11
|
||||||
|
container_name: lcbp3-mariadb
|
||||||
|
restart: unless-stopped
|
||||||
|
networks:
|
||||||
|
- dms-internal
|
||||||
|
volumes:
|
||||||
|
- mariadb_data:/var/lib/mysql
|
||||||
|
environment:
|
||||||
|
- MYSQL_ROOT_PASSWORD=${DB_ROOT_PASSWORD}
|
||||||
|
- MYSQL_DATABASE=lcbp3
|
||||||
|
|
||||||
|
elasticsearch:
|
||||||
|
image: elasticsearch:8.11.0
|
||||||
|
container_name: lcbp3-elasticsearch
|
||||||
|
restart: unless-stopped
|
||||||
|
networks:
|
||||||
|
- dms-internal
|
||||||
|
volumes:
|
||||||
|
- es_data:/usr/share/elasticsearch/data
|
||||||
|
environment:
|
||||||
|
- discovery.type=single-node
|
||||||
|
- xpack.security.enabled=false
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
ollama_models:
|
||||||
|
asustor_uploads:
|
||||||
|
driver: local
|
||||||
|
driver_opts:
|
||||||
|
type: cifs
|
||||||
|
o: "username=${ASUSTOR_USER},password=${ASUSTOR_PASS},vers=3.0,uid=0,gid=0"
|
||||||
|
device: "//192.168.10.9/np-dms-as/data/uploads"
|
||||||
|
redis_data:
|
||||||
|
mariadb_data:
|
||||||
|
es_data:
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
|
||||||
|
* **Simplified Architecture:** Single host → easier deployment, monitoring, troubleshooting
|
||||||
|
* **True Network Isolation:** Docker-internal bridge enables ADR-040 D5 (network-only auth)
|
||||||
|
* **Reduced Latency:** Backend ↔ Ollama ผ่าน localhost
|
||||||
|
* **Better Resource Utilization:** Single host มีทั้ง CPU, RAM, GPU
|
||||||
|
* **Data Separation Maintained:** ASUSTOR เป็น Primary NAS → data แยกจาก compute; QNAP เป็น backup server
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
|
||||||
|
* **SPOF Risk:** Single host = single point of failure
|
||||||
|
* **RAM Pressure:** 32GB ต้องรองรับ services ทั้งหมด + CPU-fallback tensors
|
||||||
|
* **Migration Complexity:** ต้อง migrate DB + ES + file paths
|
||||||
|
* **GPU VRAM Pressure:** 16GB ต้องอาศัย adaptive residency + CPU fallback
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔄 Rollback Plan
|
||||||
|
|
||||||
|
1. Stop services บน new host
|
||||||
|
2. Restore services บน QNAP (backend, frontend, Redis, DB, ES)
|
||||||
|
3. Restore services บน Desk-5439 (Ollama, sidecar)
|
||||||
|
4. Revert DNS/load balancer ไป QNAP
|
||||||
|
5. Update CIFS mount กลับไป ASUSTOR (192.168.10.9) บน QNAP
|
||||||
|
6. Restore `X-API-Key` ใน sidecar + backend (ADR-040 rollback)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 Verification Plan
|
||||||
|
|
||||||
|
1. Smoke tests บน new host:
|
||||||
|
- Backend health check
|
||||||
|
- Frontend accessible via LAN
|
||||||
|
- OCR endpoint functional
|
||||||
|
- AI inference functional
|
||||||
|
- File upload/download via CIFS
|
||||||
|
2. Monitor RAM/VRAM usage 24–48 hours หลัง cutover
|
||||||
|
3. Verify ADR-040 D5 (network-only auth) ทำงานได้จริง
|
||||||
|
4. Verify ADR-040 D3/D4 (adaptive residency + CPU fallback) ทำงานได้จริง
|
||||||
@@ -0,0 +1,34 @@
|
|||||||
|
# Specification Quality Checklist: OCR Sidecar Refactor
|
||||||
|
|
||||||
|
**Purpose**: Validate specification completeness and quality before proceeding to planning
|
||||||
|
**Created**: 2026-06-20
|
||||||
|
**Feature**: [spec.md](../spec.md)
|
||||||
|
|
||||||
|
## Content Quality
|
||||||
|
|
||||||
|
- [x] No implementation details (languages, frameworks, APIs)
|
||||||
|
- [x] Focused on user value and business needs
|
||||||
|
- [x] Written for non-technical stakeholders
|
||||||
|
- [x] All mandatory sections completed
|
||||||
|
|
||||||
|
## Requirement Completeness
|
||||||
|
|
||||||
|
- [x] No [NEEDS CLARIFICATION] markers remain
|
||||||
|
- [x] Requirements are testable and unambiguous
|
||||||
|
- [x] Success criteria are measurable
|
||||||
|
- [x] Success criteria are technology-agnostic (no implementation details)
|
||||||
|
- [x] All acceptance scenarios are defined
|
||||||
|
- [x] Edge cases are identified
|
||||||
|
- [x] Scope is clearly bounded
|
||||||
|
- [x] Dependencies and assumptions identified
|
||||||
|
|
||||||
|
## Feature Readiness
|
||||||
|
|
||||||
|
- [x] All functional requirements have clear acceptance criteria
|
||||||
|
- [x] User scenarios cover primary flows
|
||||||
|
- [x] Feature meets measurable outcomes defined in Success Criteria
|
||||||
|
- [x] No implementation details leak into specification
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
All checklist items pass. Specification is ready for `/speckit-clarify` or `/speckit-plan`.
|
||||||
@@ -0,0 +1,246 @@
|
|||||||
|
# Sidecar API Contract
|
||||||
|
|
||||||
|
**Version**: 1.0
|
||||||
|
**Date**: 2026-06-20
|
||||||
|
**Service**: OCR Sidecar (Desk-5439)
|
||||||
|
**Base URL**: `http://192.168.10.100:8765` (Phase 1) / `http://sidecar:8765` (Phase 2, Docker-internal)
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The OCR sidecar provides OCR processing capabilities as a pure compute worker. This document defines the API contract between backend services and the sidecar.
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
### Phase 1 (Before ADR-041 Consolidation)
|
||||||
|
|
||||||
|
All endpoints require `X-API-Key` header:
|
||||||
|
|
||||||
|
```http
|
||||||
|
X-API-Key: {OCR_SIDECAR_API_KEY}
|
||||||
|
```
|
||||||
|
|
||||||
|
If the header is missing or invalid, returns `401 Unauthorized`.
|
||||||
|
|
||||||
|
### Phase 2 (After ADR-041 Consolidation)
|
||||||
|
|
||||||
|
No authentication required. Relies on Docker-internal network isolation.
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
### POST /ocr
|
||||||
|
|
||||||
|
Extract text from PDF file using Typhoon OCR.
|
||||||
|
|
||||||
|
**Request Headers**:
|
||||||
|
```http
|
||||||
|
Content-Type: application/json
|
||||||
|
X-API-Key: {key} # Phase 1 only
|
||||||
|
```
|
||||||
|
|
||||||
|
**Request Body**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"pdf_path": "/mnt/uploads/temp/abc123.pdf",
|
||||||
|
"system_prompt": "Extract document metadata from: {{ocr_text}}...",
|
||||||
|
"dms_tags": {
|
||||||
|
"document_number": "RFA-2025-001",
|
||||||
|
"document_date": "2025-01-15",
|
||||||
|
"received_date": "2025-01-16"
|
||||||
|
},
|
||||||
|
"runtime_params": {
|
||||||
|
"temperature": 0.7,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"repeat_penalty": 1.1,
|
||||||
|
"max_tokens": 4096
|
||||||
|
},
|
||||||
|
"page_range": {
|
||||||
|
"start": 1,
|
||||||
|
"end": 3
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Request Fields**:
|
||||||
|
- `pdf_path` (string, required): Absolute path to PDF file. Must be within whitelisted base path (`OCR_SIDECAR_UPLOAD_BASE`).
|
||||||
|
- `system_prompt` (string, optional): System prompt from Active Prompt. Contains `{{ocr_text}}` placeholder.
|
||||||
|
- `dms_tags` (object, optional): DMS extraction tags to inject into prompt.
|
||||||
|
- `document_number` (string, optional): Document number
|
||||||
|
- `document_date` (string, optional): Document date
|
||||||
|
- `received_date` (string, optional): Received date
|
||||||
|
- `runtime_params` (object, required): Runtime parameters from `ai_execution_profiles`.
|
||||||
|
- `temperature` (number, required): Temperature (0.0 - 2.0)
|
||||||
|
- `top_p` (number, required): Top P (0.0 - 1.0)
|
||||||
|
- `repeat_penalty` (number, required): Repeat penalty (typically 1.0 - 2.0)
|
||||||
|
- `max_tokens` (number, required): Max tokens
|
||||||
|
- `page_range` (object, optional): Page range for processing.
|
||||||
|
- `start` (number, required): Start page (1-indexed)
|
||||||
|
- `end` (number, required): End page (inclusive)
|
||||||
|
|
||||||
|
**Response (200 OK)**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"text": "Extracted text in Markdown format...",
|
||||||
|
"ocr_used": true,
|
||||||
|
"model_used": "typhoon-np-dms-ocr:latest",
|
||||||
|
"processing_time_ms": 1250,
|
||||||
|
"error": null
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response Fields**:
|
||||||
|
- `text` (string): Extracted text in Markdown format
|
||||||
|
- `ocr_used` (boolean): Whether OCR was used (vs fast-path text layer)
|
||||||
|
- `model_used` (string): Model identifier
|
||||||
|
- `processing_time_ms` (number): Processing time in milliseconds
|
||||||
|
- `error` (string, nullable): Error message if failed
|
||||||
|
|
||||||
|
**Error Responses**:
|
||||||
|
- `400 Bad Request`: Invalid request body or parameters
|
||||||
|
- `401 Unauthorized`: Missing or invalid X-API-Key (Phase 1 only)
|
||||||
|
- `403 Forbidden`: Path outside whitelisted base directory
|
||||||
|
- `500 Internal Server Error`: Internal processing error
|
||||||
|
|
||||||
|
**Path Traversal Protection**:
|
||||||
|
- PDF path is canonicalized using `os.path.abspath()` + `os.path.realpath()`
|
||||||
|
- Path must start with whitelisted base path (`OCR_SIDECAR_UPLOAD_BASE`)
|
||||||
|
- Symlinks are resolved to their targets before whitelist check
|
||||||
|
- Returns `403 Forbidden` for any path outside base directory
|
||||||
|
|
||||||
|
### GET /health
|
||||||
|
|
||||||
|
Health check endpoint for monitoring.
|
||||||
|
|
||||||
|
**Response (200 OK)**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "healthy",
|
||||||
|
"timestamp": "2026-06-20T10:30:00Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response Fields**:
|
||||||
|
- `status` (string): Service status ("healthy" or "unhealthy")
|
||||||
|
- `timestamp` (string): ISO 8601 timestamp
|
||||||
|
- `version` (string): Service version
|
||||||
|
|
||||||
|
## Removed Endpoints
|
||||||
|
|
||||||
|
### POST /normalize (REMOVED)
|
||||||
|
|
||||||
|
This endpoint has been removed per ADR-040 D2. ThaiPreprocessProcessor has no consumers in the backend (verified by grep search).
|
||||||
|
|
||||||
|
## Rate Limiting
|
||||||
|
|
||||||
|
No rate limiting implemented on sidecar. Rate limiting is handled by backend services.
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
|
||||||
|
All errors return JSON responses with consistent format:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"error": "Error message",
|
||||||
|
"code": "ERROR_CODE",
|
||||||
|
"timestamp": "2026-06-20T10:30:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common Error Codes**:
|
||||||
|
- `INVALID_REQUEST`: Invalid request body or parameters
|
||||||
|
- `UNAUTHORIZED`: Missing or invalid authentication
|
||||||
|
- `FORBIDDEN`: Path outside whitelisted directory
|
||||||
|
- `INTERNAL_ERROR`: Internal processing error
|
||||||
|
- `OCR_FAILED`: OCR processing failed
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Example 1: Basic OCR Request (Phase 1)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://192.168.10.100:8765/ocr \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "X-API-Key: your-api-key" \
|
||||||
|
-d '{
|
||||||
|
"pdf_path": "/mnt/uploads/temp/document.pdf",
|
||||||
|
"runtime_params": {
|
||||||
|
"temperature": 0.7,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"repeat_penalty": 1.1,
|
||||||
|
"max_tokens": 4096
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 2: OCR with System Prompt and DMS Tags (Phase 1)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://192.168.10.100:8765/ocr \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "X-API-Key: your-api-key" \
|
||||||
|
-d '{
|
||||||
|
"pdf_path": "/mnt/uploads/temp/document.pdf",
|
||||||
|
"system_prompt": "Extract document metadata from: {{ocr_text}}",
|
||||||
|
"dms_tags": {
|
||||||
|
"document_number": "RFA-2025-001",
|
||||||
|
"document_date": "2025-01-15"
|
||||||
|
},
|
||||||
|
"runtime_params": {
|
||||||
|
"temperature": 0.7,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"repeat_penalty": 1.1,
|
||||||
|
"max_tokens": 4096
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 3: OCR Request (Phase 2, Docker-internal)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://sidecar:8765/ocr \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"pdf_path": "/mnt/uploads/temp/document.pdf",
|
||||||
|
"runtime_params": {
|
||||||
|
"temperature": 0.7,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"repeat_penalty": 1.1,
|
||||||
|
"max_tokens": 4096
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 4: Path Traversal Attempt (Rejected)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://192.168.10.100:8765/ocr \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "X-API-Key: your-api-key" \
|
||||||
|
-d '{
|
||||||
|
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
|
||||||
|
"runtime_params": {
|
||||||
|
"temperature": 0.7,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"repeat_penalty": 1.1,
|
||||||
|
"max_tokens": 4096
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Response: `403 Forbidden`
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"error": "Path outside whitelisted base directory",
|
||||||
|
"code": "FORBIDDEN",
|
||||||
|
"timestamp": "2026-06-20T10:30:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Version History
|
||||||
|
|
||||||
|
- **1.0** (2026-06-20): Initial version for OCR sidecar refactor
|
||||||
|
- Added POST /ocr with parameter governance
|
||||||
|
- Added path traversal protection
|
||||||
|
- Removed POST /normalize endpoint
|
||||||
|
- Documented Phase 1/Phase 2 auth migration
|
||||||
@@ -0,0 +1,319 @@
|
|||||||
|
# Data Model: OCR Sidecar Refactor
|
||||||
|
|
||||||
|
**Date**: 2026-06-20
|
||||||
|
**Purpose**: Define data contracts and entity relationships for OCR sidecar refactor
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The OCR sidecar is a pure compute worker with no database access (ADR-023/023A boundary). All data persistence and business logic remain in backend services. This document defines the data contracts between backend and sidecar.
|
||||||
|
|
||||||
|
## Entities
|
||||||
|
|
||||||
|
### OCR Request (Backend → Sidecar)
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface OcrRequest {
|
||||||
|
pdfPath: string; // Absolute path to PDF file (whitelisted)
|
||||||
|
systemPrompt?: string; // System prompt from Active Prompt
|
||||||
|
dmsTags?: { // DMS extraction tags from Active Prompt
|
||||||
|
documentNumber?: string;
|
||||||
|
documentDate?: string;
|
||||||
|
receivedDate?: string;
|
||||||
|
};
|
||||||
|
runtimeParams: { // Runtime parameters from ai_execution_profiles
|
||||||
|
temperature: number;
|
||||||
|
top_p: number;
|
||||||
|
repeat_penalty: number;
|
||||||
|
max_tokens: number;
|
||||||
|
};
|
||||||
|
pageRange?: { // Page range for processing
|
||||||
|
start: number;
|
||||||
|
end: number;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### OCR Response (Sidecar → Backend)
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface OcrResponse {
|
||||||
|
text: string; // Extracted text (Markdown format)
|
||||||
|
ocrUsed: boolean; // Whether OCR was used (vs fast-path text layer)
|
||||||
|
modelUsed: string; // Model identifier (e.g., "typhoon-np-dms-ocr")
|
||||||
|
processingTimeMs: number; // Processing time in milliseconds
|
||||||
|
error?: string; // Error message if failed
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### AI Execution Profile (Database)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Existing table (no schema changes)
|
||||||
|
CREATE TABLE ai_execution_profiles (
|
||||||
|
id INT AUTO_INCREMENT PRIMARY KEY,
|
||||||
|
profile_name VARCHAR(100) UNIQUE NOT NULL,
|
||||||
|
model_name VARCHAR(100) NOT NULL,
|
||||||
|
parameters JSON NOT NULL, -- { temperature, top_p, repeat_penalty, max_tokens, keep_alive }
|
||||||
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||||
|
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Row for OCR extraction:
|
||||||
|
-- profile_name = 'ocr-extract'
|
||||||
|
-- parameters = { temperature: 0.7, top_p: 0.9, repeat_penalty: 1.1, max_tokens: 4096 }
|
||||||
|
```
|
||||||
|
|
||||||
|
### Active Prompt (Database)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Existing table (no schema changes per ADR-029/037)
|
||||||
|
CREATE TABLE ai_prompts (
|
||||||
|
id INT AUTO_INCREMENT PRIMARY KEY,
|
||||||
|
public_id UUID,
|
||||||
|
prompt_type VARCHAR(50) NOT NULL, -- 'ocr_extraction'
|
||||||
|
template TEXT NOT NULL, -- System prompt template with {{ocr_text}} placeholder
|
||||||
|
context_config JSON, -- DMS tags configuration
|
||||||
|
version INT NOT NULL,
|
||||||
|
is_active TINYINT(1) DEFAULT 0,
|
||||||
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||||
|
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
|
||||||
|
UNIQUE KEY (prompt_type, version)
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Active prompt for OCR extraction:
|
||||||
|
-- prompt_type = 'ocr_extraction'
|
||||||
|
-- template = "Extract document metadata from: {{ocr_text}}..."
|
||||||
|
-- context_config = { dmsTags: { documentNumber: true, documentDate: true, receivedDate: true } }
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
### Phase 1: OCR Request Flow (Before ADR-041)
|
||||||
|
|
||||||
|
```
|
||||||
|
Backend OcrService
|
||||||
|
↓
|
||||||
|
1. Resolve parameters from ai_execution_profiles (row 'ocr-extract')
|
||||||
|
2. Resolve Active Prompt from ai_prompts (type 'ocr_extraction')
|
||||||
|
3. Extract systemPrompt and DMS tags from Active Prompt
|
||||||
|
4. Build OcrRequest with parameters, systemPrompt, DMS tags
|
||||||
|
5. Send POST /ocr with X-API-Key header to sidecar
|
||||||
|
↓
|
||||||
|
Sidecar (app.py)
|
||||||
|
↓
|
||||||
|
1. Validate X-API-Key
|
||||||
|
2. Canonicalize pdfPath and check whitelist
|
||||||
|
3. Extract systemPrompt and DMS tags from request
|
||||||
|
4. Call calculate_ocr_residency(active_profile) for keep_alive
|
||||||
|
5. Process OCR with Ollama (inject systemPrompt + DMS tags)
|
||||||
|
6. Return OcrResponse
|
||||||
|
↓
|
||||||
|
Backend OcrService
|
||||||
|
↓
|
||||||
|
1. Parse OcrResponse
|
||||||
|
2. Return extracted text to caller
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 2: OCR Request Flow (After ADR-041)
|
||||||
|
|
||||||
|
```
|
||||||
|
Backend OcrService
|
||||||
|
↓
|
||||||
|
1. Resolve parameters from ai_execution_profiles (row 'ocr-extract')
|
||||||
|
2. Resolve Active Prompt from ai_prompts (type 'ocr_extraction')
|
||||||
|
3. Extract systemPrompt and DMS tags from Active Prompt
|
||||||
|
4. Build OcrRequest with parameters, systemPrompt, DMS tags
|
||||||
|
5. Send POST /ocr (NO X-API-Key header) to sidecar
|
||||||
|
↓
|
||||||
|
Sidecar (app.py)
|
||||||
|
↓
|
||||||
|
1. NO X-API-Key validation (network isolation only)
|
||||||
|
2. Canonicalize pdfPath and check whitelist
|
||||||
|
3. Extract systemPrompt and DMS tags from request
|
||||||
|
4. Call calculate_ocr_residency(active_profile) for keep_alive
|
||||||
|
5. Process OCR with Ollama (inject systemPrompt + DMS tags)
|
||||||
|
6. Return OcrResponse
|
||||||
|
↓
|
||||||
|
Backend OcrService
|
||||||
|
↓
|
||||||
|
1. Parse OcrResponse
|
||||||
|
2. Return extracted text to caller
|
||||||
|
```
|
||||||
|
|
||||||
|
## Backend Service Changes
|
||||||
|
|
||||||
|
### OcrService Parameter Resolution
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// backend/src/modules/ai/services/ocr.service.ts
|
||||||
|
|
||||||
|
async extractMetadata(documentId: string): Promise<AIMetadata> {
|
||||||
|
// 1. Resolve runtime parameters from ai_execution_profiles
|
||||||
|
const profile = await this.aiProfilesService.getActiveProfile('ocr-extract');
|
||||||
|
const runtimeParams = profile.parameters; // { temperature, top_p, repeat_penalty, max_tokens }
|
||||||
|
|
||||||
|
// 2. Resolve Active Prompt
|
||||||
|
const activePrompt = await this.aiPromptsService.getActivePrompt('ocr_extraction');
|
||||||
|
const systemPrompt = activePrompt.template;
|
||||||
|
const dmsTags = activePrompt.context_config?.dmsTags || {};
|
||||||
|
|
||||||
|
// 3. Build request
|
||||||
|
const ocrRequest: OcrRequest = {
|
||||||
|
pdfPath: document.filePath,
|
||||||
|
systemPrompt,
|
||||||
|
dmsTags,
|
||||||
|
runtimeParams,
|
||||||
|
};
|
||||||
|
|
||||||
|
// 4. Send to sidecar (with X-API-Key in Phase 1)
|
||||||
|
const response = await this.httpClient.post(
|
||||||
|
`${this.ocrApiUrl}/ocr`,
|
||||||
|
ocrRequest,
|
||||||
|
{ headers: { 'X-API-Key': this.ocrApiKey } } // Phase 1 only
|
||||||
|
);
|
||||||
|
|
||||||
|
return response.data;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### SandboxOcrEngineService Parameter Resolution
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// backend/src/modules/ai/services/sandbox-ocr-engine.service.ts
|
||||||
|
|
||||||
|
async processSandboxOcr(request: SandboxOcrRequest): Promise<SandboxOcrResult> {
|
||||||
|
// Same parameter resolution pattern as OcrService
|
||||||
|
const profile = await this.aiProfilesService.getActiveProfile('ocr-extract');
|
||||||
|
const activePrompt = await this.aiPromptsService.getActivePrompt('ocr_extraction');
|
||||||
|
|
||||||
|
const ocrRequest: OcrRequest = {
|
||||||
|
pdfPath: request.pdfPath,
|
||||||
|
systemPrompt: activePrompt.template,
|
||||||
|
dmsTags: activePrompt.context_config?.dmsTags || {},
|
||||||
|
runtimeParams: profile.parameters,
|
||||||
|
};
|
||||||
|
|
||||||
|
const response = await this.httpClient.post(
|
||||||
|
`${this.ocrApiUrl}/ocr`,
|
||||||
|
ocrRequest,
|
||||||
|
{ headers: { 'X-API-Key': this.ocrApiKey } } // Phase 1 only
|
||||||
|
);
|
||||||
|
|
||||||
|
return response.data;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Sidecar API Changes
|
||||||
|
|
||||||
|
### POST /ocr Request Body
|
||||||
|
|
||||||
|
```python
|
||||||
|
# specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
|
||||||
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
class OcrRequest(BaseModel):
|
||||||
|
pdf_path: str
|
||||||
|
system_prompt: Optional[str] = None
|
||||||
|
dms_tags: Optional[Dict[str, str]] = None
|
||||||
|
runtime_params: RuntimeParams
|
||||||
|
page_range: Optional[PageRange] = None
|
||||||
|
|
||||||
|
class RuntimeParams(BaseModel):
|
||||||
|
temperature: float
|
||||||
|
top_p: float
|
||||||
|
repeat_penalty: float
|
||||||
|
max_tokens: int
|
||||||
|
|
||||||
|
class PageRange(BaseModel):
|
||||||
|
start: int
|
||||||
|
end: int
|
||||||
|
```
|
||||||
|
|
||||||
|
### POST /ocr Response Body
|
||||||
|
|
||||||
|
```python
|
||||||
|
class OcrResponse(BaseModel):
|
||||||
|
text: str
|
||||||
|
ocr_used: bool
|
||||||
|
model_used: str
|
||||||
|
processing_time_ms: float
|
||||||
|
error: Optional[str] = None
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
### Sidecar Environment Variables
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env
|
||||||
|
|
||||||
|
# Phase 1 (before ADR-041)
|
||||||
|
OCR_SIDECAR_API_KEY=required_value # Fail-fast if missing
|
||||||
|
|
||||||
|
# Phase 2 (after ADR-041) - remove OCR_SIDECAR_API_KEY
|
||||||
|
|
||||||
|
# Common variables
|
||||||
|
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads # CIFS mount base path
|
||||||
|
OLLAMA_API_URL=http://localhost:11434
|
||||||
|
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
### Backend Environment Variables
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# backend/.env
|
||||||
|
|
||||||
|
# Phase 1 (before ADR-041)
|
||||||
|
OCR_API_URL=http://192.168.10.100:8765
|
||||||
|
OCR_API_KEY=required_value # Send-side X-API-Key
|
||||||
|
|
||||||
|
# Phase 2 (after ADR-041) - remove OCR_API_KEY
|
||||||
|
|
||||||
|
# Common variables
|
||||||
|
OCR_SIDECAR_UPLOAD_BASE=/app/uploads # Backend view of uploads
|
||||||
|
```
|
||||||
|
|
||||||
|
## Validation Rules
|
||||||
|
|
||||||
|
### Path Canonicalization (Sidecar)
|
||||||
|
|
||||||
|
```python
|
||||||
|
def validate_pdf_path(pdf_path: str, base_path: str) -> str:
|
||||||
|
"""Canonicalize and whitelist PDF path"""
|
||||||
|
# 1. Canonicalize path
|
||||||
|
canonical = os.path.abspath(os.path.realpath(pdf_path))
|
||||||
|
|
||||||
|
# 2. Check whitelist
|
||||||
|
if not canonical.startswith(base_path):
|
||||||
|
raise HTTPException(
|
||||||
|
status_code=403,
|
||||||
|
detail="Path outside whitelisted base directory"
|
||||||
|
)
|
||||||
|
|
||||||
|
return canonical
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parameter Validation (Backend)
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Validate runtime parameters from ai_execution_profiles
|
||||||
|
function validateRuntimeParams(params: any): RuntimeParams {
|
||||||
|
if (!params.temperature || params.temperature < 0 || params.temperature > 2) {
|
||||||
|
throw new BusinessException('Invalid temperature value');
|
||||||
|
}
|
||||||
|
if (!params.top_p || params.top_p < 0 || params.top_p > 1) {
|
||||||
|
throw new BusinessException('Invalid top_p value');
|
||||||
|
}
|
||||||
|
// ... similar validation for other params
|
||||||
|
return params;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## No Schema Changes
|
||||||
|
|
||||||
|
This refactor does not require database schema changes:
|
||||||
|
- `ai_execution_profiles` table already exists (ADR-036)
|
||||||
|
- `ai_prompts` table already exists (ADR-029/037)
|
||||||
|
- No new tables or columns needed
|
||||||
|
- Per ADR-009: No TypeORM migrations (edit SQL directly if needed, but not needed here)
|
||||||
@@ -0,0 +1,147 @@
|
|||||||
|
# Implementation Plan: OCR Sidecar Refactor
|
||||||
|
|
||||||
|
**Branch**: `140-ocr-sidecar-refactor` | **Date**: 2026-06-20 | **Spec**: [spec.md](./spec.md)
|
||||||
|
**Input**: Feature specification from `/specs/100-Infrastructures/140-ocr-sidecar-refactor/spec.md`
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Refactor the OCR sidecar on Desk-5439 to address security vulnerabilities (hardcoded API keys, path traversal), implement async I/O for performance, preserve GPU resource management policies (Adaptive OCR Residency, CPU Fallback Retrieval), and align with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System. The sidecar becomes a pure compute worker with all orchestration and parameter governance moved to backend services.
|
||||||
|
|
||||||
|
## Technical Context
|
||||||
|
|
||||||
|
**Language/Version**: Python 3.11+ (FastAPI)
|
||||||
|
**Primary Dependencies**: FastAPI 0.111.0, httpx 0.27.0, PyMuPDF 1.24.0, typhoon-ocr>=0.4.1, FlagEmbedding>=1.2.0, pythainlp 5.0.4
|
||||||
|
**Storage**: No database access (ADR-023/023A boundary - sidecar is pure compute worker)
|
||||||
|
**Testing**: pytest for path-traversal and residency wiring tests
|
||||||
|
**Target Platform**: Desk-5439 (192.168.10.100, Windows 10/11, RTX 5060 Ti 16GB GPU) via Docker
|
||||||
|
**Project Type**: Infrastructure (sidecar service)
|
||||||
|
**Performance Goals**: 20%+ throughput improvement with async I/O; VRAM exhaustion prevention under load
|
||||||
|
**Constraints**: Must preserve LLM-First GPU Ownership; must not bypass existing residency_policy.py; must align with ADR-036 Gap-2 (keep_alive as lazy resource param)
|
||||||
|
**Scale/Scope**: Single sidecar service; affects backend AI services (OcrService, SandboxOcrEngineService)
|
||||||
|
|
||||||
|
## Constitution Check
|
||||||
|
|
||||||
|
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
|
||||||
|
|
||||||
|
| Gate | Status | Justification |
|
||||||
|
|------|--------|---------------|
|
||||||
|
| ADR-019 UUID | ✅ PASS | Sidecar N/A (pure compute worker), Backend applies ADR-019 (parameter resolution in OcrService/SandboxOcrEngineService) |
|
||||||
|
| ADR-009 Schema | N/A | No database schema changes in sidecar |
|
||||||
|
| ADR-016 Security | ✅ PASS | Path traversal hardening; no hardcoded secrets; network isolation auth |
|
||||||
|
| ADR-002 Numbering | N/A | No document numbering in sidecar |
|
||||||
|
| ADR-008 BullMQ | N/A | Sidecar does not use BullMQ (backend does) |
|
||||||
|
| ADR-023/023A AI Boundary | ✅ PASS | Sidecar is pure compute worker; no DB/storage access; AI → DMS API → DB pattern preserved |
|
||||||
|
| ADR-007 Errors | ✅ PASS | FastAPI exception handling with user-friendly messages |
|
||||||
|
| TypeScript Strict | N/A | Python codebase |
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
### Documentation (this feature)
|
||||||
|
|
||||||
|
```text
|
||||||
|
specs/100-Infrastructures/140-ocr-sidecar-refactor/
|
||||||
|
├── spec.md # Feature specification
|
||||||
|
├── plan.md # This file
|
||||||
|
├── research.md # Phase 0 output (technical decisions from ADR-040)
|
||||||
|
├── data-model.md # Phase 1 output (data contracts)
|
||||||
|
├── quickstart.md # Phase 1 output (deployment guide)
|
||||||
|
├── contracts/ # Phase 1 output (API contracts)
|
||||||
|
│ └── sidecar-api.md # Sidecar API specification
|
||||||
|
└── tasks.md # Phase 2 output (implementation tasks)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Source Code
|
||||||
|
|
||||||
|
```text
|
||||||
|
specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/
|
||||||
|
├── app.py # FastAPI application (main refactor target)
|
||||||
|
├── residency_policy.py # Retain (Adaptive OCR Residency)
|
||||||
|
├── vram_monitor.py # Retain (VRAM monitoring)
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
├── Dockerfile # Container definition
|
||||||
|
├── docker-compose.yml # Orchestration
|
||||||
|
└── .env # Environment variables
|
||||||
|
|
||||||
|
backend/src/modules/ai/
|
||||||
|
├── services/
|
||||||
|
│ ├── ocr.service.ts # Parameter resolution + sidecar calls
|
||||||
|
│ └── sandbox-ocr-engine.service.ts # Sandbox parameter resolution
|
||||||
|
└── processors/
|
||||||
|
└── ai-batch.processor.ts # BullMQ processor (unchanged)
|
||||||
|
|
||||||
|
tests/
|
||||||
|
├── unit/
|
||||||
|
│ └── ocr-sidecar/ # Sidecar unit tests
|
||||||
|
│ ├── test_path_traversal.py # Path traversal tests
|
||||||
|
│ └── test_residency_wiring.py # Residency calculation tests
|
||||||
|
└── integration/
|
||||||
|
└── ocr-sidecar/ # Sidecar integration tests
|
||||||
|
```
|
||||||
|
|
||||||
|
**Structure Decision**: Infrastructure refactor targeting existing OCR sidecar on Desk-5439. Backend changes limited to parameter resolution in AI services. No new frontend changes.
|
||||||
|
|
||||||
|
## Complexity Tracking
|
||||||
|
|
||||||
|
> No constitution violations - all gates pass. This section not applicable.
|
||||||
|
|
||||||
|
## Phase 0: Research & Technical Decisions
|
||||||
|
|
||||||
|
All technical decisions are already documented in ADR-040. Key decisions:
|
||||||
|
|
||||||
|
### Security Decisions
|
||||||
|
- **Decision**: Remove hardcoded default API key; fail-fast if env missing
|
||||||
|
- **Rationale**: Security vulnerability - leaked key cannot be rotated without rebuild
|
||||||
|
- **Decision**: Implement path canonicalization + base-path whitelist
|
||||||
|
- **Rationale**: Prevent path traversal attacks (ADR-016)
|
||||||
|
|
||||||
|
### I/O Pattern Decisions
|
||||||
|
- **Decision**: Refactor to async I/O with shared AsyncClient via lifespan
|
||||||
|
- **Rationale**: Synchronous blocking I/O reduces throughput under load
|
||||||
|
- **Decision**: Replace `@app.on_event("startup")` with lifespan context manager
|
||||||
|
- **Rationale**: Deprecated pattern; lifespan provides better resource management
|
||||||
|
|
||||||
|
### GPU Resource Management Decisions
|
||||||
|
- **Decision**: Wire `calculate_ocr_residency()` into `process_ocr` for dynamic keep_alive
|
||||||
|
- **Rationale**: Preserve Adaptive OCR Residency policy (CONTEXT.md); avoid fixed values
|
||||||
|
- **Decision**: Retain vram_monitor.py and residency_policy.py
|
||||||
|
- **Rationale**: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
|
||||||
|
- **Decision**: Reject forced GPU-resident BGE-M3/Reranker
|
||||||
|
- **Rationale**: CPU fallback is required for VRAM pressure scenarios
|
||||||
|
|
||||||
|
### Parameter Governance Decisions
|
||||||
|
- **Decision**: Remove hardcoded runtime params; accept from backend job snapshot
|
||||||
|
- **Rationale**: ADR-036 Profile-Only Parameter Governance; dynamic tuning without rebuild
|
||||||
|
- **Decision**: Backend resolves systemPrompt and DMS tags from Active Prompt
|
||||||
|
- **Rationale**: ADR-029/037 Active Prompt System; prompt authority in DB not code
|
||||||
|
- **Decision**: Reject creating PromptBuilderService
|
||||||
|
- **Rationale**: Use existing Active Prompt system; avoid invented orchestration
|
||||||
|
|
||||||
|
### Auth Decisions
|
||||||
|
- **Decision**: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
|
||||||
|
- **Rationale**: Sequenced migration; network isolation only possible post-consolidation
|
||||||
|
- **Decision**: Interim period requires X-API-Key validation
|
||||||
|
- **Rationale**: Cross-host topology (before ADR-041) requires defense-in-depth
|
||||||
|
|
||||||
|
### Endpoint Decisions
|
||||||
|
- **Decision**: Remove /normalize endpoint
|
||||||
|
- **Rationale**: No consumers (verified by grep); ThaiPreprocessProcessor unused
|
||||||
|
- **Decision**: Fix mutable default argument `options_override={}`
|
||||||
|
- **Rationale**: Python anti-pattern; causes unexpected behavior
|
||||||
|
|
||||||
|
## Phase 1: Design & Contracts
|
||||||
|
|
||||||
|
### Data Model
|
||||||
|
|
||||||
|
See [data-model.md](./data-model.md) for detailed data contracts and entity relationships.
|
||||||
|
|
||||||
|
### API Contracts
|
||||||
|
|
||||||
|
See [contracts/sidecar-api.md](./contracts/sidecar-api.md) for sidecar API specification.
|
||||||
|
|
||||||
|
### Quickstart Guide
|
||||||
|
|
||||||
|
See [quickstart.md](./quickstart.md) for deployment and testing instructions.
|
||||||
|
|
||||||
|
## Phase 2: Implementation (Tasks)
|
||||||
|
|
||||||
|
See [tasks.md](./tasks.md) for detailed implementation tasks generated by `/speckit-tasks`.
|
||||||
@@ -0,0 +1,374 @@
|
|||||||
|
# Quickstart: OCR Sidecar Refactor
|
||||||
|
|
||||||
|
**Date**: 2026-06-20
|
||||||
|
**Purpose**: Deployment and testing guide for OCR sidecar refactor
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Access to Desk-5439 (192.168.10.100) with Docker
|
||||||
|
- Access to backend services (QNAP 192.168.10.8)
|
||||||
|
- Python 3.11+ for local testing (optional)
|
||||||
|
- pytest for testing (optional)
|
||||||
|
|
||||||
|
## Phase 1: Deployment (Before ADR-041 Consolidation)
|
||||||
|
|
||||||
|
### Step 1: Update Sidecar Code
|
||||||
|
|
||||||
|
1. Navigate to sidecar directory:
|
||||||
|
```bash
|
||||||
|
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Update `app.py` with the following changes:
|
||||||
|
- Remove hardcoded default API key
|
||||||
|
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
|
||||||
|
- Implement async I/O with `httpx.AsyncClient` via lifespan
|
||||||
|
- Replace `@app.on_event("startup")` with lifespan context manager
|
||||||
|
- Wire `calculate_ocr_residency()` into `process_ocr`
|
||||||
|
- Implement path canonicalization + base-path whitelist on `/ocr`
|
||||||
|
- Remove hardcoded runtime parameters
|
||||||
|
- Receive systemPrompt and DMS tags from backend
|
||||||
|
- Remove `/normalize` endpoint
|
||||||
|
- Fix mutable default argument `options_override={}`
|
||||||
|
- Load models via `asyncio.to_thread` during lifespan
|
||||||
|
|
||||||
|
3. Update `requirements.txt`:
|
||||||
|
```text
|
||||||
|
PyMuPDF==1.24.0
|
||||||
|
fastapi==0.111.0
|
||||||
|
uvicorn[standard]==0.30.1
|
||||||
|
python-multipart==0.0.9
|
||||||
|
httpx==0.27.0
|
||||||
|
FlagEmbedding>=1.2.0
|
||||||
|
typhoon-ocr>=0.4.1
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Update `.env`:
|
||||||
|
```bash
|
||||||
|
# Phase 1 (before ADR-041)
|
||||||
|
OCR_SIDECAR_API_KEY=your-secure-api-key-here
|
||||||
|
|
||||||
|
# Common variables
|
||||||
|
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
|
||||||
|
OLLAMA_API_URL=http://localhost:11434
|
||||||
|
OCR_MODEL=np-dms-ocr:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Update Backend Services
|
||||||
|
|
||||||
|
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
|
||||||
|
- Add parameter resolution from `ai_execution_profiles` (row `ocr-extract`)
|
||||||
|
- Add Active Prompt resolution from `ai_prompts` (type `ocr_extraction`)
|
||||||
|
- Extract systemPrompt and DMS tags from Active Prompt
|
||||||
|
- Send resolved parameters to sidecar in OCR requests
|
||||||
|
- Keep X-API-Key send-side (Phase 1)
|
||||||
|
|
||||||
|
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
|
||||||
|
- Same parameter resolution pattern as OcrService
|
||||||
|
- Keep X-API-Key send-side (Phase 1)
|
||||||
|
|
||||||
|
3. Update backend `.env`:
|
||||||
|
```bash
|
||||||
|
# Phase 1 (before ADR-041)
|
||||||
|
OCR_API_URL=http://192.168.10.100:8765
|
||||||
|
OCR_API_KEY=your-secure-api-key-here
|
||||||
|
|
||||||
|
# Common variables
|
||||||
|
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Rebuild and Deploy Sidecar
|
||||||
|
|
||||||
|
1. Build Docker image on Desk-5439:
|
||||||
|
```bash
|
||||||
|
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
|
||||||
|
docker-compose build
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Stop existing container:
|
||||||
|
```bash
|
||||||
|
docker-compose down
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Start new container:
|
||||||
|
```bash
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Verify health:
|
||||||
|
```bash
|
||||||
|
curl http://192.168.10.100:8765/health
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "healthy",
|
||||||
|
"timestamp": "2026-06-20T10:30:00Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Deploy Backend Changes
|
||||||
|
|
||||||
|
1. Build backend:
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
pnpm run build
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Deploy backend containers (via existing deploy script or manual):
|
||||||
|
```bash
|
||||||
|
# From repo root
|
||||||
|
./scripts/deploy.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Verify backend health:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:3001/api/ai/health
|
||||||
|
```
|
||||||
|
|
||||||
|
## Phase 2: Deployment (After ADR-041 Consolidation)
|
||||||
|
|
||||||
|
**Note**: This phase can only be executed after ADR-041 server consolidation completes (single Docker host).
|
||||||
|
|
||||||
|
### Step 1: Remove X-API-Key from Sidecar
|
||||||
|
|
||||||
|
1. Update `app.py` on sidecar:
|
||||||
|
- Remove X-API-Key validation from all endpoints
|
||||||
|
- Remove `OCR_SIDECAR_API_KEY` environment variable check
|
||||||
|
|
||||||
|
2. Update `.env` on sidecar:
|
||||||
|
```bash
|
||||||
|
# Remove OCR_SIDECAR_API_KEY line
|
||||||
|
# Keep common variables
|
||||||
|
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
|
||||||
|
OLLAMA_API_URL=http://localhost:11434
|
||||||
|
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Rebuild and redeploy sidecar:
|
||||||
|
```bash
|
||||||
|
docker-compose down
|
||||||
|
docker-compose build
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Remove X-API-Key from Backend
|
||||||
|
|
||||||
|
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
|
||||||
|
- Remove X-API-Key header from sidecar requests
|
||||||
|
- Remove `OCR_API_KEY` environment variable usage
|
||||||
|
|
||||||
|
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
|
||||||
|
- Remove X-API-Key header from sidecar requests
|
||||||
|
- Remove `OCR_API_KEY` environment variable usage
|
||||||
|
|
||||||
|
3. Update backend `.env`:
|
||||||
|
```bash
|
||||||
|
# Remove OCR_API_KEY line
|
||||||
|
# Keep common variables
|
||||||
|
OCR_API_URL=http://sidecar:8765 # Docker-internal URL
|
||||||
|
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Rebuild and redeploy backend:
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
pnpm run build
|
||||||
|
./scripts/deploy.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Unit Tests (Sidecar)
|
||||||
|
|
||||||
|
1. Navigate to sidecar tests directory:
|
||||||
|
```bash
|
||||||
|
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Run path traversal tests:
|
||||||
|
```bash
|
||||||
|
pytest test_path_traversal.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output: All tests pass, path traversal attempts return 403
|
||||||
|
|
||||||
|
3. Run residency wiring tests:
|
||||||
|
```bash
|
||||||
|
pytest test_residency_wiring.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output: All tests pass, `calculate_ocr_residency()` is called correctly
|
||||||
|
|
||||||
|
### Integration Tests (Backend)
|
||||||
|
|
||||||
|
1. Run backend AI service tests:
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
pnpm test ai/ocr.service.spec.ts
|
||||||
|
pnpm test ai/sandbox-ocr-engine.service.spec.ts
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Verify parameter resolution from database:
|
||||||
|
- Check that `ai_execution_profiles` row `ocr-extract` exists
|
||||||
|
- Check that `ai_prompts` has active row for `ocr_extraction` type
|
||||||
|
- Verify parameters are correctly resolved and sent to sidecar
|
||||||
|
|
||||||
|
### Manual Testing
|
||||||
|
|
||||||
|
1. Test path traversal protection:
|
||||||
|
```bash
|
||||||
|
curl -X POST http://192.168.10.100:8765/ocr \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "X-API-Key: your-api-key" \
|
||||||
|
-d '{
|
||||||
|
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
|
||||||
|
"runtime_params": {
|
||||||
|
"temperature": 0.7,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"repeat_penalty": 1.1,
|
||||||
|
"max_tokens": 4096
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `403 Forbidden`
|
||||||
|
|
||||||
|
2. Test valid OCR request:
|
||||||
|
```bash
|
||||||
|
curl -X POST http://192.168.10.100:8765/ocr \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "X-API-Key: your-api-key" \
|
||||||
|
-d '{
|
||||||
|
"pdf_path": "/mnt/uploads/temp/test.pdf",
|
||||||
|
"runtime_params": {
|
||||||
|
"temperature": 0.7,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"repeat_penalty": 1.1,
|
||||||
|
"max_tokens": 4096
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `200 OK` with extracted text
|
||||||
|
|
||||||
|
3. Test parameter governance:
|
||||||
|
- Modify `ai_execution_profiles` row `ocr-extract` parameters
|
||||||
|
- Run OCR request
|
||||||
|
- Verify new parameters are used (check sidecar logs)
|
||||||
|
|
||||||
|
4. Test Active Prompt integration:
|
||||||
|
- Modify active prompt in `ai_prompts` for `ocr_extraction`
|
||||||
|
- Run OCR request
|
||||||
|
- Verify new system prompt is used
|
||||||
|
|
||||||
|
## Performance Testing
|
||||||
|
|
||||||
|
1. Benchmark async vs sync I/O:
|
||||||
|
```bash
|
||||||
|
# Use Apache Bench or similar tool
|
||||||
|
ab -n 1000 -c 10 -p ocr_request.json -T application/json \
|
||||||
|
http://192.168.10.100:8765/ocr
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: 20%+ throughput improvement with async I/O
|
||||||
|
|
||||||
|
2. Monitor VRAM usage:
|
||||||
|
```bash
|
||||||
|
# On Desk-5439, monitor GPU usage during OCR operations
|
||||||
|
nvidia-smi -l 1
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: VRAM usage stays within limits, no exhaustion
|
||||||
|
|
||||||
|
## Monitoring
|
||||||
|
|
||||||
|
### Health Checks
|
||||||
|
|
||||||
|
- Sidecar health: `GET http://192.168.10.100:8765/health`
|
||||||
|
- Backend AI health: `GET http://localhost:3001/api/ai/health`
|
||||||
|
|
||||||
|
### Logs
|
||||||
|
|
||||||
|
- Sidecar logs: `docker-compose logs -f ocr-sidecar`
|
||||||
|
- Backend logs: Check backend application logs
|
||||||
|
|
||||||
|
### Metrics
|
||||||
|
|
||||||
|
- Monitor OCR request latency
|
||||||
|
- Monitor VRAM usage on Desk-5439
|
||||||
|
- Monitor error rates (403 for path traversal, 500 for internal errors)
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
|
||||||
|
If issues arise during deployment:
|
||||||
|
|
||||||
|
### Rollback Sidecar
|
||||||
|
|
||||||
|
1. Revert `app.py` to previous version
|
||||||
|
2. Restore previous `.env` file
|
||||||
|
3. Rebuild and redeploy:
|
||||||
|
```bash
|
||||||
|
docker-compose down
|
||||||
|
docker-compose build
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rollback Backend
|
||||||
|
|
||||||
|
1. Revert service changes in `ocr.service.ts` and `sandbox-ocr-engine.service.ts`
|
||||||
|
2. Restore previous `.env` file
|
||||||
|
3. Rebuild and redeploy:
|
||||||
|
```bash
|
||||||
|
cd backend
|
||||||
|
pnpm run build
|
||||||
|
./scripts/deploy.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Emergency Rollback
|
||||||
|
|
||||||
|
If immediate rollback is needed:
|
||||||
|
1. Revert `keep_alive` to fixed value `0` in `process_ocr`
|
||||||
|
2. Restore hardcoded runtime parameters
|
||||||
|
3. Restore X-API-Key validation
|
||||||
|
4. Rebuild and redeploy
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Sidecar fails to start
|
||||||
|
|
||||||
|
1. Check environment variables are set correctly
|
||||||
|
2. Check `OCR_SIDECAR_API_KEY` is provided (Phase 1)
|
||||||
|
3. Check Docker logs: `docker-compose logs ocr-sidecar`
|
||||||
|
4. Verify Ollama is running on Desk-5439
|
||||||
|
|
||||||
|
### Path traversal returns 200 instead of 403
|
||||||
|
|
||||||
|
1. Verify `OCR_SIDECAR_UPLOAD_BASE` is set correctly
|
||||||
|
2. Check path canonicalization logic in `app.py`
|
||||||
|
3. Test with absolute paths to verify whitelist check
|
||||||
|
|
||||||
|
### Parameters not being used
|
||||||
|
|
||||||
|
1. Check `ai_execution_profiles` row `ocr-extract` exists
|
||||||
|
2. Check backend service parameter resolution logic
|
||||||
|
3. Check sidecar receives parameters in request body
|
||||||
|
4. Check sidecar passes parameters to Ollama
|
||||||
|
|
||||||
|
### VRAM exhaustion
|
||||||
|
|
||||||
|
1. Check `calculate_ocr_residency()` is being called
|
||||||
|
2. Check `vram_monitor.py` and `residency_policy.py` are present
|
||||||
|
3. Verify CPU fallback is working for `/embed` and `/rerank`
|
||||||
|
4. Monitor GPU usage with `nvidia-smi`
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- ADR-040: OCR Sidecar Refactor
|
||||||
|
- ADR-036: Profile-Only Parameter Governance
|
||||||
|
- ADR-029: Dynamic Prompt Management
|
||||||
|
- ADR-037: Active Prompt System
|
||||||
|
- ADR-041: Server Consolidation (dependency for Phase 2)
|
||||||
|
- [Sidecar API Contract](./contracts/sidecar-api.md)
|
||||||
@@ -0,0 +1,179 @@
|
|||||||
|
# Research: OCR Sidecar Refactor
|
||||||
|
|
||||||
|
**Date**: 2026-06-20
|
||||||
|
**Purpose**: Document technical decisions and research findings from ADR-040
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
All technical decisions for this refactor are already documented in ADR-040. This file consolidates those decisions for implementation reference.
|
||||||
|
|
||||||
|
## Security Decisions
|
||||||
|
|
||||||
|
### Hardcoded API Key Removal
|
||||||
|
- **Decision**: Remove hardcoded default API key (`lcbp3-dms-ocr-sidecar-secure-token-2026`) from `app.py`
|
||||||
|
- **Rationale**: Security vulnerability - if leaked, key cannot be rotated without rebuilding container
|
||||||
|
- **Implementation**: Fail-fast if `OCR_SIDECAR_API_KEY` environment variable is missing
|
||||||
|
- **Phase**: Phase 1 (before ADR-041 consolidation)
|
||||||
|
|
||||||
|
### Path Traversal Hardening
|
||||||
|
- **Decision**: Implement path canonicalization + base-path whitelist on `/ocr` endpoint
|
||||||
|
- **Rationale**: Prevent arbitrary file read attacks (ADR-016)
|
||||||
|
- **Implementation**:
|
||||||
|
- Use `os.path.abspath()` + `os.path.realpath()` for canonicalization
|
||||||
|
- Whitelist base path = `OCR_SIDECAR_UPLOAD_BASE` (CIFS mount base)
|
||||||
|
- Reject paths outside base path → 403 Forbidden
|
||||||
|
- **Alternatives Considered**:
|
||||||
|
- Using path validation regex only → rejected (insufficient for symlink attacks)
|
||||||
|
- Chroot jail → rejected (overkill for this use case)
|
||||||
|
|
||||||
|
## I/O Pattern Decisions
|
||||||
|
|
||||||
|
### Async I/O Refactor
|
||||||
|
- **Decision**: Refactor `process_ocr` to `async def` and use `httpx.AsyncClient` shared via lifespan
|
||||||
|
- **Rationale**: Synchronous blocking I/O reduces throughput under load; FastAPI event loop blocked
|
||||||
|
- **Implementation**:
|
||||||
|
- Replace `httpx.Client` with `httpx.AsyncClient`
|
||||||
|
- Create AsyncClient in lifespan context manager
|
||||||
|
- Load models via `asyncio.to_thread` to avoid blocking startup
|
||||||
|
- **Performance Target**: 20%+ throughput improvement under concurrent load
|
||||||
|
- **Alternatives Considered**:
|
||||||
|
- Keep sync I/O but add more workers → rejected (still blocks event loop)
|
||||||
|
- Use thread pool → rejected (adds complexity without solving root cause)
|
||||||
|
|
||||||
|
### Lifespan Pattern
|
||||||
|
- **Decision**: Replace `@app.on_event("startup")` with `@asynccontextmanager` lifespan
|
||||||
|
- **Rationale**: Deprecated pattern; lifespan provides better resource management and cleanup
|
||||||
|
- **Implementation**: Use FastAPI lifespan context manager for AsyncClient lifecycle
|
||||||
|
|
||||||
|
## GPU Resource Management Decisions
|
||||||
|
|
||||||
|
### Adaptive OCR Residency
|
||||||
|
- **Decision**: Wire `calculate_ocr_residency(active_profile)` into `process_ocr` for dynamic `keep_alive`
|
||||||
|
- **Rationale**: Preserve Adaptive OCR Residency policy from CONTEXT.md; avoid fixed values
|
||||||
|
- **Implementation**:
|
||||||
|
- Import `calculate_ocr_residency` from `residency_policy.py`
|
||||||
|
- Call function during OCR request to calculate appropriate keep_alive
|
||||||
|
- Do NOT accept explicit `options_override["keep_alive"]` from backend
|
||||||
|
- keep_alive is a lazy resource parameter calculated at process time (ADR-036 Gap-2)
|
||||||
|
- **Alternatives Rejected**:
|
||||||
|
- Fixed `keep_alive=0` (Claude plan) → rejected (violates ADR-036 Gap-2)
|
||||||
|
- Fixed `keep_alive=10m` (Qwen plan) → rejected (violates adaptive policy)
|
||||||
|
|
||||||
|
### Retain VRAM Monitor and Residency Policy
|
||||||
|
- **Decision**: Retain `vram_monitor.py` and `residency_policy.py` modules
|
||||||
|
- **Rationale**: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
|
||||||
|
- **Alternatives Rejected**:
|
||||||
|
- Delete these modules (Claude + Qwen plans) → rejected (violates CONTEXT.md resolved GPU policies)
|
||||||
|
|
||||||
|
### CPU Fallback for Retrieval
|
||||||
|
- **Decision**: Retain dynamic CPU/GPU selection for `/embed` and `/rerank` via `.to(device)` logic
|
||||||
|
- **Rationale**: CPU fallback required when GPU is under pressure; prevents VRAM exhaustion
|
||||||
|
- **Alternatives Rejected**:
|
||||||
|
- Force BGE-M3 and Reranker GPU-resident → rejected (violates LLM-First policy)
|
||||||
|
|
||||||
|
## Parameter Governance Decisions
|
||||||
|
|
||||||
|
### Remove Hardcoded Runtime Parameters
|
||||||
|
- **Decision**: Remove hardcoded `temperature`, `top_p`, `repeat_penalty`, `max_tokens` from sidecar
|
||||||
|
- **Rationale**: ADR-036 Profile-Only Parameter Governance; enable dynamic tuning without rebuild
|
||||||
|
- **Implementation**:
|
||||||
|
- Backend resolves parameters from `ai_execution_profiles` row `ocr-extract`
|
||||||
|
- Backend sends parameters to sidecar in every request
|
||||||
|
- Sidecar passes parameters to Ollama in every load/generate call
|
||||||
|
- Modfile serves as last-resort fallback only
|
||||||
|
- **Alternatives Rejected**:
|
||||||
|
- Keep hardcoded values in sidecar → rejected (violates ADR-036)
|
||||||
|
- Create new `PromptBuilderService` → rejected (use existing Active Prompt system)
|
||||||
|
|
||||||
|
### Active Prompt Integration
|
||||||
|
- **Decision**: Backend resolves systemPrompt and DMS tags from Active Prompt in `ai_prompts`
|
||||||
|
- **Rationale**: ADR-029/037 Active Prompt System; prompt authority in database not code
|
||||||
|
- **Implementation**:
|
||||||
|
- Backend resolves Active Prompt for `ocr_extraction` type
|
||||||
|
- Backend extracts systemPrompt and DMS tags (`<document_number>`, `<document_date>`, `<received_date>`)
|
||||||
|
- Backend sends systemPrompt and DMS tags to sidecar
|
||||||
|
- Sidecar receives and injects into Ollama request in every load/generate call
|
||||||
|
- **Alternatives Rejected**:
|
||||||
|
- Create new `PromptBuilderService` → rejected (use existing ADR-029/037 system)
|
||||||
|
- Hardcode DMS tags in sidecar → rejected (violates ADR-036 parameter governance)
|
||||||
|
|
||||||
|
## Authentication Decisions
|
||||||
|
|
||||||
|
### Two-Phase Auth Migration
|
||||||
|
- **Decision**: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
|
||||||
|
- **Rationale**: Sequenced migration; network isolation only possible after server consolidation
|
||||||
|
- **Phase 1 Implementation**:
|
||||||
|
- Remove hardcoded default API key
|
||||||
|
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
|
||||||
|
- Continue validating X-API-Key on both sidecar and backend
|
||||||
|
- **Phase 2 Implementation** (after ADR-041 consolidation):
|
||||||
|
- Remove X-API-Key validation from sidecar endpoints
|
||||||
|
- Remove X-API-Key send-side from `OcrService`
|
||||||
|
- Remove X-API-Key send-side from `SandboxOcrEngineService`
|
||||||
|
- Rely on Docker-internal network isolation
|
||||||
|
- **Interim Period**: X-API-Key validation must remain active until ADR-041 cutover
|
||||||
|
- **Alternatives Considered**:
|
||||||
|
- Remove X-API-Key immediately → rejected (cross-host topology requires defense-in-depth)
|
||||||
|
- Keep X-API-Key permanently → rejected (adds complexity without value post-consolidation)
|
||||||
|
|
||||||
|
## Endpoint Decisions
|
||||||
|
|
||||||
|
### Remove /normalize Endpoint
|
||||||
|
- **Decision**: Remove `/normalize` endpoint from sidecar
|
||||||
|
- **Rationale**: No consumers exist (verified by grep across backend codebase); ThaiPreprocessProcessor unused
|
||||||
|
- **Verification**: Grep search found no calls to `/normalize` or `THAI_PREPROCESS_URL`
|
||||||
|
- **Impact**: None - endpoint has no consumers
|
||||||
|
|
||||||
|
### Fix Mutable Default Argument
|
||||||
|
- **Decision**: Fix mutable default argument `options_override={}` in `process_with_typhoon_ocr`
|
||||||
|
- **Rationale**: Python anti-pattern; causes unexpected behavior when defaults are mutated
|
||||||
|
- **Implementation**: Change to `options_override: dict = None` and initialize to `{}` in function body
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
### External Dependencies
|
||||||
|
- **FastAPI 0.111.0**: Web framework (already in use)
|
||||||
|
- **httpx 0.27.0**: Async HTTP client (upgrade from sync httpx)
|
||||||
|
- **PyMuPDF 1.24.0**: PDF processing (already in use)
|
||||||
|
- **typhoon-ocr>=0.4.1**: OCR library (already in use)
|
||||||
|
- **FlagEmbedding>=1.2.0**: Embedding model (already in use)
|
||||||
|
- **pythainlp 5.0.4**: Thai NLP (already in use)
|
||||||
|
|
||||||
|
### Internal Dependencies
|
||||||
|
- **residency_policy.py**: Must retain for Adaptive OCR Residency
|
||||||
|
- **vram_monitor.py**: Must retain for VRAM monitoring
|
||||||
|
- **backend AI services**: OcrService, SandboxOcrEngineService must be updated for parameter resolution
|
||||||
|
|
||||||
|
## Testing Strategy
|
||||||
|
|
||||||
|
### Path Traversal Tests
|
||||||
|
- Test cases for various path traversal patterns (`../../etc/passwd`, symlinks, etc.)
|
||||||
|
- Expect 403 Forbidden for all malicious paths
|
||||||
|
- Use pytest for automated testing
|
||||||
|
|
||||||
|
### Residency Wiring Tests
|
||||||
|
- Unit test to verify `calculate_ocr_residency()` is called in `process_ocr`
|
||||||
|
- Verify keep_alive value is calculated dynamically, not fixed
|
||||||
|
- Test with different VRAM pressure scenarios
|
||||||
|
|
||||||
|
### Performance Tests
|
||||||
|
- Benchmark async vs sync I/O under concurrent load
|
||||||
|
- Target: 20%+ throughput improvement
|
||||||
|
- Measure response times and resource utilization
|
||||||
|
|
||||||
|
## Rollback Plan
|
||||||
|
|
||||||
|
If issues arise during deployment:
|
||||||
|
1. Revert `app.py` to previous version
|
||||||
|
2. Restore X-API-Key send-side in backend services
|
||||||
|
3. Re-pin `keep_alive` default to `0` in `process_ocr`
|
||||||
|
4. Restore hardcoded runtime params if needed for emergency fallback
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- ADR-040: OCR Sidecar Refactor
|
||||||
|
- ADR-036: Profile-Only Parameter Governance
|
||||||
|
- ADR-029: Dynamic Prompt Management
|
||||||
|
- ADR-037: Active Prompt System
|
||||||
|
- ADR-041: Server Consolidation (dependency for Phase 2)
|
||||||
|
- CONTEXT.md: GPU Policy (LLM-First Ownership, CPU Fallback)
|
||||||
@@ -0,0 +1,168 @@
|
|||||||
|
# Feature Specification: OCR Sidecar Refactor
|
||||||
|
|
||||||
|
**Feature Branch**: `140-ocr-sidecar-refactor`
|
||||||
|
**Created**: 2026-06-20
|
||||||
|
**Status**: Draft
|
||||||
|
**Input**: ADR-040: OCR Sidecar Refactor — Pure Compute Worker, Preserved GPU Policy, Network-Trust Boundary
|
||||||
|
|
||||||
|
## User Scenarios & Testing _(mandatory)_
|
||||||
|
|
||||||
|
### User Story 1 - Sidecar Security Hardening (Priority: P1)
|
||||||
|
|
||||||
|
System administrators need to ensure the OCR sidecar on Desk-5439 is secure from path traversal attacks and does not contain hardcoded secrets that cannot be rotated without rebuilding containers.
|
||||||
|
|
||||||
|
**Why this priority**: Security vulnerabilities (hardcoded API keys, path traversal) are critical risks that could lead to unauthorized access and data breaches.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by attempting path traversal requests and verifying that hardcoded default keys are rejected when environment variables are missing, delivering immediate security validation.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** the sidecar is running with a leaked API key, **When** an attacker attempts to use it, **Then** the system should allow key rotation without container rebuild
|
||||||
|
2. **Given** a malicious request with path traversal (e.g., `../../etc/passwd`), **When** the `/ocr` endpoint receives the request, **Then** the system returns 403 Forbidden
|
||||||
|
3. **Given** the sidecar starts without `OCR_SIDECAR_API_KEY` environment variable, **When** the container initializes, **Then** it fails fast with clear error message
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 2 - GPU Resource Management (Priority: P1)
|
||||||
|
|
||||||
|
The system must prevent VRAM exhaustion on Desk-5439 (RTX 5060 Ti 16GB) by implementing adaptive OCR residency policy and CPU fallback for retrieval models, ensuring the LLM (Typhoon-2.5) has priority GPU access.
|
||||||
|
|
||||||
|
**Why this priority**: VRAM exhaustion causes complete system failure. The LLM-First GPU Ownership policy is critical for system stability.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by monitoring VRAM usage during concurrent OCR and embedding operations, verifying that BGE-M3 and FlagReranker fall back to CPU when GPU is under pressure.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** the GPU is under heavy load from LLM operations, **When** an OCR request comes in, **Then** the system uses `calculate_ocr_residency()` to determine appropriate `keep_alive` value
|
||||||
|
2. **Given** VRAM is nearly full, **When** embedding or reranking requests are made, **Then** BGE-M3 and FlagReranker automatically fall back to CPU
|
||||||
|
3. **Given** the sidecar loads OCR model, **When** the operation completes, **Then** the model is unloaded based on residency policy (not fixed `keep_alive=0` or `300`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 3 - Parameter Governance via Active Prompt (Priority: P2)
|
||||||
|
|
||||||
|
Backend services need to control AI model parameters (temperature, top_p, repeat_penalty, max_tokens, keep_alive) from the database via `ai_execution_profiles` and `ai_prompts` tables, ensuring no hardcoded values in the sidecar.
|
||||||
|
|
||||||
|
**Why this priority**: This enables dynamic parameter tuning without container rebuilds, aligning with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by modifying `ai_execution_profiles` row `ocr-extract` and verifying that the sidecar uses the new parameters on the next request.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** the `ai_execution_profiles` row `ocr-extract` has `temperature=0.7`, **When** the backend sends OCR request, **Then** the sidecar passes `temperature=0.7` to Ollama
|
||||||
|
2. **Given** the Active Prompt in `ai_prompts` contains system prompt and DMS tags, **When** the backend resolves the prompt, **Then** the sidecar receives and injects these into the Ollama request
|
||||||
|
3. **Given** a parameter is missing from the job snapshot, **When** the sidecar processes the request, **Then** it uses Modfile as last-resort fallback only
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 4 - Async I/O Performance (Priority: P2)
|
||||||
|
|
||||||
|
The sidecar must use asynchronous I/O patterns to prevent blocking the FastAPI event loop, improving throughput and reducing latency for OCR operations.
|
||||||
|
|
||||||
|
**Why this priority**: Synchronous blocking I/O reduces system throughput and can cause request timeouts under load.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by running concurrent OCR requests and measuring response times, verifying that async implementation handles load without blocking.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** the sidecar receives multiple concurrent OCR requests, **When** processing with `httpx.AsyncClient`, **Then** requests do not block each other
|
||||||
|
2. **Given** the sidecar starts up, **When** models are loaded, **Then** loading happens via `asyncio.to_thread` to avoid blocking startup
|
||||||
|
3. **Given** the sidecar is under load, **When** measuring request latency, **Then** async implementation shows improved throughput compared to sync version
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 5 - Network Isolation Auth (Phase 2, Post-Consolidation) (Priority: P3)
|
||||||
|
|
||||||
|
After ADR-041 server consolidation completes (single Docker host), the system should remove X-API-Key validation and rely solely on Docker-internal network isolation for authentication.
|
||||||
|
|
||||||
|
**Why this priority**: This is a future-phase improvement that simplifies the system after infrastructure consolidation. It's lower priority as it depends on ADR-041 completion.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested after consolidation by removing X-API-Key headers and verifying that requests from within Docker network succeed while external requests fail.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** ADR-041 consolidation is complete (single Docker host), **When** backend calls sidecar without X-API-Key, **Then** the request succeeds via Docker-internal network
|
||||||
|
2. **Given** consolidation is complete, **When** external network attempts to call sidecar, **Then** the request is blocked by network isolation
|
||||||
|
3. **Given** the interim period (before consolidation), **When** backend calls sidecar, **Then** X-API-Key validation is still active
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Edge Cases
|
||||||
|
|
||||||
|
- What happens when the OCR sidecar receives a request for a PDF file that does not exist within the whitelisted base path? (Tested via path traversal test T007)
|
||||||
|
- How does the system handle VRAM exhaustion when both LLM and OCR models attempt to load simultaneously?
|
||||||
|
- What happens when the `ai_execution_profiles` row `ocr-extract` is missing or has invalid parameter values?
|
||||||
|
- How does the sidecar handle Ollama service unavailability or timeout during OCR processing? (Handled by FastAPI exception handling with user-friendly error messages per ADR-007)
|
||||||
|
- What happens when the Active Prompt system is unavailable during OCR request processing?
|
||||||
|
- How does the system handle concurrent requests when GPU is under extreme pressure (e.g., 95% VRAM usage)?
|
||||||
|
- What happens when path canonicalization resolves to a symlink outside the base path? (Tested via path traversal test T007 with symlink scenarios)
|
||||||
|
- How does the system behave during the transition period between Phase 1 (X-API-Key) and Phase 2 (Network Isolation)?
|
||||||
|
|
||||||
|
## Requirements _(mandatory)_
|
||||||
|
|
||||||
|
### Functional Requirements
|
||||||
|
|
||||||
|
- **FR-001**: Sidecar MUST remove hardcoded default API key and fail-fast if `OCR_SIDECAR_API_KEY` environment variable is missing
|
||||||
|
- **FR-002**: Sidecar MUST implement path canonicalization via `os.path.abspath()` + `os.path.realpath()` on all PDF path inputs
|
||||||
|
- **FR-003**: Sidecar MUST enforce base-path whitelist check on `/ocr` endpoint, rejecting paths outside `OCR_SIDECAR_UPLOAD_BASE` with 403 Forbidden
|
||||||
|
- **FR-004**: Sidecar MUST refactor `process_ocr` to use `async def` and `httpx.AsyncClient` via lifespan context manager
|
||||||
|
- **FR-005**: Sidecar MUST replace `@app.on_event("startup")` with `@asynccontextmanager` lifespan pattern
|
||||||
|
- **FR-006**: Sidecar MUST wire `calculate_ocr_residency(active_profile)` into `process_ocr` for dynamic `keep_alive` calculation
|
||||||
|
- **FR-007**: Sidecar MUST NOT accept explicit `options_override["keep_alive"]` from backend (keep_alive must be calculated lazily per ADR-036 Gap-2)
|
||||||
|
- **FR-008**: Sidecar MUST retain `vram_monitor.py` and `residency_policy.py` modules (reject deletion)
|
||||||
|
- **FR-009**: Sidecar MUST retain dynamic CPU/GPU selection for `/embed` and `/rerank` endpoints via `.to(device)` logic
|
||||||
|
- **FR-010**: Sidecar MUST remove hardcoded runtime parameters (temperature, top_p, repeat_penalty, max_tokens) and accept from backend job snapshot
|
||||||
|
- **FR-011**: Sidecar MUST receive systemPrompt and DMS extraction tags from backend and pass to Ollama in every load/generate call
|
||||||
|
- **FR-012**: Sidecar MUST remove `/normalize` endpoint (ThaiPreprocessProcessor has no consumers)
|
||||||
|
- **FR-013**: Sidecar MUST fix mutable default argument `options_override={}` in `process_with_typhoon_ocr`
|
||||||
|
- **FR-014**: Sidecar MUST load models via `asyncio.to_thread` during lifespan to avoid blocking startup
|
||||||
|
- **FR-015**: Backend MUST resolve runtime parameters from `ai_execution_profiles` row `ocr-extract` and send to sidecar
|
||||||
|
- **FR-016**: Backend MUST resolve systemPrompt and DMS tags from Active Prompt in `ai_prompts` (ADR-029/037)
|
||||||
|
- **FR-017**: Backend MUST send resolved parameters to sidecar in every OCR request
|
||||||
|
- **FR-018**: Phase 2 (post-ADR-041): Sidecar MUST remove X-API-Key validation from all endpoints
|
||||||
|
- **FR-019**: Phase 2 (post-ADR-041): Backend MUST remove X-API-Key send-side in `OcrService`
|
||||||
|
- **FR-020**: Phase 2 (post-ADR-041): Backend MUST remove X-API-Key send-side in `SandboxOcrEngineService`
|
||||||
|
|
||||||
|
### Key Entities
|
||||||
|
|
||||||
|
- **OCR Sidecar (FastAPI Service)**: Pure compute worker on Desk-5439 that provides `/ocr`, `/embed`, `/rerank` endpoints. No business logic or parameter governance. Receives parameters from backend.
|
||||||
|
- **ai_execution_profiles**: Database table containing runtime parameter profiles for different AI operations (row `ocr-extract` for OCR parameters)
|
||||||
|
- **ai_prompts**: Database table containing prompt templates with versioning and activation status (ADR-029/037)
|
||||||
|
- **Backend OcrService**: Service that orchestrates OCR requests, resolves parameters from database, and sends to sidecar
|
||||||
|
- **Backend SandboxOcrEngineService**: Service for OCR sandbox testing, similar parameter resolution as OcrService
|
||||||
|
|
||||||
|
## Success Criteria _(mandatory)_
|
||||||
|
|
||||||
|
### Measurable Outcomes
|
||||||
|
|
||||||
|
- **SC-001**: Path traversal attacks return 403 Forbidden in 100% of test cases (verified by pytest suite)
|
||||||
|
- **SC-002**: VRAM exhaustion is prevented under load; system remains stable with LLM-First GPU Ownership policy (verified by VRAM monitoring during stress test)
|
||||||
|
- **SC-003**: OCR request throughput improves by at least 20% with async I/O implementation (measured by concurrent request benchmark)
|
||||||
|
- **SC-004**: Parameter changes in `ai_execution_profiles` take effect immediately without container rebuild (verified by runtime parameter update test)
|
||||||
|
- **SC-005**: System startup time does not increase despite async model loading (measured by container startup benchmark)
|
||||||
|
- **SC-006**: No hardcoded secrets remain in sidecar codebase (verified by code audit)
|
||||||
|
- **SC-007**: All sidecar endpoints respect network isolation after ADR-041 consolidation (verified by network access test)
|
||||||
|
- **SC-008**: CPU fallback for BGE-M3 and FlagReranker activates correctly when GPU is under pressure (verified by VRAM monitoring test)
|
||||||
|
|
||||||
|
## Assumptions
|
||||||
|
|
||||||
|
- ADR-041 server consolidation will complete before Phase 2 (X-API-Key removal) can be implemented
|
||||||
|
- Desk-5439 (192.168.10.100) will continue to host the OCR sidecar with RTX 5060 Ti 16GB GPU
|
||||||
|
- Ollama service on Desk-5439 will continue to provide Typhoon OCR model
|
||||||
|
- ThaiPreprocessProcessor has no active consumers (verified by grep search across backend codebase)
|
||||||
|
- `calculate_ocr_residency()` function exists in `residency_policy.py` and is not currently wired into `process_ocr`
|
||||||
|
- VLAN/firewall ACL provides interim network security before ADR-041 consolidation
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
- ADR-041 Server Consolidation must complete before Phase 2 (X-API-Key removal)
|
||||||
|
- ADR-036 Profile-Only Parameter Governance must be implemented for parameter resolution
|
||||||
|
- ADR-029 Dynamic Prompt Management must be implemented for Active Prompt system
|
||||||
|
- ADR-037 Active Prompt System must be operational for system prompt injection
|
||||||
|
- Desk-5439 infrastructure must remain stable (GPU, network, Ollama service)
|
||||||
|
|
||||||
|
## Out of Scope
|
||||||
|
|
||||||
|
- 1-page-1-request horizontal scaling rework (separate future ADR)
|
||||||
|
- OpenTelemetry/Prometheus/Grafana observability (separate ticket)
|
||||||
|
- `/normalize` endpoint functionality (removed per D2; ThaiPreprocessProcessor has no consumers)
|
||||||
@@ -0,0 +1,296 @@
|
|||||||
|
# Tasks: OCR Sidecar Refactor
|
||||||
|
|
||||||
|
**Input**: Design documents from `/specs/100-Infrastructures/140-ocr-sidecar-refactor/`
|
||||||
|
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/sidecar-api.md, quickstart.md
|
||||||
|
|
||||||
|
**Tests**: Tests are included for path-traversal protection and residency wiring (per spec acceptance criteria)
|
||||||
|
|
||||||
|
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
|
||||||
|
|
||||||
|
## Format: `[ID] [P?] [Story] Description`
|
||||||
|
|
||||||
|
- **[P]**: Can run in parallel (different files, no dependencies)
|
||||||
|
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
|
||||||
|
- Include exact file paths in descriptions
|
||||||
|
|
||||||
|
## Path Conventions
|
||||||
|
|
||||||
|
- **Sidecar**: `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/`
|
||||||
|
- **Backend**: `backend/src/modules/ai/`
|
||||||
|
- **Tests**: `tests/unit/ocr-sidecar/`, `tests/integration/ocr-sidecar/`
|
||||||
|
|
||||||
|
## Phase 1: Setup (Shared Infrastructure)
|
||||||
|
|
||||||
|
**Purpose**: Project initialization and basic structure
|
||||||
|
|
||||||
|
- [x] T001 Create test directory structure in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests/
|
||||||
|
- [x] T002 Create test directory structure in tests/unit/ocr-sidecar/
|
||||||
|
- [x] T003 Create test directory structure in tests/integration/ocr-sidecar/
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: Foundational (Blocking Prerequisites)
|
||||||
|
|
||||||
|
**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
|
||||||
|
|
||||||
|
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
|
||||||
|
|
||||||
|
- [x] T004 Update requirements.txt in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/requirements.txt (add httpx 0.27.0, remove numpy if present)
|
||||||
|
- [x] T005 Update .env template in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env (add OCR_SIDECAR_API_KEY placeholder)
|
||||||
|
- [x] T006 Update backend .env.example in backend/.env.example (add OCR_API_URL, OCR_API_KEY placeholders)
|
||||||
|
|
||||||
|
**Checkpoint**: Foundation ready - user story implementation can now begin in parallel
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: User Story 1 - Sidecar Security Hardening (Priority: P1) 🎯 MVP
|
||||||
|
|
||||||
|
**Goal**: Ensure the OCR sidecar is secure from path traversal attacks and does not contain hardcoded secrets that cannot be rotated without rebuilding containers.
|
||||||
|
|
||||||
|
**Independent Test**: Attempt path traversal requests and verify they return 403 Forbidden; verify sidecar fails fast when OCR_SIDECAR_API_KEY env is missing.
|
||||||
|
|
||||||
|
### Tests for User Story 1
|
||||||
|
|
||||||
|
- [x] T007 [P] [US1] Create path traversal test in tests/unit/ocr-sidecar/test_path_traversal.py (test various path patterns: ../../etc/passwd, symlinks outside base path, etc.)
|
||||||
|
- [x] T008 [P] [US1] Create API key validation test in tests/unit/ocr-sidecar/test_api_key_validation.py (test missing key, invalid key scenarios)
|
||||||
|
|
||||||
|
### Implementation for User Story 1
|
||||||
|
|
||||||
|
- [x] T009 [US1] Remove hardcoded default API key in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T010 [US1] Add fail-fast check for OCR_SIDECAR_API_KEY environment variable in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (raise error on startup if missing)
|
||||||
|
- [x] T011 [US1] Implement path canonicalization function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (using os.path.abspath + os.path.realpath)
|
||||||
|
- [x] T012 [US1] Implement base-path whitelist check in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check against OCR_SIDECAR_UPLOAD_BASE)
|
||||||
|
- [x] T013 [US1] Add path validation to POST /ocr endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (return 403 for invalid paths)
|
||||||
|
- [x] T014 [US1] Fix mutable default argument options_override={} in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (change to None and initialize in function body)
|
||||||
|
- [x] T015 [US1] Remove duplicate import tempfile in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
|
||||||
|
**Checkpoint**: At this point, User Story 1 should be fully functional and testable independently
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: User Story 2 - GPU Resource Management (Priority: P1)
|
||||||
|
|
||||||
|
**Goal**: Prevent VRAM exhaustion on Desk-5439 by implementing adaptive OCR residency policy and CPU fallback for retrieval models, ensuring LLM has priority GPU access.
|
||||||
|
|
||||||
|
**Independent Test**: Monitor VRAM usage during concurrent OCR and embedding operations; verify BGE-M3 and FlagReranker fall back to CPU when GPU is under pressure.
|
||||||
|
|
||||||
|
### Tests for User Story 2
|
||||||
|
|
||||||
|
- [x] T016 [P] [US2] Create residency wiring unit test in tests/unit/ocr-sidecar/test_residency_wiring.py (verify calculate_ocr_residency is called in process_ocr)
|
||||||
|
- [x] T017 [P] [US2] Create CPU fallback integration test in tests/integration/ocr-sidecar/test_cpu_fallback.py (verify BGE-M3 and FlagReranker use CPU when GPU under pressure)
|
||||||
|
|
||||||
|
### Implementation for User Story 2
|
||||||
|
|
||||||
|
- [x] T018 [US2] Import calculate_ocr_residency from residency_policy.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T019 [US2] Wire calculate_ocr_residency(active_profile) into process_ocr function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T020 [US2] Remove hardcoded keep_alive=0 in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T021 [US2] Reject explicit options_override["keep_alive"] from backend in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (keep_alive must be calculated lazily per ADR-036 Gap-2)
|
||||||
|
- [x] T022 [US2] Retain vram_monitor.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/ (ensure not deleted)
|
||||||
|
- [x] T023 [US2] Retain residency_policy.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/ (ensure not deleted)
|
||||||
|
- [x] T024 [US2] Verify dynamic CPU/GPU selection exists for /embed endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check .to(device) logic)
|
||||||
|
- [x] T025 [US2] Verify dynamic CPU/GPU selection exists for /rerank endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check .to(device) logic)
|
||||||
|
|
||||||
|
**Checkpoint**: At this point, User Stories 1 AND 2 should both work independently
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5: User Story 3 - Parameter Governance via Active Prompt (Priority: P2)
|
||||||
|
|
||||||
|
**Goal**: Enable backend services to control AI model parameters from the database via ai_execution_profiles and ai_prompts tables, ensuring no hardcoded values in the sidecar.
|
||||||
|
|
||||||
|
**Independent Test**: Modify ai_execution_profiles row ocr-extract and verify that the sidecar uses the new parameters on the next request.
|
||||||
|
|
||||||
|
### Tests for User Story 3
|
||||||
|
|
||||||
|
- [x] T026 [P] [US3] Create parameter resolution integration test in tests/integration/ocr-sidecar/test_parameter_governance.py (verify parameters from ai_execution_profiles are used)
|
||||||
|
- [x] T027 [P] [US3] Create Active Prompt integration test in tests/integration/ocr-sidecar/test_active_prompt.py (verify systemPrompt and DMS tags from ai_prompts are used)
|
||||||
|
|
||||||
|
### Implementation for User Story 3
|
||||||
|
|
||||||
|
- [x] T028 [US3] Remove hardcoded runtime parameters (temperature, top_p, repeat_penalty, max_tokens) in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T029 [US3] Add runtime_params field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T030 [US3] Add system_prompt field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T031 [US3] Add dms_tags field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T032 [US3] Pass runtime_params to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T033 [US3] Pass system_prompt to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (inject into every load/generate call)
|
||||||
|
- [x] T034 [US3] Pass dms_tags to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (inject into every load/generate call)
|
||||||
|
- [x] T035 [US3] Implement parameter resolution in backend/src/modules/ai/services/ocr.service.ts (resolve from ai_execution_profiles row ocr-extract)
|
||||||
|
- [x] T036 [US3] Implement Active Prompt resolution in backend/src/modules/ai/services/ocr.service.ts (resolve from ai_prompts type ocr_extraction)
|
||||||
|
- [x] T037 [US3] Extract systemPrompt and DMS tags in backend/src/modules/ai/services/ocr.service.ts
|
||||||
|
- [x] T038 [US3] Send resolved parameters to sidecar in backend/src/modules/ai/services/ocr.service.ts
|
||||||
|
- [x] T039 [US3] Implement parameter resolution in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts (same pattern as ocr.service.ts)
|
||||||
|
- [x] T040 [US3] Implement Active Prompt resolution in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts (same pattern as ocr.service.ts)
|
||||||
|
|
||||||
|
**Checkpoint**: All user stories should now be independently functional
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6: User Story 4 - Async I/O Performance (Priority: P2)
|
||||||
|
|
||||||
|
**Goal**: Use asynchronous I/O patterns to prevent blocking the FastAPI event loop, improving throughput and reducing latency for OCR operations.
|
||||||
|
|
||||||
|
**Independent Test**: Run concurrent OCR requests and measure response times; verify async implementation handles load without blocking.
|
||||||
|
|
||||||
|
### Tests for User Story 4
|
||||||
|
|
||||||
|
- [x] T041 [P] [US4] Create async I/O performance test in tests/integration/ocr-sidecar/test_async_performance.py (benchmark concurrent requests)
|
||||||
|
|
||||||
|
### Implementation for User Story 4
|
||||||
|
|
||||||
|
- [x] T042 [US4] Refactor process_ocr to async def in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T043 [US4] Create AsyncClient via lifespan context manager in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T044 [US4] Replace httpx.Client with httpx.AsyncClient in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T045 [US4] Replace @app.on_event("startup") with @asynccontextmanager lifespan in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T046 [US4] Load models via asyncio.to_thread during lifespan in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (avoid blocking startup)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 7: User Story 5 - Network Isolation Auth Phase 2 (Priority: P3)
|
||||||
|
|
||||||
|
**Goal**: After ADR-041 server consolidation completes, remove X-API-Key validation and rely solely on Docker-internal network isolation for authentication.
|
||||||
|
|
||||||
|
**Independent Test**: After consolidation, remove X-API-Key headers and verify that requests from within Docker network succeed while external requests fail.
|
||||||
|
|
||||||
|
### Tests for User Story 5
|
||||||
|
|
||||||
|
- [ ] T047 [P] [US5] Create network isolation test in tests/integration/ocr-sidecar/test_network_isolation.py (verify Docker-internal requests work, external requests fail)
|
||||||
|
|
||||||
|
### Implementation for User Story 5 (BLOCKED until ADR-041 consolidation complete)
|
||||||
|
|
||||||
|
- [ ] T048 [US5] Remove X-API-Key validation from all endpoints in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [ ] T049 [US5] Remove OCR_SIDECAR_API_KEY from .env in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env
|
||||||
|
- [ ] T050 [US5] Remove X-API-Key send-side in backend/src/modules/ai/services/ocr.service.ts
|
||||||
|
- [ ] T051 [US5] Remove X-API-Key send-side in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts
|
||||||
|
- [ ] T052 [US5] Remove OCR_API_KEY from backend .env in backend/.env
|
||||||
|
- [ ] T053 [US5] Update OCR_API_URL to Docker-internal URL in backend/.env (e.g., http://sidecar:8765)
|
||||||
|
|
||||||
|
**Note**: Phase 7 tasks are BLOCKED until ADR-041 server consolidation completes. Do not implement until ADR-041 cutover is successful.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 8: Remove /normalize Endpoint (Cross-Cutting)
|
||||||
|
|
||||||
|
**Purpose**: Remove unused /normalize endpoint per ADR-040 D2
|
||||||
|
|
||||||
|
- [x] T054 Remove /normalize endpoint from specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
|
||||||
|
- [x] T055 Verify no consumers exist via grep search in backend codebase
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 9: Polish & Cross-Cutting Concerns
|
||||||
|
|
||||||
|
**Purpose**: Improvements that affect multiple user stories
|
||||||
|
|
||||||
|
- [x] T056 [P] Update Dockerfile in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/Dockerfile (if any changes needed)
|
||||||
|
- [x] T057 [P] Update docker-compose.yml in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/docker-compose.yml (if any changes needed)
|
||||||
|
- [x] T058 Run path traversal test suite and verify all tests pass
|
||||||
|
- [x] T059 Run residency wiring test suite and verify all tests pass
|
||||||
|
- [x] T060 Run parameter governance test suite and verify all tests pass
|
||||||
|
- [x] T061 Run async performance test and verify 20%+ throughput improvement
|
||||||
|
- [x] T062 Update documentation in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/README.md
|
||||||
|
- [x] T063 Validate quickstart.md deployment steps on Desk-5439
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies & Execution Order
|
||||||
|
|
||||||
|
### Phase Dependencies
|
||||||
|
|
||||||
|
- **Setup (Phase 1)**: No dependencies - can start immediately
|
||||||
|
- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
|
||||||
|
- **User Stories (Phase 3-6)**: All depend on Foundational phase completion
|
||||||
|
- User Stories 1-4 (P1, P1, P2, P2) can proceed in parallel after Phase 2
|
||||||
|
- User Story 5 (P3) is BLOCKED until ADR-041 consolidation completes
|
||||||
|
- **Remove /normalize (Phase 8)**: Can run in parallel with user stories (no dependencies)
|
||||||
|
- **Polish (Phase 9)**: Depends on all desired user stories being complete
|
||||||
|
|
||||||
|
### User Story Dependencies
|
||||||
|
|
||||||
|
- **User Story 1 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
|
||||||
|
- **User Story 2 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
|
||||||
|
- **User Story 3 (P2)**: Can start after Foundational (Phase 2) - No dependencies on other stories
|
||||||
|
- **User Story 4 (P2)**: Can start after Foundational (Phase 2) - No dependencies on other stories
|
||||||
|
- **User Story 5 (P3)**: BLOCKED until ADR-041 consolidation completes
|
||||||
|
|
||||||
|
### Within Each User Story
|
||||||
|
|
||||||
|
- Tests MUST be written and FAIL before implementation (TDD approach)
|
||||||
|
- Sidecar implementation before backend implementation (for parameter governance story)
|
||||||
|
- Core implementation before integration
|
||||||
|
- Story complete before moving to next priority
|
||||||
|
|
||||||
|
### Parallel Opportunities
|
||||||
|
|
||||||
|
- All Setup tasks (T001-T003) can run in parallel
|
||||||
|
- All Foundational tasks (T004-T006) can run in parallel
|
||||||
|
- Once Foundational phase completes, User Stories 1-4 can start in parallel (if team capacity allows)
|
||||||
|
- All tests for a user story marked [P] can run in parallel
|
||||||
|
- User Story 5 tasks can run in parallel once ADR-041 consolidation completes
|
||||||
|
- Remove /normalize task (T054-T055) can run in parallel with user stories
|
||||||
|
- Polish tasks (T056-T057) can run in parallel
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Parallel Example: User Story 1
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Launch all tests for User Story 1 together:
|
||||||
|
Task: "Create path traversal test in tests/unit/ocr-sidecar/test_path_traversal.py"
|
||||||
|
Task: "Create API key validation test in tests/unit/ocr-sidecar/test_api_key_validation.py"
|
||||||
|
|
||||||
|
# Launch implementation tasks sequentially (each depends on previous):
|
||||||
|
Task: "Remove hardcoded default API key in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
|
||||||
|
Task: "Add fail-fast check for OCR_SIDECAR_API_KEY environment variable in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
|
||||||
|
Task: "Implement path canonicalization function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Strategy
|
||||||
|
|
||||||
|
### MVP First (User Stories 1-2 Only - Critical Security & GPU Management)
|
||||||
|
|
||||||
|
1. Complete Phase 1: Setup
|
||||||
|
2. Complete Phase 2: Foundational (CRITICAL - blocks all stories)
|
||||||
|
3. Complete Phase 3: User Story 1 (Security Hardening)
|
||||||
|
4. Complete Phase 4: User Story 2 (GPU Resource Management)
|
||||||
|
5. **STOP and VALIDATE**: Test User Stories 1-2 independently
|
||||||
|
6. Deploy/demo if ready
|
||||||
|
|
||||||
|
### Incremental Delivery
|
||||||
|
|
||||||
|
1. Complete Setup + Foundational → Foundation ready
|
||||||
|
2. Add User Story 1 → Test independently → Deploy/Demo (Security MVP!)
|
||||||
|
3. Add User Story 2 → Test independently → Deploy/Demo (GPU Management MVP!)
|
||||||
|
4. Add User Story 3 → Test independently → Deploy/Demo (Parameter Governance)
|
||||||
|
5. Add User Story 4 → Test independently → Deploy/Demo (Async Performance)
|
||||||
|
6. Wait for ADR-041 consolidation → Add User Story 5 → Test independently → Deploy/Demo
|
||||||
|
7. Each story adds value without breaking previous stories
|
||||||
|
|
||||||
|
### Parallel Team Strategy
|
||||||
|
|
||||||
|
With multiple developers:
|
||||||
|
|
||||||
|
1. Team completes Setup + Foundational together
|
||||||
|
2. Once Foundational is done:
|
||||||
|
- Developer A: User Story 1 (Security)
|
||||||
|
- Developer B: User Story 2 (GPU Management)
|
||||||
|
- Developer C: User Story 3 (Parameter Governance)
|
||||||
|
- Developer D: User Story 4 (Async I/O)
|
||||||
|
3. Stories complete and integrate independently
|
||||||
|
4. After ADR-041 consolidation: Developer A/E: User Story 5 (Network Isolation)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- [P] tasks = different files, no dependencies
|
||||||
|
- [Story] label maps task to specific user story for traceability
|
||||||
|
- Each user story should be independently completable and testable
|
||||||
|
- Verify tests fail before implementing
|
||||||
|
- Commit after each task or logical group
|
||||||
|
- Stop at any checkpoint to validate story independently
|
||||||
|
- User Story 5 is BLOCKED until ADR-041 consolidation completes
|
||||||
|
- Phase 7 tasks should NOT be started until ADR-041 cutover is successful
|
||||||
|
- Avoid: vague tasks, same file conflicts, cross-story dependencies that break independence
|
||||||
|
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
# Specification Quality Checklist: Single-Host Server Consolidation
|
||||||
|
|
||||||
|
**Purpose**: Validate specification completeness and quality before proceeding to planning
|
||||||
|
**Created**: 2026-06-20
|
||||||
|
**Feature**: [spec.md](../spec.md)
|
||||||
|
|
||||||
|
## Content Quality
|
||||||
|
|
||||||
|
- [x] No implementation details (languages, frameworks, APIs) — spec focuses on operational outcomes
|
||||||
|
- [x] Focused on user value and business needs — admin/ops workflows clearly defined
|
||||||
|
- [x] Written for non-technical stakeholders — user stories describe journeys, not code
|
||||||
|
- [x] All mandatory sections completed — User Scenarios, Requirements, Success Criteria all filled
|
||||||
|
|
||||||
|
## Requirement Completeness
|
||||||
|
|
||||||
|
- [x] No [NEEDS CLARIFICATION] markers remain — all requirements have clear definitions
|
||||||
|
- [x] Requirements are testable and unambiguous — each FR has measurable acceptance criteria
|
||||||
|
- [x] Success criteria are measurable — SC-001 through SC-010 have specific metrics
|
||||||
|
- [x] Success criteria are technology-agnostic — focus on outcomes (parity, latency, uptime) not tools
|
||||||
|
- [x] All acceptance scenarios are defined — 5 user stories with Given/When/Then scenarios
|
||||||
|
- [x] Edge cases are identified — 7 edge cases covering GPU OOM, RAM, CIFS, SPOF, network, migration failures
|
||||||
|
- [x] Scope is clearly bounded — includes provisioning, migration, cutover, security, decommission
|
||||||
|
- [x] Dependencies and assumptions identified — 7 assumptions documented
|
||||||
|
|
||||||
|
## Feature Readiness
|
||||||
|
|
||||||
|
- [x] All functional requirements have clear acceptance criteria — FR-001 through FR-015 mapped to user stories
|
||||||
|
- [x] User scenarios cover primary flows — P1 (provision) → P2 (migrate) → P3 (cutover) → P4 (security) → P5 (decommission)
|
||||||
|
- [x] Feature meets measurable outcomes defined in Success Criteria — 10 measurable outcomes
|
||||||
|
- [x] No implementation details leak into specification — Docker/tech names are inherent to infra spec but kept at architecture level
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- This is an infrastructure specification based on ADR-041; some technical terms (Docker, CIFS, VRAM) are inherent to the domain
|
||||||
|
- ADR-040 (OCR Sidecar Refactor) is a hard dependency for FR-008 (remove X-API-Key) and FR-009 (GPU VRAM management)
|
||||||
|
- Spec is ready for `/speckit-clarify` or `/speckit-plan`
|
||||||
+69
@@ -0,0 +1,69 @@
|
|||||||
|
# Docker Compose Contract: New Host
|
||||||
|
|
||||||
|
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
|
||||||
|
|
||||||
|
This contract defines the service topology for the consolidated single-host deployment.
|
||||||
|
The actual `docker-compose.new-host.yml` will be created at:
|
||||||
|
`specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
|
||||||
|
|
||||||
|
## Service Topology
|
||||||
|
|
||||||
|
| Service | Image | Networks | LAN Ports | Internal Port | Memory Limit | Depends On |
|
||||||
|
|---------|-------|----------|-----------|---------------|--------------|------------|
|
||||||
|
| ollama | ollama/ollama:latest | dms-internal | none | 11434 | 2G (host) | — |
|
||||||
|
| ocr-sidecar | build (local) | dms-internal | none | 8765 | 1G | ollama |
|
||||||
|
| backend | lcbp3-backend:latest | dms-internal, dms-frontend | 3001→3000 | 3000 | 2G | ollama, ocr-sidecar, redis, mariadb, elasticsearch, qdrant, clamav |
|
||||||
|
| frontend | lcbp3-frontend:latest | dms-frontend | 3000 | 3000 | 1G | backend |
|
||||||
|
| redis | redis:7-alpine | dms-internal | none | 6379 | 1G | — |
|
||||||
|
| mariadb | mariadb:11.8 | dms-internal | none | 3306 | 8G | — |
|
||||||
|
| elasticsearch | elasticsearch:8.11.1 | dms-internal | none | 9200 | 4G | — |
|
||||||
|
| qdrant | qdrant/qdrant:v1.16.1 | dms-internal | none | 6333 | 1G | — |
|
||||||
|
| clamav | clamav/clamav:1.4.4 | dms-internal | none | 3310 | 2G | — |
|
||||||
|
| ollama-metrics | ghcr.io/norskhelsenett/ollama-metrics:latest | dms-internal | 9924 | 9924 | 256M | ollama |
|
||||||
|
|
||||||
|
## Network Topology
|
||||||
|
|
||||||
|
```
|
||||||
|
dms-internal (bridge, no LAN access)
|
||||||
|
├── ollama:11434
|
||||||
|
├── ocr-sidecar:8765
|
||||||
|
├── backend:3000 (also on dms-frontend)
|
||||||
|
├── redis:6379
|
||||||
|
├── mariadb:3306
|
||||||
|
├── elasticsearch:9200
|
||||||
|
├── qdrant:6333
|
||||||
|
├── clamav:3310
|
||||||
|
└── ollama-metrics:9924
|
||||||
|
|
||||||
|
dms-frontend (bridge, LAN published)
|
||||||
|
├── frontend:3000 → LAN:3000
|
||||||
|
├── backend:3000 → LAN:3001 (NPM routes backend.np-dms.work → :3001)
|
||||||
|
└── ollama-metrics:9924 → LAN:9924 (Prometheus scrape target)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment Variables (New)
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|----------|---------|-------------|
|
||||||
|
| ASUSTOR_USER | (required) | CIFS share username |
|
||||||
|
| ASUSTOR_PASS | (required) | CIFS share password |
|
||||||
|
| NEW_HOST_IP | (required) | New host LAN IP for CI/CD deploy target |
|
||||||
|
|
||||||
|
## Environment Variables (Changed from QNAP)
|
||||||
|
|
||||||
|
| Variable | Old Value (QNAP) | New Value (New Host) |
|
||||||
|
|----------|------------------|---------------------|
|
||||||
|
| DB_HOST | mariadb | mariadb (unchanged — Docker DNS) |
|
||||||
|
| REDIS_HOST | cache | redis (service name change) |
|
||||||
|
| ELASTICSEARCH_HOST | search | elasticsearch (service name change) |
|
||||||
|
| QDRANT_HOST | qdrant | qdrant (unchanged) |
|
||||||
|
| OCR_API_URL | http://192.168.10.100:8765 | http://ocr-sidecar:8765 |
|
||||||
|
| OLLAMA_API_URL | http://192.168.10.100:11434 | http://ollama:11434 |
|
||||||
|
| CLAMAV_HOST | clamav | clamav (unchanged) |
|
||||||
|
|
||||||
|
## Removed Environment Variables
|
||||||
|
|
||||||
|
| Variable | Reason |
|
||||||
|
|----------|--------|
|
||||||
|
| OCR_SIDECAR_API_KEY | ADR-040 D5 — network-only auth, no API key needed |
|
||||||
|
| OCR_SIDECAR_UPLOAD_BASE | Still needed but value changes to /mnt/uploads (same) |
|
||||||
@@ -0,0 +1,230 @@
|
|||||||
|
// File: specs/100-Infrastructures/141-server-consolidation/data-model.md
|
||||||
|
// Change Log:
|
||||||
|
// - 2026-06-20: Initial data model for Single-Host Server Consolidation
|
||||||
|
|
||||||
|
# Data Model: Single-Host Server Consolidation
|
||||||
|
|
||||||
|
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
|
||||||
|
|
||||||
|
## Infrastructure Entities
|
||||||
|
|
||||||
|
### 1. Docker Network: dms-internal
|
||||||
|
|
||||||
|
| Attribute | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| name | string | `dms-internal` |
|
||||||
|
| driver | string | `bridge` |
|
||||||
|
| scope | string | local (single host) |
|
||||||
|
| published_ports | none | No ports published to LAN |
|
||||||
|
|
||||||
|
**Members**: ollama, ocr-sidecar, backend, redis, mariadb, elasticsearch, qdrant, clamav, ollama-metrics
|
||||||
|
|
||||||
|
### 2. Docker Network: dms-frontend
|
||||||
|
|
||||||
|
| Attribute | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| name | string | `dms-frontend` |
|
||||||
|
| driver | string | `bridge` |
|
||||||
|
| scope | string | local (single host) |
|
||||||
|
| published_ports | 3000 (frontend), 3001→3000 (backend), 9924 (ollama-metrics) | Only ports published to LAN |
|
||||||
|
|
||||||
|
**Members**: frontend, backend
|
||||||
|
|
||||||
|
### 3. Docker Volume: asustor_uploads
|
||||||
|
|
||||||
|
| Attribute | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| driver | string | `local` |
|
||||||
|
| type | string | `cifs` |
|
||||||
|
| device | string | `//192.168.10.9/np-dms-as/data/uploads` |
|
||||||
|
| mount_options | string | `username=${ASUSTOR_USER},password=${ASUSTOR_PASS},vers=3.0,uid=0,gid=0` |
|
||||||
|
| mount_point (sidecar) | string | `/mnt/uploads` (read-only) |
|
||||||
|
| mount_point (backend) | string | `/app/uploads` (read-write) |
|
||||||
|
|
||||||
|
### 4. Docker Volume: ollama_models
|
||||||
|
|
||||||
|
| Attribute | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| driver | string | `local` (named volume) |
|
||||||
|
| mount_point | string | `/root/.ollama` |
|
||||||
|
| content | string | Ollama model files (np-dms-ai, np-dms-ocr, nomic-embed-text) |
|
||||||
|
|
||||||
|
### 5. Docker Volume: mariadb_data
|
||||||
|
|
||||||
|
| Attribute | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| driver | string | `local` (named volume) |
|
||||||
|
| mount_point | string | `/var/lib/mysql` |
|
||||||
|
| content | string | MariaDB data files (migrated from QNAP) |
|
||||||
|
|
||||||
|
### 6. Docker Volume: es_data
|
||||||
|
|
||||||
|
| Attribute | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| driver | string | `local` (named volume) |
|
||||||
|
| mount_point | string | `/usr/share/elasticsearch/data` |
|
||||||
|
| content | string | Elasticsearch indices (migrated from QNAP) |
|
||||||
|
|
||||||
|
### 7. Docker Volume: redis_data
|
||||||
|
|
||||||
|
| Attribute | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| driver | string | `local` (named volume) |
|
||||||
|
| mount_point | string | `/data` |
|
||||||
|
| content | string | Redis AOF persistence + BullMQ queue data |
|
||||||
|
|
||||||
|
### 8. Docker Volume: qdrant_data
|
||||||
|
|
||||||
|
| Attribute | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| driver | string | `local` (named volume) |
|
||||||
|
| mount_point | string | `/qdrant/storage` |
|
||||||
|
| content | string | Qdrant vector collections |
|
||||||
|
|
||||||
|
## Service Definitions
|
||||||
|
|
||||||
|
### ollama
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| image | `ollama/ollama:latest` |
|
||||||
|
| GPU | NVIDIA RTX 5060 Ti 16GB (passthrough) |
|
||||||
|
| network | dms-internal only |
|
||||||
|
| ports | none (expose 11434 internal only) |
|
||||||
|
| volumes | ollama_models → /root/.ollama |
|
||||||
|
| depends_on | none |
|
||||||
|
| healthcheck | `ollama list` (verify API responsive) |
|
||||||
|
|
||||||
|
### ocr-sidecar
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| build | `./specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar` |
|
||||||
|
| network | dms-internal only |
|
||||||
|
| ports | none (expose 8765 internal only) |
|
||||||
|
| volumes | asustor_uploads → /mnt/uploads (read-only) |
|
||||||
|
| depends_on | ollama |
|
||||||
|
| env | OLLAMA_API_URL=http://ollama:11434, OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads |
|
||||||
|
| healthcheck | `curl -f http://localhost:8765/health` |
|
||||||
|
|
||||||
|
### backend
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| image | `lcbp3-backend:${BACKEND_IMAGE_TAG:-latest}` |
|
||||||
|
| networks | dms-internal + dms-frontend |
|
||||||
|
| ports | 3001:3000 (published to LAN — NPM routes `backend.np-dms.work` → :3001) |
|
||||||
|
| volumes | asustor_uploads → /app/uploads (read-write) |
|
||||||
|
| depends_on | ollama, ocr-sidecar, redis, mariadb, elasticsearch, qdrant, clamav |
|
||||||
|
| env | OCR_API_URL=http://ocr-sidecar:8765, OLLAMA_API_URL=http://ollama:11434, DB_HOST=mariadb, REDIS_HOST=redis, ELASTICSEARCH_HOST=elasticsearch, QDRANT_HOST=qdrant |
|
||||||
|
| healthcheck | `curl -f http://localhost:3000/health` |
|
||||||
|
| memory_limit | 2G |
|
||||||
|
|
||||||
|
### frontend
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| image | `lcbp3-frontend:${FRONTEND_IMAGE_TAG:-latest}` |
|
||||||
|
| networks | dms-frontend only |
|
||||||
|
| ports | 3000:3000 (published to LAN) |
|
||||||
|
| depends_on | backend |
|
||||||
|
| env | INTERNAL_API_URL=http://backend:3000/api |
|
||||||
|
| healthcheck | `curl -f http://localhost:3000/` |
|
||||||
|
| memory_limit | 1G |
|
||||||
|
|
||||||
|
### redis
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| image | `redis:7-alpine` |
|
||||||
|
| network | dms-internal only |
|
||||||
|
| ports | none (expose 6379 internal only) |
|
||||||
|
| volumes | redis_data → /data |
|
||||||
|
| command | `redis-server --requirepass ${REDIS_PASSWORD} --appendonly yes --maxmemory-policy noeviction` |
|
||||||
|
| healthcheck | `redis-cli -a ${REDIS_PASSWORD} --no-auth-warning ping` |
|
||||||
|
| memory_limit | 1G |
|
||||||
|
|
||||||
|
### mariadb
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| image | `mariadb:11.8` |
|
||||||
|
| network | dms-internal only |
|
||||||
|
| ports | none (expose 3306 internal only) |
|
||||||
|
| volumes | mariadb_data → /var/lib/mysql |
|
||||||
|
| env | MARIADB_ROOT_PASSWORD, MARIADB_DATABASE=lcbp3, MARIADB_USER=center |
|
||||||
|
| command | `--character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci` |
|
||||||
|
| healthcheck | `healthcheck.sh --connect --innodb_initialized` |
|
||||||
|
| memory_limit | 8G |
|
||||||
|
|
||||||
|
### elasticsearch
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| image | `elasticsearch:8.11.1` |
|
||||||
|
| network | dms-internal only |
|
||||||
|
| ports | none (expose 9200 internal only) |
|
||||||
|
| volumes | es_data → /usr/share/elasticsearch/data |
|
||||||
|
| env | discovery.type=single-node, xpack.security.enabled=false, ES_JAVA_OPTS=-Xms2g -Xmx2g |
|
||||||
|
| healthcheck | `curl -s http://localhost:9200/_cluster/health` |
|
||||||
|
| memory_limit | 4G |
|
||||||
|
|
||||||
|
### qdrant
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| image | `qdrant/qdrant:v1.16.1` |
|
||||||
|
| network | dms-internal only |
|
||||||
|
| ports | none (expose 6333 internal only) |
|
||||||
|
| volumes | qdrant_data → /qdrant/storage |
|
||||||
|
| healthcheck | TCP check on port 6333 |
|
||||||
|
| memory_limit | 1G |
|
||||||
|
|
||||||
|
### clamav
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| image | `clamav/clamav:1.4.4` |
|
||||||
|
| network | dms-internal only |
|
||||||
|
| ports | none (expose 3310 internal only) |
|
||||||
|
| healthcheck | `clamdcheck.sh` |
|
||||||
|
| memory_limit | 2G |
|
||||||
|
|
||||||
|
### ollama-metrics
|
||||||
|
|
||||||
|
| Attribute | Value |
|
||||||
|
|-----------|-------|
|
||||||
|
| image | `ghcr.io/norskhelsenett/ollama-metrics:latest` |
|
||||||
|
| network | dms-internal only |
|
||||||
|
| ports | 9924:9924 (published to LAN — Prometheus on ASUSTOR scrapes `http://<new-host-ip>:9924/metrics`) |
|
||||||
|
| env | OLLAMA_HOST=http://ollama:11434 |
|
||||||
|
| memory_limit | 256M |
|
||||||
|
|
||||||
|
## Service Communication Map
|
||||||
|
|
||||||
|
```
|
||||||
|
LAN (VLAN 10)
|
||||||
|
│
|
||||||
|
├── :3000 (Frontend) ──→ http://backend:3000/api (dms-frontend)
|
||||||
|
├── :3001 (Backend) ──→ http://backend:3000/api (dms-frontend)
|
||||||
|
└── :9924 (ollama-metrics) ──→ Prometheus scrape target
|
||||||
|
│
|
||||||
|
├──→ mariadb:3306 (dms-internal)
|
||||||
|
├──→ redis:6379 (dms-internal)
|
||||||
|
├──→ elasticsearch:9200 (dms-internal)
|
||||||
|
├──→ qdrant:6333 (dms-internal)
|
||||||
|
├──→ clamav:3310 (dms-internal)
|
||||||
|
├──→ ocr-sidecar:8765 (dms-internal)
|
||||||
|
└──→ ollama:11434 (dms-internal)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Path Mapping
|
||||||
|
|
||||||
|
| Service | Container Path | Source |
|
||||||
|
|---------|---------------|--------|
|
||||||
|
| Backend | `/app/uploads/temp` | ASUSTOR CIFS `/data/uploads/temp` |
|
||||||
|
| Backend | `/app/uploads/permanent` | ASUSTOR CIFS `/data/uploads/permanent` |
|
||||||
|
| Sidecar | `/mnt/uploads/temp` (read-only) | ASUSTOR CIFS `/data/uploads/temp` |
|
||||||
|
| Sidecar | `/mnt/uploads/permanent` (read-only) | ASUSTOR CIFS `/data/uploads/permanent` |
|
||||||
|
|
||||||
|
**Note**: Backend uses `/app/uploads` (read-write), Sidecar uses `/mnt/uploads` (read-only). Both map to the same ASUSTOR CIFS share. Path remapping in `ocr.service.ts` (`remapPath()`) continues to work — strip `/app/uploads` and replace with `/mnt/uploads`.
|
||||||
@@ -0,0 +1,124 @@
|
|||||||
|
// File: specs/100-Infrastructures/141-server-consolidation/plan.md
|
||||||
|
// Change Log:
|
||||||
|
// - 2026-06-20: Initial implementation plan for Single-Host Server Consolidation
|
||||||
|
|
||||||
|
# Implementation Plan: Single-Host Server Consolidation
|
||||||
|
|
||||||
|
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20 | **Spec**: [spec.md](./spec.md)
|
||||||
|
**Input**: Feature specification from `/specs/100-Infrastructures/141-server-consolidation/spec.md`
|
||||||
|
**Related ADRs**: [ADR-041](../../06-Decision-Records/ADR-041-server-consolidation.md), [ADR-040](../../06-Decision-Records/ADR-040-ocr-sidecar-refactor.md)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Consolidate all LCBP3-DMS services from a 2-host architecture (QNAP NAS + Desk-5439) onto a single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB). ASUSTOR becomes primary NAS for file storage via CIFS. Docker internal bridge network isolates Ollama and OCR Sidecar from LAN, enabling removal of X-API-Key auth (ADR-040 D5). QNAP becomes backup server; Desk-5439 is retired.
|
||||||
|
|
||||||
|
## Technical Context
|
||||||
|
|
||||||
|
**Language/Version**: Docker Compose v2 (YAML), Bash scripts, PowerShell provisioning
|
||||||
|
**Primary Dependencies**: Docker Engine 24+, Docker Compose v2, NVIDIA Container Toolkit, CIFS Utils
|
||||||
|
**Storage**: MariaDB 11.8 (Docker volume), Elasticsearch 8.11 (Docker volume), Redis 7 (Docker volume), Qdrant v1.16 (Docker volume), ASUSTOR CIFS for file uploads
|
||||||
|
**Testing**: Smoke tests (manual + scripted), health check endpoints, data parity verification scripts
|
||||||
|
**Target Platform**: Linux (Ubuntu 22.04 LTS or Debian 12) on Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB
|
||||||
|
**Project Type**: Infrastructure (Docker Compose stack + provisioning scripts)
|
||||||
|
**Performance Goals**: Backend-to-Ollama latency <50ms (localhost vs ~2ms LAN), all containers healthy within 5 min
|
||||||
|
**Constraints**: 32GB RAM total (target <28GB usage), 16GB VRAM (target <15GB usage), CIFS mount reliability
|
||||||
|
**Scale/Scope**: 8 containers (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, ES, Qdrant) + ClamAV + ollama-metrics
|
||||||
|
|
||||||
|
## Constitution Check
|
||||||
|
|
||||||
|
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
|
||||||
|
|
||||||
|
| Principle | Status | Notes |
|
||||||
|
|-----------|--------|-------|
|
||||||
|
| ADR-016 Security | ✅ Pass | Network isolation replaces API key; no ports published for internal services |
|
||||||
|
| ADR-019 UUID | ✅ Pass | No UUID changes — infrastructure only |
|
||||||
|
| ADR-009 Schema | ✅ Pass | No schema changes — data migration via dump/restore |
|
||||||
|
| ADR-023/023A AI Boundary | ✅ Pass | Ollama isolated on Docker internal network; no direct DB/storage access |
|
||||||
|
| ADR-040 D5 Network Auth | ✅ Pass | Docker bridge isolation enables X-API-Key removal |
|
||||||
|
| ADR-008 BullMQ | ✅ Pass | Redis co-located on same host; queue behavior unchanged |
|
||||||
|
| ADR-002 Document Numbering | ✅ Pass | Redis Redlock unchanged; co-located reduces lock latency |
|
||||||
|
| SPOF Risk | ⚠️ Acknowledged | Single host = SPOF; mitigated by QNAP backup + DR plan |
|
||||||
|
|
||||||
|
**Gate Result**: PASS — no violations. SPOF risk is acknowledged in ADR-041 with mitigation plan.
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
### Documentation (this feature)
|
||||||
|
|
||||||
|
```text
|
||||||
|
specs/100-Infrastructures/141-server-consolidation/
|
||||||
|
├── spec.md # Feature specification
|
||||||
|
├── plan.md # This file
|
||||||
|
├── research.md # Phase 0 output — research findings
|
||||||
|
├── data-model.md # Phase 1 output — infrastructure data model
|
||||||
|
├── quickstart.md # Phase 1 output — deployment guide
|
||||||
|
├── contracts/ # Phase 1 output — docker-compose contracts
|
||||||
|
│ └── docker-compose.new-host.yml
|
||||||
|
├── checklists/
|
||||||
|
│ └── requirements.md # Spec quality checklist
|
||||||
|
└── tasks.md # Phase 2 output (/speckit.tasks command)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Source Code (repository root)
|
||||||
|
|
||||||
|
```text
|
||||||
|
specs/04-Infrastructure-OPS/04-00-docker-compose/
|
||||||
|
├── New-Host/ # NEW — consolidated host
|
||||||
|
│ ├── docker-compose.new-host.yml # Unified compose for all 8+ services
|
||||||
|
│ ├── .env.template # Environment template for new host
|
||||||
|
│ ├── ocr-sidecar/ # Sidecar (copied from Desk-5439, adapted)
|
||||||
|
│ │ ├── Dockerfile
|
||||||
|
│ │ ├── app.py
|
||||||
|
│ │ └── requirements.txt
|
||||||
|
│ ├── scripts/
|
||||||
|
│ │ ├── provision-host.sh # OS prep + Docker + NVIDIA toolkit
|
||||||
|
│ │ ├── migrate-mariadb.sh # Dump from QNAP → restore to new host
|
||||||
|
│ │ ├── migrate-elasticsearch.sh # Snapshot from QNAP → restore to new host
|
||||||
|
│ │ ├── smoke-test.sh # Post-cutover verification
|
||||||
|
│ │ └── rollback.sh # Emergency rollback to QNAP + Desk-5439
|
||||||
|
│ └── README.md # Deployment guide for new host
|
||||||
|
├── QNAP/ # EXISTING — becomes backup
|
||||||
|
├── Desk-5439/ # EXISTING — retired after cutover
|
||||||
|
└── ASUSTOR/ # EXISTING — Gitea runner stays
|
||||||
|
```
|
||||||
|
|
||||||
|
**Structure Decision**: New `New-Host/` directory under existing `04-00-docker-compose/` follows the established per-host directory pattern (QNAP/, Desk-5439/, ASUSTOR/). The unified compose file replaces the split QNAP/app + QNAP/service + QNAP/mariadb + Desk-5439/ocr-sidecar pattern with a single stack.
|
||||||
|
|
||||||
|
## Complexity Tracking
|
||||||
|
|
||||||
|
> No constitution check violations — table not needed.
|
||||||
|
|
||||||
|
## Implementation Phases
|
||||||
|
|
||||||
|
### Phase 1: Provision New Host (T001-T002)
|
||||||
|
- Install Ubuntu 22.04 LTS / Debian 12
|
||||||
|
- Install Docker Engine + Docker Compose v2
|
||||||
|
- Install NVIDIA drivers + nvidia-container-toolkit
|
||||||
|
- Mount ASUSTOR CIFS share to `/mnt/uploads`
|
||||||
|
- Create directory structure for Docker volumes
|
||||||
|
|
||||||
|
### Phase 2: Create Unified Docker Compose (T003-T005)
|
||||||
|
- Write `docker-compose.new-host.yml` with all services
|
||||||
|
- Configure `dms-internal` bridge network (no LAN publish for Ollama/sidecar)
|
||||||
|
- Configure `dms-frontend` bridge network (Frontend + Backend published)
|
||||||
|
- Copy OCR sidecar code from Desk-5439, adapt for Docker-internal Ollama URL
|
||||||
|
- Configure per-container memory limits per ADR-041 D5
|
||||||
|
|
||||||
|
### Phase 3: Migrate Data (T006-T007)
|
||||||
|
- Dump MariaDB from QNAP → restore to new host container
|
||||||
|
- Snapshot Elasticsearch from QNAP → restore to new host container
|
||||||
|
- Verify row count + document count parity
|
||||||
|
- Verify CIFS file access from backend container
|
||||||
|
|
||||||
|
### Phase 4: Cutover (T008-T010)
|
||||||
|
- Update Gitea CI/CD deploy target to new host
|
||||||
|
- Deploy services on new host
|
||||||
|
- Run smoke tests (login, document CRUD, OCR, AI, search)
|
||||||
|
- Remove X-API-Key from sidecar + backend (ADR-040 D5)
|
||||||
|
- Update DNS/NPM to point to new host
|
||||||
|
|
||||||
|
### Phase 5: Decommission (T011-T012)
|
||||||
|
- Stop services on QNAP (retain data for backup)
|
||||||
|
- Retire Desk-5439 (power off or repurpose)
|
||||||
|
- Monitor RAM/VRAM for 24-48 hours
|
||||||
|
- Document rollback procedure
|
||||||
@@ -0,0 +1,154 @@
|
|||||||
|
// File: specs/100-Infrastructures/141-server-consolidation/quickstart.md
|
||||||
|
// Change Log:
|
||||||
|
// - 2026-06-20: Initial quickstart guide for Single-Host Server Consolidation
|
||||||
|
|
||||||
|
# Quickstart: Single-Host Server Consolidation
|
||||||
|
|
||||||
|
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- New host with Ubuntu 22.04 LTS or Debian 12 installed
|
||||||
|
- Ryzen 5 5600 / 32GB RAM / RTX 5060 Ti 16GB
|
||||||
|
- Network access to VLAN 10 (192.168.10.x)
|
||||||
|
- ASUSTOR NAS accessible at 192.168.10.9 with CIFS share `np-dms-as`
|
||||||
|
- SSH access to QNAP (192.168.10.8) for data migration
|
||||||
|
- Gitea CI/CD access for deploy target update
|
||||||
|
|
||||||
|
## Step 1: Provision Host
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run on new host (as root or sudo user)
|
||||||
|
cd /opt/lcbp3
|
||||||
|
bash specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/provision-host.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
This script:
|
||||||
|
1. Installs Docker Engine + Docker Compose v2
|
||||||
|
2. Installs NVIDIA drivers + nvidia-container-toolkit
|
||||||
|
3. Creates CIFS mount for ASUSTOR at `/mnt/uploads`
|
||||||
|
4. Creates Docker volume directories
|
||||||
|
5. Verifies GPU access with `nvidia-smi`
|
||||||
|
|
||||||
|
## Step 2: Prepare .env
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/lcbp3/specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host
|
||||||
|
cp .env.template .env
|
||||||
|
# Edit .env with real values:
|
||||||
|
# - ASUSTOR_USER, ASUSTOR_PASS (CIFS credentials)
|
||||||
|
# - DB_PASSWORD, DB_ROOT_PASSWORD (from QNAP .env)
|
||||||
|
# - REDIS_PASSWORD (from QNAP .env)
|
||||||
|
# - JWT_SECRET, JWT_REFRESH_SECRET (from QNAP .env)
|
||||||
|
# - AUTH_SECRET (from QNAP .env)
|
||||||
|
# - ELASTICSEARCH_PASSWORD (from QNAP .env)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 3: Migrate Data
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Migrate MariaDB (from QNAP to new host)
|
||||||
|
bash scripts/migrate-mariadb.sh
|
||||||
|
|
||||||
|
# Migrate Elasticsearch (from QNAP to new host)
|
||||||
|
bash scripts/migrate-elasticsearch.sh
|
||||||
|
|
||||||
|
# Verify parity
|
||||||
|
bash scripts/verify-data-parity.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 4: Deploy Services
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Pull latest images from Gitea registry
|
||||||
|
docker compose --env-file .env -f docker-compose.new-host.yml pull
|
||||||
|
|
||||||
|
# Start all services
|
||||||
|
docker compose --env-file .env -f docker-compose.new-host.yml up -d
|
||||||
|
|
||||||
|
# Check health
|
||||||
|
docker compose -f docker-compose.new-host.yml ps
|
||||||
|
docker compose -f docker-compose.new-host.yml logs --tail=50
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 5: Smoke Test
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run smoke tests
|
||||||
|
bash scripts/smoke-test.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Smoke tests verify:
|
||||||
|
- Backend health check (`GET http://localhost:3001/health`)
|
||||||
|
- Frontend accessible (`GET http://localhost:3000/`)
|
||||||
|
- Login flow (POST /api/auth/login)
|
||||||
|
- Document list (GET /api/correspondences)
|
||||||
|
- OCR endpoint (POST /api/ai/sandbox/ocr)
|
||||||
|
- AI inference (POST /api/ai/sandbox/extract)
|
||||||
|
- Full-text search (GET /api/search)
|
||||||
|
|
||||||
|
## Step 6: Update CI/CD
|
||||||
|
|
||||||
|
Update Gitea secrets:
|
||||||
|
- `HOST` → new host IP (e.g., `192.168.10.50`)
|
||||||
|
- `COMPOSE_FILE` → `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
|
||||||
|
|
||||||
|
## Step 7: Cutover DNS
|
||||||
|
|
||||||
|
Update NPM (Nginx Proxy Manager) on QNAP:
|
||||||
|
- `lcbp3.np-dms.work` → new host IP
|
||||||
|
- `backend.np-dms.work` → new host IP
|
||||||
|
|
||||||
|
## Step 8: Remove X-API-Key (ADR-040 D5)
|
||||||
|
|
||||||
|
After verifying Docker-internal network isolation:
|
||||||
|
1. Remove `OCR_SIDECAR_API_KEY` from sidecar environment
|
||||||
|
2. Remove API key validation from `app.py`
|
||||||
|
3. Remove `X-API-Key` header from backend `ocr.service.ts`
|
||||||
|
4. Rebuild and redeploy sidecar + backend
|
||||||
|
|
||||||
|
## Step 9: Monitor (24-48 hours)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Monitor RAM usage
|
||||||
|
docker stats --no-stream
|
||||||
|
|
||||||
|
# Monitor VRAM usage
|
||||||
|
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 60
|
||||||
|
|
||||||
|
# Monitor container health
|
||||||
|
watch -n 30 'docker compose -f docker-compose.new-host.yml ps'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 10: Decommission Old Hosts
|
||||||
|
|
||||||
|
After 24-48 hours of stable operation:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stop QNAP services (retain data for backup)
|
||||||
|
ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose down'
|
||||||
|
ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose down'
|
||||||
|
|
||||||
|
# Power off Desk-5439
|
||||||
|
ssh user@192.168.10.100 'sudo shutdown -h now'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Rollback (Emergency)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stop new host services
|
||||||
|
docker compose -f docker-compose.new-host.yml down
|
||||||
|
|
||||||
|
# Restore QNAP services
|
||||||
|
ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose up -d'
|
||||||
|
ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose up -d'
|
||||||
|
|
||||||
|
# Restore Desk-5439 services
|
||||||
|
ssh user@192.168.10.100 'cd /opt/ocr-sidecar && docker compose up -d'
|
||||||
|
|
||||||
|
# Revert DNS
|
||||||
|
# Update NPM to point back to QNAP (192.168.10.8)
|
||||||
|
|
||||||
|
# Revert CI/CD
|
||||||
|
# Update Gitea secrets HOST back to 192.168.10.8
|
||||||
|
```
|
||||||
@@ -0,0 +1,139 @@
|
|||||||
|
// File: specs/100-Infrastructures/141-server-consolidation/research.md
|
||||||
|
// Change Log:
|
||||||
|
// - 2026-06-20: Initial research for Single-Host Server Consolidation
|
||||||
|
|
||||||
|
# Research: Single-Host Server Consolidation
|
||||||
|
|
||||||
|
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
|
||||||
|
|
||||||
|
## R1: Docker Network Isolation Strategy
|
||||||
|
|
||||||
|
**Decision**: Use two Docker bridge networks — `dms-internal` (all services) and `dms-frontend` (Frontend + Backend only, for LAN publish).
|
||||||
|
|
||||||
|
**Rationale**: Docker bridge networks provide L2 isolation. Services on `dms-internal` without `ports` mapping are unreachable from LAN. Only Frontend (3000) and Backend (3000) need LAN access. This replaces VLAN/firewall ACL reliance with Docker-native isolation.
|
||||||
|
|
||||||
|
**Alternatives Considered**:
|
||||||
|
- Single bridge network + iptables rules — more complex, error-prone
|
||||||
|
- Docker Swarm overlay network — overkill for single host
|
||||||
|
- Host network mode — no isolation, security risk
|
||||||
|
|
||||||
|
## R2: CIFS Mount Strategy for ASUSTOR
|
||||||
|
|
||||||
|
**Decision**: Use Docker named volume with CIFS driver to mount ASUSTOR share `//192.168.10.9/np-dms-as/data/uploads` as `asustor_uploads` volume, mounted at `/mnt/uploads` in sidecar and `/app/uploads` in backend.
|
||||||
|
|
||||||
|
**Rationale**: Docker CIFS volume driver handles mount lifecycle with container start/stop. Credentials in `.env` (gitignored). Both backend and sidecar see the same files via the same CIFS mount point.
|
||||||
|
|
||||||
|
**Alternatives Considered**:
|
||||||
|
- Host-level `mount -t cifs` then bind mount — requires host OS config, not portable
|
||||||
|
- SSHFS — slower than CIFS for file operations
|
||||||
|
- Sync files to local SSD — adds complexity, storage duplication
|
||||||
|
|
||||||
|
**Key Consideration**: Previous Desk-5439 setup had issues with Docker Desktop WSL2 + CIFS (see memory). On Linux host, CIFS volume driver works natively without WSL2 layer.
|
||||||
|
|
||||||
|
## R3: MariaDB Migration Strategy
|
||||||
|
|
||||||
|
**Decision**: Use `mariadb-dump` (logical dump) from QNAP MariaDB 11.8, pipe directly to new host MariaDB 11.8 container.
|
||||||
|
|
||||||
|
**Rationale**: Same MariaDB version (11.8) on both hosts → logical dump is safest. Database is small enough (<10GB estimated) that dump/restore completes within maintenance window.
|
||||||
|
|
||||||
|
**Alternatives Considered**:
|
||||||
|
- `mariabackup` (physical backup) — faster but requires same filesystem layout
|
||||||
|
- Replication (binlog) — overkill for one-time migration
|
||||||
|
- Copy raw data files — risky, requires same version + config
|
||||||
|
|
||||||
|
**Migration Command**:
|
||||||
|
```bash
|
||||||
|
# From QNAP (source) — dump all databases
|
||||||
|
mariadb-dump --single-transaction --routines --triggers \
|
||||||
|
-h 127.0.0.1 -u root -p"$DB_ROOT_PASSWORD" \
|
||||||
|
--all-databases > qnap-full-dump.sql
|
||||||
|
|
||||||
|
# On new host — restore
|
||||||
|
docker exec -i lcbp3-mariadb mariadb -u root -p"$DB_ROOT_PASSWORD" < qnap-full-dump.sql
|
||||||
|
```
|
||||||
|
|
||||||
|
## R4: Elasticsearch Migration Strategy
|
||||||
|
|
||||||
|
**Decision**: Use ES snapshot/restore API — create snapshot on QNAP ES, transfer to new host, restore.
|
||||||
|
|
||||||
|
**Rationale**: ES snapshot API is the official migration path. Handles index mappings, settings, and data. Works across same ES version (8.11.x).
|
||||||
|
|
||||||
|
**Alternatives Considered**:
|
||||||
|
- Copy raw data directory — risky, requires identical ES config
|
||||||
|
- Re-index from MariaDB — slow, loses search index tuning
|
||||||
|
- Logstash pipeline — overkill for one-time migration
|
||||||
|
|
||||||
|
**Migration Steps**:
|
||||||
|
1. Register shared filesystem repo on QNAP ES
|
||||||
|
2. Create snapshot of all indices
|
||||||
|
3. Copy snapshot files to new host ES data volume
|
||||||
|
4. Register repo on new host ES
|
||||||
|
5. Restore snapshot
|
||||||
|
|
||||||
|
## R5: GPU VRAM Management on Single Host
|
||||||
|
|
||||||
|
**Decision**: Rely on ADR-040 D3 (Adaptive OCR Residency via `calculate_ocr_residency()`) and ADR-040 D4 (CPU Fallback Retrieval). LLM-First GPU Ownership from CONTEXT.md.
|
||||||
|
|
||||||
|
**Rationale**: RTX 5060 Ti 16GB must serve:
|
||||||
|
- np-dms-ai (Typhoon-2.5 ~7-8B): ~6-8GB VRAM
|
||||||
|
- np-dms-ocr (Typhoon OCR): ~5GB VRAM
|
||||||
|
- nomic-embed-text: ~0.5GB VRAM
|
||||||
|
- CUDA overhead: ~1.5GB
|
||||||
|
- Total: ~13-15GB → tight but feasible with adaptive residency
|
||||||
|
|
||||||
|
**Key Policy**: When LLM (np-dms-ai) needs to load, OCR model is unloaded first (`keep_alive=0` for OCR). BGE-M3 + Reranker use CPU fallback when GPU is occupied.
|
||||||
|
|
||||||
|
**Alternatives Considered**:
|
||||||
|
- Force GPU-resident for all models — OOM risk (15.5GB > 16GB with overhead)
|
||||||
|
- CPU-only for all AI — too slow for production
|
||||||
|
- Second GPU — not available on new host
|
||||||
|
|
||||||
|
## R6: RAM Budget Allocation
|
||||||
|
|
||||||
|
**Decision**: Per-container memory limits in Docker Compose:
|
||||||
|
|
||||||
|
| Service | Memory Limit | Notes |
|
||||||
|
|---------|-------------|-------|
|
||||||
|
| MariaDB | 8G | Largest consumer, tune innodb_buffer_pool |
|
||||||
|
| Elasticsearch | 4G | ES_JAVA_OPTS=-Xms2g -Xmx2g |
|
||||||
|
| Backend (NestJS) | 2G | Node.js + BullMQ workers |
|
||||||
|
| Frontend (Next.js) | 1G | Standalone mode |
|
||||||
|
| Redis | 1G | In-memory + AOF |
|
||||||
|
| Qdrant | 1G | Vector DB |
|
||||||
|
| OCR Sidecar | 1G | Python + PyMuPDF |
|
||||||
|
| Ollama | 2G | Model loading + inference |
|
||||||
|
| ClamAV | 2G | Virus definitions |
|
||||||
|
| ollama-metrics | 256M | Lightweight proxy |
|
||||||
|
| **Total** | **~22.3G** | Leaves ~9.7G for OS + swap |
|
||||||
|
|
||||||
|
**Rationale**: 32GB total - 22.3GB containers = ~9.7GB for OS kernel + page cache + swap. Comfortable margin.
|
||||||
|
|
||||||
|
**Alternatives Considered**:
|
||||||
|
- No limits — risk of OOM killer affecting critical services
|
||||||
|
- Tighter limits — may cause ES/MariaDB instability
|
||||||
|
|
||||||
|
## R7: CI/CD Pipeline Update
|
||||||
|
|
||||||
|
**Decision**: Update Gitea Actions `ci-deploy.yml` to SSH-deploy to new host IP instead of QNAP IP. ASUSTOR Gitea runner stays unchanged.
|
||||||
|
|
||||||
|
**Rationale**: Gitea runner on ASUSTOR (192.168.10.9) can reach new host via VLAN 10. Only the deploy target IP changes. `deploy.sh` path to compose file updates to `New-Host/docker-compose.new-host.yml`.
|
||||||
|
|
||||||
|
**Alternatives Considered**:
|
||||||
|
- Move Gitea runner to new host — unnecessary, runner works remotely
|
||||||
|
- Manual deployment — not sustainable for ongoing releases
|
||||||
|
|
||||||
|
## R8: Rollback Strategy
|
||||||
|
|
||||||
|
**Decision**: Multi-step rollback plan documented in `rollback.sh`:
|
||||||
|
1. Stop services on new host (`docker compose down`)
|
||||||
|
2. Restore services on QNAP (start existing containers with old data)
|
||||||
|
3. Restore services on Desk-5439 (start Ollama + sidecar)
|
||||||
|
4. Revert DNS/NPM to point to QNAP
|
||||||
|
5. Revert Gitea CI/CD deploy target to QNAP
|
||||||
|
6. Re-enable X-API-Key in sidecar + backend
|
||||||
|
|
||||||
|
**Rationale**: QNAP retains all data (MariaDB, ES, Redis, files) until verified stable. Rollback is fast (<2 hours) because old infrastructure is intact.
|
||||||
|
|
||||||
|
**Alternatives Considered**:
|
||||||
|
- No rollback (accept SPOF) — too risky for production DMS
|
||||||
|
- Hot failover with replication — overkill for current scale
|
||||||
@@ -0,0 +1,160 @@
|
|||||||
|
// File: specs/100-Infrastructures/141-server-consolidation/spec.md
|
||||||
|
// Change Log:
|
||||||
|
// - 2026-06-20: Initial specification for Single-Host Server Consolidation (ADR-041)
|
||||||
|
|
||||||
|
# Feature Specification: Single-Host Server Consolidation
|
||||||
|
|
||||||
|
**Feature Branch**: `141-server-consolidation`
|
||||||
|
**Created**: 2026-06-20
|
||||||
|
**Status**: Draft
|
||||||
|
**Category**: 100-Infrastructures
|
||||||
|
**Input**: ADR-041 — Consolidate all LCBP3-DMS services onto a single Docker host with ASUSTOR as primary NAS.
|
||||||
|
**Related ADRs**: [ADR-041](../../06-Decision-Records/ADR-041-server-consolidation.md), [ADR-040](../../06-Decision-Records/ADR-040-ocr-sidecar-refactor.md), [ADR-016](../../06-Decision-Records/ADR-016-security-authentication.md), [ADR-023A](../../06-Decision-Records/ADR-023A-unified-ai-architecture.md), [ADR-034](../../06-Decision-Records/ADR-034-AI-model-change.md)
|
||||||
|
|
||||||
|
## User Scenarios & Testing _(mandatory)_
|
||||||
|
|
||||||
|
### User Story 1 - Provision and Deploy on New Host (Priority: P1)
|
||||||
|
|
||||||
|
System administrator provisions the new single host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB), installs Docker, mounts CIFS share from ASUSTOR, and deploys all services (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, Elasticsearch) using a single Docker Compose stack with internal bridge network isolation.
|
||||||
|
|
||||||
|
**Why this priority**: Without a running host, no other work can proceed. This is the foundation for all subsequent stories.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by running `docker compose up` on the new host and verifying all containers are healthy via `docker ps` and health check endpoints.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** a fresh OS installation on the new host, **When** the administrator runs the provisioning script, **Then** Docker Engine and Docker Compose are installed and verified with `docker --version`
|
||||||
|
2. **Given** Docker is installed, **When** the administrator mounts the ASUSTOR CIFS share, **Then** `/mnt/uploads/temp` and `/mnt/uploads/permanent` are accessible and writable by containers
|
||||||
|
3. **Given** CIFS mounts are ready, **When** the administrator runs `docker compose up -d`, **Then** all 7 service containers start and report healthy within 5 minutes
|
||||||
|
4. **Given** all containers are running, **When** the administrator checks network isolation, **Then** Ollama and OCR Sidecar ports are NOT accessible from LAN (only Frontend port 3000 and Backend port 3000 are published)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 2 - Migrate Data from QNAP to New Host (Priority: P2)
|
||||||
|
|
||||||
|
Database administrator migrates MariaDB data and Elasticsearch indices from QNAP to the new host, ensuring zero data loss and minimal downtime.
|
||||||
|
|
||||||
|
**Why this priority**: Data migration is the critical path for cutover. Without migrated data, the new host cannot serve production traffic.
|
||||||
|
|
||||||
|
**Independent Test**: Can be tested by comparing row counts and index document counts between source (QNAP) and destination (new host) after migration.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** the new host is running with empty MariaDB, **When** the administrator performs a database dump-and-restore from QNAP, **Then** all tables and row counts match the source exactly
|
||||||
|
2. **Given** the new host is running with empty Elasticsearch, **When** the administrator migrates indices from QNAP, **Then** all index document counts match the source exactly
|
||||||
|
3. **Given** data migration is complete, **When** the administrator runs a data integrity check script, **Then** all critical tables pass checksum verification with zero discrepancies
|
||||||
|
4. **Given** file storage is on ASUSTOR CIFS mount, **When** the administrator verifies file access from the backend container, **Then** all existing uploaded files are accessible at the expected paths
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 3 - Cutover and Smoke Test (Priority: P3)
|
||||||
|
|
||||||
|
Operations team performs the cutover from the old 2-host architecture (QNAP + Desk-5439) to the new single host, updates DNS/network routing, and runs smoke tests to verify all system functions work end-to-end.
|
||||||
|
|
||||||
|
**Why this priority**: Cutover is the final step that makes the new host production-active. It depends on P1 and P2 being complete.
|
||||||
|
|
||||||
|
**Independent Test**: Can be tested by accessing the application via the new host's IP/hostname and performing core DMS operations (login, document upload, search, AI inference).
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** data migration is verified, **When** the administrator updates DNS to point to the new host, **Then** users accessing the application URL reach the new host within the DNS TTL period
|
||||||
|
2. **Given** DNS is updated, **When** a user logs in and creates a new Correspondence, **Then** the document is saved successfully and visible in the list
|
||||||
|
3. **Given** the system is live on the new host, **When** a user uploads a PDF and triggers OCR, **Then** OCR text extraction completes successfully via the internal Docker network (sidecar → Ollama)
|
||||||
|
4. **Given** the system is live, **When** a user performs a full-text search, **Then** Elasticsearch returns results with the same accuracy as before migration
|
||||||
|
5. **Given** the system is live, **When** a user triggers AI metadata extraction, **Then** the AI inference completes successfully via the internal Docker network (backend → Ollama)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 4 - Remove X-API-Key and Verify Network-Only Auth (Priority: P4)
|
||||||
|
|
||||||
|
Security administrator removes the `X-API-Key` header authentication from the OCR Sidecar and Backend, relying solely on Docker-internal network isolation as per ADR-040 D5.
|
||||||
|
|
||||||
|
**Why this priority**: This is a key security improvement enabled by the consolidation. It simplifies the architecture but must be validated carefully.
|
||||||
|
|
||||||
|
**Independent Test**: Can be tested by attempting to access sidecar endpoints from outside the Docker network (should fail) and from within the Docker network (should succeed without API key).
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** all services are on the Docker internal bridge, **When** the backend calls the sidecar without `X-API-Key`, **Then** the sidecar processes the request successfully
|
||||||
|
2. **Given** the sidecar is not publishing ports to LAN, **When** an external client attempts to reach the sidecar directly, **Then** the connection is refused
|
||||||
|
3. **Given** the `X-API-Key` code is removed, **When** the administrator reviews the sidecar and backend configuration, **Then** no hardcoded API keys remain in the codebase
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 5 - Decommission Old Hosts (Priority: P5)
|
||||||
|
|
||||||
|
Operations team stops services on QNAP (which becomes backup server) and retires Desk-5439, completing the consolidation.
|
||||||
|
|
||||||
|
**Why this priority**: Cleanup is the final step after the new host is verified stable. It frees up old hardware and reduces management complexity.
|
||||||
|
|
||||||
|
**Independent Test**: Can be tested by verifying that QNAP services are stopped (except backup-related) and Desk-5439 is powered off or repurposed.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** the new host has been stable for 24-48 hours, **When** the administrator stops backend/frontend/Redis/DB/ES services on QNAP, **Then** QNAP remains available as a backup server with data intact
|
||||||
|
2. **Given** QNAP services are stopped, **When** the administrator powers off Desk-5439, **Then** no LCBP3-DMS services are affected on the new host
|
||||||
|
3. **Given** old hosts are decommissioned, **When** the administrator verifies monitoring dashboards, **Then** only the new host is tracked as the active production host
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Edge Cases
|
||||||
|
|
||||||
|
- **GPU OOM during concurrent AI + OCR load**: What happens when np-dms-ai and np-dms-ocr are loaded simultaneously and VRAM exceeds 16GB? ADR-040 D3 (Adaptive OCR Residency) must unload OCR model to make room for LLM.
|
||||||
|
- **RAM exhaustion under heavy load**: What happens when MariaDB + Elasticsearch + CPU-fallback tensors consume more than 32GB? System must have swap space configured and memory limits per container.
|
||||||
|
- **CIFS mount failure**: What happens when ASUSTOR NAS is unreachable? File upload/download will fail; system must degrade gracefully with clear error messages.
|
||||||
|
- **Single host hardware failure**: What happens when the new host crashes? SPOF mitigation requires backup data on QNAP and a disaster recovery plan.
|
||||||
|
- **Network misconfiguration**: What happens if Docker bridge network is accidentally exposed? Sidecar and Ollama would be accessible from LAN, breaking the security model.
|
||||||
|
- **Database migration partial failure**: What happens if MariaDB migration fails midway? Rollback plan must restore QNAP as the active database host.
|
||||||
|
- **Elasticsearch index corruption during migration**: What happens if ES indices are corrupted during transfer? Re-indexing from MariaDB data must be available as a fallback.
|
||||||
|
|
||||||
|
## Requirements _(mandatory)_
|
||||||
|
|
||||||
|
### Functional Requirements
|
||||||
|
|
||||||
|
- **FR-001**: System MUST co-locate all 7 services (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, Elasticsearch) on a single Docker host with a unified `docker-compose.yml`
|
||||||
|
- **FR-002**: System MUST use ASUSTOR (192.168.10.9) as the primary NAS for file storage via CIFS mount at `/mnt/uploads`
|
||||||
|
- **FR-003**: System MUST isolate Ollama and OCR Sidecar on a Docker internal bridge network (`dms-internal`) with no ports published to LAN
|
||||||
|
- **FR-004**: System MUST publish only Frontend (port 3000) and Backend (port 3000) to the LAN
|
||||||
|
- **FR-005**: System MUST enable backend-to-sidecar and backend-to-Ollama communication via Docker service names (`http://ocr-sidecar:8765`, `http://ollama:11434`)
|
||||||
|
- **FR-006**: System MUST migrate MariaDB data from QNAP to the new host with zero data loss
|
||||||
|
- **FR-007**: System MUST migrate Elasticsearch indices from QNAP to the new host with zero data loss
|
||||||
|
- **FR-008**: System MUST remove `X-API-Key` authentication from sidecar and backend after confirming Docker-internal network isolation (ADR-040 D5)
|
||||||
|
- **FR-009**: System MUST enforce GPU VRAM management via Adaptive OCR Residency (ADR-040 D3) and CPU Fallback Retrieval (ADR-040 D4)
|
||||||
|
- **FR-010**: System MUST configure per-container memory limits to prevent any single service from exhausting 32GB RAM
|
||||||
|
- **FR-011**: System MUST retain QNAP as a backup server with database and file storage data intact after cutover
|
||||||
|
- **FR-012**: System MUST retire Desk-5439 after cutover is verified stable for 24-48 hours
|
||||||
|
- **FR-013**: System MUST provide a rollback plan to restore services on QNAP and Desk-5439 if the new host fails
|
||||||
|
- **FR-014**: System MUST verify all core DMS functions (login, document CRUD, OCR, AI inference, search) work end-to-end on the new host before decommissioning old hosts
|
||||||
|
- **FR-015**: System MUST monitor RAM and VRAM usage for 24-48 hours post-cutover to detect resource pressure
|
||||||
|
|
||||||
|
### Key Entities _(include if feature involves data)_
|
||||||
|
|
||||||
|
- **Docker Compose Stack**: Single `docker-compose.yml` defining all 7 services, 2 networks (`dms-internal`, `dms-frontend`), and volumes (CIFS, named volumes for data)
|
||||||
|
- **CIFS Volume Mount**: ASUSTOR network share mounted as Docker volume for file storage (`/mnt/uploads/temp`, `/mnt/uploads/permanent`)
|
||||||
|
- **Docker Internal Network**: Bridge network (`dms-internal`) isolating Ollama, Sidecar, Backend, Redis, MariaDB, and Elasticsearch from LAN access
|
||||||
|
- **GPU Resource Allocation**: NVIDIA GPU passthrough to Ollama container with VRAM management via adaptive residency policies
|
||||||
|
|
||||||
|
## Success Criteria _(mandatory)_
|
||||||
|
|
||||||
|
### Measurable Outcomes
|
||||||
|
|
||||||
|
- **SC-001**: All 7 service containers start and report healthy within 5 minutes of `docker compose up -d` on the new host
|
||||||
|
- **SC-002**: Database migration completes with 100% row count parity between QNAP and new host for all critical tables
|
||||||
|
- **SC-003**: Elasticsearch migration completes with 100% document count parity between QNAP and new host for all indices
|
||||||
|
- **SC-004**: Core DMS operations (login, document upload, search, OCR, AI inference) complete successfully on the new host with zero functional regressions
|
||||||
|
- **SC-005**: Ollama and OCR Sidecar are unreachable from LAN (port scan returns closed/refused for ports 11434 and 8765)
|
||||||
|
- **SC-006**: Backend-to-Ollama latency is reduced by at least 50% compared to cross-host LAN communication (measured via AI inference response time)
|
||||||
|
- **SC-007**: RAM usage remains below 28GB (87.5% of 32GB) under normal operational load for 24 hours post-cutover
|
||||||
|
- **SC-008**: VRAM usage remains below 15GB (93.7% of 16GB) during concurrent AI inference and OCR workloads
|
||||||
|
- **SC-009**: Rollback plan can be executed within 2 hours to restore services on QNAP and Desk-5439 if needed
|
||||||
|
- **SC-010**: QNAP backup server retains a valid database snapshot within 24 hours of cutover
|
||||||
|
|
||||||
|
### Assumptions
|
||||||
|
|
||||||
|
- The new host hardware (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB) is physically available and OS-installed before provisioning begins
|
||||||
|
- ASUSTOR NAS (192.168.10.9) has sufficient storage capacity for all file uploads (temp + permanent)
|
||||||
|
- Network connectivity between the new host and ASUSTOR is via VLAN 10 with CIFS/SMB 3.0 support
|
||||||
|
- NVIDIA drivers and Docker GPU runtime (nvidia-container-toolkit) are compatible with the RTX 5060 Ti
|
||||||
|
- QNAP data (MariaDB, Elasticsearch) is in a consistent state suitable for dump-and-restore migration
|
||||||
|
- ADR-040 (OCR Sidecar Refactor) is implemented concurrently or prior to cutover for network-only auth and adaptive residency
|
||||||
|
- Gitea CI/CD pipeline can be updated to target the new host for deployment
|
||||||
@@ -0,0 +1,221 @@
|
|||||||
|
// File: specs/100-Infrastructures/141-server-consolidation/tasks.md
|
||||||
|
// Change Log:
|
||||||
|
// - 2026-06-20: Initial task list for Single-Host Server Consolidation
|
||||||
|
// - 2026-06-20: Fix C1-C5 from analysis: backend env var update, port conflict, GPU residency, ollama-metrics port, n8n endpoints
|
||||||
|
|
||||||
|
# Tasks: Single-Host Server Consolidation
|
||||||
|
|
||||||
|
**Input**: Design documents from `/specs/100-Infrastructures/141-server-consolidation/`
|
||||||
|
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/
|
||||||
|
**Related ADRs**: ADR-041, ADR-040, ADR-016, ADR-023A, ADR-034
|
||||||
|
|
||||||
|
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
|
||||||
|
|
||||||
|
## Format: `[ID] [P?] [Story] Description`
|
||||||
|
|
||||||
|
- **[P]**: Can run in parallel (different files, no dependencies)
|
||||||
|
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
|
||||||
|
- Include exact file paths in descriptions
|
||||||
|
|
||||||
|
## Phase 1: Setup (Shared Infrastructure)
|
||||||
|
|
||||||
|
**Purpose**: Create directory structure and initial files for the new host deployment
|
||||||
|
|
||||||
|
- [ ] T001 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/` directory structure with subdirectories: `ocr-sidecar/`, `scripts/`
|
||||||
|
- [ ] T002 [P] Create `.env.template` at `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.template` with all required env vars from contracts
|
||||||
|
- [ ] T003 [P] Create `README.md` at `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/README.md` with deployment overview
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: Foundational (Blocking Prerequisites)
|
||||||
|
|
||||||
|
**Purpose**: Provision the new host OS and create the unified Docker Compose stack — MUST be complete before any user story can proceed
|
||||||
|
|
||||||
|
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
|
||||||
|
|
||||||
|
- [ ] T004 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/provision-host.sh` — installs Docker Engine, Docker Compose v2, NVIDIA drivers, nvidia-container-toolkit, CIFS utils, creates directory structure
|
||||||
|
- [ ] T005 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` — unified compose with all 10 services, 2 networks (dms-internal, dms-frontend), CIFS volume, named volumes, memory limits per data-model.md. Backend publishes `3001:3000` to LAN (NPM routes `backend.np-dms.work` → :3001); Frontend publishes `3000:3000`; ollama-metrics publishes `9924:9924` to LAN for Prometheus scraping from ASUSTOR
|
||||||
|
- [ ] T006 [P] Copy OCR sidecar code from `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/` to `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/` — adapt `OLLAMA_API_URL` to `http://ollama:11434` (Docker DNS), remove `ports` mapping, use `expose` only
|
||||||
|
- [ ] T007 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/Dockerfile` — verify GPU access via nvidia-container-toolkit, ensure poppler-utils installed
|
||||||
|
- [ ] T008 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/requirements.txt` — verify typhoon-ocr, PyMuPDF, httpx, fastapi versions match Desk-5439
|
||||||
|
- [ ] T008b Update backend environment variables for renamed service names: `REDIS_HOST=redis` (was `cache`), `ELASTICSEARCH_HOST=elasticsearch` (was `search`) in `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.template` and `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` backend environment section — these service names changed from QNAP compose where Redis was `cache` and ES was `search`
|
||||||
|
|
||||||
|
**Checkpoint**: New host directory structure and unified compose file ready — user story implementation can now begin
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: User Story 1 - Provision and Deploy on New Host (Priority: P1) 🎯 MVP
|
||||||
|
|
||||||
|
**Goal**: Administrator provisions the new host, mounts ASUSTOR CIFS, and deploys all services with Docker internal network isolation
|
||||||
|
|
||||||
|
**Independent Test**: Run `docker compose up -d` on the new host and verify all containers are healthy via `docker ps` and health check endpoints
|
||||||
|
|
||||||
|
### Implementation for User Story 1
|
||||||
|
|
||||||
|
- [ ] T009 [US1] Run `provision-host.sh` on new host — verify Docker, NVIDIA, CIFS mount at `/mnt/uploads`
|
||||||
|
- [ ] T010 [US1] Pull Ollama models on new host: `ollama pull np-dms-ai:latest`, `ollama pull np-dms-ocr:latest`, `ollama pull nomic-embed-text:latest` — verify with `ollama list`
|
||||||
|
- [ ] T011 [US1] Copy `.env.template` to `.env`, fill in all secrets from QNAP `.env` (DB passwords, JWT secrets, Redis password, ASUSTOR CIFS credentials)
|
||||||
|
- [ ] T012 [US1] Run `docker compose --env-file .env -f docker-compose.new-host.yml up -d` and verify all 10 containers start
|
||||||
|
- [ ] T013 [US1] Verify network isolation: `nmap -p 11434 <new-host-ip>` from another VLAN 10 machine should show closed/refused; `nmap -p 8765` should show closed/refused; `nmap -p 3000` (frontend) and `nmap -p 3001` (backend) should show open; `nmap -p 9924` (ollama-metrics) should show open for Prometheus
|
||||||
|
- [ ] T014 [US1] Verify health checks: `curl http://localhost:3001/health` (backend on published port 3001), `curl http://localhost:3000/` (frontend), `curl http://ocr-sidecar:8765/health` (from inside backend container via Docker DNS)
|
||||||
|
|
||||||
|
**Checkpoint**: All services running on new host with correct network isolation — MVP achieved
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: User Story 2 - Migrate Data from QNAP to New Host (Priority: P2)
|
||||||
|
|
||||||
|
**Goal**: Migrate MariaDB and Elasticsearch data from QNAP to the new host with zero data loss
|
||||||
|
|
||||||
|
**Independent Test**: Compare row counts and index document counts between QNAP (source) and new host (destination) after migration
|
||||||
|
|
||||||
|
### Implementation for User Story 2
|
||||||
|
|
||||||
|
- [ ] T015 [P] [US2] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-mariadb.sh` — dump from QNAP MariaDB 11.8 via `mariadb-dump --single-transaction --routines --triggers`, pipe to new host container
|
||||||
|
- [ ] T016 [P] [US2] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-elasticsearch.sh` — create snapshot on QNAP ES, transfer files, register repo on new host, restore
|
||||||
|
- [ ] T017 [US2] Run `migrate-mariadb.sh` — verify all table row counts match between QNAP and new host
|
||||||
|
- [ ] T018 [US2] Run `migrate-elasticsearch.sh` — verify all index document counts match between QNAP and new host
|
||||||
|
- [ ] T019 [US2] Create and run `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/verify-data-parity.sh` — automated row count + document count comparison script
|
||||||
|
- [ ] T020 [US2] Verify CIFS file access: list files in `/app/uploads/temp` and `/app/uploads/permanent` from backend container, compare with ASUSTOR share
|
||||||
|
|
||||||
|
**Checkpoint**: All data migrated and verified — new host has complete production data
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5: User Story 3 - Cutover and Smoke Test (Priority: P3)
|
||||||
|
|
||||||
|
**Goal**: Perform production cutover from old 2-host architecture to new single host, verify all DMS functions work end-to-end
|
||||||
|
|
||||||
|
**Independent Test**: Access application via new host IP, perform core DMS operations (login, document upload, search, AI inference)
|
||||||
|
|
||||||
|
### Implementation for User Story 3
|
||||||
|
|
||||||
|
- [ ] T021 [P] [US3] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/smoke-test.sh` — automated tests for: backend health, frontend accessible, login flow, document list, OCR endpoint, AI inference, full-text search
|
||||||
|
- [ ] T022 [US3] Update Gitea secrets: `HOST` → new host IP, `COMPOSE_FILE` → `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
|
||||||
|
- [ ] T023 [US3] Update `scripts/deploy.sh` — change `COMPOSE_FILE` path to New-Host directory
|
||||||
|
- [ ] T024 [US3] Update NPM (Nginx Proxy Manager) on QNAP: `lcbp3.np-dms.work` → new host IP:3000 (frontend), `backend.np-dms.work` → new host IP:3001 (backend)
|
||||||
|
- [ ] T024b [US3] Update n8n workflow endpoints on QNAP: change all backend API URLs from `http://192.168.10.8:3000/api` (QNAP) to `http://<new-host-ip>:3001/api` (new host) — n8n stays on QNAP but must reach backend on new host via LAN port 3001
|
||||||
|
- [ ] T025 [US3] Run `smoke-test.sh` on new host — verify all 7 smoke tests pass
|
||||||
|
- [ ] T026 [US3] Verify from external machine on VLAN 10: access `https://lcbp3.np-dms.work`, login, create a test Correspondence, upload a PDF, trigger OCR, perform search
|
||||||
|
|
||||||
|
**Checkpoint**: New host is production-active — all DMS functions verified end-to-end
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6: User Story 4 - Remove X-API-Key and Verify Network-Only Auth (Priority: P4)
|
||||||
|
|
||||||
|
**Goal**: Remove `X-API-Key` authentication from sidecar and backend, relying solely on Docker-internal network isolation per ADR-040 D5
|
||||||
|
|
||||||
|
**Independent Test**: Attempt to access sidecar from outside Docker network (should fail); verify backend calls sidecar without API key (should succeed)
|
||||||
|
|
||||||
|
### Implementation for User Story 4
|
||||||
|
|
||||||
|
- [ ] T027 [P] [US4] Remove `OCR_SIDECAR_API_KEY` from `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` ocr-sidecar environment
|
||||||
|
- [ ] T028 [P] [US4] Remove API key validation from `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/app.py` — remove `X-API-Key` header check middleware
|
||||||
|
- [ ] T029 [US4] Remove `X-API-Key` header from `backend/src/modules/ai/services/ocr.service.ts` — remove API key from HTTP client headers
|
||||||
|
- [ ] T030 [US4] Remove `OCR_SIDECAR_API_KEY` from `backend/.env.example` and any backend config that sets it
|
||||||
|
- [ ] T031 [US4] Rebuild and redeploy sidecar + backend containers — verify backend can call sidecar without API key
|
||||||
|
- [ ] T032 [US4] Verify external access blocked: `curl http://<new-host-ip>:8765/health` from VLAN 10 machine should fail (connection refused)
|
||||||
|
|
||||||
|
**Checkpoint**: Network-only auth verified — no API key needed, Docker isolation sufficient
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 7: User Story 5 - Decommission Old Hosts (Priority: P5)
|
||||||
|
|
||||||
|
**Goal**: Stop services on QNAP (becomes backup) and retire Desk-5439, completing the consolidation
|
||||||
|
|
||||||
|
**Independent Test**: Verify QNAP services stopped (except backup), Desk-5439 powered off, new host unaffected
|
||||||
|
|
||||||
|
### Implementation for User Story 5
|
||||||
|
|
||||||
|
- [ ] T033 [P] [US5] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/rollback.sh` — emergency rollback: stop new host, restore QNAP + Desk-5439 services, revert DNS, revert CI/CD
|
||||||
|
- [ ] T034 [US5] Monitor new host for 24-48 hours: RAM usage (`docker stats`), VRAM usage (`nvidia-smi`), container health, application logs
|
||||||
|
- [ ] T034b [US5] Verify Adaptive OCR Residency (ADR-040 D3) on new RTX 5060 Ti: load `np-dms-ai` and `np-dms-ocr` concurrently, confirm `calculate_ocr_residency()` unloads OCR model when LLM needs VRAM; verify CPU Fallback Retrieval (ADR-040 D4) activates for BGE-M3/Reranker when GPU is occupied by LLM
|
||||||
|
- [ ] T035 [US5] Stop QNAP app services: `ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose down'`
|
||||||
|
- [ ] T036 [US5] Stop QNAP service stack: `ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose down'`
|
||||||
|
- [ ] T037 [US5] Retire Desk-5439: `ssh user@192.168.10.100 'sudo shutdown -h now'` (or repurpose)
|
||||||
|
- [ ] T038 [US5] Verify new host still fully operational after old hosts decommissioned — re-run `smoke-test.sh`
|
||||||
|
- [ ] T039 [US5] Take QNAP backup snapshot: `mariadb-dump` on QNAP MariaDB (if still running) or verify existing backup is current
|
||||||
|
|
||||||
|
**Checkpoint**: Consolidation complete — single host is sole production, old hosts decommissioned
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 8: Polish & Cross-Cutting Concerns
|
||||||
|
|
||||||
|
**Purpose**: Documentation, monitoring, and final verification
|
||||||
|
|
||||||
|
- [ ] T040 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/README.md` — add New-Host section, mark QNAP as backup, mark Desk-5439 as retired
|
||||||
|
- [ ] T041 [P] Update `CONTEXT.md` — update infrastructure topology to reflect single-host architecture
|
||||||
|
- [ ] T042 [P] Update `AGENTS.md` — update infrastructure references (Desk-5439 → New Host, QNAP → backup)
|
||||||
|
- [ ] T043 Update `specs/04-Infrastructure-OPS/04-00-docker-compose/.env.template` — add ASUSTOR_USER, ASUSTOR_PASS, NEW_HOST_IP variables
|
||||||
|
- [ ] T044 [P] Update Prometheus/Grafana scrape config on ASUSTOR — update ollama-metrics target from `192.168.10.100:9924` to new host internal or host-published port
|
||||||
|
- [ ] T045 Run `quickstart.md` validation — follow all steps end-to-end on a fresh provision
|
||||||
|
- [ ] T046 [P] Document disaster recovery procedure — backup schedule, restore from QNAP backup, estimated RTO/RPO
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies & Execution Order
|
||||||
|
|
||||||
|
### Phase Dependencies
|
||||||
|
|
||||||
|
- **Setup (Phase 1)**: No dependencies — can start immediately
|
||||||
|
- **Foundational (Phase 2)**: Depends on Setup — BLOCKS all user stories
|
||||||
|
- **US1 (Phase 3)**: Depends on Foundational — requires physical access to new host
|
||||||
|
- **US2 (Phase 4)**: Depends on US1 (services must be running to receive migrated data)
|
||||||
|
- **US3 (Phase 5)**: Depends on US1 + US2 (services running + data migrated for cutover)
|
||||||
|
- **US4 (Phase 6)**: Depends on US3 (cutover complete, network isolation verified)
|
||||||
|
- **US5 (Phase 7)**: Depends on US3 + US4 (stable production before decommissioning)
|
||||||
|
- **Polish (Phase 8)**: Can start after US3; some tasks depend on US5
|
||||||
|
|
||||||
|
### User Story Dependencies
|
||||||
|
|
||||||
|
- **US1 (P1)**: Foundational → US1 — no dependencies on other stories
|
||||||
|
- **US2 (P2)**: US1 → US2 — needs running services to receive data
|
||||||
|
- **US3 (P3)**: US1 + US2 → US3 — needs running services + migrated data
|
||||||
|
- **US4 (P4)**: US3 → US4 — needs cutover complete to verify network isolation in production
|
||||||
|
- **US5 (P5)**: US3 + US4 → US5 — needs stable production before decommissioning
|
||||||
|
|
||||||
|
### Parallel Opportunities
|
||||||
|
|
||||||
|
- T002, T003 can run in parallel (different files)
|
||||||
|
- T006, T007, T008 can run in parallel (sidecar files, no dependencies)
|
||||||
|
- T015, T016 can run in parallel (different migration scripts)
|
||||||
|
- T027, T028 can run in parallel (different files: compose vs app.py)
|
||||||
|
- T040, T041, T042, T044 can run in parallel (different doc files)
|
||||||
|
- T027, T028, T030 can run in parallel (different files: compose, app.py, .env.example)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Strategy
|
||||||
|
|
||||||
|
### MVP First (User Story 1 Only)
|
||||||
|
|
||||||
|
1. Complete Phase 1: Setup (create directory structure)
|
||||||
|
2. Complete Phase 2: Foundational (provision host + create compose)
|
||||||
|
3. Complete Phase 3: User Story 1 (deploy services)
|
||||||
|
4. **STOP and VALIDATE**: All containers healthy, network isolation verified
|
||||||
|
5. Demo to stakeholders if ready
|
||||||
|
|
||||||
|
### Incremental Delivery
|
||||||
|
|
||||||
|
1. Setup + Foundational → Infrastructure ready
|
||||||
|
2. Add US1 → Services deployed → Validate (MVP!)
|
||||||
|
3. Add US2 → Data migrated → Validate parity
|
||||||
|
4. Add US3 → Cutover complete → Validate end-to-end
|
||||||
|
5. Add US4 → Security hardened → Validate network-only auth
|
||||||
|
6. Add US5 → Old hosts retired → Validate stability
|
||||||
|
7. Polish → Documentation updated → Final validation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- This is an infrastructure task — most work is shell scripts, Docker Compose YAML, and manual operations
|
||||||
|
- Physical access to the new host is required for US1
|
||||||
|
- Data migration (US2) requires SSH access to QNAP
|
||||||
|
- Cutover (US3) requires DNS/NPM access and coordination with users
|
||||||
|
- Decommission (US5) should only proceed after 24-48 hours of stable monitoring
|
||||||
|
- Rollback plan must be tested before cutover
|
||||||
|
- All env secrets must come from `.env` (gitignored) — never commit real secrets
|
||||||
@@ -36,4 +36,11 @@
|
|||||||
| 2026-06-19 | v1.9.10 | Feature-240 AI Admin Console Collapsible Cards — เพิ่มปุ่มและฟังก์ชันพับ/คลี่การ์ดและเซกชัน พร้อมบันทึกสถานะลง localStorage และรักษา background query polling | ✅ Complete |
|
| 2026-06-19 | v1.9.10 | Feature-240 AI Admin Console Collapsible Cards — เพิ่มปุ่มและฟังก์ชันพับ/คลี่การ์ดและเซกชัน พร้อมบันทึกสถานะลง localStorage และรักษา background query polling | ✅ Complete |
|
||||||
| 2026-06-19 | v1.9.10 | Deployment Timeout Fix — Added clamav health check before recreation (skip if healthy), increased CI timeout 20→30 min | ✅ Complete |
|
| 2026-06-19 | v1.9.10 | Deployment Timeout Fix — Added clamav health check before recreation (skip if healthy), increased CI timeout 20→30 min | ✅ Complete |
|
||||||
| 2026-06-19 | v1.9.10 | AI Admin Response Normalization — recursive data unwrap for VRAM/prompt payloads, fixed Sandbox `.map()` crash and false OOM Guard | ✅ Complete |
|
| 2026-06-19 | v1.9.10 | AI Admin Response Normalization — recursive data unwrap for VRAM/prompt payloads, fixed Sandbox `.map()` crash and false OOM Guard | ✅ Complete |
|
||||||
|
| 2026-06-19 | v1.9.2 | SQL Delta Consolidation — merged applied deltas into schema/seed files, updated data dictionary to v1.9.2, cleaned up deltas directory, moved INSERT statements from schema to seed file | ✅ Complete |
|
||||||
|
| 2026-06-20 | v1.9.10 | ADR-040 OCR Sidecar Refactor — Pure compute worker, async I/O, residency wiring, path hardening, network isolation (supersedes ADR-033 §7) | ✅ Proposed |
|
||||||
|
| 2026-06-20 | v1.9.10 | ADR-041 Server Consolidation — Single Docker host (Ryzen 5 5600/32GB/RTX 5060 Ti 16GB), ASUSTOR as Primary NAS, QNAP as backup | ✅ Proposed |
|
||||||
|
| 2026-06-20 | v1.9.10 | OCR Sidecar Refactor (Speckit-140) — spec.md, plan.md, tasks.md generated, 5 analysis issues fixed, ready for implementation | ✅ Ready for Implement |
|
||||||
|
| 2026-06-20 | v1.9.10 | OCR Sidecar Refactor Phase 6+8+9 — async I/O (lifespan + AsyncClient + asyncio.to_thread), ลบ /normalize endpoint, Dockerfile curl, docker-compose stale config cleanup, README.md, quickstart.md fix — 19/19 Python tests pass | ✅ Complete (Phase 7 blocked by ADR-041) |
|
||||||
|
| 2026-06-20 | v1.9.10 | OCR Backend Cleanup — typhoon-llm → np-dms-ai (processor+queue+module), tesseract → fast-path (enum+entity+controller+service+tests), P1-P3 fixes (keep_alive removal, hardcoded API key removal, env var alignment, Dockerfile 3.11, asyncio.to_thread VRAM calls) | ✅ Complete (pending tsc verify) |
|
||||||
|
| 2026-06-20 | v1.9.10 | OCR Naming Refactor — TyphoonOcr → NpDmsOcr (processor/queue/Redis key/aiModel), OcrTyphoonOptions → OcrNpDmsOptions, typhoonOptions → ocrOptions (backend 7 files + 3 tests), frontend typhoon state vars → ocr, isTyphoon → isAiPowered, Tesseract mocks → Fast Path, dead typhoon_ocr checks removed, page.tsx model name constants | ✅ Complete (pending tsc verify) |
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,32 @@
|
|||||||
|
# Session — 2026-06-19 (SQL Delta Consolidation)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
รวม SQL delta files ที่ apply แล้วเข้ากับ schema และ seed files หลัก, ลบ rollback files, อัปเดต data dictionary, และย้าย INSERT statements จาก schema file ไป seed file
|
||||||
|
|
||||||
|
## ปัญหาที่พบ (Root Cause)
|
||||||
|
|
||||||
|
ไม่มีปัญหา - เป็นงาน maintenance ปกติ
|
||||||
|
|
||||||
|
## การแก้ไข (Fix)
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| -------------- | ---------------------- |
|
||||||
|
| `specs/03-Data-and-Storage/lcbp3-v1.9.0-schema-02-tables.sql` | อัปเดต `tags`, `correspondence_tags`, `system_settings`, `migration_review_queue`, `ai_audit_logs` tables; เพิ่ม `ai_available_models`, `ai_prompts`, `ai_execution_profiles`, `ai_sandbox_profiles`, `migration_errors` tables; ลบ INSERT statements |
|
||||||
|
| `specs/03-Data-and-Storage/lcbp3-v1.9.0-seed-basic.sql` | เพิ่ม AI seed data (ai_available_models, ai_execution_profiles, ai_sandbox_profiles); เพิ่ม system_settings INSERT statements |
|
||||||
|
| `specs/03-Data-and-Storage/03-01-data-dictionary.md` | อัปเดต version เป็น 1.9.2; อัปเดต `ai_audit_logs` definition; เพิ่ม entries สำหรับ `ai_available_models`, `ai_prompts`, `ai_execution_profiles`, `ai_sandbox_profiles`, `migration_errors` |
|
||||||
|
| `specs/03-Data-and-Storage/deltas/` | ลบ rollback files 15 ไฟล์และ .sql files 26 ไฟล์ทั้งหมด |
|
||||||
|
|
||||||
|
## กฎที่ Lock แล้ว
|
||||||
|
|
||||||
|
- **Schema Management**: ใช้ ADR-009 (no migrations) - แก้ SQL schema โดยตรง และใช้ delta files สำหรับ tracking
|
||||||
|
- **Seed Data Separation**: INSERT statements ต้องอยู่ใน seed files ไม่ใช่ schema files
|
||||||
|
- **Data Dictionary Sync**: เมื่อแก้ schema ต้องอัปเดต data dictionary พร้อม version bump
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- [x] Schema file ไม่มี INSERT statements
|
||||||
|
- [x] Seed file มี system_settings INSERT statements
|
||||||
|
- [x] AI seed data ถูกเพิ่มใน seed-basic.sql
|
||||||
|
- [x] Data dictionary version ถูก bump เป็น 1.9.2
|
||||||
|
- [x] Delta directory ถูก clean up (เหลือเฉพาะ README.md)
|
||||||
@@ -0,0 +1,79 @@
|
|||||||
|
# Session 2026-06-20 — OCR Backend Cleanup (Legacy Alias Removal)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
ทำความสะอาด backend code ให้ใช้ canonical naming อย่างสม่ำเสมอ: เปลี่ยน `typhoon-llm` → `np-dms-ai`, ลบ `tesseract` references ทั้งหมด, และ apply recommended fixes (P1–P3) จาก code review
|
||||||
|
|
||||||
|
## การเปลี่ยนแปลง (Fix)
|
||||||
|
|
||||||
|
### P1: Critical Fixes
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| --- | --- |
|
||||||
|
| `backend/src/modules/ai/services/ocr.service.ts` | P1-1: ลบ `keep_alive` ออกจาก multipart form data (sidecar คำนวณ internally); P1-2: ลบ hardcoded API key default — throw ถ้า `OCR_SIDECAR_API_KEY` ไม่ set; เปลี่ยน `processWithTyphoon` → `processWithNpDmsOcr`, `processWithTesseract` → `processWithFastPath`; audit log model names เปลี่ยนเป็น `fast-path`/`pymupdf` และ `np-dms-ocr` |
|
||||||
|
| `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts` | P1-2: ลบ hardcoded API key default — throw ถ้า `OCR_SIDECAR_API_KEY` ไม่ set; ลบ `'tesseract'` ออกจาก `SandboxOcrEngineType`; อัปเดต routing condition และ fallback comments |
|
||||||
|
|
||||||
|
### P2: Important Fixes
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| --- | --- |
|
||||||
|
| `backend/.env.example` | P2-1: `OCR_API_KEY` → `OCR_SIDECAR_API_KEY` (align กับ code); P2-2: OCR URL `192.168.10.8` → `192.168.10.100` (Desk-5439); ลบ `THAI_PREPROCESS_URL` (endpoint deleted in ADR-040 Phase 8); comment "PaddleOCR" → "np-dms-ocr" |
|
||||||
|
| `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/Dockerfile` | P2-5: `python:3.10-slim` → `python:3.11-slim` |
|
||||||
|
|
||||||
|
### P3: Medium Priority
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| --- | --- |
|
||||||
|
| `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py` | P3-1/P3-2: Wrap sync VRAM calls ใน `asyncio.to_thread()` — `calculate_ocr_residency()` ใน `process_ocr`, `get_vram_headroom()` ใน `/embed` และ `/rerank` |
|
||||||
|
| `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/services/residency_policy.py` | อัปเดต comment "Typhoon OCR" → "np-dms-ocr" |
|
||||||
|
|
||||||
|
### typhoon-llm → np-dms-ai Rename
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| --- | --- |
|
||||||
|
| `backend/src/modules/ai/processors/np-dms-ai.processor.ts` | **สร้างใหม่** — `NpDmsAiProcessor`, `QUEUE_NP_DMS_AI = 'np-dms-ai'`, `NpDmsAiJobData`, Redis key `ai:np-dms-ai:llm:`, audit `aiModel: 'np-dms-ai'` |
|
||||||
|
| `backend/src/modules/ai/processors/typhoon-llm.processor.ts` | **ลบ** — แทนที่ด้วย `np-dms-ai.processor.ts` |
|
||||||
|
| `backend/src/modules/ai/ai.module.ts` | เปลี่ยน import จาก `typhoon-llm.processor` → `np-dms-ai.processor`; queue name `QUEUE_TYPHOON_LLM` → `QUEUE_NP_DMS_AI`; provider `TyphoonLlmProcessor` → `NpDmsAiProcessor` |
|
||||||
|
|
||||||
|
### Tesseract Cleanup
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| --- | --- |
|
||||||
|
| `backend/src/modules/ai/entities/ocr-engine-configuration.entity.ts` | `TESSERACT` → `FAST_PATH`, `TYPHOON_OCR` → `NP_DMS_OCR` ใน enum |
|
||||||
|
| `backend/src/modules/ai/ai.controller.ts` | ลบ `'tesseract'` ออกจาก Swagger enum และ `validEngineTypes` |
|
||||||
|
| `backend/src/modules/ai/entities/ai-audit-log.entity.ts` | อัปเดต comment examples: `tesseract` → `fast-path`, `typhoon-ocr-3b` → `np-dms-ocr` |
|
||||||
|
|
||||||
|
### Test Files Updated
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| --- | --- |
|
||||||
|
| `backend/src/modules/ai/services/sandbox-ocr-engine.service.spec.ts` | ลบ tesseract test block; อัปเดต `engineUsed` expectations จาก `'tesseract'` → `'fast-path'`; อัปเดต fallback test descriptions และ mock text |
|
||||||
|
|
||||||
|
### User Manual Changes (หลัง session)
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| --- | --- |
|
||||||
|
| `backend/src/modules/ai/processors/typhoon-ocr.processor.ts` | **ลบโดย user** — renamed ไป `np-dms-ocr-processor.ts` |
|
||||||
|
| `backend/src/modules/ai/processors/np-dms-ocr-processor.ts` | **สร้างโดย user** — renamed file; class name ยังเป็น `TyphoonOcrProcessor` และ queue `QUEUE_TYPHOON_OCR` |
|
||||||
|
| `backend/src/modules/ai/ai.module.ts` | User อัปเดต import path เป็น `./processors/np-dms-ocr-processor` |
|
||||||
|
|
||||||
|
## กฎที่ Lock แล้ว
|
||||||
|
|
||||||
|
- **Canonical naming:** `np-dms-ai` (LLM processor/queue), `np-dms-ocr` (OCR engine), `fast-path` (PyMuPDF text layer) — ไม่ใช้ `typhoon-llm`, `tesseract`, หรือ `Typhoon OCR` ใน code ใหม่
|
||||||
|
- **API Key:** `OCR_SIDECAR_API_KEY` เป็น mandatory env var — ห้ามมี hardcoded default
|
||||||
|
- **keep_alive:** Backend ไม่ส่ง `keep_alive` ใน form data — sidecar คำนวณผ่าน `calculate_ocr_residency()` เท่านั้น
|
||||||
|
- **VRAM calls:** Sync VRAM/residency calls ใน async endpoints ต้อง wrap ใน `asyncio.to_thread()`
|
||||||
|
|
||||||
|
## Remaining Work (Next Session)
|
||||||
|
|
||||||
|
- [ ] **Rename `TyphoonOcrProcessor` → `NpDmsOcrProcessor`** ใน `np-dms-ocr-processor.ts` (class name + queue constant `QUEUE_TYPHOON_OCR` → `QUEUE_NP_DMS_OCR`)
|
||||||
|
- [ ] **อัปเดต `ai.module.ts`** import ให้ใช้ `NpDmsOcrProcessor` และ `QUEUE_NP_DMS_OCR`
|
||||||
|
- [ ] **อัปเดต `typhoon-ocr.processor.spec.ts`** ถ้ามี — rename และ update references
|
||||||
|
- [ ] **tsc --noEmit verification** หลัง rename ครบ
|
||||||
|
- [ ] **Backend build** เพื่อยืนยันไม่มี broken imports
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- [ ] `pnpm --filter backend exec tsc --noEmit` — ยังไม่ได้รัน (pending rename TyphoonOcrProcessor)
|
||||||
|
- [ ] Backend unit tests — ยังไม่ได้รัน
|
||||||
|
- [ ] Python tests — ยังไม่ได้รันหลัง asyncio.to_thread changes
|
||||||
@@ -0,0 +1,51 @@
|
|||||||
|
# Session — 2026-06-20 (OCR Naming Refactor: typhoon → np-dms-ocr)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
ทำการ refactor naming conventions ที่เกี่ยวข้องกับ OCR engine จาก "typhoon" เป็น "np-dms-ocr" ทั้ง backend และ frontend อย่างครบถ้วน รวมถึงการ cleanup Tesseract references ที่เหลืออยู่ใน test files และ source code
|
||||||
|
|
||||||
|
## การเปลี่ยนแปลง (Changes)
|
||||||
|
|
||||||
|
### Backend (7 files + 3 test files)
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| --- | --- |
|
||||||
|
| `sandbox-ocr-engine.service.ts` | `OcrTyphoonOptions` → `OcrNpDmsOptions`, `typhoonOptions` → `ocrOptions`, log messages |
|
||||||
|
| `np-dms-ocr-processor.ts` | Import `OcrNpDmsOptions`, field `typhoonOptions` → `ocrOptions` |
|
||||||
|
| `ai.controller.ts` | `typhoonOptions` → `ocrOptions` ใน `submitSandboxOcr` |
|
||||||
|
| `ai-batch.processor.ts` | Import + all `typhoonOptions` → `ocrOptions` (5 locations) |
|
||||||
|
| `ai-queue.service.ts` | Field `typhoonOptions` → `ocrOptions` ใน payload type + job data |
|
||||||
|
| `ocr.service.ts` | `OcrDetectionInput.typhoonOptions` → `ocrOptions`, override logic |
|
||||||
|
| `ai.module.ts` | Change log comment: `TyphoonOcrProcessor` → `NpDmsOcrProcessor` |
|
||||||
|
| `ai-batch.processor.spec.ts` | Test assertion: `typhoonOptions` → `ocrOptions` |
|
||||||
|
| `ocr.service.spec.ts` | Test input: `typhoonOptions` → `ocrOptions` |
|
||||||
|
| `sandbox-ocr-engine.service.spec.ts` | Test description: `typhoonOptions` → `ocrOptions` |
|
||||||
|
|
||||||
|
### Frontend (7 files)
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| --- | --- |
|
||||||
|
| `admin-ai.service.ts` | Param `typhoonOptions` → `ocrOptions` ใน `submitSandboxOcr` |
|
||||||
|
| `OcrSandboxPromptManager.tsx` | State vars `typhoon*` → `ocr*`, UI label "Typhoon OCR Options" → "OCR Options", engineType mapping `tesseract` → `fast_path` → `auto`, ลบ dead `typhoon_ocr` check, fallback label "Tesseract" → "Fast Path (OCR)" |
|
||||||
|
| `OcrEngineSelector.tsx` | `isTyphoon` → `isAiPowered`, ลบ dead `typhoon_ocr` check |
|
||||||
|
| `SandboxTestArea.tsx` | UI labels "Typhoon OCR" → "np-dms-ocr" |
|
||||||
|
| `page.tsx` | ลบ `name.includes('typhoon')` 4 จุด, เปลี่ยน `name.includes('ocr')` → `name.includes(OCR_MODEL_NAME)` 4 จุด |
|
||||||
|
| `ocr-engine-selector.test.tsx` | Mock Tesseract → Fast Path (PyMuPDF), assertions อัปเดต |
|
||||||
|
| `OcrEngineSelector.test.tsx` | Mock Tesseract → Fast Path (PyMuPDF), assertions อัปเดต |
|
||||||
|
| `ocr-sandbox-prompt-manager.test.tsx` | Mock `engineType: 'typhoon_ocr'` → `'np_dms_ocr'` |
|
||||||
|
|
||||||
|
## กฎที่ Lock แล้ว
|
||||||
|
|
||||||
|
- **D28 (เดิม):** Canonical naming: `np-dms-ai` (LLM), `np-dms-ocr` (OCR), `fast-path` (PyMuPDF) — ครบทุก layer แล้ว
|
||||||
|
- **เพิ่มเติม:** Frontend model name normalization ใช้ constants `OCR_MODEL_NAME` และ `MAIN_MODEL_NAME` เท่านั้น — ห้าม hardcoded strings
|
||||||
|
- **เพิ่มเติม:** Backend `OcrEngineType` enum มีแค่ `FAST_PATH` และ `NP_DMS_OCR` — ไม่มี `TESSERACT` หรือ `TYPHOON_OCR` แล้ว
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- [ ] `pnpm --filter backend exec tsc --noEmit` — ยังไม่ได้รัน
|
||||||
|
- [ ] `pnpm --filter lcbp3-frontend exec tsc --noEmit` — ยังไม่ได้รัน
|
||||||
|
- [ ] Backend unit tests — ยังไม่ได้รัน
|
||||||
|
- [ ] Frontend unit tests — ยังไม่ได้รัน
|
||||||
|
- [x] `grep_search` สำหรับ `typhoon|Typhoon|TYPHOON` ใน frontend — เหลือเฉพาะ change log comments
|
||||||
|
- [x] `grep_search` สำหรับ `tesseract|Tesseract` ใน frontend — เหลือเฉพาะ change log comments
|
||||||
|
- [x] `grep_search` สำหรับ `typhoonOptions|OcrTyphoonOptions|QUEUE_TYPHOON_OCR|TyphoonOcrProcessor` ใน backend — ไม่พบ
|
||||||
@@ -0,0 +1,41 @@
|
|||||||
|
<!-- File: specs/88-logs/session-2026-06-20-ocr-sidecar-refactor-adr.md -->
|
||||||
|
# Session — 2026-06-20 (OCR Sidecar Refactor & Server Consolidation ADRs)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
สร้าง ADR-040 (OCR Sidecar Refactor) และ ADR-041 (Server Consolidation) โดย reconcile 2 แผน refactor (Claude + Qwen) กับ canonical specs (AGENTS.md, CONTEXT.md, ADR-033/034/036) และอัปเดต CONTEXT.md flagged ambiguities
|
||||||
|
|
||||||
|
## ปัญหาที่พบ (Root Cause)
|
||||||
|
|
||||||
|
- แผน refactor ทั้งสอง (Claude + Qwen) มี conflicts กับ resolved policies:
|
||||||
|
- ลบ `vram_monitor.py` / `residency_policy.py` → ละเมิด Adaptive OCR Residency + CPU Fallback Retrieval
|
||||||
|
- Force BGE+Reranker GPU-resident → ละเมิด LLM-First GPU Ownership
|
||||||
|
- Fixed `keep_alive` → ละเมิด ADR-036 Gap-2 (keep_alive เป็น lazy resource param)
|
||||||
|
- Cross-host trust gap: sidecar อยู่บน Desk-5439, backend อยู่บน QNAP → "Docker internal isolation" เป็นเท็จ
|
||||||
|
|
||||||
|
## การแก้ไข (Fix)
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| ----- | ----------------- |
|
||||||
|
| `specs/06-Decision-Records/ADR-040-ocr-sidecar-refactor.md` | สร้าง ADR ใหม่สำหรับ OCR sidecar refactor — preserve GPU policies, async I/O, path hardening, network isolation (supersedes ADR-033 §7) |
|
||||||
|
| `specs/06-Decision-Records/ADR-041-server-consolidation.md` | สร้าง ADR ใหม่สำหรับ server consolidation — single Docker host, ASUSTOR as Primary NAS, QNAP as backup |
|
||||||
|
| `CONTEXT.md` | เพิ่ม 2 resolved ambiguities: OCR Sidecar X-API-Key (network isolation only), Cross-host trust gap (server consolidation) |
|
||||||
|
|
||||||
|
## กฎที่ Lock แล้ว
|
||||||
|
|
||||||
|
| ID | Decision | ADR |
|
||||||
|
| -- | -------- | --- |
|
||||||
|
| D21 | OCR Sidecar = Pure Compute Worker — orchestration/params อยู่ใน backend existing services | ADR-040 D1 |
|
||||||
|
| D22 | Wire `calculate_ocr_residency()` ใน `process_ocr` — keep_alive เป็น lazy resource param (ADR-036 Gap-2) | ADR-040 D3 |
|
||||||
|
| D23 | Retain vram_monitor + CPU-fallback for `/embed`,`/rerank` — ห้าม force GPU-resident | ADR-040 D4 |
|
||||||
|
| D24 | Remove X-API-Key — auth = network isolation (supersedes ADR-033 §7) | ADR-040 D5 |
|
||||||
|
| D25 | Server Consolidation — co-locate ทุก services บน single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB) | ADR-041 D1 |
|
||||||
|
| D26 | ASUSTOR (192.168.10.9) = Primary NAS, QNAP = Backup server | ADR-041 D2 |
|
||||||
|
| D27 | Docker-internal network only for sidecar/Ollama — enables ADR-040 D5 network-only auth | ADR-041 D3 |
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- [ ] ADR-040 และ ADR-041 ถูก review และ approve
|
||||||
|
- [ ] Implementation tasks ใน ADR-040/041 ถูก execute
|
||||||
|
- [ ] Server consolidation cutover สำเร็จ
|
||||||
|
- [ ] X-API-Key removal สำเร็จหลัง consolidation cutover
|
||||||
@@ -0,0 +1,67 @@
|
|||||||
|
# Session — 2026-06-20 (OCR Sidecar Refactor Phase 6-9 Implementation)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Implemented Phases 6, 8, and 9 of the OCR Sidecar Refactor (Speckit-140) following ADR-040. Phase 6 refactored the sidecar to async I/O with lifespan context manager. Phase 8 removed the unused `/normalize` endpoint. Phase 9 polished Dockerfile, docker-compose.yml, created README.md, and validated quickstart.md. All 19 Python tests pass.
|
||||||
|
|
||||||
|
## ปัญหาที่พบ (Root Cause)
|
||||||
|
|
||||||
|
ไม่มี bug ใน session นี้ — เป็นการ implement feature ใหม่ตาม ADR-040:
|
||||||
|
- **Sync I/O bottleneck**: `process_ocr` ใช้ `httpx.Client` แบบ sync ทำให้ block FastAPI event loop
|
||||||
|
- **Stale startup pattern**: `@app.on_event("startup")` deprecate แล้วใน FastAPI 0.111+
|
||||||
|
- **Unused /normalize endpoint**: ไม่มี consumers ใน backend codebase
|
||||||
|
- **Stale Docker config**: `OCR_LANG`, `USE_GPU` เป็น Tesseract config ที่ไม่ใช้แล้ว
|
||||||
|
- **Missing curl**: Dockerfile ไม่มี `curl` ทำให้ HEALTHCHECK ล้มเหลว
|
||||||
|
|
||||||
|
## การแก้ไข (Fix)
|
||||||
|
|
||||||
|
### Phase 6: Async I/O Performance (T041-T046)
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
|------|----------------|
|
||||||
|
| `specs/04-Infrastructure-OPS/.../ocr-sidecar/app.py` | `process_ocr` → `async def`, `_process_pdf_doc` → `async def`, `/ocr` + `/ocr-upload` → `async def` |
|
||||||
|
| `app.py` | เพิ่ม `ollama_client` global (`httpx.AsyncClient`) สร้างใน lifespan context manager |
|
||||||
|
| `app.py` | แทน `@app.on_event("startup")` ด้วย `@asynccontextmanager lifespan` |
|
||||||
|
| `app.py` | Model loading ผ่าน `asyncio.to_thread(_load_bge_models)` |
|
||||||
|
| `app.py` | แทน `httpx.Client` ด้วย `await client.post()` (AsyncClient) |
|
||||||
|
| `tests/integration/ocr-sidecar/test_async_performance.py` | **New file**: 6 tests (coroutine check, lifespan, ollama_client global, /normalize removed, concurrent requests) |
|
||||||
|
| `tests/unit/ocr-sidecar/test_residency_wiring.py` | Updated: `FakeClient` → `FakeAsyncClient`, sync → `asyncio.run()` |
|
||||||
|
| `tests/integration/ocr-sidecar/test_parameter_governance.py` | Updated: async mock, patch `ollama_client` |
|
||||||
|
| `tests/integration/ocr-sidecar/test_active_prompt.py` | Updated: async mock, patch `ollama_client` |
|
||||||
|
|
||||||
|
### Phase 8: Remove /normalize (T054-T055)
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
|------|----------------|
|
||||||
|
| `app.py` | ลบ `NormalizeRequest`, `NormalizeResponse`, `/normalize` endpoint, `pythainlp` imports |
|
||||||
|
| `requirements.txt` | ลบ `pythainlp==5.0.4` และ `Pillow==10.0.0` |
|
||||||
|
| (grep verified) | ไม่มี `/normalize` หรือ `THAI_PREPROCESS_URL` consumers ใน backend |
|
||||||
|
|
||||||
|
### Phase 9: Polish (T056-T063)
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
|------|----------------|
|
||||||
|
| `Dockerfile` | เพิ่ม `curl` สำหรับ HEALTHCHECK, change log entry |
|
||||||
|
| `docker-compose.yml` | ลบ `OCR_LANG`, `USE_GPU`; เพิ่ม `OCR_SIDECAR_API_KEY`, `OCR_ACTIVE_PROFILE` |
|
||||||
|
| `README.md` | **New file**: architecture, endpoints, env vars, deploy guide, test coverage |
|
||||||
|
| `quickstart.md` | แก้ stale requirements (ลบ pythainlp/Pillow), แก้ `TYPHOON_OCR_MODEL` → `OCR_MODEL` |
|
||||||
|
| `tasks.md` | Mark T041-T046, T054-T063 as `[x]` |
|
||||||
|
|
||||||
|
## กฎที่ Lock แล้ว
|
||||||
|
|
||||||
|
- **Async I/O pattern**: `process_ocr` ต้องเป็น `async def` และใช้ `httpx.AsyncClient` ผ่าน `ollama_client` global (สร้างใน lifespan)
|
||||||
|
- **Lifespan over startup event**: ใช้ `@asynccontextmanager lifespan` แทน `@app.on_event("startup")` — deprecate แล้วใน FastAPI 0.111+
|
||||||
|
- **Model loading non-blocking**: ใช้ `asyncio.to_thread()` สำหรับ model loading ใน lifespan เพื่อไม่ block startup
|
||||||
|
- **No /normalize**: endpoint ถูกลบแล้ว — ไม่มี consumers ใน backend
|
||||||
|
- **Test mock pattern**: ใช้ `FakeAsyncClient` (async `post()` + `aclose()`) แทน `FakeClient` (sync) สำหรับทุก test ที่ mock Ollama API
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- [x] `python -m pytest tests/ -v` — **19/19 tests passed** in 4.81s
|
||||||
|
- [x] `test_process_ocr_is_coroutine_function` — process_ocr เป็น async ✅
|
||||||
|
- [x] `test_process_pdf_doc_is_coroutine_function` — _process_pdf_doc เป็น async ✅
|
||||||
|
- [x] `test_app_uses_lifespan_not_startup_event` — ใช้ lifespan ไม่ใช่ on_event ✅
|
||||||
|
- [x] `test_app_has_async_client_global` — ollama_client global มีอยู่ ✅
|
||||||
|
- [x] `test_normalize_endpoint_removed` — /normalize ถูกลบแล้ว ✅
|
||||||
|
- [x] `test_concurrent_ocr_requests_dont_block` — 3 concurrent requests สำเร็จ ✅
|
||||||
|
- [x] Existing tests (path traversal, API key, residency, CPU fallback, parameter governance, active prompt) — ทั้งหมดผ่าน ✅
|
||||||
@@ -0,0 +1,59 @@
|
|||||||
|
# Session — 2026-06-20 (OCR Sidecar Refactor Speckit Workflow)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Executed complete `/speckit.prepare` workflow for OCR Sidecar Refactor (ADR-040). Generated spec.md, plan.md, research.md, data-model.md, contracts/sidecar-api.md, quickstart.md, and tasks.md. Performed consistency analysis and fixed all identified issues (1 CRITICAL, 2 MEDIUM, 2 LOW).
|
||||||
|
|
||||||
|
## ปัญหาที่พบ (Root Cause)
|
||||||
|
|
||||||
|
None. This was a planning/specification workflow, not a bug fix session.
|
||||||
|
|
||||||
|
## การแก้ไข (Fix)
|
||||||
|
|
||||||
|
N/A - Specification generation workflow.
|
||||||
|
|
||||||
|
## สิ่งที่ทำใน Session
|
||||||
|
|
||||||
|
| ไฟล์ | การเปลี่ยนแปลง |
|
||||||
|
| ----- | ------------------ |
|
||||||
|
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/spec.md` | Created feature specification with 5 user stories, 20 functional requirements, 8 success criteria |
|
||||||
|
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/plan.md` | Created implementation plan with technical context, constitution check, and phase structure |
|
||||||
|
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/research.md` | Created technical decisions documentation from ADR-040 |
|
||||||
|
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/data-model.md` | Created data contracts and entity relationships |
|
||||||
|
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/contracts/sidecar-api.md` | Created sidecar API specification |
|
||||||
|
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/quickstart.md` | Created deployment and testing guide |
|
||||||
|
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/tasks.md` | Created 63 implementation tasks organized by user story |
|
||||||
|
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/checklists/requirements.md` | Created specification quality validation checklist |
|
||||||
|
|
||||||
|
## Analysis & Fixes
|
||||||
|
|
||||||
|
| Issue | Severity | Fix |
|
||||||
|
| ----- | -------- | --- |
|
||||||
|
| C1 - Constitution Check ADR-019 | CRITICAL | Updated plan.md to acknowledge ADR-019 applies to backend services (parameter resolution in OcrService/SandboxOcrEngineService) |
|
||||||
|
| U1 - Symlink resolution edge case | MEDIUM | Updated spec.md edge case to reference test T007 |
|
||||||
|
| U2 - Ollama unavailability edge case | MEDIUM | Updated spec.md edge case to note handled by FastAPI exception handling per ADR-007 |
|
||||||
|
| I1 - IP address inconsistency | LOW | Standardized IP to 192.168.10.100 in spec.md and plan.md |
|
||||||
|
| I2 - Task description clarity | LOW | Changed tasks T022/T023 from "Verify" to "Retain" |
|
||||||
|
|
||||||
|
## กฎที่ Lock แล้ว
|
||||||
|
|
||||||
|
- OCR sidecar is a pure compute worker (no DB/storage access per ADR-023/023A)
|
||||||
|
- Backend services handle all parameter governance (ai_execution_profiles, ai_prompts)
|
||||||
|
- Adaptive OCR Residency must be preserved (vram_monitor.py, residency_policy.py retained)
|
||||||
|
- CPU fallback for BGE-M3/FlagReranker must be preserved
|
||||||
|
- Phase 2 (X-API-Key removal) is BLOCKED until ADR-041 consolidation completes
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
- [x] All 5 user stories have acceptance criteria
|
||||||
|
- [x] All 20 functional requirements have task coverage (100%)
|
||||||
|
- [x] Constitution check passes with proper ADR-019 acknowledgment
|
||||||
|
- [x] No ambiguities or duplications found
|
||||||
|
- [x] All 5 analysis issues fixed
|
||||||
|
- [x] Ready for `/speckit-implement`
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Execute `/speckit-implement` to begin implementation
|
||||||
|
- Start with MVP (User Stories 1-2: Security Hardening + GPU Resource Management)
|
||||||
|
- User Story 5 (Network Isolation Auth Phase 2) remains BLOCKED until ADR-041 consolidation
|
||||||
@@ -0,0 +1,96 @@
|
|||||||
|
# File: tests/integration/ocr-sidecar/test_active_prompt.py
|
||||||
|
# Change Log:
|
||||||
|
# - 2026-06-20: Initial creation for US3 active prompt integration tests.
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from types import SimpleNamespace
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
UNIT_DIR = Path(__file__).resolve().parents[2] / "unit" / "ocr-sidecar"
|
||||||
|
if str(UNIT_DIR) not in sys.path:
|
||||||
|
sys.path.insert(0, str(UNIT_DIR))
|
||||||
|
|
||||||
|
from test_path_traversal import FakeDocument, load_app
|
||||||
|
|
||||||
|
|
||||||
|
class FakeAsyncResponse:
|
||||||
|
def raise_for_status(self) -> None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def json(self) -> dict:
|
||||||
|
return {"choices": [{"message": {"content": "{\"natural_text\": \"prompt result\"}"}}]}
|
||||||
|
|
||||||
|
|
||||||
|
class FakeAsyncClient:
|
||||||
|
last_payload = None
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs) -> None:
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def post(self, url: str, json: dict, headers: dict) -> FakeAsyncResponse:
|
||||||
|
FakeAsyncClient.last_payload = json
|
||||||
|
return FakeAsyncResponse()
|
||||||
|
|
||||||
|
async def aclose(self) -> None:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def test_ocr_injects_system_prompt_and_dms_tags(tmp_path: Path) -> None:
|
||||||
|
upload_base = tmp_path / "uploads"
|
||||||
|
upload_base.mkdir()
|
||||||
|
pdf_path = upload_base / "document.pdf"
|
||||||
|
pdf_path.write_bytes(b"%PDF-1.4\n")
|
||||||
|
|
||||||
|
app_module = load_app(upload_base)
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
|
||||||
|
decision = SimpleNamespace(keep_alive_seconds=120, reason="headroom-sufficient", vram_headroom_mb=9000.0)
|
||||||
|
fake_client = FakeAsyncClient()
|
||||||
|
FakeAsyncClient.last_payload = None
|
||||||
|
|
||||||
|
# Prepare dummy message structure
|
||||||
|
initial_messages = [{"role": "user", "content": [{"type": "text", "text": "OCR Page content"}]}]
|
||||||
|
|
||||||
|
with patch.object(app_module, "calculate_ocr_residency", return_value=decision), \
|
||||||
|
patch.object(app_module, "prepare_ocr_messages", return_value=initial_messages), \
|
||||||
|
patch.object(app_module.fitz, "open", return_value=FakeDocument()), \
|
||||||
|
patch.object(app_module, "ollama_client", fake_client):
|
||||||
|
|
||||||
|
response = client.post(
|
||||||
|
"/ocr",
|
||||||
|
json={
|
||||||
|
"pdfPath": str(pdf_path),
|
||||||
|
"engine": "np-dms-ocr",
|
||||||
|
"system_prompt": "Custom system instruction",
|
||||||
|
"dms_tags": {
|
||||||
|
"document_number": "true",
|
||||||
|
"document_date": "true"
|
||||||
|
},
|
||||||
|
"runtime_params": {
|
||||||
|
"temperature": 0.1,
|
||||||
|
"top_p": 0.5,
|
||||||
|
"repeat_penalty": 1.0,
|
||||||
|
"max_tokens": 4096
|
||||||
|
}
|
||||||
|
},
|
||||||
|
headers={"X-API-Key": "test-key"}
|
||||||
|
)
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
|
||||||
|
# Verify the message content in last payload sent to Ollama
|
||||||
|
sent_messages = FakeAsyncClient.last_payload["messages"]
|
||||||
|
|
||||||
|
# We expect system_prompt to be appended to messages[0]["content"]
|
||||||
|
content_list = sent_messages[0]["content"]
|
||||||
|
|
||||||
|
# Verify system prompt exists
|
||||||
|
system_prompt_found = any(c.get("type") == "text" and c.get("text") == "Custom system instruction" for c in content_list)
|
||||||
|
assert system_prompt_found, "System prompt was not injected into message content"
|
||||||
|
|
||||||
|
# Verify DMS tags instruction exists
|
||||||
|
dms_tags_instruction = any(c.get("type") == "text" and "<document_number>" in c.get("text") and "<document_date>" in c.get("text") for c in content_list)
|
||||||
|
assert dms_tags_instruction, "DMS tags instructions were not injected correctly"
|
||||||
@@ -0,0 +1,129 @@
|
|||||||
|
# File: tests/integration/ocr-sidecar/test_async_performance.py
|
||||||
|
# Change Log:
|
||||||
|
# - 2026-06-20: Added ADR-040 US4 async I/O performance tests for process_ocr and lifespan.
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import inspect
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from types import SimpleNamespace
|
||||||
|
from unittest.mock import AsyncMock, MagicMock, patch
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
UNIT_DIR = Path(__file__).resolve().parents[2] / "unit" / "ocr-sidecar"
|
||||||
|
if str(UNIT_DIR) not in sys.path:
|
||||||
|
sys.path.insert(0, str(UNIT_DIR))
|
||||||
|
|
||||||
|
from test_path_traversal import load_app
|
||||||
|
|
||||||
|
|
||||||
|
class FakeAsyncResponse:
|
||||||
|
"""จำลอง httpx.AsyncClient response"""
|
||||||
|
|
||||||
|
def raise_for_status(self) -> None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def json(self) -> dict:
|
||||||
|
return {"choices": [{"message": {"content": '{"natural_text": "ok"}'}}]}
|
||||||
|
|
||||||
|
|
||||||
|
class FakeAsyncClient:
|
||||||
|
"""จำลอง httpx.AsyncClient สำหรับ async process_ocr"""
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs) -> None:
|
||||||
|
self.payload = None
|
||||||
|
FakeAsyncClient.last_payload = None
|
||||||
|
|
||||||
|
async def post(self, url: str, json: dict, headers: dict) -> FakeAsyncResponse:
|
||||||
|
self.payload = json
|
||||||
|
FakeAsyncClient.last_payload = json
|
||||||
|
return FakeAsyncResponse()
|
||||||
|
|
||||||
|
async def aclose(self) -> None:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
FakeAsyncClient.last_payload = None
|
||||||
|
|
||||||
|
|
||||||
|
def test_process_ocr_is_coroutine_function(tmp_path: Path) -> None:
|
||||||
|
"""T042: process_ocr ต้องเป็น async def (coroutine function)"""
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
assert inspect.iscoroutinefunction(app_module.process_ocr), (
|
||||||
|
"process_ocr must be async def per ADR-040 US4"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_process_pdf_doc_is_coroutine_function(tmp_path: Path) -> None:
|
||||||
|
"""T042: _process_pdf_doc ต้องเป็น async def เพราะเรียก process_ocr"""
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
assert inspect.iscoroutinefunction(app_module._process_pdf_doc), (
|
||||||
|
"_process_pdf_doc must be async def per ADR-040 US4"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_app_uses_lifespan_not_startup_event(tmp_path: Path) -> None:
|
||||||
|
"""T045: app ต้องใช้ lifespan context manager ไม่ใช่ @app.on_event('startup')"""
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
app_obj = app_module.app
|
||||||
|
# FastAPI เก็บ lifespan ใน app.router.lifespan_context
|
||||||
|
assert hasattr(app_obj.router, "lifespan_context"), (
|
||||||
|
"App must use lifespan parameter, not @app.on_event('startup')"
|
||||||
|
)
|
||||||
|
# ตรวจสอบว่าไม่มี startup event handlers แบบเดิม
|
||||||
|
startup_handlers = app_obj.router.on_startup
|
||||||
|
assert len(startup_handlers) == 0, (
|
||||||
|
"App must not register @app.on_event('startup') handlers"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_app_has_async_client_global(tmp_path: Path) -> None:
|
||||||
|
"""T043: app module ต้องมี ollama_client global สำหรับ AsyncClient"""
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
assert hasattr(app_module, "ollama_client"), (
|
||||||
|
"app module must have ollama_client global for shared AsyncClient"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_normalize_endpoint_removed(tmp_path: Path) -> None:
|
||||||
|
"""T054: /normalize endpoint ต้องถูกลบออกแล้ว"""
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
routes = [r.path for r in app_module.app.routes]
|
||||||
|
assert "/normalize" not in routes, (
|
||||||
|
"/normalize endpoint must be removed per ADR-040 D2"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_concurrent_ocr_requests_dont_block(tmp_path: Path) -> None:
|
||||||
|
"""T041: concurrent OCR requests ต้องไม่ block กัน (async I/O)"""
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
|
||||||
|
decision = SimpleNamespace(
|
||||||
|
keep_alive_seconds=60,
|
||||||
|
reason="headroom-sufficient",
|
||||||
|
vram_headroom_mb=9000.0,
|
||||||
|
)
|
||||||
|
|
||||||
|
fake_client = FakeAsyncClient()
|
||||||
|
|
||||||
|
async def run_concurrent() -> list[str]:
|
||||||
|
"""รัน process_ocr 3 ครั้งพร้อมกัน วัดว่าไม่ block"""
|
||||||
|
with (
|
||||||
|
patch.object(app_module, "calculate_ocr_residency", return_value=decision),
|
||||||
|
patch.object(app_module, "prepare_ocr_messages", return_value=[{"content": []}]),
|
||||||
|
patch.object(app_module, "ollama_client", fake_client),
|
||||||
|
):
|
||||||
|
tasks = [
|
||||||
|
app_module.process_ocr("/tmp/test.pdf", page_num=i + 1)
|
||||||
|
for i in range(3)
|
||||||
|
]
|
||||||
|
results = await asyncio.gather(*tasks)
|
||||||
|
return results
|
||||||
|
|
||||||
|
results = asyncio.run(run_concurrent())
|
||||||
|
assert len(results) == 3
|
||||||
|
assert all(r == "ok" for r in results)
|
||||||
|
# ทุก request ต้องส่ง payload ได้สำเร็จ
|
||||||
|
assert FakeAsyncClient.last_payload is not None
|
||||||
|
assert FakeAsyncClient.last_payload["keep_alive"] == 60
|
||||||
@@ -0,0 +1,49 @@
|
|||||||
|
# File: tests/integration/ocr-sidecar/test_cpu_fallback.py
|
||||||
|
# Change Log:
|
||||||
|
# - 2026-06-20: Added ADR-040 CPU fallback integration coverage for retrieval endpoints.
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
from unittest.mock import MagicMock, patch
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
import sys
|
||||||
|
|
||||||
|
UNIT_DIR = Path(__file__).resolve().parents[2] / "unit" / "ocr-sidecar"
|
||||||
|
if str(UNIT_DIR) not in sys.path:
|
||||||
|
sys.path.insert(0, str(UNIT_DIR))
|
||||||
|
|
||||||
|
from test_path_traversal import load_app
|
||||||
|
|
||||||
|
|
||||||
|
def test_embed_uses_cpu_when_vram_headroom_is_low(tmp_path: Path) -> None:
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
bge_model = MagicMock()
|
||||||
|
bge_model.encode.return_value = {
|
||||||
|
"dense_vecs": [[0.1, 0.2]],
|
||||||
|
"lexical_weights": [{"101": 0.5}],
|
||||||
|
}
|
||||||
|
headroom = MagicMock(total_mb=16384.0, used_mb=15000.0, available_mb=1000.0, query_success=True)
|
||||||
|
with patch.object(app_module, "bge_model", bge_model), patch.object(app_module, "get_vram_headroom", return_value=headroom):
|
||||||
|
response = client.post("/embed", json={"text": "hello"}, headers={"X-API-Key": "test-key"})
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert response.json()["device"] == "cpu"
|
||||||
|
bge_model.model.to.assert_called_with("cpu")
|
||||||
|
|
||||||
|
|
||||||
|
def test_rerank_uses_cpu_when_vram_headroom_is_low(tmp_path: Path) -> None:
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
reranker = MagicMock()
|
||||||
|
reranker.compute_score.return_value = [0.9]
|
||||||
|
headroom = MagicMock(total_mb=16384.0, used_mb=15000.0, available_mb=1000.0, query_success=True)
|
||||||
|
with patch.object(app_module, "reranker", reranker), patch.object(app_module, "get_vram_headroom", return_value=headroom):
|
||||||
|
response = client.post(
|
||||||
|
"/rerank",
|
||||||
|
json={"query": "q", "chunks": ["chunk"]},
|
||||||
|
headers={"X-API-Key": "test-key"},
|
||||||
|
)
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert response.json()["device"] == "cpu"
|
||||||
|
reranker.model.to.assert_called_with("cpu")
|
||||||
@@ -0,0 +1,81 @@
|
|||||||
|
# File: tests/integration/ocr-sidecar/test_parameter_governance.py
|
||||||
|
# Change Log:
|
||||||
|
# - 2026-06-20: Initial creation for US3 parameter governance integration tests.
|
||||||
|
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
from types import SimpleNamespace
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
UNIT_DIR = Path(__file__).resolve().parents[2] / "unit" / "ocr-sidecar"
|
||||||
|
if str(UNIT_DIR) not in sys.path:
|
||||||
|
sys.path.insert(0, str(UNIT_DIR))
|
||||||
|
|
||||||
|
from test_path_traversal import FakeDocument, load_app
|
||||||
|
|
||||||
|
|
||||||
|
class FakeAsyncResponse:
|
||||||
|
def raise_for_status(self) -> None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def json(self) -> dict:
|
||||||
|
return {"choices": [{"message": {"content": "{\"natural_text\": \"governed result\"}"}}]}
|
||||||
|
|
||||||
|
|
||||||
|
class FakeAsyncClient:
|
||||||
|
last_payload = None
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs) -> None:
|
||||||
|
pass
|
||||||
|
|
||||||
|
async def post(self, url: str, json: dict, headers: dict) -> FakeAsyncResponse:
|
||||||
|
FakeAsyncClient.last_payload = json
|
||||||
|
return FakeAsyncResponse()
|
||||||
|
|
||||||
|
async def aclose(self) -> None:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def test_ocr_uses_governed_runtime_parameters(tmp_path: Path) -> None:
|
||||||
|
upload_base = tmp_path / "uploads"
|
||||||
|
upload_base.mkdir()
|
||||||
|
pdf_path = upload_base / "document.pdf"
|
||||||
|
pdf_path.write_bytes(b"%PDF-1.4\n")
|
||||||
|
|
||||||
|
app_module = load_app(upload_base)
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
|
||||||
|
decision = SimpleNamespace(keep_alive_seconds=120, reason="headroom-sufficient", vram_headroom_mb=9000.0)
|
||||||
|
fake_client = FakeAsyncClient()
|
||||||
|
FakeAsyncClient.last_payload = None
|
||||||
|
|
||||||
|
with patch.object(app_module, "calculate_ocr_residency", return_value=decision), \
|
||||||
|
patch.object(app_module, "prepare_ocr_messages", return_value=[{"content": []}]), \
|
||||||
|
patch.object(app_module.fitz, "open", return_value=FakeDocument()), \
|
||||||
|
patch.object(app_module, "ollama_client", fake_client):
|
||||||
|
|
||||||
|
response = client.post(
|
||||||
|
"/ocr",
|
||||||
|
json={
|
||||||
|
"pdfPath": str(pdf_path),
|
||||||
|
"engine": "np-dms-ocr",
|
||||||
|
"runtime_params": {
|
||||||
|
"temperature": 0.7,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"repeat_penalty": 1.1,
|
||||||
|
"max_tokens": 4096
|
||||||
|
}
|
||||||
|
},
|
||||||
|
headers={"X-API-Key": "test-key"}
|
||||||
|
)
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert response.json()["text"] == "governed result"
|
||||||
|
|
||||||
|
# Check that parameters were passed to Ollama payload
|
||||||
|
assert FakeAsyncClient.last_payload["temperature"] == 0.7
|
||||||
|
assert FakeAsyncClient.last_payload["top_p"] == 0.9
|
||||||
|
assert FakeAsyncClient.last_payload["repetition_penalty"] == 1.1
|
||||||
|
assert FakeAsyncClient.last_payload["max_tokens"] == 4096
|
||||||
@@ -0,0 +1,42 @@
|
|||||||
|
# File: tests/unit/ocr-sidecar/test_api_key_validation.py
|
||||||
|
# Change Log:
|
||||||
|
# - 2026-06-20: Added ADR-040 API key startup and request validation tests.
|
||||||
|
|
||||||
|
import importlib
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
from test_path_traversal import SIDECAR_DIR, install_import_stubs, load_app
|
||||||
|
|
||||||
|
|
||||||
|
def test_sidecar_fails_fast_when_api_key_missing(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
|
||||||
|
install_import_stubs()
|
||||||
|
monkeypatch.delenv("OCR_SIDECAR_API_KEY", raising=False)
|
||||||
|
monkeypatch.setenv("OCR_SIDECAR_UPLOAD_BASE", str(tmp_path))
|
||||||
|
if str(SIDECAR_DIR) not in sys.path:
|
||||||
|
sys.path.insert(0, str(SIDECAR_DIR))
|
||||||
|
sys.modules.pop("app", None)
|
||||||
|
with pytest.raises(RuntimeError, match="OCR_SIDECAR_API_KEY is required"):
|
||||||
|
importlib.import_module("app")
|
||||||
|
|
||||||
|
|
||||||
|
def test_sidecar_rejects_invalid_api_key(tmp_path: Path) -> None:
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
response = client.post(
|
||||||
|
"/embed",
|
||||||
|
json={"text": "hello"},
|
||||||
|
headers={"X-API-Key": "wrong-key"},
|
||||||
|
)
|
||||||
|
assert response.status_code == 401
|
||||||
|
|
||||||
|
|
||||||
|
def test_sidecar_rejects_missing_api_key(tmp_path: Path) -> None:
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
response = client.post("/embed", json={"text": "hello"})
|
||||||
|
assert response.status_code == 401
|
||||||
@@ -0,0 +1,114 @@
|
|||||||
|
# File: tests/unit/ocr-sidecar/test_path_traversal.py
|
||||||
|
# Change Log:
|
||||||
|
# - 2026-06-20: Added ADR-040 path traversal tests for OCR sidecar.
|
||||||
|
|
||||||
|
import importlib
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import types
|
||||||
|
from pathlib import Path
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
SIDECAR_DIR = Path(__file__).resolve().parents[3] / "specs" / "04-Infrastructure-OPS" / "04-00-docker-compose" / "Desk-5439" / "ocr-sidecar"
|
||||||
|
|
||||||
|
|
||||||
|
def install_import_stubs() -> None:
|
||||||
|
"""ติดตั้ง stub สำหรับ dependency หนักเพื่อให้ unit test import app ได้เร็ว"""
|
||||||
|
fitz_module = types.ModuleType("fitz")
|
||||||
|
fitz_module.Document = object
|
||||||
|
fitz_module.open = lambda *args, **kwargs: None
|
||||||
|
sys.modules["fitz"] = fitz_module
|
||||||
|
typhoon_module = types.ModuleType("typhoon_ocr")
|
||||||
|
typhoon_module.prepare_ocr_messages = lambda *args, **kwargs: [{"content": []}]
|
||||||
|
sys.modules["typhoon_ocr"] = typhoon_module
|
||||||
|
flag_module = types.ModuleType("FlagEmbedding")
|
||||||
|
flag_module.BGEM3FlagModel = lambda *args, **kwargs: None
|
||||||
|
flag_module.FlagReranker = lambda *args, **kwargs: None
|
||||||
|
sys.modules["FlagEmbedding"] = flag_module
|
||||||
|
pil_module = types.ModuleType("PIL")
|
||||||
|
pil_image_module = types.ModuleType("PIL.Image")
|
||||||
|
pil_module.Image = pil_image_module
|
||||||
|
sys.modules["PIL"] = pil_module
|
||||||
|
sys.modules["PIL.Image"] = pil_image_module
|
||||||
|
pythainlp_module = types.ModuleType("pythainlp")
|
||||||
|
tokenize_module = types.ModuleType("pythainlp.tokenize")
|
||||||
|
tokenize_module.word_tokenize = lambda text, **kwargs: text.split()
|
||||||
|
util_module = types.ModuleType("pythainlp.util")
|
||||||
|
util_module.normalize = lambda text: text
|
||||||
|
sys.modules["pythainlp"] = pythainlp_module
|
||||||
|
sys.modules["pythainlp.tokenize"] = tokenize_module
|
||||||
|
sys.modules["pythainlp.util"] = util_module
|
||||||
|
|
||||||
|
|
||||||
|
def load_app(upload_base: Path):
|
||||||
|
install_import_stubs()
|
||||||
|
os.environ["OCR_SIDECAR_API_KEY"] = "test-key"
|
||||||
|
os.environ["OCR_SIDECAR_UPLOAD_BASE"] = str(upload_base)
|
||||||
|
if str(SIDECAR_DIR) not in sys.path:
|
||||||
|
sys.path.insert(0, str(SIDECAR_DIR))
|
||||||
|
sys.modules.pop("app", None)
|
||||||
|
return importlib.import_module("app")
|
||||||
|
|
||||||
|
|
||||||
|
class FakePage:
|
||||||
|
def get_text(self) -> str:
|
||||||
|
return "A" * 120
|
||||||
|
|
||||||
|
|
||||||
|
class FakeDocument:
|
||||||
|
name = "fake.pdf"
|
||||||
|
|
||||||
|
def __len__(self) -> int:
|
||||||
|
return 1
|
||||||
|
|
||||||
|
def __getitem__(self, index: int) -> FakePage:
|
||||||
|
return FakePage()
|
||||||
|
|
||||||
|
|
||||||
|
def test_ocr_rejects_parent_traversal_outside_upload_base(tmp_path: Path) -> None:
|
||||||
|
upload_base = tmp_path / "uploads"
|
||||||
|
upload_base.mkdir()
|
||||||
|
app_module = load_app(upload_base)
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
outside_path = upload_base / ".." / "outside.pdf"
|
||||||
|
response = client.post(
|
||||||
|
"/ocr",
|
||||||
|
json={"pdfPath": str(outside_path)},
|
||||||
|
headers={"X-API-Key": "test-key"},
|
||||||
|
)
|
||||||
|
assert response.status_code == 403
|
||||||
|
|
||||||
|
|
||||||
|
def test_ocr_rejects_prefix_sibling_path(tmp_path: Path) -> None:
|
||||||
|
upload_base = tmp_path / "uploads"
|
||||||
|
sibling = tmp_path / "uploads_evil"
|
||||||
|
upload_base.mkdir()
|
||||||
|
sibling.mkdir()
|
||||||
|
app_module = load_app(upload_base)
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
response = client.post(
|
||||||
|
"/ocr",
|
||||||
|
json={"pdfPath": str(sibling / "document.pdf")},
|
||||||
|
headers={"X-API-Key": "test-key"},
|
||||||
|
)
|
||||||
|
assert response.status_code == 403
|
||||||
|
|
||||||
|
|
||||||
|
def test_ocr_accepts_canonical_path_inside_upload_base(tmp_path: Path) -> None:
|
||||||
|
upload_base = tmp_path / "uploads"
|
||||||
|
upload_base.mkdir()
|
||||||
|
pdf_path = upload_base / "document.pdf"
|
||||||
|
pdf_path.write_bytes(b"%PDF-1.4\n")
|
||||||
|
app_module = load_app(upload_base)
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
with patch.object(app_module.fitz, "open", return_value=FakeDocument()):
|
||||||
|
response = client.post(
|
||||||
|
"/ocr",
|
||||||
|
json={"pdfPath": str(pdf_path)},
|
||||||
|
headers={"X-API-Key": "test-key"},
|
||||||
|
)
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert response.json()["engineUsed"] == "fast-path"
|
||||||
|
|
||||||
@@ -0,0 +1,81 @@
|
|||||||
|
# File: tests/unit/ocr-sidecar/test_residency_wiring.py
|
||||||
|
# Change Log:
|
||||||
|
# - 2026-06-20: Added ADR-040 residency wiring tests for process_ocr.
|
||||||
|
# - 2026-06-20: Updated for async process_ocr (Phase 6 — async I/O refactor).
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
from pathlib import Path
|
||||||
|
from types import SimpleNamespace
|
||||||
|
from unittest.mock import MagicMock, patch
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from test_path_traversal import load_app
|
||||||
|
|
||||||
|
|
||||||
|
class FakeAsyncResponse:
|
||||||
|
"""จำลอง httpx.AsyncClient response สำหรับ async process_ocr"""
|
||||||
|
|
||||||
|
def raise_for_status(self) -> None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def json(self) -> dict:
|
||||||
|
return {"choices": [{"message": {"content": "{\"natural_text\": \"ok\"}"}}]}
|
||||||
|
|
||||||
|
|
||||||
|
class FakeAsyncClient:
|
||||||
|
"""จำลอง httpx.AsyncClient สำหรับ async process_ocr"""
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs) -> None:
|
||||||
|
self.payload = None
|
||||||
|
FakeAsyncClient.last_payload = None
|
||||||
|
|
||||||
|
async def post(self, url: str, json: dict, headers: dict) -> FakeAsyncResponse:
|
||||||
|
self.payload = json
|
||||||
|
FakeAsyncClient.last_payload = json
|
||||||
|
return FakeAsyncResponse()
|
||||||
|
|
||||||
|
async def aclose(self) -> None:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
FakeAsyncClient.last_payload = None
|
||||||
|
|
||||||
|
|
||||||
|
def test_process_ocr_uses_calculated_residency_keep_alive(tmp_path: Path) -> None:
|
||||||
|
"""T019: process_ocr ต้องเรียก calculate_ocr_residency และใช้ค่า keep_alive ที่คำนวณได้"""
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
decision = SimpleNamespace(keep_alive_seconds=120, reason="headroom-sufficient", vram_headroom_mb=9000.0)
|
||||||
|
fake_client = FakeAsyncClient()
|
||||||
|
with patch.object(app_module, "calculate_ocr_residency", return_value=decision) as calculate, \
|
||||||
|
patch.object(app_module, "prepare_ocr_messages", return_value=[{"content": []}]), \
|
||||||
|
patch.object(app_module, "ollama_client", fake_client):
|
||||||
|
result = asyncio.run(app_module.process_ocr("/tmp/test.pdf", page_num=1))
|
||||||
|
assert result == "ok"
|
||||||
|
calculate.assert_called_once_with(app_module.OCR_ACTIVE_PROFILE)
|
||||||
|
assert FakeAsyncClient.last_payload["keep_alive"] == 120
|
||||||
|
|
||||||
|
|
||||||
|
def test_process_ocr_rejects_backend_keep_alive_override(tmp_path: Path) -> None:
|
||||||
|
"""T021: process_ocr ต้องปฏิเสธ keep_alive จาก backend"""
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
|
||||||
|
async def run_test():
|
||||||
|
with pytest.raises(ValueError, match="keep_alive must be calculated"):
|
||||||
|
await app_module.process_ocr("/tmp/test.pdf", options_override={"keep_alive": 0})
|
||||||
|
|
||||||
|
asyncio.run(run_test())
|
||||||
|
|
||||||
|
|
||||||
|
def test_ocr_endpoint_rejects_keep_alive_override(tmp_path: Path) -> None:
|
||||||
|
"""T021: /ocr endpoint ต้องปฏิเสธ keep_alive ใน request body"""
|
||||||
|
app_module = load_app(tmp_path)
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
|
||||||
|
client = TestClient(app_module.app)
|
||||||
|
response = client.post(
|
||||||
|
"/ocr",
|
||||||
|
json={"pdfPath": str(tmp_path / "document.pdf"), "keep_alive": 0},
|
||||||
|
headers={"X-API-Key": "test-key"},
|
||||||
|
)
|
||||||
|
assert response.status_code == 400
|
||||||
Reference in New Issue
Block a user