refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
CI / CD Pipeline / build (push) Successful in 7m37s
CI / CD Pipeline / deploy (push) Failing after 20m15s

This commit is contained in:
2026-06-20 16:37:04 +07:00
parent d418d791a4
commit a80ebef285
70 changed files with 5762 additions and 452 deletions
+1
View File
@@ -518,6 +518,7 @@ graph TB
- **np-dms-ai** - Main LLM for classification, tagging, extraction, RAG answers - **np-dms-ai** - Main LLM for classification, tagging, extraction, RAG answers
- **np-dms-ocr** - OCR model through the sidecar, with adaptive residency from ADR-033 - **np-dms-ocr** - OCR model through the sidecar, with adaptive residency from ADR-033
- **BGE-M3 + BGE Reranker** - Retrieval stack served by the OCR sidecar - **BGE-M3 + BGE Reranker** - Retrieval stack served by the OCR sidecar
- **OCR Sidecar Phase 1 hardening** - ADR-040 keeps X-API-Key before ADR-041 cutover, enforces upload-base path canonicalization, and verifies adaptive residency/CPU fallback with `tests/unit/ocr-sidecar/` plus `tests/integration/ocr-sidecar/`.
--- ---
+2
View File
@@ -304,6 +304,8 @@ _Avoid_: Throw exception from tool, Untyped error
- **"Master Data context parity (Gap 5)"** — resolved: Sandbox (`processSandboxExtract`/`processSandboxAiExtract`) ปัจจุบัน skip master data context ถ้า `projectPublicId='default'` → ทำให้ prompt content ต่างจาก production. Sandbox UI ต้องให้ admin ระบุ `projectPublicId` (และ `contractPublicId`) จริง; `aiPromptsService.resolveContext` ต้องถูกเรียกด้วย ID จริงเสมอ (ไม่ใช้ `'default'` เพื่อ skip); `aiPromptsService` จะคืนค่า empty context ถ้า project/contract ไม่มี master data - **"Master Data context parity (Gap 5)"** — resolved: Sandbox (`processSandboxExtract`/`processSandboxAiExtract`) ปัจจุบัน skip master data context ถ้า `projectPublicId='default'` → ทำให้ prompt content ต่างจาก production. Sandbox UI ต้องให้ admin ระบุ `projectPublicId` (และ `contractPublicId`) จริง; `aiPromptsService.resolveContext` ต้องถูกเรียกด้วย ID จริงเสมอ (ไม่ใช้ `'default'` เพื่อ skip); `aiPromptsService` จะคืนค่า empty context ถ้า project/contract ไม่มี master data
- **"Apply Guardrails (Gap 6)"** — resolved: Apply to Production เป็น critical config change → ต้องมี guardrails ตาม AGENTS.md: (1) **Idempotency-Key** header mandatory สำหรับ `POST /api/ai/profiles/:profileName/apply` (Redis dedupe 5 นาที); (2) **CASL Guard** `@UseGuards(CaslGuard)` + permission `system.manage_ai`; (3) **Param Validation** class-validator (`@Min(0) @Max(1)` สำหรับ temperature/topP); (4) **Audit Trail** `ai_audit_logs` บันทึก `action='APPLY_PROFILE'`, user, old→new values; (5) **Range Guard** service layer throw `BusinessException` ถ้า out of range - **"Apply Guardrails (Gap 6)"** — resolved: Apply to Production เป็น critical config change → ต้องมี guardrails ตาม AGENTS.md: (1) **Idempotency-Key** header mandatory สำหรับ `POST /api/ai/profiles/:profileName/apply` (Redis dedupe 5 นาที); (2) **CASL Guard** `@UseGuards(CaslGuard)` + permission `system.manage_ai`; (3) **Param Validation** class-validator (`@Min(0) @Max(1)` สำหรับ temperature/topP); (4) **Audit Trail** `ai_audit_logs` บันทึก `action='APPLY_PROFILE'`, user, old→new values; (5) **Range Guard** service layer throw `BusinessException` ถ้า out of range
- **"Entity/Service canonicalModel mapping (Gap 7)"** — resolved: `AiExecutionProfileEntity` ไม่มี mapping `canonical_model` column; `getProfileParameters` (`:125`) hardcode `canonicalModel: 'np-dms-ai'` → ต้องเพิ่ม `@Column({ name: 'canonical_model' })` ใน Entity; แก้ `getProfileParameters` อ่านจาก column แทน hardcode; สร้าง accessor `getModelDefaults(canonicalModel)` สำหรับ query ตาม canonical_model โดยตรง - **"Entity/Service canonicalModel mapping (Gap 7)"** — resolved: `AiExecutionProfileEntity` ไม่มี mapping `canonical_model` column; `getProfileParameters` (`:125`) hardcode `canonicalModel: 'np-dms-ai'` → ต้องเพิ่ม `@Column({ name: 'canonical_model' })` ใน Entity; แก้ `getProfileParameters` อ่านจาก column แทน hardcode; สร้าง accessor `getModelDefaults(canonicalModel)` สำหรับ query ตาม canonical_model โดยตรง
- **"OCR Sidecar X-API-Key"** — resolved: ใช้ **Network Isolation Only** (ADR-040 D5) — supersede ADR-033 §7; ลบ `X-API-Key` validation จาก sidecar endpoints; ตรวจสอบผ่าน Docker-internal network (post-consolidation) หรือ VLAN/firewall ACL (interim cross-host); sequencing: ลบ `X-API-Key` เฉพาะเมื่อ ADR-041 cutover เสร็จ (single Docker host)
- **"Cross-host trust gap ของ OCR sidecar"** — resolved: ใช้ **Server Consolidation** (ADR-041) — co-locate ทุก services บน single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB); sidecar+backend อยู่บน Docker bridge เดียวกัน → Docker-internal isolation จริง; QNAP ยังคงเป็น NAS (CIFS) สำหรับ file storage
## ADRs ที่เกี่ยวข้องกับ AI Runtime Layer ## ADRs ที่เกี่ยวข้องกับ AI Runtime Layer
+4 -6
View File
@@ -51,7 +51,7 @@ QDRANT_URL=http://localhost:6333
# Ollama (Admin Desktop Desk-5439 — ADR-034 Thai-Optimized Model Stack) # Ollama (Admin Desktop Desk-5439 — ADR-034 Thai-Optimized Model Stack)
OLLAMA_MODEL_MAIN=typhoon2.5-np-dms:latest OLLAMA_MODEL_MAIN=typhoon2.5-np-dms:latest
OLLAMA_MODEL_OCR=typhoon-np-dms-ocr:latest OLLAMA_MODEL_OCR=np-dms-ocr:latest
OLLAMA_MODEL_EMBED=nomic-embed-text OLLAMA_MODEL_EMBED=nomic-embed-text
OLLAMA_EMBED_MODEL=nomic-embed-text OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_RAG_MODEL=typhoon2.5-np-dms:latest OLLAMA_RAG_MODEL=typhoon2.5-np-dms:latest
@@ -67,12 +67,10 @@ AI_REALTIME_CONCURRENCY=2
QDRANT_HOST=http://192.168.10.8:6333 QDRANT_HOST=http://192.168.10.8:6333
QDRANT_COLLECTION=lcbp3_documents QDRANT_COLLECTION=lcbp3_documents
# OCR sidecar (PaddleOCR on Desk-5439) # OCR sidecar (np-dms-ocr on Desk-5439)
OCR_CHAR_THRESHOLD=100 OCR_CHAR_THRESHOLD=100
OCR_API_URL=http://192.168.10.8:8765 OCR_API_URL=http://192.168.10.100:8765
OCR_SIDECAR_API_KEY=change-me-sidecar-api-key
# Thai preprocessing microservice (PyThaiNLP — Admin Desktop)
THAI_PREPROCESS_URL=http://192.168.10.8:8765
# ADR-023 forbids cloud AI fallback for project documents. # ADR-023 forbids cloud AI fallback for project documents.
+2 -2
View File
@@ -134,7 +134,7 @@ export class AiQueueService {
filePublicId?: string; filePublicId?: string;
pdfPath?: string; pdfPath?: string;
engineType?: string; engineType?: string;
typhoonOptions?: { ocrOptions?: {
temperature?: number; temperature?: number;
topP?: number; topP?: number;
repeatPenalty?: number; repeatPenalty?: number;
@@ -154,7 +154,7 @@ export class AiQueueService {
filePublicId: payload.filePublicId, filePublicId: payload.filePublicId,
pdfPath: payload.pdfPath, pdfPath: payload.pdfPath,
engineType: payload.engineType, engineType: payload.engineType,
typhoonOptions: payload.typhoonOptions, ocrOptions: payload.ocrOptions,
contractPublicId: payload.contractPublicId, contractPublicId: payload.contractPublicId,
...payload.extraPayload, ...payload.extraPayload,
}, },
+4 -9
View File
@@ -567,7 +567,7 @@ export class AiController {
}, },
engineType: { engineType: {
type: 'string', type: 'string',
enum: ['auto', 'tesseract', 'np-dms-ocr', 'typhoon-np-dms-ocr'], enum: ['auto', 'np-dms-ocr'],
description: 'OCR engine ที่ต้องการใช้ (default: auto)', description: 'OCR engine ที่ต้องการใช้ (default: auto)',
}, },
temperature: { temperature: {
@@ -607,19 +607,14 @@ export class AiController {
const attachment = await this.fileStorageService.upload(file, user.user_id); const attachment = await this.fileStorageService.upload(file, user.user_id);
const requestPublicId = uuidv7(); const requestPublicId = uuidv7();
// ตรวจสอบและ normalize engineType ให้เป็นค่าที่ valid // ตรวจสอบและ normalize engineType ให้เป็นค่าที่ valid
const validEngineTypes = [ const validEngineTypes = ['auto', 'np-dms-ocr'] as const;
'auto',
'tesseract',
'np-dms-ocr',
'typhoon-np-dms-ocr',
] as const;
const resolvedEngineType: SandboxOcrEngineType = validEngineTypes.includes( const resolvedEngineType: SandboxOcrEngineType = validEngineTypes.includes(
engineType as SandboxOcrEngineType engineType as SandboxOcrEngineType
) )
? (engineType as SandboxOcrEngineType) ? (engineType as SandboxOcrEngineType)
: 'auto'; : 'auto';
// แปลง string จาก multipart form เป็น number (optional override) // แปลง string จาก multipart form เป็น number (optional override)
const typhoonOptions = { const ocrOptions = {
...(temperature !== undefined && { ...(temperature !== undefined && {
temperature: parseFloat(temperature), temperature: parseFloat(temperature),
}), }),
@@ -634,7 +629,7 @@ export class AiController {
idempotencyKey: requestPublicId, idempotencyKey: requestPublicId,
pdfPath: attachment.filePath, pdfPath: attachment.filePath,
engineType: resolvedEngineType, engineType: resolvedEngineType,
...(Object.keys(typhoonOptions).length > 0 && { typhoonOptions }), ...(Object.keys(ocrOptions).length > 0 && { ocrOptions }),
} }
); );
return { requestPublicId, jobId, status: 'queued' }; return { requestPublicId, jobId, status: 'queued' };
+12 -12
View File
@@ -8,7 +8,7 @@
// - 2026-05-22: นำเข้าและลงทะเบียน CleanupTempFilesWorker (T016) เพื่อลบไฟล์แนบชั่วคราวหมดอายุ // - 2026-05-22: นำเข้าและลงทะเบียน CleanupTempFilesWorker (T016) เพื่อลบไฟล์แนบชั่วคราวหมดอายุ
// - 2026-05-23: ลงทะเบียน MigrationProgress + AiMigrationCheckpointService (ADR-023A) // - 2026-05-23: ลงทะเบียน MigrationProgress + AiMigrationCheckpointService (ADR-023A)
// - 2026-05-25: ลงทะเบียน AiAvailableModel สำหรับ AI Model Management (ADR-027). // - 2026-05-25: ลงทะเบียน AiAvailableModel สำหรับ AI Model Management (ADR-027).
// - 2026-05-30: ลงทะเบียน VramMonitorService, OcrCacheService, TyphoonOcrProcessor, TyphoonLlmProcessor (ADR-032). // - 2026-05-30: ลงทะเบียน VramMonitorService, OcrCacheService, NpDmsOcrProcessor, NpDmsAiProcessor (ADR-032).
// - 2026-06-13: ลงทะเบียน AiSandboxProfile สำหรับ ADR-036 sandbox-production parity // - 2026-06-13: ลงทะเบียน AiSandboxProfile สำหรับ ADR-036 sandbox-production parity
// Module สำหรับ AI Gateway — ลงทะเบียน Services และ Controllers (ADR-023) // Module สำหรับ AI Gateway — ลงทะเบียน Services และ Controllers (ADR-023)
@@ -75,13 +75,13 @@ import {
QUEUE_AI_VECTOR_DELETION, QUEUE_AI_VECTOR_DELETION,
} from '../common/constants/queue.constants'; } from '../common/constants/queue.constants';
import { import {
TyphoonOcrProcessor, NpDmsOcrProcessor,
QUEUE_TYPHOON_OCR, QUEUE_NP_DMS_OCR,
} from './processors/typhoon-ocr.processor'; } from './processors/np-dms-ocr-processor';
import { import {
TyphoonLlmProcessor, NpDmsAiProcessor,
QUEUE_TYPHOON_LLM, QUEUE_NP_DMS_AI,
} from './processors/typhoon-llm.processor'; } from './processors/np-dms-ai.processor';
@Module({ @Module({
imports: [ imports: [
@@ -129,7 +129,7 @@ import {
{ name: QUEUE_AI_VECTOR_DELETION }, { name: QUEUE_AI_VECTOR_DELETION },
// Typhoon OCR + LLM queues: concurrency=1 เพื่อป้องกัน VRAM overflow (ADR-032) // Typhoon OCR + LLM queues: concurrency=1 เพื่อป้องกัน VRAM overflow (ADR-032)
{ {
name: QUEUE_TYPHOON_OCR, name: QUEUE_NP_DMS_OCR,
defaultJobOptions: { defaultJobOptions: {
attempts: 2, attempts: 2,
backoff: { type: 'exponential', delay: 5000 }, backoff: { type: 'exponential', delay: 5000 },
@@ -138,7 +138,7 @@ import {
}, },
}, },
{ {
name: QUEUE_TYPHOON_LLM, name: QUEUE_NP_DMS_AI,
defaultJobOptions: { defaultJobOptions: {
attempts: 2, attempts: 2,
backoff: { type: 'exponential', delay: 5000 }, backoff: { type: 'exponential', delay: 5000 },
@@ -198,9 +198,9 @@ import {
AiRagProcessor, AiRagProcessor,
// Phase 5: Vector Deletion async processor (ADR-023 FR-008) // Phase 5: Vector Deletion async processor (ADR-023 FR-008)
AiVectorDeletionProcessor, AiVectorDeletionProcessor,
// ADR-032: Typhoon OCR + LLM sequential processors (concurrency=1) // ADR-032: np-dms-ocr + np-dms-ai sequential processors (concurrency=1)
TyphoonOcrProcessor, NpDmsOcrProcessor,
TyphoonLlmProcessor, NpDmsAiProcessor,
// US4: Execution Profiles Service (T044) // US4: Execution Profiles Service (T044)
AiExecutionProfilesService, AiExecutionProfilesService,
// RbacGuard ต้องการ UserService จาก UserModule // RbacGuard ต้องการ UserService จาก UserModule
+1 -1
View File
@@ -80,7 +80,7 @@ describe('AiService', () => {
const mockOllamaService = { const mockOllamaService = {
getMainModelName: jest.fn().mockReturnValue('typhoon2.5-np-dms:latest'), getMainModelName: jest.fn().mockReturnValue('typhoon2.5-np-dms:latest'),
getOcrModelName: jest.fn().mockReturnValue('typhoon-np-dms-ocr:latest'), getOcrModelName: jest.fn().mockReturnValue('np-dms-ocr:latest'),
checkHealth: jest.fn().mockResolvedValue({ checkHealth: jest.fn().mockResolvedValue({
status: 'HEALTHY', status: 'HEALTHY',
latencyMs: 120, latencyMs: 120,
@@ -41,7 +41,7 @@ export class AiAuditLog extends UuidBaseEntity {
@Column({ name: 'model_name', type: 'varchar', length: 100, nullable: true }) @Column({ name: 'model_name', type: 'varchar', length: 100, nullable: true })
modelName?: string; modelName?: string;
// ประเภท OCR/LLM model ที่ใช้ เช่น tesseract, typhoon-ocr-3b, typhoon2.1-gemma3-4b (ADR-032) // ประเภท OCR/LLM model ที่ใช้ เช่น fast-path, np-dms-ocr, np-dms-ai (ADR-032)
@Index('idx_ai_audit_model_type') @Index('idx_ai_audit_model_type')
@Column({ name: 'model_type', type: 'varchar', length: 50, nullable: true }) @Column({ name: 'model_type', type: 'varchar', length: 50, nullable: true })
modelType?: string; modelType?: string;
@@ -1,12 +1,13 @@
// File: src/modules/ai/entities/ocr-engine-configuration.entity.ts // File: src/modules/ai/entities/ocr-engine-configuration.entity.ts
// Change Log // Change Log
// - 2026-05-30: สร้าง OcrEngineConfiguration class สำหรับเก็บข้อมูลการตั้งค่า OCR Engine (T010, US1) // - 2026-05-30: สร้าง OcrEngineConfiguration class สำหรับเก็บข้อมูลการตั้งค่า OCR Engine (T010, US1)
// - 2026-06-20: เปลี่ยน TESSERACT → FAST_PATH, TYPHOON_OCR → NP_DMS_OCR ตามการทำความสะอาด legacy references
import { ApiProperty } from '@nestjs/swagger'; import { ApiProperty } from '@nestjs/swagger';
export enum OcrEngineType { export enum OcrEngineType {
TESSERACT = 'tesseract', FAST_PATH = 'fast_path',
TYPHOON_OCR = 'typhoon_ocr', NP_DMS_OCR = 'np_dms_ocr',
} }
/** คลาสสำหรับเก็บข้อมูลการตั้งค่า OCR Engine (ไม่ผูกกับตาราง SQL ตาม data-model.md) */ /** คลาสสำหรับเก็บข้อมูลการตั้งค่า OCR Engine (ไม่ผูกกับตาราง SQL ตาม data-model.md) */
@@ -738,7 +738,7 @@ describe('AiBatchProcessor', () => {
expect(ocrService.detectAndExtract).toHaveBeenCalledWith({ expect(ocrService.detectAndExtract).toHaveBeenCalledWith({
pdfPath: '/files/test.pdf', pdfPath: '/files/test.pdf',
activeProfile: 'quality', activeProfile: 'quality',
typhoonOptions: { ocrOptions: {
temperature: 0.15, temperature: 0.15,
topP: 0.65, topP: 0.65,
repeatPenalty: 1.15, repeatPenalty: 1.15,
@@ -34,7 +34,7 @@ import { OcrService } from '../services/ocr.service';
import { import {
SandboxOcrEngineService, SandboxOcrEngineService,
SandboxOcrEngineType, SandboxOcrEngineType,
OcrTyphoonOptions, OcrNpDmsOptions,
} from '../services/sandbox-ocr-engine.service'; } from '../services/sandbox-ocr-engine.service';
import { import {
OllamaService, OllamaService,
@@ -562,7 +562,7 @@ export class AiBatchProcessor extends WorkerHost {
}) })
); );
try { try {
let ocrParams: OcrTyphoonOptions | undefined = undefined; let ocrParams: OcrNpDmsOptions | undefined = undefined;
if (engineType === 'np-dms-ocr') { if (engineType === 'np-dms-ocr') {
try { try {
const ocrDraft = const ocrDraft =
@@ -705,7 +705,7 @@ export class AiBatchProcessor extends WorkerHost {
const { idempotencyKey, payload } = data; const { idempotencyKey, payload } = data;
const pdfPath = payload.pdfPath as string; const pdfPath = payload.pdfPath as string;
const engineType = (payload.engineType as SandboxOcrEngineType) || 'auto'; const engineType = (payload.engineType as SandboxOcrEngineType) || 'auto';
const typhoonOptions = payload.typhoonOptions as const ocrOptions = payload.ocrOptions as
| { temperature?: number; topP?: number; repeatPenalty?: number } | { temperature?: number; topP?: number; repeatPenalty?: number }
| undefined; | undefined;
@@ -722,7 +722,7 @@ export class AiBatchProcessor extends WorkerHost {
}) })
); );
let ocrParams = typhoonOptions; let ocrParams = ocrOptions;
if (!ocrParams && engineType === 'np-dms-ocr') { if (!ocrParams && engineType === 'np-dms-ocr') {
try { try {
const ocrDraft = const ocrDraft =
@@ -1078,7 +1078,7 @@ export class AiBatchProcessor extends WorkerHost {
ocrResult = await this.ocrService.detectAndExtract({ ocrResult = await this.ocrService.detectAndExtract({
pdfPath: attachment.filePath, pdfPath: attachment.filePath,
activeProfile: job.data.effectiveProfile, activeProfile: job.data.effectiveProfile,
typhoonOptions: job.data.ocrSnapshotParams, ocrOptions: job.data.ocrSnapshotParams,
}); });
} catch (err: unknown) { } catch (err: unknown) {
const errMsg = err instanceof Error ? err.message : String(err); const errMsg = err instanceof Error ? err.message : String(err);
@@ -1,8 +1,9 @@
// File: src/modules/ai/processors/typhoon-llm.processor.ts // File: backend/src/modules/ai/processors/np-dms-ai.processor.ts
// Change Log // Change Log
// - 2026-05-30: Initial processor สำหรับ Typhoon LLM sequential jobs (T009d, ADR-032) // - 2026-05-30: Initial processor สำหรับ np-dms-ai sequential jobs (T009d, ADR-032)
// รันด้วย concurrency=1 เพื่อป้องกัน VRAM overflow บน RTX 2060 Super (8GB) // รันด้วย concurrency=1 เพื่อป้องกัน VRAM overflow บน RTX 2060 Super (8GB)
// ใช้ keep_alive=0 ผ่าน Ollama API เพื่อ unload model หลังประมวลผล // ใช้ keep_alive=0 ผ่าน Ollama API เพื่อ unload model หลังประมวลผล
// - 2026-06-20: เปลี่ยนชื่อจาก typhoon-llm.processor.ts เป็น np-dms-ai.processor.ts
import { Processor, WorkerHost } from '@nestjs/bullmq'; import { Processor, WorkerHost } from '@nestjs/bullmq';
import { Logger } from '@nestjs/common'; import { Logger } from '@nestjs/common';
@@ -16,14 +17,14 @@ import axios from 'axios';
import { AiAuditLog, AiAuditStatus } from '../entities/ai-audit-log.entity'; import { AiAuditLog, AiAuditStatus } from '../entities/ai-audit-log.entity';
import { VramMonitorService } from '../services/vram-monitor.service'; import { VramMonitorService } from '../services/vram-monitor.service';
/** ชื่อ queue สำหรับ Typhoon LLM jobs */ /** ชื่อ queue สำหรับ np-dms-ai LLM jobs */
export const QUEUE_TYPHOON_LLM = 'typhoon-llm'; export const QUEUE_NP_DMS_AI = 'np-dms-ai';
/** รูปแบบข้อมูล job ใน Typhoon LLM queue */ /** รูปแบบข้อมูล job ใน np-dms-ai LLM queue */
export interface TyphoonLlmJobData { export interface NpDmsAiJobData {
/** prompt ที่จะส่งให้ Typhoon LLM */ /** prompt ที่จะส่งให้ np-dms-ai LLM */
prompt: string; prompt: string;
/** ชื่อ model เช่น scb10x/typhoon2.1-gemma3-4b */ /** ชื่อ model เช่น typhoon2.5-np-dms:latest */
model?: string; model?: string;
/** idempotencyKey สำหรับ Redis result key */ /** idempotencyKey สำหรับ Redis result key */
idempotencyKey: string; idempotencyKey: string;
@@ -39,19 +40,19 @@ interface OllamaGenerateResponse {
done: boolean; done: boolean;
} }
// VRAM ที่ Typhoon 2.1 Gemma3 4B ต้องการ (MB) — ตาม ADR-032 // VRAM ที่ np-dms-ai ต้องการ (MB) — ตาม ADR-032
const TYPHOON_LLM_REQUIRED_VRAM_MB = 4500; const NP_DMS_AI_REQUIRED_VRAM_MB = 4500;
// Timeout 120 วินาทีสำหรับ LLM generation // Timeout 120 วินาทีสำหรับ LLM generation
const TYPHOON_LLM_TIMEOUT_MS = 120000; const NP_DMS_AI_TIMEOUT_MS = 120000;
/** /**
* Processor Typhoon LLM jobs sequential (concurrency=1) * Processor np-dms-ai LLM jobs sequential (concurrency=1)
* VRAM overflow LLM RTX 2060 Super * VRAM overflow LLM RTX 2060 Super
* ADR-032: lockDuration=180000ms 120s timeout + buffer * ADR-032: lockDuration=180000ms 120s timeout + buffer
*/ */
@Processor(QUEUE_TYPHOON_LLM, { concurrency: 1, lockDuration: 180000 }) @Processor(QUEUE_NP_DMS_AI, { concurrency: 1, lockDuration: 180000 })
export class TyphoonLlmProcessor extends WorkerHost { export class NpDmsAiProcessor extends WorkerHost {
private readonly logger = new Logger(TyphoonLlmProcessor.name); private readonly logger = new Logger(NpDmsAiProcessor.name);
private readonly ollamaUrl: string; private readonly ollamaUrl: string;
private readonly defaultModel: string; private readonly defaultModel: string;
@@ -68,25 +69,25 @@ export class TyphoonLlmProcessor extends WorkerHost {
this.configService.get<string>('AI_HOST_URL', 'http://localhost:11434') this.configService.get<string>('AI_HOST_URL', 'http://localhost:11434')
); );
this.defaultModel = this.configService.get<string>( this.defaultModel = this.configService.get<string>(
'OLLAMA_MODEL_TYPHOON', 'OLLAMA_MODEL_MAIN',
'scb10x/typhoon2.1-gemma3-4b' 'typhoon2.5-np-dms:latest'
); );
} }
/** ประมวลผล Typhoon LLM job ทีละงาน */ /** ประมวลผล np-dms-ai LLM job ทีละงาน */
async process(job: Job<TyphoonLlmJobData>): Promise<void> { async process(job: Job<NpDmsAiJobData>): Promise<void> {
const { prompt, model, idempotencyKey, documentPublicId } = job.data; const { prompt, model, idempotencyKey, documentPublicId } = job.data;
const startTime = Date.now(); const startTime = Date.now();
const targetModel = model ?? this.defaultModel; const targetModel = model ?? this.defaultModel;
this.logger.log( this.logger.log(
`Typhoon LLM job started — idempotencyKey=${idempotencyKey}, model=${targetModel}` `np-dms-ai LLM job started — idempotencyKey=${idempotencyKey}, model=${targetModel}`
); );
// ตรวจสอบ VRAM ก่อนโหลด model // ตรวจสอบ VRAM ก่อนโหลด model
const hasCapacity = await this.vramMonitorService.hasVramCapacity( const hasCapacity = await this.vramMonitorService.hasVramCapacity(
TYPHOON_LLM_REQUIRED_VRAM_MB NP_DMS_AI_REQUIRED_VRAM_MB
); );
if (!hasCapacity) { if (!hasCapacity) {
const errMsg = `VRAM ไม่เพียงพอสำหรับ ${targetModel} (ต้องการ ${TYPHOON_LLM_REQUIRED_VRAM_MB}MB) — retry ภายหลัง`; const errMsg = `VRAM ไม่เพียงพอสำหรับ ${targetModel} (ต้องการ ${NP_DMS_AI_REQUIRED_VRAM_MB}MB) — retry ภายหลัง`;
this.logger.warn(errMsg); this.logger.warn(errMsg);
await this.saveResult(idempotencyKey, { await this.saveResult(idempotencyKey, {
status: 'failed', status: 'failed',
@@ -117,7 +118,7 @@ export class TyphoonLlmProcessor extends WorkerHost {
}, },
keep_alive: 0, keep_alive: 0,
}, },
{ timeout: TYPHOON_LLM_TIMEOUT_MS } { timeout: NP_DMS_AI_TIMEOUT_MS }
); );
const processingTimeMs = Date.now() - startTime; const processingTimeMs = Date.now() - startTime;
const generatedText = response.data.response ?? ''; const generatedText = response.data.response ?? '';
@@ -136,11 +137,11 @@ export class TyphoonLlmProcessor extends WorkerHost {
processingTimeMs, processingTimeMs,
}); });
this.logger.log( this.logger.log(
`Typhoon LLM completed — ${generatedText.length} chars, ${processingTimeMs}ms` `np-dms-ai LLM completed — ${generatedText.length} chars, ${processingTimeMs}ms`
); );
} catch (err: unknown) { } catch (err: unknown) {
const errMsg = err instanceof Error ? err.message : String(err); const errMsg = err instanceof Error ? err.message : String(err);
this.logger.error(`Typhoon LLM job failed: ${errMsg}`); this.logger.error(`np-dms-ai LLM job failed: ${errMsg}`);
await this.saveResult(idempotencyKey, { await this.saveResult(idempotencyKey, {
status: 'failed', status: 'failed',
errorMessage: errMsg, errorMessage: errMsg,
@@ -169,7 +170,7 @@ export class TyphoonLlmProcessor extends WorkerHost {
} }
): Promise<void> { ): Promise<void> {
await this.redis.setex( await this.redis.setex(
`ai:typhoon:llm:${idempotencyKey}`, `ai:np-dms-ai:llm:${idempotencyKey}`,
3600, 3600,
JSON.stringify({ JSON.stringify({
idempotencyKey, idempotencyKey,
@@ -179,7 +180,7 @@ export class TyphoonLlmProcessor extends WorkerHost {
); );
} }
/** บันทึก audit log สำหรับ Typhoon LLM interaction */ /** บันทึก audit log สำหรับ np-dms-ai LLM interaction */
private async writeAuditLog(params: { private async writeAuditLog(params: {
documentPublicId?: string; documentPublicId?: string;
model: string; model: string;
@@ -189,7 +190,7 @@ export class TyphoonLlmProcessor extends WorkerHost {
}): Promise<void> { }): Promise<void> {
const log = this.auditLogRepo.create({ const log = this.auditLogRepo.create({
documentPublicId: params.documentPublicId, documentPublicId: params.documentPublicId,
aiModel: 'typhoon-llm', aiModel: 'np-dms-ai',
modelName: params.model, modelName: params.model,
modelType: 'llm', modelType: 'llm',
status: params.status, status: params.status,
@@ -1,8 +1,9 @@
// File: src/modules/ai/processors/typhoon-ocr.processor.ts // File: src/modules/ai/processors/np-dms-ocr-processor.ts
// Change Log // Change Log
// - 2026-05-30: Initial processor สำหรับ Typhoon OCR sequential jobs (T009c, ADR-032) // - 2026-05-30: Initial processor สำหรับ Typhoon OCR sequential jobs (T009c, ADR-032)
// รันด้วย concurrency=1 เพื่อป้องกัน VRAM overflow บน RTX 2060 Super (8GB) // รันด้วย concurrency=1 เพื่อป้องกัน VRAM overflow บน RTX 2060 Super (8GB)
// ใช้ keep_alive=0 ผ่าน sidecar Ollama API เพื่อ unload model หลังประมวลผล // ใช้ keep_alive=0 ผ่าน sidecar Ollama API เพื่อ unload model หลังประมวลผล
// - 2026-06-20: เปลี่ยนชื่อไฟล์จาก typhoon-ocr.processor.ts → np-dms-ocr-processor.ts
import { Processor, WorkerHost } from '@nestjs/bullmq'; import { Processor, WorkerHost } from '@nestjs/bullmq';
import { Logger } from '@nestjs/common'; import { Logger } from '@nestjs/common';
@@ -17,24 +18,24 @@ import { VramMonitorService } from '../services/vram-monitor.service';
import { import {
SandboxOcrEngineService, SandboxOcrEngineService,
SandboxOcrEngineType, SandboxOcrEngineType,
OcrTyphoonOptions, OcrNpDmsOptions,
} from '../services/sandbox-ocr-engine.service'; } from '../services/sandbox-ocr-engine.service';
/** ชื่อ queue สำหรับ Typhoon OCR jobs */ /** ชื่อ queue สำหรับ np-dms-ocr jobs */
export const QUEUE_TYPHOON_OCR = 'typhoon-ocr'; export const QUEUE_NP_DMS_OCR = 'np-dms-ocr';
/** รูปแบบข้อมูล job ใน Typhoon OCR queue */ /** รูปแบบข้อมูล job ใน np-dms-ocr queue */
export interface TyphoonOcrJobData { export interface NpDmsOcrJobData {
/** public path ของไฟล์ PDF ที่ต้องการ OCR */ /** public path ของไฟล์ PDF ที่ต้องการ OCR */
pdfPath: string; pdfPath: string;
/** engineType: 'typhoon-np-dms-ocr' สำหรับ queue นี้ */ /** engineType: 'np-dms-ocr' สำหรับ queue นี้ */
engineType: SandboxOcrEngineType; engineType: SandboxOcrEngineType;
/** idempotencyKey สำหรับ Redis result key */ /** idempotencyKey สำหรับ Redis result key */
idempotencyKey: string; idempotencyKey: string;
/** documentPublicId สำหรับ audit log (optional) */ /** documentPublicId สำหรับ audit log (optional) */
documentPublicId?: string; documentPublicId?: string;
/** Typhoon OCR options จาก sandbox UI เพื่อ override Modelfile defaults (optional) */ /** np-dms-ocr options จาก sandbox UI เพื่อ override Modelfile defaults (optional) */
typhoonOptions?: OcrTyphoonOptions; ocrOptions?: OcrNpDmsOptions;
} }
// VRAM ที่ Typhoon OCR-3B ต้องการ (MB) — ตาม ADR-032 // VRAM ที่ Typhoon OCR-3B ต้องการ (MB) — ตาม ADR-032
@@ -45,9 +46,9 @@ const TYPHOON_OCR_REQUIRED_VRAM_MB = 4000;
* VRAM overflow OCR RTX 2060 Super * VRAM overflow OCR RTX 2060 Super
* ADR-032: lockDuration=180000ms 120s timeout + buffer * ADR-032: lockDuration=180000ms 120s timeout + buffer
*/ */
@Processor(QUEUE_TYPHOON_OCR, { concurrency: 1, lockDuration: 180000 }) @Processor(QUEUE_NP_DMS_OCR, { concurrency: 1, lockDuration: 180000 })
export class TyphoonOcrProcessor extends WorkerHost { export class NpDmsOcrProcessor extends WorkerHost {
private readonly logger = new Logger(TyphoonOcrProcessor.name); private readonly logger = new Logger(NpDmsOcrProcessor.name);
constructor( constructor(
@InjectRedis() private readonly redis: Redis, @InjectRedis() private readonly redis: Redis,
@@ -61,13 +62,13 @@ export class TyphoonOcrProcessor extends WorkerHost {
} }
/** ประมวลผล Typhoon OCR job ทีละงาน */ /** ประมวลผล Typhoon OCR job ทีละงาน */
async process(job: Job<TyphoonOcrJobData>): Promise<void> { async process(job: Job<NpDmsOcrJobData>): Promise<void> {
const { const {
pdfPath, pdfPath,
engineType, engineType,
idempotencyKey, idempotencyKey,
documentPublicId, documentPublicId,
typhoonOptions, ocrOptions,
} = job.data; } = job.data;
const startTime = Date.now(); const startTime = Date.now();
this.logger.log( this.logger.log(
@@ -116,7 +117,7 @@ export class TyphoonOcrProcessor extends WorkerHost {
const result = await this.sandboxOcrEngineService.detectAndExtract( const result = await this.sandboxOcrEngineService.detectAndExtract(
pdfPath, pdfPath,
engineType, engineType,
typhoonOptions ocrOptions
); );
const processingTimeMs = Date.now() - startTime; const processingTimeMs = Date.now() - startTime;
// บันทึกผลลัพธ์ใน Redis cache (24h TTL) // บันทึกผลลัพธ์ใน Redis cache (24h TTL)
@@ -171,7 +172,7 @@ export class TyphoonOcrProcessor extends WorkerHost {
} }
): Promise<void> { ): Promise<void> {
await this.redis.setex( await this.redis.setex(
`ai:typhoon:ocr:${idempotencyKey}`, `ai:np-dms-ocr:${idempotencyKey}`,
3600, 3600,
JSON.stringify({ JSON.stringify({
idempotencyKey, idempotencyKey,
@@ -193,8 +194,8 @@ export class TyphoonOcrProcessor extends WorkerHost {
}): Promise<void> { }): Promise<void> {
const log = this.auditLogRepo.create({ const log = this.auditLogRepo.create({
documentPublicId: params.documentPublicId, documentPublicId: params.documentPublicId,
aiModel: 'typhoon-ocr', aiModel: 'np-dms-ocr',
modelName: 'typhoon-np-dms-ocr:latest', modelName: 'np-dms-ocr:latest',
modelType: params.engineType, modelType: params.engineType,
status: params.status, status: params.status,
processingTimeMs: params.processingTimeMs, processingTimeMs: params.processingTimeMs,
@@ -97,7 +97,7 @@ export class AiPolicyService {
*/ */
getCanonicalModelName(modelName: string): 'np-dms-ai' | 'np-dms-ocr' { getCanonicalModelName(modelName: string): 'np-dms-ai' | 'np-dms-ocr' {
const name = modelName.toLowerCase(); const name = modelName.toLowerCase();
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) { if (name.includes('ocr')) {
return 'np-dms-ocr'; return 'np-dms-ocr';
} }
return 'np-dms-ai'; return 'np-dms-ai';
+110 -67
View File
@@ -4,13 +4,13 @@
// - 2026-05-25: แก้ไข AggregateError (empty message) จาก axios โดย wrap เป็น Error พร้อม context ที่ชัดเจน. // - 2026-05-25: แก้ไข AggregateError (empty message) จาก axios โดย wrap เป็น Error พร้อม context ที่ชัดเจน.
// - 2026-05-25: เพิ่ม path remapping (OCR_UPLOAD_BASE_PATH) เพื่อแปลง local upload path เป็น path ที่ sidecar เห็นผ่าน CIFS. // - 2026-05-25: เพิ่ม path remapping (OCR_UPLOAD_BASE_PATH) เพื่อแปลง local upload path เป็น path ที่ sidecar เห็นผ่าน CIFS.
// - 2026-05-29: เพิ่ม checkHealth() เพื่อตรวจสอบสุขภาพของ OCR sidecar สำหรับ getSystemHealth() (ADR-027) // - 2026-05-29: เพิ่ม checkHealth() เพื่อตรวจสอบสุขภาพของ OCR sidecar สำหรับ getSystemHealth() (ADR-027)
// - 2026-05-30: เปลี่ยนจาก PaddleOCR เป็น Tesseract OCR เพื่อความเข้ากันได้กับ CPU เก่า // - 2026-05-30: เปลี่ยนจาก PaddleOCR เป็น fast-path (PyMuPDF text layer) เพื่อความเข้ากันได้กับ CPU เก่า
// - 2026-05-30: เพิ่ม VRAM insufficiency guard สำหรับ Typhoon OCR engine (T016a, ADR-032) // - 2026-05-30: เพิ่ม VRAM insufficiency guard สำหรับ Typhoon OCR engine (T016a, ADR-032)
// - 2026-05-30: ปรับปรุงสำหรับ Dynamic OCR Engine selection, Caching, และ Graceful Fallback (T013, T014, T016, T022, T023, US1) // - 2026-05-30: ปรับปรุงสำหรับ Dynamic OCR Engine selection, Caching, และ Graceful Fallback (T013, T014, T016, T022, T023, US1)
// - 2026-06-01: ปรับปรุง remapPath ให้รองรับ Windows absolute และ relative path ได้แม่นยำ 100% // - 2026-06-01: ปรับปรุง remapPath ให้รองรับ Windows absolute และ relative path ได้แม่นยำ 100%
// - 2026-06-01: เปลี่ยน processWithTesseract/processWithTyphoon ให้ส่ง file content ผ่าน multipart ไปยัง /ocr-upload แทนการส่ง path // - 2026-06-01: เปลี่ยน processWithFastPath/processWithNpDmsOcr ให้ส่ง file content ผ่าน multipart ไปยัง /ocr-upload แทนการส่ง path
// - 2026-06-02: ส่งค่า X-API-Key ใน request headers ไปยัง ocr-sidecar เพื่อความมั่นคงปลอดภัยสูงสุด (ADR-033, Suggestion 2) // - 2026-06-02: ส่งค่า X-API-Key ใน request headers ไปยัง ocr-sidecar เพื่อความมั่นคงปลอดภัยสูงสุด (ADR-033, Suggestion 2)
// - 2026-06-04: ADR-034 — เปลี่ยน TYPHOON_ENGINE.engineName เป็น typhoon-np-dms-ocr:latest ตรงกับชื่อโมเดลใน Ollama // - 2026-06-04: ADR-034 — เปลี่ยน TYPHOON_ENGINE.engineName เป็น np-dms-ocr:latest ตรงกับชื่อโมเดลใน Ollama
// - 2026-06-11: US2 - คำนวณ OCR residency keep_alive แบบ dynamic ตาม VRAM headroom และ active profile // - 2026-06-11: US2 - คำนวณ OCR residency keep_alive แบบ dynamic ตาม VRAM headroom และ active profile
// - 2026-06-13: US5 - เพิ่มการส่ง temperature, topP และ repeatPenalty ไปยัง OCR sidecar ผ่าน multipart form (T070) // - 2026-06-13: US5 - เพิ่มการส่ง temperature, topP และ repeatPenalty ไปยัง OCR sidecar ผ่าน multipart form (T070)
@@ -28,6 +28,9 @@ import {
} from '../entities/ocr-engine-configuration.entity'; } from '../entities/ocr-engine-configuration.entity';
import { OcrEngineResponseDto } from '../dto/ocr-engine-response.dto'; import { OcrEngineResponseDto } from '../dto/ocr-engine-response.dto';
import { SystemSetting } from '../entities/system-setting.entity'; import { SystemSetting } from '../entities/system-setting.entity';
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
import { AiPromptsService } from '../prompts/ai-prompts.service';
import { BusinessException } from '../../../common/exceptions';
import { AiAuditLog, AiAuditStatus } from '../entities/ai-audit-log.entity'; import { AiAuditLog, AiAuditStatus } from '../entities/ai-audit-log.entity';
import { OcrCacheService } from './ocr-cache.service'; import { OcrCacheService } from './ocr-cache.service';
import { VramMonitorService } from './vram-monitor.service'; import { VramMonitorService } from './vram-monitor.service';
@@ -41,7 +44,7 @@ export interface OcrDetectionInput {
pdfPath?: string; pdfPath?: string;
documentPublicId?: string; // เพิ่มเพื่อการทำ audit logs documentPublicId?: string; // เพิ่มเพื่อการทำ audit logs
activeProfile?: ExecutionProfile; activeProfile?: ExecutionProfile;
typhoonOptions?: { ocrOptions?: {
temperature?: number; temperature?: number;
topP?: number; topP?: number;
repeatPenalty?: number; repeatPenalty?: number;
@@ -68,16 +71,16 @@ const OCR_ACTIVE_ENGINE_KEY = 'OCR_ACTIVE_ENGINE';
const OCR_ACTIVE_ENGINE_CACHE_KEY = 'system_settings:OCR_ACTIVE_ENGINE'; const OCR_ACTIVE_ENGINE_CACHE_KEY = 'system_settings:OCR_ACTIVE_ENGINE';
const OCR_ACTIVE_ENGINE_TTL_SECONDS = 30; const OCR_ACTIVE_ENGINE_TTL_SECONDS = 30;
const TESSERACT_ENGINE_ID = '019505a1-7c3e-7000-8000-abc123def001'; const FAST_PATH_ENGINE_ID = '019505a1-7c3e-7000-8000-abc123def001';
const TYPHOON_ENGINE_ID = '019505a1-7c3e-7000-8000-abc123def002'; const OCR_ENGINE_ID = '019505a1-7c3e-7000-8000-abc123def002';
// VRAM ที่ Typhoon OCR-3B ต้องการ (MB) // VRAM ที่ np-dms-ocr ต้องการ (MB)
const TYPHOON_OCR_REQUIRED_VRAM_MB = 4000; const OCR_REQUIRED_VRAM_MB = 4000;
const TESSERACT_ENGINE: OcrEngineConfiguration = { const FAST_PATH_ENGINE: OcrEngineConfiguration = {
engineId: TESSERACT_ENGINE_ID, engineId: FAST_PATH_ENGINE_ID,
engineName: 'Tesseract OCR', engineName: 'Fast Path (PyMuPDF)',
engineType: OcrEngineType.TESSERACT, engineType: OcrEngineType.FAST_PATH,
isActive: true, isActive: true,
vramRequirementMB: 0, vramRequirementMB: 0,
processingTimeLimitSeconds: 30, processingTimeLimitSeconds: 30,
@@ -87,25 +90,25 @@ const TESSERACT_ENGINE: OcrEngineConfiguration = {
updatedAt: new Date('2026-05-30T00:00:00Z'), updatedAt: new Date('2026-05-30T00:00:00Z'),
}; };
const TYPHOON_ENGINE: OcrEngineConfiguration = { const OCR_ENGINE: OcrEngineConfiguration = {
engineId: TYPHOON_ENGINE_ID, engineId: OCR_ENGINE_ID,
engineName: 'typhoon-np-dms-ocr:latest', engineName: 'np-dms-ocr:latest',
engineType: OcrEngineType.TYPHOON_OCR, engineType: OcrEngineType.NP_DMS_OCR,
isActive: true, isActive: true,
vramRequirementMB: TYPHOON_OCR_REQUIRED_VRAM_MB, vramRequirementMB: OCR_REQUIRED_VRAM_MB,
processingTimeLimitSeconds: 60, processingTimeLimitSeconds: 60,
concurrentLimit: 1, concurrentLimit: 1,
fallbackEngineId: TESSERACT_ENGINE_ID, fallbackEngineId: FAST_PATH_ENGINE_ID,
createdAt: new Date('2026-05-30T00:00:00Z'), createdAt: new Date('2026-05-30T00:00:00Z'),
updatedAt: new Date('2026-05-30T00:00:00Z'), updatedAt: new Date('2026-05-30T00:00:00Z'),
}; };
const ENGINES_MAP = new Map<string, OcrEngineConfiguration>([ const ENGINES_MAP = new Map<string, OcrEngineConfiguration>([
[TESSERACT_ENGINE_ID, TESSERACT_ENGINE], [FAST_PATH_ENGINE_ID, FAST_PATH_ENGINE],
[TYPHOON_ENGINE_ID, TYPHOON_ENGINE], [OCR_ENGINE_ID, OCR_ENGINE],
]); ]);
/** บริการเลือก fast path หรือ OCR sidecar (Tesseract/Typhoon) พร้อมความสามารถในสลับ Engine และ Caching */ /** บริการเลือก fast path หรือ OCR sidecar (np-dms-ocr) พร้อมความสามารถในสลับ Engine และ Caching */
@Injectable() @Injectable()
export class OcrService { export class OcrService {
private readonly logger = new Logger(OcrService.name); private readonly logger = new Logger(OcrService.name);
@@ -121,6 +124,9 @@ export class OcrService {
private readonly settingRepo: Repository<SystemSetting>, private readonly settingRepo: Repository<SystemSetting>,
@InjectRepository(AiAuditLog) @InjectRepository(AiAuditLog)
private readonly auditLogRepo: Repository<AiAuditLog>, private readonly auditLogRepo: Repository<AiAuditLog>,
@InjectRepository(AiExecutionProfile)
private readonly profileRepo: Repository<AiExecutionProfile>,
private readonly aiPromptsService: AiPromptsService,
private readonly ocrCacheService: OcrCacheService, private readonly ocrCacheService: OcrCacheService,
private readonly vramMonitorService: VramMonitorService, private readonly vramMonitorService: VramMonitorService,
private readonly aiPolicyService: AiPolicyService, private readonly aiPolicyService: AiPolicyService,
@@ -131,10 +137,15 @@ export class OcrService {
'OCR_API_URL', 'OCR_API_URL',
'http://localhost:8765' 'http://localhost:8765'
); );
this.ocrSidecarApiKey = this.configService.get<string>( const ocrSidecarApiKey = this.configService.get<string>(
'OCR_SIDECAR_API_KEY', 'OCR_SIDECAR_API_KEY'
'lcbp3-dms-ocr-sidecar-secure-token-2026'
); );
if (!ocrSidecarApiKey) {
throw new Error(
'OCR_SIDECAR_API_KEY is required — กรุณาตั้งค่า environment variable'
);
}
this.ocrSidecarApiKey = ocrSidecarApiKey;
this.vramHeadroomThresholdMb = this.configService.get<number>( this.vramHeadroomThresholdMb = this.configService.get<number>(
'VRAM_HEADROOM_THRESHOLD_MB', 'VRAM_HEADROOM_THRESHOLD_MB',
this.configService.get<number>('AI_VRAM_HEADROOM_THRESHOLD_MB', 3000) this.configService.get<number>('AI_VRAM_HEADROOM_THRESHOLD_MB', 3000)
@@ -272,7 +283,7 @@ export class OcrService {
where: { settingKey: OCR_ACTIVE_ENGINE_KEY }, where: { settingKey: OCR_ACTIVE_ENGINE_KEY },
}); });
const activeEngine = setting?.settingValue ?? TESSERACT_ENGINE_ID; const activeEngine = setting?.settingValue ?? FAST_PATH_ENGINE_ID;
await this.redis.set( await this.redis.set(
OCR_ACTIVE_ENGINE_CACHE_KEY, OCR_ACTIVE_ENGINE_CACHE_KEY,
activeEngine, activeEngine,
@@ -284,7 +295,7 @@ export class OcrService {
this.logger.error( this.logger.error(
`Failed to get active OCR engine: ${error instanceof Error ? error.message : String(error)}` `Failed to get active OCR engine: ${error instanceof Error ? error.message : String(error)}`
); );
return TESSERACT_ENGINE_ID; return FAST_PATH_ENGINE_ID;
} }
} }
@@ -330,20 +341,20 @@ export class OcrService {
const activeEngineId = await this.getActiveEngineId(); const activeEngineId = await this.getActiveEngineId();
if (activeEngineId === TYPHOON_ENGINE_ID) { if (activeEngineId === OCR_ENGINE_ID) {
return this.processWithTyphoon(input); return this.processWithNpDmsOcr(input);
} else { } else {
return this.processWithTesseract(input); return this.processWithFastPath(input);
} }
} }
/** ประมวลผลผ่าน Tesseract OCR โดยส่ง file content ผ่าน multipart */ /** ประมวลผลผ่าน Fast Path (PyMuPDF text layer) โดยส่ง file content ผ่าน multipart */
private async processWithTesseract( private async processWithFastPath(
input: OcrDetectionInput input: OcrDetectionInput
): Promise<OcrDetectionResult> { ): Promise<OcrDetectionResult> {
const startTime = Date.now(); const startTime = Date.now();
try { try {
this.logger.debug(`Tesseract OCR processing: ${input.pdfPath}`); this.logger.debug(`Fast Path processing: ${input.pdfPath}`);
const fileBuffer = fs.readFileSync(input.pdfPath!); const fileBuffer = fs.readFileSync(input.pdfPath!);
const form = new FormData(); const form = new FormData();
form.append( form.append(
@@ -364,9 +375,9 @@ export class OcrService {
const durationMs = Date.now() - startTime; const durationMs = Date.now() - startTime;
await this.writeAuditLog({ await this.writeAuditLog({
documentPublicId: input.documentPublicId, documentPublicId: input.documentPublicId,
aiModel: 'tesseract', aiModel: 'fast-path',
modelName: 'tesseract-ocr', modelName: 'pymupdf',
modelType: 'tesseract', modelType: 'fast-path',
status: AiAuditStatus.SUCCESS, status: AiAuditStatus.SUCCESS,
processingTimeMs: durationMs, processingTimeMs: durationMs,
cacheHit: false, cacheHit: false,
@@ -384,36 +395,70 @@ export class OcrService {
: String(err); : String(err);
await this.writeAuditLog({ await this.writeAuditLog({
documentPublicId: input.documentPublicId, documentPublicId: input.documentPublicId,
aiModel: 'tesseract', aiModel: 'fast-path',
modelName: 'tesseract-ocr', modelName: 'pymupdf',
modelType: 'tesseract', modelType: 'fast-path',
status: AiAuditStatus.FAILED, status: AiAuditStatus.FAILED,
processingTimeMs: durationMs, processingTimeMs: durationMs,
errorMessage: cause, errorMessage: cause,
cacheHit: false, cacheHit: false,
}); });
throw new Error(`Tesseract OCR Sidecar failed: ${cause}`); throw new Error(`Fast Path OCR Sidecar failed: ${cause}`);
} }
} }
/** ประมวลผลผ่าน Typhoon OCR */ /** ประมวลผลผ่าน np-dms-ocr (Ollama) */
private async processWithTyphoon( private async processWithNpDmsOcr(
input: OcrDetectionInput input: OcrDetectionInput
): Promise<OcrDetectionResult> { ): Promise<OcrDetectionResult> {
const startTime = Date.now(); const startTime = Date.now();
try { try {
const hasCapacity = await this.vramMonitorService.hasVramCapacity( const hasCapacity =
TYPHOON_OCR_REQUIRED_VRAM_MB await this.vramMonitorService.hasVramCapacity(OCR_REQUIRED_VRAM_MB);
);
if (!hasCapacity) { if (!hasCapacity) {
this.logger.warn( this.logger.warn(
`VRAM insufficient for Typhoon OCR. Falling back to Tesseract baseline.` `VRAM insufficient for np-dms-ocr. Falling back to fast-path.`
); );
return this.processWithTesseract(input); return this.processWithFastPath(input);
} }
const residency = await this.calculateOcrResidency(input.activeProfile); await this.calculateOcrResidency(input.activeProfile);
const keepAlive = residency.keepAliveSeconds;
this.logger.debug(`Typhoon OCR processing: ${input.pdfPath}`); // Resolve runtime parameters from DB (ocr-extract profile)
const profile = await this.profileRepo.findOne({
where: { profileName: 'ocr-extract' },
});
const runtimeParams = {
temperature: profile ? Number(profile.temperature) : 0.1,
top_p: profile ? Number(profile.topP) : 0.5,
repeat_penalty: profile ? Number(profile.repeatPenalty) : 1.0,
max_tokens: profile?.maxTokens ?? 16000,
};
// Override with input ocrOptions if provided
if (input.ocrOptions?.temperature !== undefined) {
runtimeParams.temperature = input.ocrOptions.temperature;
}
if (input.ocrOptions?.topP !== undefined) {
runtimeParams.top_p = input.ocrOptions.topP;
}
if (input.ocrOptions?.repeatPenalty !== undefined) {
runtimeParams.repeat_penalty = input.ocrOptions.repeatPenalty;
}
// Resolve Active Prompt from DB (ocr_extraction)
const activePrompt =
await this.aiPromptsService.getActive('ocr_extraction');
if (!activePrompt) {
throw new BusinessException(
'NO_ACTIVE_PROMPT',
'No active ocr_extraction prompt found',
'ไม่พบ Prompt OCR สำหรับดึงข้อมูลที่เปิดใช้งาน'
);
}
const systemPrompt = activePrompt.template;
const dmsTags = activePrompt.contextConfig?.dmsTags;
this.logger.debug(`np-dms-ocr processing: ${input.pdfPath}`);
const fileBuffer = fs.readFileSync(input.pdfPath!); const fileBuffer = fs.readFileSync(input.pdfPath!);
const form = new FormData(); const form = new FormData();
form.append( form.append(
@@ -421,20 +466,18 @@ export class OcrService {
new Blob([fileBuffer], { type: 'application/pdf' }), new Blob([fileBuffer], { type: 'application/pdf' }),
'upload.pdf' 'upload.pdf'
); );
form.append('engine', 'typhoon-np-dms-ocr'); form.append('engine', 'np-dms-ocr');
form.append('keep_alive', String(keepAlive)); form.append('systemPrompt', systemPrompt);
if (input.typhoonOptions?.temperature !== undefined) { if (dmsTags) {
form.append('temperature', String(input.typhoonOptions.temperature)); form.append('dmsTags', JSON.stringify(dmsTags));
}
if (input.typhoonOptions?.topP !== undefined) {
form.append('topP', String(input.typhoonOptions.topP));
}
if (input.typhoonOptions?.repeatPenalty !== undefined) {
form.append(
'repeatPenalty',
String(input.typhoonOptions.repeatPenalty)
);
} }
form.append('runtimeParams', JSON.stringify(runtimeParams));
// Append individual overrides for backward compatibility
form.append('temperature', String(runtimeParams.temperature));
form.append('topP', String(runtimeParams.top_p));
form.append('repeatPenalty', String(runtimeParams.repeat_penalty));
const response = await axios.post<OcrSidecarResponse>( const response = await axios.post<OcrSidecarResponse>(
`${this.ocrApiUrl}/ocr-upload`, `${this.ocrApiUrl}/ocr-upload`,
form, form,
@@ -447,9 +490,9 @@ export class OcrService {
const durationMs = Date.now() - startTime; const durationMs = Date.now() - startTime;
await this.writeAuditLog({ await this.writeAuditLog({
documentPublicId: input.documentPublicId, documentPublicId: input.documentPublicId,
aiModel: 'typhoon-ocr', aiModel: 'np-dms-ocr',
modelName: 'typhoon-np-dms-ocr:latest', modelName: 'np-dms-ocr:latest',
modelType: 'typhoon-ocr', modelType: 'np-dms-ocr',
status: AiAuditStatus.SUCCESS, status: AiAuditStatus.SUCCESS,
processingTimeMs: durationMs, processingTimeMs: durationMs,
cacheHit: false, cacheHit: false,
@@ -460,9 +503,9 @@ export class OcrService {
}; };
} catch (err: unknown) { } catch (err: unknown) {
this.logger.warn( this.logger.warn(
`Typhoon OCR failed, trying fallback baseline (Tesseract): ${err instanceof Error ? err.message : String(err)}` `np-dms-ocr failed, trying fallback to fast-path: ${err instanceof Error ? err.message : String(err)}`
); );
return this.processWithTesseract(input); return this.processWithFastPath(input);
} }
} }
@@ -1,14 +1,17 @@
// File: src/modules/ai/services/sandbox-ocr-engine.service.spec.ts // File: src/modules/ai/services/sandbox-ocr-engine.service.spec.ts
// Change Log: // Change Log:
// - 2026-06-14: สร้าง unit tests สำหรับ SandboxOcrEngineService ครอบคลุม detectAndExtract ทุก engine // - 2026-06-14: สร้าง unit tests สำหรับ SandboxOcrEngineService ครอบคลุม detectAndExtract ทุก engine
// - 2026-06-20: เพิ่ม mock getRepositoryToken(AiExecutionProfile) สำหรับทดสอบ parameter governance
import { Test, TestingModule } from '@nestjs/testing'; import { Test, TestingModule } from '@nestjs/testing';
import { ConfigService } from '@nestjs/config'; import { ConfigService } from '@nestjs/config';
import { getRepositoryToken } from '@nestjs/typeorm';
import axios from 'axios'; import axios from 'axios';
import * as fs from 'fs'; import * as fs from 'fs';
import { SandboxOcrEngineService } from './sandbox-ocr-engine.service'; import { SandboxOcrEngineService } from './sandbox-ocr-engine.service';
import { OcrService } from './ocr.service'; import { OcrService } from './ocr.service';
import { AiPromptsService } from '../prompts/ai-prompts.service'; import { AiPromptsService } from '../prompts/ai-prompts.service';
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
jest.mock('axios'); jest.mock('axios');
jest.mock('fs'); jest.mock('fs');
@@ -16,14 +19,31 @@ jest.mock('fs');
const mockedAxios = axios as jest.Mocked<typeof axios>; const mockedAxios = axios as jest.Mocked<typeof axios>;
const mockedFs = fs as jest.Mocked<typeof fs>; const mockedFs = fs as jest.Mocked<typeof fs>;
/** OcrService mock สำหรับ tesseract/fast-path */ /** OcrService mock สำหรับ fast-path */
const mockOcrService = { const mockOcrService = {
detectAndExtract: jest.fn(), detectAndExtract: jest.fn(),
}; };
/** AiPromptsService mock สำหรับ ocr_system prompt */ /** AiPromptsService mock สำหรับ ocr_system prompt */
const mockAiPromptsService = { const mockAiPromptsService = {
getActive: jest.fn(), getActive: jest.fn().mockResolvedValue({
template: 'mock active system prompt',
contextConfig: {
dmsTags: ['tag1', 'tag2'],
},
}),
};
/** AiExecutionProfile mock repository */
const mockProfile = {
profileName: 'ocr-extract',
temperature: 0.1,
topP: 0.5,
repeatPenalty: 1.0,
maxTokens: 16000,
};
const mockProfileRepository = {
findOne: jest.fn().mockResolvedValue(mockProfile),
}; };
/** ConfigService mock */ /** ConfigService mock */
@@ -48,6 +68,10 @@ describe('SandboxOcrEngineService', () => {
{ provide: ConfigService, useValue: mockConfigService }, { provide: ConfigService, useValue: mockConfigService },
{ provide: OcrService, useValue: mockOcrService }, { provide: OcrService, useValue: mockOcrService },
{ provide: AiPromptsService, useValue: mockAiPromptsService }, { provide: AiPromptsService, useValue: mockAiPromptsService },
{
provide: getRepositoryToken(AiExecutionProfile),
useValue: mockProfileRepository,
},
], ],
}).compile(); }).compile();
service = module.get<SandboxOcrEngineService>(SandboxOcrEngineService); service = module.get<SandboxOcrEngineService>(SandboxOcrEngineService);
@@ -65,7 +89,7 @@ describe('SandboxOcrEngineService', () => {
}); });
const result = await service.detectAndExtract('/tmp/file.pdf', 'auto'); const result = await service.detectAndExtract('/tmp/file.pdf', 'auto');
expect(result.text).toBe('auto extracted text'); expect(result.text).toBe('auto extracted text');
expect(result.engineUsed).toBe('tesseract'); expect(result.engineUsed).toBe('fast-path');
expect(result.fallbackUsed).toBe(false); expect(result.fallbackUsed).toBe(false);
expect(mockOcrService.detectAndExtract).toHaveBeenCalledWith({ expect(mockOcrService.detectAndExtract).toHaveBeenCalledWith({
pdfPath: '/tmp/file.pdf', pdfPath: '/tmp/file.pdf',
@@ -83,42 +107,6 @@ describe('SandboxOcrEngineService', () => {
}); });
}); });
describe('detectAndExtract() — engine=tesseract', () => {
it('ควร route ไปยัง OcrService เมื่อ engine=tesseract', async () => {
mockOcrService.detectAndExtract.mockResolvedValueOnce({
text: 'tesseract text',
ocrUsed: true,
});
const result = await service.detectAndExtract(
'/tmp/file.pdf',
'tesseract'
);
expect(result.engineUsed).toBe('tesseract');
expect(result.fallbackUsed).toBe(false);
});
});
describe('detectAndExtract() — engine=typhoon-np-dms-ocr (legacy alias)', () => {
it('ควรแปลง typhoon-np-dms-ocr เป็น np-dms-ocr และส่งไปยัง sidecar', async () => {
const mockBuffer = Buffer.from('pdf content');
(mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer);
mockedAxios.post = jest.fn().mockResolvedValueOnce({
data: {
text: 'ocr text via alias',
ocrUsed: true,
engineUsed: 'np-dms-ocr',
},
});
const result = await service.detectAndExtract(
'/tmp/file.pdf',
'typhoon-np-dms-ocr'
);
expect(result.text).toBe('ocr text via alias');
expect(result.engineUsed).toBe('np-dms-ocr');
expect(result.fallbackUsed).toBe(false);
});
});
describe('detectAndExtract() — engine=np-dms-ocr (sidecar path)', () => { describe('detectAndExtract() — engine=np-dms-ocr (sidecar path)', () => {
it('ควรส่ง file ไปยัง sidecar /ocr-upload สำเร็จ', async () => { it('ควรส่ง file ไปยัง sidecar /ocr-upload สำเร็จ', async () => {
const mockBuffer = Buffer.from('pdf binary data'); const mockBuffer = Buffer.from('pdf binary data');
@@ -149,7 +137,7 @@ describe('SandboxOcrEngineService', () => {
); );
}); });
it('ควรส่ง typhoonOptions (temperature, topP, repeatPenalty) ไปใน form data', async () => { it('ควรส่ง ocrOptions (temperature, topP, repeatPenalty) ไปใน form data', async () => {
const mockBuffer = Buffer.from('pdf data'); const mockBuffer = Buffer.from('pdf data');
(mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer); (mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer);
mockedAxios.post = jest.fn().mockResolvedValueOnce({ mockedAxios.post = jest.fn().mockResolvedValueOnce({
@@ -178,13 +166,13 @@ describe('SandboxOcrEngineService', () => {
expect(result.engineUsed).toBe('np-dms-ocr'); // resolvedEngineType fallback expect(result.engineUsed).toBe('np-dms-ocr'); // resolvedEngineType fallback
}); });
it('ควร fallback ไปยัง Tesseract เมื่อ fs.readFileSync ล้มเหลว (outer catch fallback)', async () => { it('ควร fallback ไปยัง fast-path เมื่อ fs.readFileSync ล้มเหลว (outer catch fallback)', async () => {
(mockedFs.readFileSync as jest.Mock).mockImplementationOnce(() => { (mockedFs.readFileSync as jest.Mock).mockImplementationOnce(() => {
throw new Error('ENOENT: file not found'); throw new Error('ENOENT: file not found');
}); });
// service จะ catch error และ fallback ไปยัง Tesseract // service จะ catch error และ fallback ไปยัง fast-path
mockOcrService.detectAndExtract.mockResolvedValueOnce({ mockOcrService.detectAndExtract.mockResolvedValueOnce({
text: 'tesseract fallback text', text: 'fast-path fallback text',
ocrUsed: true, ocrUsed: true,
}); });
const result = await service.detectAndExtract( const result = await service.detectAndExtract(
@@ -192,10 +180,10 @@ describe('SandboxOcrEngineService', () => {
'np-dms-ocr' 'np-dms-ocr'
); );
expect(result.fallbackUsed).toBe(true); expect(result.fallbackUsed).toBe(true);
expect(result.engineUsed).toBe('tesseract'); expect(result.engineUsed).toBe('fast-path');
}); });
it('ควร fallback ไปยัง Tesseract เมื่อ sidecar HTTP error เกิดขึ้น', async () => { it('ควร fallback ไปยัง fast-path เมื่อ sidecar HTTP error เกิดขึ้น', async () => {
const mockBuffer = Buffer.from('pdf data'); const mockBuffer = Buffer.from('pdf data');
(mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer); (mockedFs.readFileSync as jest.Mock).mockReturnValueOnce(mockBuffer);
mockedAxios.post = jest.fn().mockRejectedValueOnce( mockedAxios.post = jest.fn().mockRejectedValueOnce(
@@ -204,16 +192,16 @@ describe('SandboxOcrEngineService', () => {
}) })
); );
mockOcrService.detectAndExtract.mockResolvedValueOnce({ mockOcrService.detectAndExtract.mockResolvedValueOnce({
text: 'tesseract fallback result', text: 'fast-path fallback result',
ocrUsed: true, ocrUsed: true,
}); });
const result = await service.detectAndExtract( const result = await service.detectAndExtract(
'/tmp/doc.pdf', '/tmp/doc.pdf',
'np-dms-ocr' 'np-dms-ocr'
); );
expect(result.text).toBe('tesseract fallback result'); expect(result.text).toBe('fast-path fallback result');
expect(result.fallbackUsed).toBe(true); expect(result.fallbackUsed).toBe(true);
expect(result.engineUsed).toBe('tesseract'); expect(result.engineUsed).toBe('fast-path');
}); });
it('ควร fallback ไปยัง fast-path เมื่อ sidecar error และ OcrService ส่ง ocrUsed=false', async () => { it('ควร fallback ไปยัง fast-path เมื่อ sidecar error และ OcrService ส่ง ocrUsed=false', async () => {
@@ -3,26 +3,26 @@
// - 2026-05-30: แยก SandboxOcrEngineService ออกจาก OcrService เพื่อรองรับการเลือก Typhoon OCR เฉพาะ sandbox โดยไม่กระทบ core OCR flow // - 2026-05-30: แยก SandboxOcrEngineService ออกจาก OcrService เพื่อรองรับการเลือก Typhoon OCR เฉพาะ sandbox โดยไม่กระทบ core OCR flow
// - 2026-06-01: เปลี่ยนจาก remapPath + pdfPath ไปเป็น multipart file upload ไปยัง /ocr-upload (แก้ปัญหา Docker WSL2 mount) // - 2026-06-01: เปลี่ยนจาก remapPath + pdfPath ไปเป็น multipart file upload ไปยัง /ocr-upload (แก้ปัญหา Docker WSL2 mount)
// - 2026-06-02: ส่งค่า X-API-Key ใน request headers ไปยัง ocr-sidecar เพื่อความมั่นคงปลอดภัยสูงสุด (ADR-033, Suggestion 2) // - 2026-06-02: ส่งค่า X-API-Key ใน request headers ไปยัง ocr-sidecar เพื่อความมั่นคงปลอดภัยสูงสุด (ADR-033, Suggestion 2)
// - 2026-06-04: ADR-034 — เพิ่ม 'typhoon-np-dms-ocr' เป็น canonical SandboxOcrEngineType; legacy aliases ยังรองรับ // - 2026-06-04: ADR-034 — เพิ่ม 'np-dms-ocr' เป็น canonical SandboxOcrEngineType
// - 2026-06-04: เพิ่ม OcrTyphoonOptions interface; รับ temperature/topP/repeatPenalty จาก frontend sandbox เพื่อ override Modelfile defaults // - 2026-06-04: เพิ่ม OcrNpDmsOptions interface; รับ temperature/topP/repeatPenalty จาก frontend sandbox เพื่อ override Modelfile defaults
// - 2026-06-13: ADR-036 — เปลี่ยน canonical SandboxOcrEngineType เป็น np-dms-ocr และคง legacy alias // - 2026-06-13: ADR-036 — เปลี่ยน canonical SandboxOcrEngineType เป็น np-dms-ocr
// - 2026-06-17: เพิ่ม AiPromptsService injection และส่ง systemPrompt form field จาก active ocr_system prompt (T028) // - 2026-06-17: เพิ่ม AiPromptsService injection และส่ง systemPrompt form field จาก active ocr_system prompt (T028)
import { Injectable, Logger } from '@nestjs/common'; import { Injectable, Logger } from '@nestjs/common';
import { ConfigService } from '@nestjs/config'; import { ConfigService } from '@nestjs/config';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import axios from 'axios'; import axios from 'axios';
import * as fs from 'fs'; import * as fs from 'fs';
import { OcrService } from './ocr.service'; import { OcrService } from './ocr.service';
import { AiPromptsService } from '../prompts/ai-prompts.service'; import { AiPromptsService } from '../prompts/ai-prompts.service';
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
import { BusinessException } from '../../../common/exceptions';
export type SandboxOcrEngineType = export type SandboxOcrEngineType = 'auto' | 'np-dms-ocr';
| 'auto'
| 'tesseract'
| 'np-dms-ocr'
| 'typhoon-np-dms-ocr';
/** ค่า parameter สำหรับ Typhoon OCR ที่ override Modelfile defaults ได้จาก sandbox UI */ /** ค่า parameter สำหรับ np-dms-ocr ที่ override Modelfile defaults ได้จาก sandbox UI */
export interface OcrTyphoonOptions { export interface OcrNpDmsOptions {
temperature?: number; temperature?: number;
topP?: number; topP?: number;
repeatPenalty?: number; repeatPenalty?: number;
@@ -50,7 +50,9 @@ export class SandboxOcrEngineService {
constructor( constructor(
private readonly configService: ConfigService, private readonly configService: ConfigService,
private readonly ocrService: OcrService, private readonly ocrService: OcrService,
private readonly aiPromptsService: AiPromptsService private readonly aiPromptsService: AiPromptsService,
@InjectRepository(AiExecutionProfile)
private readonly profileRepo: Repository<AiExecutionProfile>
) { ) {
this.ocrApiUrl = this.configService.get<string>( this.ocrApiUrl = this.configService.get<string>(
'OCR_API_URL', 'OCR_API_URL',
@@ -62,26 +64,23 @@ export class SandboxOcrEngineService {
); );
} }
/** รัน OCR ตาม engine ที่เลือก โดย fallback กลับไป Tesseract baseline เมื่อ Typhoon ล้มเหลว */ /** รัน OCR ตาม engine ที่เลือก โดย fallback กลับไป fast-path เมื่อ np-dms-ocr ล้มเหลว */
async detectAndExtract( async detectAndExtract(
pdfPath: string, pdfPath: string,
engineType: SandboxOcrEngineType = 'auto', engineType: SandboxOcrEngineType = 'auto',
typhoonOptions?: OcrTyphoonOptions ocrOptions?: OcrNpDmsOptions
): Promise<SandboxOcrResult> { ): Promise<SandboxOcrResult> {
const resolvedEngineType = const resolvedEngineType = engineType;
engineType === 'typhoon-np-dms-ocr' ? 'np-dms-ocr' : engineType;
this.logger.log( this.logger.log(
`detectAndExtract called — engine="${resolvedEngineType}" pdfPath="${pdfPath}" typhoonOptions=${JSON.stringify(typhoonOptions ?? null)}` `detectAndExtract called — engine="${resolvedEngineType}" pdfPath="${pdfPath}" ocrOptions=${JSON.stringify(ocrOptions ?? null)}`
);
if (resolvedEngineType === 'auto' || resolvedEngineType === 'tesseract') {
this.logger.log(
`engine="${resolvedEngineType}" → routing to Tesseract/fast-path`
); );
if (resolvedEngineType === 'auto') {
this.logger.log(`engine="${resolvedEngineType}" → routing to fast-path`);
const result = await this.ocrService.detectAndExtract({ pdfPath }); const result = await this.ocrService.detectAndExtract({ pdfPath });
return { return {
text: result.text, text: result.text,
ocrUsed: result.ocrUsed, ocrUsed: result.ocrUsed,
engineUsed: result.ocrUsed ? 'tesseract' : 'fast-path', engineUsed: result.ocrUsed ? 'fast-path' : 'fast-path',
fallbackUsed: false, fallbackUsed: false,
}; };
} }
@@ -103,6 +102,42 @@ export class SandboxOcrEngineService {
); );
throw fsErr; throw fsErr;
} }
// Resolve runtime parameters from DB (ocr-extract profile)
const profile = await this.profileRepo.findOne({
where: { profileName: 'ocr-extract' },
});
const runtimeParams = {
temperature: profile ? Number(profile.temperature) : 0.1,
top_p: profile ? Number(profile.topP) : 0.5,
repeat_penalty: profile ? Number(profile.repeatPenalty) : 1.0,
max_tokens: profile?.maxTokens ?? 16000,
};
// Override with sandbox options if provided
if (ocrOptions?.temperature !== undefined) {
runtimeParams.temperature = ocrOptions.temperature;
}
if (ocrOptions?.topP !== undefined) {
runtimeParams.top_p = ocrOptions.topP;
}
if (ocrOptions?.repeatPenalty !== undefined) {
runtimeParams.repeat_penalty = ocrOptions.repeatPenalty;
}
// Resolve Active Prompt from DB (ocr_extraction)
const activePrompt =
await this.aiPromptsService.getActive('ocr_extraction');
if (!activePrompt) {
throw new BusinessException(
'NO_ACTIVE_PROMPT',
'No active ocr_extraction prompt found',
'ไม่พบ Prompt OCR สำหรับดึงข้อมูลที่เปิดใช้งาน'
);
}
const systemPrompt = activePrompt.template;
const dmsTags = activePrompt.contextConfig?.dmsTags;
const form = new FormData(); const form = new FormData();
form.append( form.append(
'file', 'file',
@@ -110,32 +145,19 @@ export class SandboxOcrEngineService {
'upload.pdf' 'upload.pdf'
); );
form.append('engine', resolvedEngineType); form.append('engine', resolvedEngineType);
if (typhoonOptions?.temperature !== undefined) { form.append('systemPrompt', systemPrompt);
form.append('temperature', String(typhoonOptions.temperature)); if (dmsTags) {
form.append('dmsTags', JSON.stringify(dmsTags));
} }
if (typhoonOptions?.topP !== undefined) { form.append('runtimeParams', JSON.stringify(runtimeParams));
form.append('topP', String(typhoonOptions.topP));
} // Append individual overrides for backward compatibility
if (typhoonOptions?.repeatPenalty !== undefined) { form.append('temperature', String(runtimeParams.temperature));
form.append('repeatPenalty', String(typhoonOptions.repeatPenalty)); form.append('topP', String(runtimeParams.top_p));
} form.append('repeatPenalty', String(runtimeParams.repeat_penalty));
// ดึง active ocr_system prompt และส่งไป sidecar
try {
const activeOcrSystemPrompt =
await this.aiPromptsService.getActive('ocr_system');
if (activeOcrSystemPrompt && activeOcrSystemPrompt.template) {
form.append('systemPrompt', activeOcrSystemPrompt.template);
this.logger.log( this.logger.log(
`Injected active ocr_system prompt (version ${activeOcrSystemPrompt.versionNumber})` `Sending to sidecar — engine=${engineType} options=${JSON.stringify(ocrOptions ?? {})}`
);
}
} catch (promptErr: unknown) {
this.logger.warn(
`Failed to retrieve active ocr_system prompt, proceeding without: ${promptErr instanceof Error ? promptErr.message : String(promptErr)}`
);
}
this.logger.log(
`Sending to sidecar — engine=${engineType} options=${JSON.stringify(typhoonOptions ?? {})}`
); );
const response = await axios.post<SandboxOcrSidecarResponse>( const response = await axios.post<SandboxOcrSidecarResponse>(
`${this.ocrApiUrl}/ocr-upload`, `${this.ocrApiUrl}/ocr-upload`,
@@ -183,9 +205,9 @@ export class SandboxOcrEngineService {
? `HTTP ${axiosStatus}${cause} — sidecar detail: ${axiosDetail}` ? `HTTP ${axiosStatus}${cause} — sidecar detail: ${axiosDetail}`
: `HTTP ${axiosStatus}${cause}`; : `HTTP ${axiosStatus}${cause}`;
this.logger.error( this.logger.error(
`[DIAG] Typhoon OCR FAILED — engine="${engineType}" url="${this.ocrApiUrl}/ocr-upload" error: ${fullCause}` `[DIAG] np-dms-ocr FAILED — engine="${engineType}" url="${this.ocrApiUrl}/ocr-upload" error: ${fullCause}`
); );
this.logger.warn(`Falling back to Tesseract due to: ${fullCause}`); this.logger.warn(`Falling back to fast-path due to: ${fullCause}`);
const fallbackResult = await this.ocrService.detectAndExtract({ const fallbackResult = await this.ocrService.detectAndExtract({
pdfPath, pdfPath,
@@ -193,7 +215,7 @@ export class SandboxOcrEngineService {
return { return {
text: fallbackResult.text, text: fallbackResult.text,
ocrUsed: fallbackResult.ocrUsed, ocrUsed: fallbackResult.ocrUsed,
engineUsed: fallbackResult.ocrUsed ? 'tesseract' : 'fast-path', engineUsed: fallbackResult.ocrUsed ? 'fast-path' : 'fast-path',
fallbackUsed: true, fallbackUsed: true,
}; };
} }
@@ -54,7 +54,7 @@ describe('AiPolicyService', () => {
describe('getCanonicalModelName', () => { describe('getCanonicalModelName', () => {
it('ควรคืนค่า np-dms-ocr สำหรับชื่อโมเดลที่มีคำว่า ocr', () => { it('ควรคืนค่า np-dms-ocr สำหรับชื่อโมเดลที่มีคำว่า ocr', () => {
expect(service.getCanonicalModelName('typhoon-np-dms-ocr:latest')).toBe( expect(service.getCanonicalModelName('np-dms-ocr:latest')).toBe(
'np-dms-ocr' 'np-dms-ocr'
); );
expect(service.getCanonicalModelName('my-ocr-model')).toBe('np-dms-ocr'); expect(service.getCanonicalModelName('my-ocr-model')).toBe('np-dms-ocr');
@@ -1,6 +1,7 @@
// File: backend/src/modules/ai/tests/ocr-residency.spec.ts // File: backend/src/modules/ai/tests/ocr-residency.spec.ts
// Change Log: // Change Log:
// - 2026-06-11: Initial unit tests for adaptive OCR residency // - 2026-06-11: Initial unit tests for adaptive OCR residency
// - 2026-06-20: เพิ่ม mock สำหรับ AiExecutionProfile repository และ AiPromptsService เพื่อรองรับ parameter governance
import { Test, TestingModule } from '@nestjs/testing'; import { Test, TestingModule } from '@nestjs/testing';
import { ConfigService } from '@nestjs/config'; import { ConfigService } from '@nestjs/config';
@@ -11,6 +12,8 @@ import { AiPolicyService } from '../services/ai-policy.service';
import { OcrCacheService } from '../services/ocr-cache.service'; import { OcrCacheService } from '../services/ocr-cache.service';
import { SystemSetting } from '../entities/system-setting.entity'; import { SystemSetting } from '../entities/system-setting.entity';
import { AiAuditLog } from '../entities/ai-audit-log.entity'; import { AiAuditLog } from '../entities/ai-audit-log.entity';
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
import { AiPromptsService } from '../prompts/ai-prompts.service';
describe('OcrService Adaptive Residency (US2)', () => { describe('OcrService Adaptive Residency (US2)', () => {
let service: OcrService; let service: OcrService;
@@ -36,6 +39,23 @@ describe('OcrService Adaptive Residency (US2)', () => {
create: jest.fn().mockReturnValue({}), create: jest.fn().mockReturnValue({}),
save: jest.fn().mockResolvedValue({}), save: jest.fn().mockResolvedValue({}),
}; };
const mockProfileRepo = {
findOne: jest.fn().mockResolvedValue({
profileName: 'ocr-extract',
temperature: 0.1,
topP: 0.5,
repeatPenalty: 1.0,
maxTokens: 16000,
}),
};
const mockAiPromptsService = {
getActive: jest.fn().mockResolvedValue({
template: 'mock active system prompt',
contextConfig: {
dmsTags: ['tag1', 'tag2'],
},
}),
};
const mockOcrCacheService = {}; const mockOcrCacheService = {};
const mockVramMonitorService = { const mockVramMonitorService = {
getVramHeadroom: jest.fn(), getVramHeadroom: jest.fn(),
@@ -61,6 +81,11 @@ describe('OcrService Adaptive Residency (US2)', () => {
provide: getRepositoryToken(AiAuditLog), provide: getRepositoryToken(AiAuditLog),
useValue: mockAiAuditLogRepo, useValue: mockAiAuditLogRepo,
}, },
{
provide: getRepositoryToken(AiExecutionProfile),
useValue: mockProfileRepo,
},
{ provide: AiPromptsService, useValue: mockAiPromptsService },
{ provide: OcrCacheService, useValue: mockOcrCacheService }, { provide: OcrCacheService, useValue: mockOcrCacheService },
{ provide: VramMonitorService, useValue: mockVramMonitorService }, { provide: VramMonitorService, useValue: mockVramMonitorService },
{ provide: AiPolicyService, useValue: mockAiPolicyService }, { provide: AiPolicyService, useValue: mockAiPolicyService },
@@ -1,6 +1,8 @@
// File: backend/src/modules/ai/tests/ocr.service.spec.ts // File: backend/src/modules/ai/tests/ocr.service.spec.ts
// Change Log: // Change Log:
// - 2026-06-13: Initial unit tests for OCR parameter wiring (T066) // - 2026-06-13: Initial unit tests for OCR parameter wiring (T066)
// - 2026-06-20: เพิ่ม mock สำหรับ AiExecutionProfile repository และ AiPromptsService เพื่อรองรับ parameter governance
import { Test, TestingModule } from '@nestjs/testing'; import { Test, TestingModule } from '@nestjs/testing';
import { ConfigService } from '@nestjs/config'; import { ConfigService } from '@nestjs/config';
import { getRepositoryToken } from '@nestjs/typeorm'; import { getRepositoryToken } from '@nestjs/typeorm';
@@ -10,12 +12,17 @@ import { AiPolicyService } from '../services/ai-policy.service';
import { OcrCacheService } from '../services/ocr-cache.service'; import { OcrCacheService } from '../services/ocr-cache.service';
import { SystemSetting } from '../entities/system-setting.entity'; import { SystemSetting } from '../entities/system-setting.entity';
import { AiAuditLog } from '../entities/ai-audit-log.entity'; import { AiAuditLog } from '../entities/ai-audit-log.entity';
import { AiExecutionProfile } from '../entities/ai-execution-profile.entity';
import { AiPromptsService } from '../prompts/ai-prompts.service';
import axios from 'axios'; import axios from 'axios';
import * as fs from 'fs'; import * as fs from 'fs';
jest.mock('axios'); jest.mock('axios');
jest.mock('fs'); jest.mock('fs');
describe('OcrService Parameter Wiring (T066)', () => { describe('OcrService Parameter Wiring (T066)', () => {
let service: OcrService; let service: OcrService;
const mockConfigService = { const mockConfigService = {
get: jest.fn((key: string, defaultValue?: unknown): unknown => { get: jest.fn((key: string, defaultValue?: unknown): unknown => {
const config: Record<string, unknown> = { const config: Record<string, unknown> = {
@@ -29,16 +36,39 @@ describe('OcrService Parameter Wiring (T066)', () => {
return config[key] ?? defaultValue; return config[key] ?? defaultValue;
}), }),
}; };
const mockSystemSettingRepo = { const mockSystemSettingRepo = {
findOne: jest.fn().mockResolvedValue({ findOne: jest.fn().mockResolvedValue({
settingValue: '019505a1-7c3e-7000-8000-abc123def002', settingValue: '019505a1-7c3e-7000-8000-abc123def002',
}), }),
}; };
const mockAiAuditLogRepo = { const mockAiAuditLogRepo = {
create: jest.fn().mockReturnValue({}), create: jest.fn().mockReturnValue({}),
save: jest.fn().mockResolvedValue({}), save: jest.fn().mockResolvedValue({}),
}; };
const mockProfileRepo = {
findOne: jest.fn().mockResolvedValue({
profileName: 'ocr-extract',
temperature: 0.1,
topP: 0.5,
repeatPenalty: 1.0,
maxTokens: 16000,
}),
};
const mockAiPromptsService = {
getActive: jest.fn().mockResolvedValue({
template: 'mock active system prompt',
contextConfig: {
dmsTags: ['tag1', 'tag2'],
},
}),
};
const mockOcrCacheService = {}; const mockOcrCacheService = {};
const mockVramMonitorService = { const mockVramMonitorService = {
getVramHeadroom: jest.fn().mockResolvedValue({ getVramHeadroom: jest.fn().mockResolvedValue({
totalMb: 16384, totalMb: 16384,
@@ -49,12 +79,15 @@ describe('OcrService Parameter Wiring (T066)', () => {
}), }),
hasVramCapacity: jest.fn().mockResolvedValue(true), hasVramCapacity: jest.fn().mockResolvedValue(true),
}; };
const mockAiPolicyService = {}; const mockAiPolicyService = {};
const mockRedis = { const mockRedis = {
get: jest.fn().mockResolvedValue(null), get: jest.fn().mockResolvedValue(null),
set: jest.fn().mockResolvedValue('OK'), set: jest.fn().mockResolvedValue('OK'),
del: jest.fn().mockResolvedValue(1), del: jest.fn().mockResolvedValue(1),
}; };
beforeEach(async () => { beforeEach(async () => {
const module: TestingModule = await Test.createTestingModule({ const module: TestingModule = await Test.createTestingModule({
providers: [ providers: [
@@ -68,6 +101,11 @@ describe('OcrService Parameter Wiring (T066)', () => {
provide: getRepositoryToken(AiAuditLog), provide: getRepositoryToken(AiAuditLog),
useValue: mockAiAuditLogRepo, useValue: mockAiAuditLogRepo,
}, },
{
provide: getRepositoryToken(AiExecutionProfile),
useValue: mockProfileRepo,
},
{ provide: AiPromptsService, useValue: mockAiPromptsService },
{ provide: OcrCacheService, useValue: mockOcrCacheService }, { provide: OcrCacheService, useValue: mockOcrCacheService },
{ provide: VramMonitorService, useValue: mockVramMonitorService }, { provide: VramMonitorService, useValue: mockVramMonitorService },
{ provide: AiPolicyService, useValue: mockAiPolicyService }, { provide: AiPolicyService, useValue: mockAiPolicyService },
@@ -88,7 +126,7 @@ describe('OcrService Parameter Wiring (T066)', () => {
await service.detectAndExtract({ await service.detectAndExtract({
pdfPath: '/path/to/test.pdf', pdfPath: '/path/to/test.pdf',
documentPublicId: 'doc-123', documentPublicId: 'doc-123',
typhoonOptions: { ocrOptions: {
temperature: 0.15, temperature: 0.15,
topP: 0.65, topP: 0.65,
repeatPenalty: 1.15, repeatPenalty: 1.15,
@@ -104,7 +142,7 @@ describe('OcrService Parameter Wiring (T066)', () => {
const formData = postCallArgs[1]; const formData = postCallArgs[1];
expect(url).toBe('http://localhost:8765/ocr-upload'); expect(url).toBe('http://localhost:8765/ocr-upload');
expect(formData).toBeInstanceOf(FormData); expect(formData).toBeInstanceOf(FormData);
expect(formData.get('engine')).toBe('typhoon-np-dms-ocr'); expect(formData.get('engine')).toBe('np-dms-ocr');
expect(formData.get('temperature')).toBe('0.15'); expect(formData.get('temperature')).toBe('0.15');
expect(formData.get('topP')).toBe('0.65'); expect(formData.get('topP')).toBe('0.65');
expect(formData.get('repeatPenalty')).toBe('1.15'); expect(formData.get('repeatPenalty')).toBe('1.15');
+255
View File
@@ -0,0 +1,255 @@
# OCR Sidecar — แผนการ Refactor by CLAUDE
**ไฟล์:** `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py`
**วันที่วิเคราะห์:** 2026-06-20
**GPU ปัจจุบัน:** RTX 5060 Ti 16GB
**ไฟล์:** `ocr-sidecar-refactor-plan-cluade.md`
---
## สรุปปัญหาที่พบ
| # | ปัญหา | ความรุนแรง | หมวด |
|---|-------|-----------|------|
| P1 | Hardcoded default API key ใน source code | 🔴 Critical | Security |
| P2 | `process_ocr` เป็น sync function — block event loop | 🔴 Critical | Performance |
| P3 | God Service — รวม OCR + Embed + Rerank + Normalize ไว้ด้วยกัน | 🔴 Critical | Architecture |
| P4 | Business logic อยู่ใน sidecar แทน backend | 🟡 Medium | Architecture |
| P5 | VRAM contention logic ล้าสมัย (ออกแบบมาสำหรับ 8GB) | 🟡 Medium | Performance |
| P6 | `on_event("startup")` deprecated + blocking | 🟡 Medium | Code Quality |
| P7 | `import tempfile` ซ้ำ | 🟢 Low | Code Quality |
| P8 | JSON parse fallback ไม่มี warning log | 🟢 Low | Observability |
---
## VRAM Budget (RTX 5060 Ti 16GB)
```
np-dms-ocr (typhoon-ocr 3B) ~34 GB
np-dms-ai (llama3.2 3B) ~23 GB
BGE-M3 (BAAI/bge-m3) ~2 GB
Reranker (bge-reranker-large) ~1 GB
─────────────────────────────────────────
รวมประมาณ ~810 GB ✅ พอดีใน 16GB
```
**ผลกระทบ:** โหลดทุก model พร้อมกันได้ — VRAM Arbiter และ `keep_alive: 0` ไม่จำเป็นอีกต่อไป
---
## สิ่งที่ควรย้ายไป Backend (NestJS)
| สิ่งที่ย้าย | เหตุผล |
|------------|--------|
| API Key Authentication | Sidecar อยู่ใน internal Docker network — ไม่ต้องการ auth layer ซ้อน |
| `systemPrompt` validation + length check | Business rule — backend ควรเป็นผู้กำหนดและ validate ก่อนส่งมา |
| `/normalize` endpoint ทั้งหมด | Pipeline step ที่ backend orchestrate เอง |
| Engine selection + alias normalization | Backend ควร resolve engine แล้วส่งชื่อที่ถูกต้องมาตรงๆ |
| Fast-path text extraction (auto engine) | การตัดสินใจว่า "ต้อง OCR ไหม" เป็น business rule ของ backend |
| Page range calculation | Backend รู้ document metadata อยู่แล้ว |
---
## แผนการ Refactor แบ่งเป็น 3 Phase
---
### Phase 1 — Security & Critical Bugs
**เป้าหมาย:** แก้ปัญหา critical ที่กระทบ production ทันที
**ขนาดงาน:** ~1 วัน
#### 1.1 ลบ Hardcoded Default API Key
```python
# ❌ ก่อน
OCR_SIDECAR_API_KEY = os.getenv("OCR_SIDECAR_API_KEY", "lcbp3-dms-ocr-sidecar-secure-token-2026")
# ✅ หลัง
OCR_SIDECAR_API_KEY = os.getenv("OCR_SIDECAR_API_KEY")
if not OCR_SIDECAR_API_KEY:
raise RuntimeError("OCR_SIDECAR_API_KEY environment variable must be set")
```
> ต้อง rotate key ที่ expose ใน git history ด้วย
#### 1.2 เปลี่ยน `process_ocr` เป็น Async
```python
# ❌ ก่อน
def process_ocr(...) -> str:
with httpx.Client(timeout=OCR_TIMEOUT) as client:
response = client.post(...)
# ✅ หลัง
async def process_ocr(...) -> str:
async with httpx.AsyncClient(timeout=OCR_TIMEOUT) as client:
response = await client.post(...)
```
#### 1.3 เปลี่ยน `keep_alive` จาก 0 เป็นค่าที่เหมาะสม
```python
# ❌ ก่อน — unload ทันทีเพราะ VRAM ไม่พอ (8GB era)
"keep_alive": options_override.get("keep_alive", 0)
# ✅ หลัง — keep ไว้เพราะ 16GB พอ
"keep_alive": options_override.get("keep_alive", 300)
```
---
### Phase 2 — Performance & Code Quality
**เป้าหมาย:** ลบ legacy code ที่ออกแบบมาสำหรับ 8GB GPU และปรับปรุง startup
**ขนาดงาน:** ~1 วัน
#### 2.1 ลบ VRAM Contention Logic ทั้งหมด
```python
# ❌ ลบออกทั้งหมด
from services.vram_monitor import get_vram_headroom
headroom = get_vram_headroom()
if not headroom.query_success:
device = "cpu"
elif headroom.available_mb < threshold_mb:
device = "cpu"
```
```python
# ✅ แทนด้วย fixed device
bge_model = BGEM3FlagModel('BAAI/bge-m3', use_fp16=True) # fp16 ได้แล้วบน 16GB
# device = "cuda" เสมอ — ไม่ต้อง dynamic selection
```
#### 2.2 เปลี่ยน Startup ไปใช้ `lifespan`
```python
# ❌ ก่อน — deprecated
@app.on_event("startup")
def load_bge_models():
bge_model = BGEM3FlagModel(...)
# ✅ หลัง
from contextlib import asynccontextmanager
@asynccontextmanager
async def lifespan(app: FastAPI):
await asyncio.to_thread(load_models) # ไม่ block event loop
yield
app = FastAPI(title="OCR Sidecar", version="3.0.0", lifespan=lifespan)
```
#### 2.3 แก้ duplicate import และ JSON parse warning
```python
# ลบ import tempfile ที่ซ้ำใน /ocr-upload
# เพิ่ม log warning ใน JSON parse fallback
try:
result_text = json.loads(raw_text).get("natural_text", raw_text)
except (json.JSONDecodeError, AttributeError):
logger.warning(f"[DIAG] Failed to parse JSON response, using raw text. Preview: {raw_text[:100]}")
result_text = raw_text
```
#### 2.4 Validate `pdf_path` ก่อนส่งเข้า `process_ocr`
```python
# เพิ่มใน _process_pdf_doc
resolved_path = pdf_path or (str(doc.name) if hasattr(doc, 'name') and doc.name else None)
if not resolved_path or resolved_path in ("", "<memory>"):
raise ValueError("Invalid PDF path — ต้องส่ง pdf_path ที่ valid เข้ามาด้วย")
```
---
### Phase 3 — Architecture Separation
**เป้าหมาย:** แยก concerns ออกจากกัน ให้ sidecar เป็น pure compute worker
**ขนาดงาน:** ~2–3 วัน
#### 3.1 ย้าย `/normalize` ไป Backend
Backend เรียก PyThaiNLP โดยตรง หรือสร้าง microservice แยก:
```
n8n → POST /api/rag/normalize (NestJS) → PyThaiNLP → return normalized text
```
ลบ `/normalize` endpoint ออกจาก sidecar ทั้งหมด
#### 3.2 ย้าย Authentication ออกจาก Sidecar
```yaml
# docker-compose — จำกัด network แทน API key
services:
ocr-sidecar:
networks:
- internal # ไม่ expose ออก external network
# ไม่มี ports mapping ออก host
```
Backend (NestJS) เรียก sidecar ผ่าน internal network โดยไม่ต้องส่ง API key
#### 3.3 Sidecar รับ Resolved Input เท่านั้น
Backend ทำ pre-processing ก่อนแล้วส่งมา:
```
Backend (NestJS)
├─ ตรวจสอบ PDF มี text layer หรือไม่ (fast-path decision)
├─ กำหนด engine ที่จะใช้ (ไม่มี "auto" ใน sidecar)
├─ validate systemPrompt
├─ คำนวณ page range
└─► POST /ocr { engine: "np-dms-ocr", pages: [1,2,3], systemPrompt: "..." }
```
Sidecar เหลือหน้าที่เดียว: **รับ input → เรียก model → คืน result**
---
## Target Architecture หลัง Refactor
```
┌─────────────────────────────────────────────┐
│ Backend (NestJS) │
│ │
│ - Fast-path text extraction decision │
│ - Engine selection & validation │
│ - systemPrompt validation │
│ - Page range calculation │
│ - Thai text normalization (PyThaiNLP) │
│ - Auth & rate limiting │
└────────────────────┬────────────────────────┘
│ internal Docker network
┌─────────────────────────────────────────────┐
│ OCR Sidecar (compute only) │
│ │
│ POST /ocr ← PDF path + page list │
│ POST /ocr-upload ← multipart file │
│ POST /embed ← normalized text │
│ POST /rerank ← query + chunks │
│ GET /health │
│ │
│ Models (always loaded, CUDA): │
│ - np-dms-ocr via Ollama (keep_alive=300) │
│ - BGE-M3 fp16 │
│ - BGE-Reranker-Large fp16 │
└─────────────────────────────────────────────┘
```
---
## Checklist สรุป
### Phase 1 (Critical — ทำก่อน)
- [ ] ลบ hardcoded default API key + rotate key ใน secrets
- [ ] เปลี่ยน `process_ocr` เป็น async + `httpx.AsyncClient`
- [ ] เปลี่ยน `keep_alive` default จาก 0 เป็น 300
### Phase 2 (Performance)
- [ ] ลบ VRAM contention logic ทั้งหมด (`get_vram_headroom`, dynamic device)
- [ ] เปลี่ยน `use_fp16=False` เป็น `use_fp16=True` สำหรับ BGE models
- [ ] เปลี่ยน `on_event("startup")` เป็น `lifespan` + `asyncio.to_thread`
- [ ] ลบ duplicate `import tempfile`
- [ ] เพิ่ม log warning ใน JSON parse fallback
- [ ] Validate `pdf_path` ก่อนส่งเข้า `process_ocr`
### Phase 3 (Architecture)
- [ ] ย้าย `/normalize` ไป Backend
- [ ] ย้าย engine selection + alias normalization ไป Backend
- [ ] ย้าย fast-path decision ไป Backend
- [ ] จำกัด sidecar network เป็น internal-only แทน API key auth
- [ ] ลบ `/normalize`, auth middleware ออกจาก sidecar
---
*เอกสารนี้จัดทำจากการ code review วันที่ 2026-06-20 — ควร update เมื่อ architecture เปลี่ยน*
+259
View File
@@ -0,0 +1,259 @@
# 📋 OCR Sidecar — แผนการ Refactor by QWEN
**Project:** NAP-DMS (OCR Sidecar Modernization)
**Target Hardware:** NVIDIA RTX 5060 Ti 16GB
**Date:** 2026-06-20
**Owner:** Document Intelligence Engine / Senior Full Stack Developer
**Status:** 🟡 Planning Phase
**ไฟล์:** `ocr-sidecar-refactor-plan-qwen.md`
---
## 🎯 1. Executive Summary
แผนการ Refactor ครั้งนี้มีเป้าหมายเพื่อเปลี่ยน **OCR Sidecar** จาก "Fat Worker ที่แบกรับ Business Logic และ Hardware Decision" ให้กลายเป็น **"Pure Dumb Worker"** ที่โฟกัสเฉพาะการทำ AI Inference เท่านั้น โดยย้าย Orchestration, Security Gatekeeping, และ VRAM Management กลับไปให้ **NestJS Backend** เป็นผู้ควบคุมผ่านกลไก **Global Mutex + Task Queue**
### 🎯 Key Objectives
1. ✅ **Security:** ปิดช่องโหว่ Path Traversal ใน `/ocr` endpoint
2. ✅ **Architecture:** แยก Business Logic (DMS Tags, Noise Filtering, Pagination) ออกจาก Inference Layer
3. ✅ **Performance:** ลด Latency 2-5 วินาที/Request โดยยกเลิกการย้าย Model ข้าม RAM↔VRAM
4. ✅ **Stability:** ป้องกัน OOM Crash บน RTX 5060 Ti 16GB ด้วย Backend-Controlled Mutex
5. ✅ **Scalability:** Sidecar รับ Request แบบ "1 หน้า = 1 Request" เพื่อรองรับ Horizontal Scaling
---
## 🏗️ 2. Architecture Comparison
### 🔴 Current Architecture (Anti-Pattern)
```
┌─────────────────┐ ┌─────────────────────────────────────┐
│ NestJS API │ ──────► │ Python Sidecar (Fat Worker) │
│ │ │ ┌───────────────────────────────┐ │
│ (Minimal Logic)│ │ │ ❌ Path Validation │ │
│ │ │ │ ❌ DMS Tag Injection │ │
│ │ │ │ ❌ Noise Filtering │ │
│ │ │ │ ❌ Page Loop Orchestration │ │
│ │ │ │ ❌ VRAM Decision (per req) │ │
│ │ │ │ ❌ Model .to('cuda'/'cpu') │ │
│ │ │ └───────────────────────────────┘ │
└─────────────────┘ └─────────────────────────────────────┘
```
### 🟢 Target Architecture (Best Practice)
```
┌──────────────────────────────────────────────────────────────────┐
│ NestJS Backend (Orchestrator) │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ Path Guard │ │ Prompt Builder│ │ VRAM Mutex (Global) │ │
│ │ (Canonical) │ │ (DMS Tags) │ │ ──► Sequential GPU Ops │ │
│ └─────────────┘ └──────────────┘ └────────────────────────┘ │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ PDF Splitter│ │ Noise Filter │ │ BullMQ Task Queue │ │
│ │ (Per Page) │ │ (Regex) │ │ ──► Concurrency Ctrl │ │
│ └─────────────┘ └──────────────┘ └────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
│ HTTP (1 page = 1 request)
┌──────────────────────────────────────────────────────────────────┐
│ Python Sidecar (Pure Dumb Worker) │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ ✅ PDF → Image (PyMuPDF) │ │
│ │ ✅ Ollama /v1/chat/completions call │ │
│ │ ✅ BGE-M3 Embedding (Fixed on GPU at startup) │ │
│ │ ✅ BGE-Reranker (Fixed on GPU at startup) │ │
│ │ ✅ Thai NLP Normalize (PyThaiNLP) │ │
│ └────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
```
---
## 📊 3. VRAM Budget Analysis (RTX 5060 Ti 16GB)
| Component | Model | VRAM Usage | Status |
|-----------|-------|------------|--------|
| **BGE-M3 + Reranker** | `BAAI/bge-m3` + `bge-reranker-large` | **~4.5 GB** | 🔒 **Resident** (Load once at startup, stay on GPU) |
| **np-dms-ocr** (VLM 3B) | Q4_K_M quantized | **~5.0 GB** | 🔄 **Ephemeral** (Loaded on-demand, `keep_alive=0`) |
| **np-dms-ai** (LLM 7B-8B) | Q4_K_M quantized | **~6.0 GB** | 🔄 **Ephemeral** (Loaded on-demand, `keep_alive=10m`) |
| **CUDA Context + OS** | System overhead | **~1.5 GB** | 🔒 **Fixed** |
| **Total Peak** | — | **~10.5 GB** | ✅ **Safe** (Headroom ~5.5 GB) |
### ⚠️ Critical Rule
**ห้าม** โหลด `np-dms-ocr` และ `np-dms-ai` พร้อมกันเด็ดขาด (5 + 6 = 11 GB + BGE 4.5 GB = 15.5 GB → OOM Risk)
**ทางแก้:** NestJS Backend ต้องใช้ **Mutex** บังคับให้ทำงานแบบ Sequential เท่านั้น
---
## 📝 4. Task Breakdown
### 🔴 Phase 1: Security & Critical Fixes (Priority: CRITICAL)
**Scope:** `app.py` only — ต้องทำก่อน Deploy
| # | Task | File | Status |
|---|------|------|--------|
| 1.1 | แก้ **Path Traversal** ใน `/ocr` ด้วย Path Canonicalization | `app.py` | ⬜ |
| 1.2 | แก้ **Mutable Default Argument** (`options_override: dict = {}`) | `app.py` | ⬜ |
| 1.3 | ลบ `import tempfile` ที่ซ้ำซ้อนใน `ocr_upload` | `app.py` | ⬜ |
| 1.4 | เปลี่ยน `@app.on_event("startup")``lifespan` context manager | `app.py` | ⬜ |
**Acceptance Criteria:**
- [ ] ส่ง `pdfPath: "../../../../etc/passwd"` ต้องได้ HTTP 403
- [ ] Pytest ผ่าน 100% สำหรับ Security Test Suite
---
### 🟠 Phase 2: Move Business Logic to Backend (Priority: HIGH)
**Scope:** NestJS Backend + `app.py` simplification
| # | Task | File | Status |
|---|------|------|--------|
| 2.1 | สร้าง `PromptBuilderService` ใน NestJS (Inject DMS Tags) | `backend/src/ocr/prompt-builder.service.ts` | ⬜ |
| 2.2 | สร้าง `PdfSplitterService` (แยก PDF เป็น N หน้า) | `backend/src/ocr/pdf-splitter.service.ts` | ⬜ |
| 2.3 | สร้าง `OcrNoiseFilterService` (Regex-based cleanup) | `backend/src/ocr/noise-filter.service.ts` | ⬜ |
| 2.4 | สร้าง `OcrOrchestratorService` (Loop + Concurrent Calls) | `backend/src/ocr/orchestrator.service.ts` | ⬜ |
| 2.5 | **ลบ** DMS Tag injection ออกจาก `process_ocr()` | `app.py` | ⬜ |
| 2.6 | **ลบ** `filter_ocr_noise()` ออกจาก Sidecar | `app.py` | ⬜ |
| 2.7 | **ลบ** Page loop ออกจาก `_process_pdf_doc()` (รับ page_num เดียว) | `app.py` | ⬜ |
**Acceptance Criteria:**
- [ ] Sidecar รับ Request แบบ "1 หน้า = 1 Request" เท่านั้น
- [ ] Backend สามารถประกอบ Prompt แบบ Dynamic ได้ (รองรับ Metadata Fields ใหม่ๆ)
- [ ] Concurrent OCR 5 หน้าพร้อมกัน ทำได้ผ่าน BullMQ
---
### 🟡 Phase 3: VRAM & GPU Management (Priority: HIGH)
**Scope:** `app.py` + NestJS Mutex
| # | Task | File | Status |
|---|------|------|--------|
| 3.1 | **ลบ** `bge_model.model.to("cuda"/"cpu")` ออกจาก `/embed`, `/rerank` | `app.py` | ⬜ |
| 3.2 | **แก้** `load_bge_models()` ให้ `.to("cuda")` ครั้งเดียวตอน Startup | `app.py` | ⬜ |
| 3.3 | **ลบ** `get_vram_headroom()` decision logic (เหลือแค่ Log) | `app.py` | ⬜ |
| 3.4 | สร้าง `VramMutexService` ใน NestJS (Global Async Lock) | `backend/src/gpu/vram-mutex.service.ts` | ⬜ |
| 3.5 | สร้าง `GpuTaskQueue` (BullMQ) สำหรับ OCR/Chat/Rerank | `backend/src/gpu/gpu-queue.service.ts` | ⬜ |
| 3.6 | ตั้ง `keep_alive=0` สำหรับ `np-dms-ocr` ใน Ollama config | `docker-compose.yml` | ⬜ |
| 3.7 | ตั้ง `keep_alive=10m` สำหรับ `np-dms-ai` ใน Ollama config | `docker-compose.yml` | ⬜ |
**Acceptance Criteria:**
- [ ] BGE-M3 โหลดเข้า GPU ครั้งเดียวตอน Container Start
- [ ] ไม่เกิด OOM Crash แม้รัน OCR + Chat สลับกัน 100 รอบ
- [ ] Latency ของ `/embed` ลดลงจาก ~3s → ~0.3s ต่อ Request
---
### 🟢 Phase 4: Sidecar Simplification (Priority: MEDIUM)
**Scope:** `app.py` cleanup
| # | Task | File | Status |
|---|------|------|--------|
| 4.1 | เปลี่ยน `httpx.Client` → Global `httpx.AsyncClient` (Connection Pool) | `app.py` | ⬜ |
| 4.2 | เปลี่ยน endpoint ทั้งหมดเป็น `async def` | `app.py` | ⬜ |
| 4.3 | เปลี่ยน `process_ocr()``async def process_ocr()` | `app.py` | ⬜ |
| 4.4 | เพิ่ม OpenTelemetry tracing (span per request) | `app.py` | ⬜ |
| 4.5 | เพิ่ม Prometheus metrics (`ocr_requests_total`, `inference_duration_seconds`) | `app.py` | ⬜ |
**Acceptance Criteria:**
- [ ] Sidecar รองรับ 50 concurrent requests ได้โดยไม่ Timeout
- [ ] มี Grafana Dashboard แสดง Latency p95/p99
---
## 📦 5. File Changes Summary
### 🗑️ Files to DELETE / Simplify
| File | Action | Reason |
|------|--------|--------|
| `app.py::filter_ocr_noise()` | **Delete** | ย้ายไป NestJS |
| `app.py::DMS tag injection` | **Delete** | ย้ายไป NestJS PromptBuilder |
| `app.py::Page loop` | **Delete** | ย้ายไป NestJS Orchestrator |
| `app.py::VRAM decision` | **Delete** | ย้ายไป NestJS Mutex |
| `services/vram_monitor.py` | **Delete** | ไม่จำเป็นแล้ว |
### 🆕 Files to CREATE (NestJS)
| File | Purpose |
|------|---------|
| `backend/src/ocr/prompt-builder.service.ts` | ประกอบ Prompt + Inject DMS Tags |
| `backend/src/ocr/pdf-splitter.service.ts` | แยก PDF เป็น Buffer ต่อหน้า |
| `backend/src/ocr/noise-filter.service.ts` | Regex-based text cleanup |
| `backend/src/ocr/orchestrator.service.ts` | จัดการ Page Loop + Concurrency |
| `backend/src/gpu/vram-mutex.service.ts` | Global Async Lock สำหรับ GPU Ops |
| `backend/src/gpu/gpu-queue.service.ts` | BullMQ Queue สำหรับ GPU Tasks |
### ✏️ Files to MODIFY
| File | Changes |
|------|---------|
| `app.py` | Simplify to Pure Worker (~50% reduction) |
| `docker-compose.yml` | เพิ่ม Ollama `keep_alive` config |
| `Modelfile` | Sync options กับ Sidecar payload |
---
## ✅ 6. Definition of Done (DoD)
### 🔒 Security
- [ ] Path Traversal test ผ่าน 100%
- [ ] API Key validation ครอบคลุมทุก endpoint
- [ ] `systemPrompt` length validation ทำงานถูกต้อง
### ⚡ Performance
- [ ] `/embed` latency < 500ms (p95)
- [ ] `/rerank` latency < 800ms (p95)
- [ ] OCR per page < 30s (รวม cold start)
- [ ] Concurrent 5 pages OCR ทำได้ภายใน 60s
### 🛡️ Stability
- [ ] ไม่เกิด OOM Crash ใน 24-hour stress test
- [ ] Sidecar auto-recover จาก Ollama timeout ได้
- [ ] VRAM usage คงที่ (ไม่เกิด memory leak)
### 📊 Observability
- [ ] Structured logging (JSON) ในทุก endpoint
- [ ] Prometheus metrics exposed ที่ `/metrics`
- [ ] Grafana dashboard พร้อมใช้งาน
---
## ⚠️ 7. Risks & Mitigations
| Risk | Impact | Mitigation |
|------|--------|------------|
| **OOM Crash** เมื่อโหลด 2 LLM พร้อมกัน | 🔴 Critical | NestJS Mutex บังคับ Sequential + Ollama `keep_alive=0` |
| **Path Traversal** ใน `/ocr` | 🔴 Critical | Canonicalization + Base Path Whitelist |
| **BGE-M3 Load ช้า** ตอน Startup | 🟡 Medium | Pre-download model ใน Dockerfile (no runtime download) |
| **Ollama Cold Start** (~65s) | 🟡 Medium | ใช้ Warm-up endpoint ตอน Container Start |
| **VRAM Fragmentation** จาก `.to()` calls | 🟡 Medium | **ลบ** `.to()` calls ออกทั้งหมด (Phase 3) |
---
## 🚀 8. Rollout Plan
```
Week 1: Phase 1 (Security) ────────────────► Deploy to Staging
Week 2: Phase 3 (VRAM) ────────────────────► Load Test 24h
Week 3: Phase 2 (Move Logic to Backend) ───► Integration Test
Week 4: Phase 4 (Simplification) ──────────► Production Release
```
### 🔄 Rollback Strategy
- ทุก Phase ต้องมี **Feature Flag** เปิด/ปิดได้
- Sidecar เก่ายังคง Deploy คู่ขนานได้ 2 สัปดาห์หลัง Release
- NestJS Backend สามารถ Fallback ไปใช้ Sidecar เก่าได้ผ่าน Env Var `OCR_SIDECAR_VERSION=v1|v2`
---
## 📚 9. References
- [ADR-023A] OCR Engine Selection (revised 2026-06-11)
- [ADR-033] Engine Switching Strategy
- [ADR-034] np-dms-ocr as Canonical Engine
- [ADR-036] Model Naming Convention
- [T015], [T025], [T026-T028] — Technical Specs จาก Change Log
- [NAP-DMS Spec 04-00] Infrastructure & OPS
---
**Prepared by:** Document Intelligence Engine
**Reviewed by:** _Pending_
**Approved by:** _Pending_
+8 -8
View File
@@ -75,9 +75,9 @@ function normalizeLoadedModels(value: unknown): VramLoadedModelView[] {
if (typeof item === 'string') { if (typeof item === 'string') {
const name = item.toLowerCase(); const name = item.toLowerCase();
let normName = item; let normName = item;
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) { if (name.includes(OCR_MODEL_NAME)) {
normName = OCR_MODEL_NAME; normName = OCR_MODEL_NAME;
} else if (name.includes('typhoon') || name.includes(MAIN_MODEL_NAME)) { } else if (name.includes(MAIN_MODEL_NAME)) {
normName = MAIN_MODEL_NAME; normName = MAIN_MODEL_NAME;
} }
return { return {
@@ -95,9 +95,9 @@ function normalizeLoadedModels(value: unknown): VramLoadedModelView[] {
const rawName = model.modelName ?? model.name ?? `model-${index + 1}`; const rawName = model.modelName ?? model.name ?? `model-${index + 1}`;
const name = rawName.toLowerCase(); const name = rawName.toLowerCase();
let normName = rawName; let normName = rawName;
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) { if (name.includes(OCR_MODEL_NAME)) {
normName = OCR_MODEL_NAME; normName = OCR_MODEL_NAME;
} else if (name.includes('typhoon') || name.includes(MAIN_MODEL_NAME)) { } else if (name.includes(MAIN_MODEL_NAME)) {
normName = MAIN_MODEL_NAME; normName = MAIN_MODEL_NAME;
} }
return { return {
@@ -115,8 +115,8 @@ function normalizeLoadedModels(value: unknown): VramLoadedModelView[] {
function toCanonicalModel(rawName: string): string { function toCanonicalModel(rawName: string): string {
const name = rawName.toLowerCase(); const name = rawName.toLowerCase();
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) return OCR_MODEL_NAME; if (name.includes(OCR_MODEL_NAME)) return OCR_MODEL_NAME;
if (name.includes('typhoon') || name.includes(MAIN_MODEL_NAME)) return MAIN_MODEL_NAME; if (name.includes(MAIN_MODEL_NAME)) return MAIN_MODEL_NAME;
return rawName; return rawName;
} }
@@ -193,8 +193,8 @@ export default function AiAdminConsolePage() {
new Set( new Set(
rawHealthOllamaModels.map((m) => { rawHealthOllamaModels.map((m) => {
const name = m.toLowerCase(); const name = m.toLowerCase();
if (name.includes('ocr') || name.includes('typhoon-np-dms-ocr')) return OCR_MODEL_NAME; if (name.includes(OCR_MODEL_NAME)) return OCR_MODEL_NAME;
if (name.includes('typhoon') || name.includes(MAIN_MODEL_NAME)) return MAIN_MODEL_NAME; if (name.includes(MAIN_MODEL_NAME)) return MAIN_MODEL_NAME;
return m; return m;
}) })
) )
@@ -76,7 +76,7 @@ export default function OcrEngineSelector() {
</CardHeader> </CardHeader>
<CardContent className="space-y-4"> <CardContent className="space-y-4">
{engines.map((engine) => { {engines.map((engine) => {
const isTyphoon = engine.engineType === 'typhoon_ocr'; const isAiPowered = engine.engineType === 'np_dms_ocr';
return ( return (
<div <div
key={engine.engineId} key={engine.engineId}
@@ -95,14 +95,14 @@ export default function OcrEngineSelector() {
</Badge> </Badge>
)} )}
{isTyphoon && ( {isAiPowered && (
<Badge variant="secondary" className="text-[10px] h-4 bg-purple-500/10 text-purple-600 dark:text-purple-400 border-purple-500/20"> <Badge variant="secondary" className="text-[10px] h-4 bg-purple-500/10 text-purple-600 dark:text-purple-400 border-purple-500/20">
AI Powered AI Powered
</Badge> </Badge>
)} )}
</div> </div>
<p className="text-xs text-muted-foreground leading-relaxed"> <p className="text-xs text-muted-foreground leading-relaxed">
{isTyphoon {isAiPowered
? 'สกัดภาษาไทยความแม่นยำสูง (95%+) เหมาะสำหรับภาษาไทยผสมอังกฤษ' ? 'สกัดภาษาไทยความแม่นยำสูง (95%+) เหมาะสำหรับภาษาไทยผสมอังกฤษ'
: 'เอนจินมาตรฐานเบสไลน์ ประมวลผลรวดเร็วและใช้ทรัพยากรต่ำ'} : 'เอนจินมาตรฐานเบสไลน์ ประมวลผลรวดเร็วและใช้ทรัพยากรต่ำ'}
</p> </p>
@@ -111,7 +111,7 @@ export default function OcrEngineSelector() {
<Server className="h-3 w-3" /> <Server className="h-3 w-3" />
: {engine.concurrentLimit} : {engine.concurrentLimit}
</span> </span>
{isTyphoon && ( {isAiPowered && (
<> <>
<span className="flex items-center gap-1 text-purple-600 dark:text-purple-400"> <span className="flex items-center gap-1 text-purple-600 dark:text-purple-400">
<Cpu className="h-3 w-3" /> <Cpu className="h-3 w-3" />
@@ -133,9 +133,9 @@ export default function OcrSandboxPromptManager() {
// 2-step flow states // 2-step flow states
const [sandboxStep, setSandboxStep] = useState<'ocr' | 'ai'>('ocr'); const [sandboxStep, setSandboxStep] = useState<'ocr' | 'ai'>('ocr');
const [selectedOcrEngine, setSelectedOcrEngine] = useState<string>('auto'); const [selectedOcrEngine, setSelectedOcrEngine] = useState<string>('auto');
const [typhoonTemperature, setTyphoonTemperature] = useState<number>(0.1); const [ocrTemperature, setOcrTemperature] = useState<number>(0.1);
const [typhoonTopP, setTyphoonTopP] = useState<number>(0.1); const [ocrTopP, setOcrTopP] = useState<number>(0.1);
const [typhoonRepeatPenalty, setTyphoonRepeatPenalty] = useState<number>(1.1); const [ocrRepeatPenalty, setOcrRepeatPenalty] = useState<number>(1.1);
const { data: ocrEnginesData } = useQuery<OcrEngineResponse[]>({ const { data: ocrEnginesData } = useQuery<OcrEngineResponse[]>({
queryKey: ['ocr-engines'], queryKey: ['ocr-engines'],
queryFn: () => adminAiService.getOcrEngines(), queryFn: () => adminAiService.getOcrEngines(),
@@ -250,9 +250,9 @@ export default function OcrSandboxPromptManager() {
if (!ocrEnginesData) return base; if (!ocrEnginesData) return base;
const mapped = ocrEnginesData.map((e: OcrEngineResponse) => { const mapped = ocrEnginesData.map((e: OcrEngineResponse) => {
const value = const value =
e.engineType === 'tesseract' e.engineType === 'fast_path'
? 'tesseract' ? 'auto'
: e.engineType === 'typhoon_ocr' : e.engineType === 'np_dms_ocr'
? 'np-dms-ocr' ? 'np-dms-ocr'
: e.engineType; : e.engineType;
const vramLabel = const vramLabel =
@@ -354,13 +354,13 @@ export default function OcrSandboxPromptManager() {
try { try {
resetSandbox(); resetSandbox();
setSandboxStep('ocr'); setSandboxStep('ocr');
const typhoonOptions = selectedOcrEngine === 'np-dms-ocr' const ocrOptions = selectedOcrEngine === 'np-dms-ocr'
? { temperature: typhoonTemperature, topP: typhoonTopP, repeatPenalty: typhoonRepeatPenalty } ? { temperature: ocrTemperature, topP: ocrTopP, repeatPenalty: ocrRepeatPenalty }
: undefined; : undefined;
const { requestPublicId } = await adminAiService.submitSandboxOcr( const { requestPublicId } = await adminAiService.submitSandboxOcr(
ocrFile, ocrFile,
selectedOcrEngine, selectedOcrEngine,
typhoonOptions ocrOptions
); );
toast.success(t('ai.prompt.uploadSuccess')); toast.success(t('ai.prompt.uploadSuccess'));
// Poll สำหรับผลลัพธ์ OCR // Poll สำหรับผลลัพธ์ OCR
@@ -429,9 +429,9 @@ export default function OcrSandboxPromptManager() {
setOcrResult(null); setOcrResult(null);
setSelectedPromptVersion(undefined); setSelectedPromptVersion(undefined);
setSelectedOcrEngine('auto'); setSelectedOcrEngine('auto');
setTyphoonTemperature(0.1); setOcrTemperature(0.1);
setTyphoonTopP(0.1); setOcrTopP(0.1);
setTyphoonRepeatPenalty(1.1); setOcrRepeatPenalty(1.1);
setOcrFile(null); setOcrFile(null);
setSelectedProjectPublicId(''); setSelectedProjectPublicId('');
setSelectedContractPublicId(''); setSelectedContractPublicId('');
@@ -677,37 +677,37 @@ export default function OcrSandboxPromptManager() {
</div> </div>
{selectedOcrEngine === 'np-dms-ocr' && ( {selectedOcrEngine === 'np-dms-ocr' && (
<div className="space-y-3 rounded-md border border-dashed border-amber-500/30 bg-amber-500/5 p-3"> <div className="space-y-3 rounded-md border border-dashed border-amber-500/30 bg-amber-500/5 p-3">
<p className="text-xs font-medium text-amber-600 dark:text-amber-400">Typhoon OCR Options <span className="font-normal text-muted-foreground">(override Modelfile defaults)</span></p> <p className="text-xs font-medium text-amber-600 dark:text-amber-400">OCR Options <span className="font-normal text-muted-foreground">(override Modelfile defaults)</span></p>
<div className="space-y-1"> <div className="space-y-1">
<div className="flex justify-between text-xs"> <div className="flex justify-between text-xs">
<label>Temperature</label> <label>Temperature</label>
<span className="font-mono text-muted-foreground">{typhoonTemperature.toFixed(2)}</span> <span className="font-mono text-muted-foreground">{ocrTemperature.toFixed(2)}</span>
</div> </div>
<input type="range" min={0} max={1} step={0.01} <input type="range" min={0} max={1} step={0.01}
value={typhoonTemperature} value={ocrTemperature}
onChange={(e) => setTyphoonTemperature(parseFloat(e.target.value))} onChange={(e) => setOcrTemperature(parseFloat(e.target.value))}
className="w-full h-1.5 accent-amber-500" className="w-full h-1.5 accent-amber-500"
/> />
</div> </div>
<div className="space-y-1"> <div className="space-y-1">
<div className="flex justify-between text-xs"> <div className="flex justify-between text-xs">
<label>Top-P</label> <label>Top-P</label>
<span className="font-mono text-muted-foreground">{typhoonTopP.toFixed(2)}</span> <span className="font-mono text-muted-foreground">{ocrTopP.toFixed(2)}</span>
</div> </div>
<input type="range" min={0} max={1} step={0.01} <input type="range" min={0} max={1} step={0.01}
value={typhoonTopP} value={ocrTopP}
onChange={(e) => setTyphoonTopP(parseFloat(e.target.value))} onChange={(e) => setOcrTopP(parseFloat(e.target.value))}
className="w-full h-1.5 accent-amber-500" className="w-full h-1.5 accent-amber-500"
/> />
</div> </div>
<div className="space-y-1"> <div className="space-y-1">
<div className="flex justify-between text-xs"> <div className="flex justify-between text-xs">
<label>Repeat Penalty</label> <label>Repeat Penalty</label>
<span className="font-mono text-muted-foreground">{typhoonRepeatPenalty.toFixed(2)}</span> <span className="font-mono text-muted-foreground">{ocrRepeatPenalty.toFixed(2)}</span>
</div> </div>
<input type="range" min={1} max={2} step={0.01} <input type="range" min={1} max={2} step={0.01}
value={typhoonRepeatPenalty} value={ocrRepeatPenalty}
onChange={(e) => setTyphoonRepeatPenalty(parseFloat(e.target.value))} onChange={(e) => setOcrRepeatPenalty(parseFloat(e.target.value))}
className="w-full h-1.5 accent-amber-500" className="w-full h-1.5 accent-amber-500"
/> />
</div> </div>
@@ -864,14 +864,14 @@ export default function OcrSandboxPromptManager() {
{ocrResult.engineUsed === 'np-dms-ocr' {ocrResult.engineUsed === 'np-dms-ocr'
? 'np-dms-ocr' ? 'np-dms-ocr'
: ocrResult.ocrUsed : ocrResult.ocrUsed
? 'Tesseract' ? 'Fast Path (OCR)'
: 'Fast Path (Text Layer)'} : 'Fast Path (Text Layer)'}
</Badge> </Badge>
</CardHeader> </CardHeader>
<CardContent className="pt-4"> <CardContent className="pt-4">
{ocrResult.fallbackUsed && ( {ocrResult.fallbackUsed && (
<div className="mb-3 rounded-md border border-amber-500/20 bg-amber-500/5 px-3 py-2 text-xs text-amber-600 dark:text-amber-400"> <div className="mb-3 rounded-md border border-amber-500/20 bg-amber-500/5 px-3 py-2 text-xs text-amber-600 dark:text-amber-400">
np-dms-ocr unavailable. Fallback to Tesseract was used for this run. np-dms-ocr unavailable. Fallback to Fast Path was used for this run.
</div> </div>
)} )}
<div className="relative rounded-md bg-muted p-4 font-mono text-xs overflow-auto max-h-[200px] border border-border/10"> <div className="relative rounded-md bg-muted p-4 font-mono text-xs overflow-auto max-h-[200px] border border-border/10">
@@ -1,7 +1,7 @@
// File: frontend/components/admin/ai/SandboxTestArea.tsx // File: frontend/components/admin/ai/SandboxTestArea.tsx
// Change Log: // Change Log:
// - 2026-06-15: Created SandboxTestArea component with UI elements for 3-step sandbox testing (T038) // - 2026-06-15: Created SandboxTestArea component with UI elements for 3-step sandbox testing (T038)
// - 2026-06-17: ลบ Tesseract ออกจาก OCR Engine dropdown ตาม ADR-035 (ใช้ Typhoon OCR ผ่าน Ollama) // - 2026-06-17: ลบ Tesseract ออกจาก OCR Engine dropdown ตาม ADR-035 (ใช้ np-dms-ocr ผ่าน Ollama)
import React, { useState } from 'react'; import React, { useState } from 'react';
import { Card, CardContent, CardHeader, CardTitle, CardDescription } from '@/components/ui/card'; import { Card, CardContent, CardHeader, CardTitle, CardDescription } from '@/components/ui/card';
@@ -254,8 +254,8 @@ export default function SandboxTestArea({
<SelectValue placeholder="เลือกเอนจิน..." /> <SelectValue placeholder="เลือกเอนจิน..." />
</SelectTrigger> </SelectTrigger>
<SelectContent> <SelectContent>
<SelectItem value="auto" className="text-xs">Auto (Fast Path / Typhoon OCR)</SelectItem> <SelectItem value="auto" className="text-xs">Auto (Fast Path / np-dms-ocr)</SelectItem>
<SelectItem value="np-dms-ocr" className="text-xs">Typhoon OCR (AI Vision)</SelectItem> <SelectItem value="np-dms-ocr" className="text-xs">np-dms-ocr (AI Vision)</SelectItem>
</SelectContent> </SelectContent>
</Select> </Select>
</div> </div>
@@ -28,16 +28,16 @@ vi.mock('sonner', () => ({
const mockEngines = [ const mockEngines = [
{ {
engineId: 'engine-1', engineId: 'engine-1',
engineName: 'Tesseract OCR', engineName: 'Fast Path (PyMuPDF)',
engineType: 'tesseract', engineType: 'fast_path',
isCurrentActive: true, isCurrentActive: true,
concurrentLimit: 4, concurrentLimit: 10,
vramRequirementMB: 0, vramRequirementMB: 0,
}, },
{ {
engineId: 'engine-2', engineId: 'engine-2',
engineName: 'Typhoon OCR', engineName: 'np-dms-ocr',
engineType: 'typhoon_ocr', engineType: 'np_dms_ocr',
isCurrentActive: false, isCurrentActive: false,
concurrentLimit: 1, concurrentLimit: 1,
vramRequirementMB: 4096, vramRequirementMB: 4096,
@@ -67,10 +67,10 @@ describe('OcrEngineSelector', () => {
expect(screen.getByText('ระบบจัดการ OCR Engine')).toBeInTheDocument(); expect(screen.getByText('ระบบจัดการ OCR Engine')).toBeInTheDocument();
}); });
expect(screen.getByText('Tesseract OCR')).toBeInTheDocument(); expect(screen.getByText('Fast Path (PyMuPDF)')).toBeInTheDocument();
expect(screen.getByText('Typhoon OCR')).toBeInTheDocument(); expect(screen.getByText('np-dms-ocr')).toBeInTheDocument();
expect(screen.getByText('กำลังใช้งาน')).toBeInTheDocument(); // Badge for active engine expect(screen.getByText('กำลังใช้งาน')).toBeInTheDocument(); // Badge for active engine
expect(screen.getByText('AI Powered')).toBeInTheDocument(); // Badge for typhoon expect(screen.getByText('AI Powered')).toBeInTheDocument(); // Badge for np-dms-ocr
}); });
it('calls selectOcrEngine and shows success toast when changing engine', async () => { it('calls selectOcrEngine and shows success toast when changing engine', async () => {
@@ -92,7 +92,7 @@ describe('OcrEngineSelector', () => {
}); });
expect(adminAiService.selectOcrEngine).toHaveBeenCalledWith('engine-2'); expect(adminAiService.selectOcrEngine).toHaveBeenCalledWith('engine-2');
expect(toast.success).toHaveBeenCalledWith('เปลี่ยนเอนจิน OCR หลักเป็น Typhoon OCR สำเร็จ'); expect(toast.success).toHaveBeenCalledWith('เปลี่ยนเอนจิน OCR หลักเป็น np-dms-ocr สำเร็จ');
// It should fetch engines again // It should fetch engines again
expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(2); expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(2);
@@ -18,17 +18,17 @@ vi.mock('@/lib/services/admin-ai.service', () => ({
const engines: OcrEngineResponse[] = [ const engines: OcrEngineResponse[] = [
{ {
engineId: 'tesseract', engineId: 'fast-path',
engineName: 'Tesseract OCR', engineName: 'Fast Path (PyMuPDF)',
engineType: 'tesseract', engineType: 'fast_path',
isCurrentActive: true, isCurrentActive: true,
concurrentLimit: 4, concurrentLimit: 10,
vramRequirementMB: 0, vramRequirementMB: 0,
}, },
{ {
engineId: 'typhoon', engineId: 'np-dms-ocr',
engineName: 'Typhoon OCR', engineName: 'np-dms-ocr',
engineType: 'typhoon_ocr', engineType: 'np_dms_ocr',
isCurrentActive: false, isCurrentActive: false,
concurrentLimit: 1, concurrentLimit: 1,
vramRequirementMB: 6144, vramRequirementMB: 6144,
@@ -44,8 +44,8 @@ describe('OcrEngineSelector', () => {
it('renders OCR engine data from admin service', async () => { it('renders OCR engine data from admin service', async () => {
render(<OcrEngineSelector />); render(<OcrEngineSelector />);
expect(await screen.findByText('Tesseract OCR')).toBeInTheDocument(); expect(await screen.findByText('Fast Path (PyMuPDF)')).toBeInTheDocument();
expect(screen.getByText('Typhoon OCR')).toBeInTheDocument(); expect(screen.getByText('np-dms-ocr')).toBeInTheDocument();
expect(screen.getByText('AI Powered')).toBeInTheDocument(); expect(screen.getByText('AI Powered')).toBeInTheDocument();
expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(1); expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(1);
}); });
@@ -55,9 +55,9 @@ describe('OcrEngineSelector', () => {
render(<OcrEngineSelector />); render(<OcrEngineSelector />);
await user.click(await screen.findByRole('button', { name: 'สลับใช้งาน' })); await user.click(await screen.findByRole('button', { name: 'สลับใช้งาน' }));
await waitFor(() => { await waitFor(() => {
expect(adminAiService.selectOcrEngine).toHaveBeenCalledWith('typhoon'); expect(adminAiService.selectOcrEngine).toHaveBeenCalledWith('np-dms-ocr');
}); });
expect(toast.success).toHaveBeenCalledWith('เปลี่ยนเอนจิน OCR หลักเป็น Typhoon OCR สำเร็จ'); expect(toast.success).toHaveBeenCalledWith('เปลี่ยนเอนจิน OCR หลักเป็น np-dms-ocr สำเร็จ');
expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(2); expect(adminAiService.getOcrEngines).toHaveBeenCalledTimes(2);
}); });
@@ -100,7 +100,7 @@ vi.mock('@/lib/services/admin-ai.service', () => ({
adminAiService: { adminAiService: {
getOcrEngines: vi.fn().mockResolvedValue([ getOcrEngines: vi.fn().mockResolvedValue([
{ {
engineType: 'typhoon_ocr', engineType: 'np_dms_ocr',
engineName: 'np-dms-ocr', engineName: 'np-dms-ocr',
vramRequirementMB: 4096, vramRequirementMB: 4096,
isCurrentActive: true, isCurrentActive: true,
+7 -7
View File
@@ -281,19 +281,19 @@ export const adminAiService = {
submitSandboxOcr: async ( submitSandboxOcr: async (
file: File, file: File,
engineType: string = 'auto', engineType: string = 'auto',
typhoonOptions?: { temperature?: number; topP?: number; repeatPenalty?: number } ocrOptions?: { temperature?: number; topP?: number; repeatPenalty?: number }
): Promise<{ requestPublicId: string; jobId: string; status: string }> => { ): Promise<{ requestPublicId: string; jobId: string; status: string }> => {
const formData = new FormData(); const formData = new FormData();
formData.append('file', file); formData.append('file', file);
formData.append('engineType', engineType); formData.append('engineType', engineType);
if (typhoonOptions?.temperature !== undefined) { if (ocrOptions?.temperature !== undefined) {
formData.append('temperature', String(typhoonOptions.temperature)); formData.append('temperature', String(ocrOptions.temperature));
} }
if (typhoonOptions?.topP !== undefined) { if (ocrOptions?.topP !== undefined) {
formData.append('topP', String(typhoonOptions.topP)); formData.append('topP', String(ocrOptions.topP));
} }
if (typhoonOptions?.repeatPenalty !== undefined) { if (ocrOptions?.repeatPenalty !== undefined) {
formData.append('repeatPenalty', String(typhoonOptions.repeatPenalty)); formData.append('repeatPenalty', String(ocrOptions.repeatPenalty));
} }
const { data } = await api.post('/ai/admin/sandbox/ocr', formData, { const { data } = await api.post('/ai/admin/sandbox/ocr', formData, {
headers: { headers: {
+41 -3
View File
@@ -27,7 +27,7 @@
> การตัดสินใจเหล่านี้ **ไม่สามารถเปลี่ยนแปลงได้** โดยไม่ได้รับ Explicit Approval > การตัดสินใจเหล่านี้ **ไม่สามารถเปลี่ยนแปลงได้** โดยไม่ได้รับ Explicit Approval
| ID | Decision | ADR | | ID | Decision | ADR |
| --- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | | --- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------ |
| D1 | n8n = Migration Phase orchestrator เท่านั้น — ห้ามทำ New Correspondence pipeline ผ่าน n8n | ADR-023A | | D1 | n8n = Migration Phase orchestrator เท่านั้น — ห้ามทำ New Correspondence pipeline ผ่าน n8n | ADR-023A |
| D2 | New Correspondence → BullMQ `ai-realtime` queue โดยตรง (ไม่ผ่าน n8n) | ADR-023A | | D2 | New Correspondence → BullMQ `ai-realtime` queue โดยตรง (ไม่ผ่าน n8n) | ADR-023A |
| D3 | n8n ต้อง call `POST /api/ai/jobs` (DMS Backend) เท่านั้น — ห้าม call Ollama/Qdrant โดยตรง | ADR-023A | | D3 | n8n ต้อง call `POST /api/ai/jobs` (DMS Backend) เท่านั้น — ห้าม call Ollama/Qdrant โดยตรง | ADR-023A |
@@ -48,18 +48,26 @@
| D18 | Deploy script ต้องตรวจสอบ ClamAV health status ก่อน recreation — ถ้า healthy ให้ recreate เฉพาะ backend/frontend (skip 5-minute healthcheck delay) | Session 2026-06-19 | | D18 | Deploy script ต้องตรวจสอบ ClamAV health status ก่อน recreation — ถ้า healthy ให้ recreate เฉพาะ backend/frontend (skip 5-minute healthcheck delay) | Session 2026-06-19 |
| D19 | CI timeout ต้องอย่างน้อย 30 minutes เพื่อรองรับ ClamAV startup กรณีต้อง recreate full stack | Session 2026-06-19 | | D19 | CI timeout ต้องอย่างน้อย 30 minutes เพื่อรองรับ ClamAV startup กรณีต้อง recreate full stack | Session 2026-06-19 |
| D20 | AI Admin frontend services ต้อง normalize API response envelope ที่อาจซ้อน `data` ก่อน render; VRAM `totalVRAMMB = 0` คือ unknown capacity ไม่ใช่ OOM Guard | Session 2026-06-19 | | D20 | AI Admin frontend services ต้อง normalize API response envelope ที่อาจซ้อน `data` ก่อน render; VRAM `totalVRAMMB = 0` คือ unknown capacity ไม่ใช่ OOM Guard | Session 2026-06-19 |
| D21 | OCR Sidecar = Pure Compute Worker — orchestration/params อยู่ใน backend existing services (reject PromptBuilderService, OcrNoiseFilterService, OcrOrchestratorService) | ADR-040 D1 |
| D22 | Wire `calculate_ocr_residency()` ใน `process_ocr` — keep_alive เป็น lazy resource param (ADR-036 Gap-2), ห้าม fixed value | ADR-040 D3 |
| D23 | Retain vram_monitor + CPU-fallback for `/embed`,`/rerank` — ห้าม force BGE+Reranker GPU-resident, เคารณะ LLM-First GPU Ownership + CPU Fallback Retrieval | ADR-040 D4 |
| D24 | Remove X-API-Key from sidecar — auth = network isolation (supersedes ADR-033 §7), sequencing: ลบเฉพาะหลัง ADR-041 cutover (single Docker host) | ADR-040 D5 |
| D25 | Server Consolidation — co-locate ทุก services บน single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB), retire Desk-5439 | ADR-041 D1 |
| D26 | ASUSTOR (192.168.10.9) = Primary NAS (CIFS share np-dms-as), QNAP = Backup server เท่านั้น | ADR-041 D2 |
| D27 | Docker-internal network only for sidecar/Ollama — enables ADR-040 D5 network-only auth, QNAP backend → new host consolidation | ADR-041 D3 |
| D28 | Canonical naming enforced: `np-dms-ai` (LLM), `np-dms-ocr` (OCR), `fast-path` (PyMuPDF) — ลบ `typhoon-llm`, `tesseract`, `Typhoon OCR` ออกจาก code; `OCR_SIDECAR_API_KEY` mandatory (no default); backend ไม่ส่ง `keep_alive` (sidecar คำนวณเอง) | ADR-040/034 |
## Environment & Services ## Environment & Services
| Service | Local URL / Port | Production | Notes | | Service | Local URL / Port | Production | Notes |
| ---------------- | ----------------------------- | --------------------------------- | ------------------------------------------------------------------------------------------ | | ---------------- | ----------------------------- | --------------------------------- | -------------------------------------------------------------------------------------------------- |
| **Backend API** | `http://localhost:3001` | `https://backend.np-dms.work/api` | NestJS — port 3000 in container, exposed via Nginx Proxy Manager | | **Backend API** | `http://localhost:3001` | `https://backend.np-dms.work/api` | NestJS — port 3000 in container, exposed via Nginx Proxy Manager |
| **Frontend** | `http://localhost:3000` | QNAP `192.168.10.8` | Next.js | | **Frontend** | `http://localhost:3000` | QNAP `192.168.10.8` | Next.js |
| **MariaDB** | `localhost:3307` | QNAP internal | DB: `lcbp3`, root via docker | | **MariaDB** | `localhost:3307` | QNAP internal | DB: `lcbp3`, root via docker |
| **Redis** | `localhost:6379` | QNAP internal | BullMQ + session store | | **Redis** | `localhost:6379` | QNAP internal | BullMQ + session store |
| **Ollama** | `http://192.168.10.100:11434` | Admin Desktop (Desk-5439) | typhoon2.5-np-dms:latest (main) + typhoon-np-dms-ocr:latest (OCR, keep_alive:0) | | **Ollama** | `http://192.168.10.100:11434` | Admin Desktop (Desk-5439) | typhoon2.5-np-dms:latest (main) + typhoon-np-dms-ocr:latest (OCR, keep_alive:0) |
| **Qdrant** | `http://localhost:6333` | Admin Desktop (Desk-5439) | Vector DB — requires projectPublicId | | **Qdrant** | `http://localhost:6333` | Admin Desktop (Desk-5439) | Vector DB — requires projectPublicId |
| **OCR Sidecar** | `http://192.168.10.100:8765` | Admin Desktop (Desk-5439) | Tesseract (fallback) / Typhoon OCR-3B (primary) + BGE-M3 `/embed` + BGE-Reranker `/rerank` | | **OCR Sidecar** | `http://192.168.10.100:8765` | Admin Desktop (Desk-5439) | np-dms-ocr (Ollama) + BGE-M3 `/embed` + BGE-Reranker `/rerank`; async I/O, lifespan, no /normalize |
| **Gitea** | `https://git.np-dms.work` | QNAP `192.168.10.8` | Source + CI/CD | | **Gitea** | `https://git.np-dms.work` | QNAP `192.168.10.8` | Source + CI/CD |
| **Gitea Runner** | ASUSTOR `192.168.10.9` | — | CI runner | | **Gitea Runner** | ASUSTOR `192.168.10.9` | — | CI runner |
@@ -75,6 +83,36 @@ QDRANT_URL
## Next Session Focus ## Next Session Focus
### OCR Backend Cleanup (Session 2026-06-20) ✅ COMPLETE
- [x] **P1-1:** ลบ `keep_alive` จาก backend form data
- [x] **P1-2:** ลบ hardcoded API key defaults (ocr.service.ts + sandbox-ocr-engine.service.ts)
- [x] **P2-1:** Align env var `OCR_SIDECAR_API_KEY` ใน `.env.example`
- [x] **P2-2:** Fix OCR URL + ลบ `THAI_PREPROCESS_URL` ใน `.env.example`
- [x] **P2-5:** Bump Dockerfile เป็น `python:3.11-slim`
- [x] **P3-1/P3-2:** Wrap sync VRAM calls ใน `asyncio.to_thread()`
- [x] **Rename typhoon-llm → np-dms-ai:** สร้าง `np-dms-ai.processor.ts`, ลบ `typhoon-llm.processor.ts`, อัปเดต `ai.module.ts`
- [x] **Tesseract cleanup:** enum, entity, controller, service, audit log, tests
- [x] **User renamed:** `typhoon-ocr.processor.ts``np-dms-ocr-processor.ts`
- [x] **Rename TyphoonOcr → NpDmsOcr:** `TyphoonOcrProcessor``NpDmsOcrProcessor`, `QUEUE_TYPHOON_OCR``QUEUE_NP_DMS_OCR`, `OcrTyphoonOptions``OcrNpDmsOptions`, `typhoonOptions``ocrOptions` (backend 7 files + 3 tests)
- [x] **Frontend cleanup:** `isTyphoon``isAiPowered`, state vars `typhoon*``ocr*`, Tesseract mocks → Fast Path, dead `typhoon_ocr` checks removed, `page.tsx` model name constants
- [ ] **Verify:** `tsc --noEmit` หลัง rename ครบ (backend + frontend)
### ADR-040/041 Implementation
- [x] **OCR Sidecar Refactor (Speckit-140):** Phases 1-6, 8, 9 complete (T001-T046, T054-T063)
- [x] Phase 1-2: Setup + Foundational (T001-T006)
- [x] Phase 3: US1 Security Hardening (T007-T015) — path traversal, API key fail-fast
- [x] Phase 4: US2 GPU Resource Management (T016-T025) — residency wiring, CPU fallback
- [x] Phase 5: US3 Parameter Governance (T026-T040) — backend param resolution
- [x] Phase 6: US4 Async I/O (T041-T046) — async def, lifespan context manager, AsyncClient
- [x] Phase 8: Remove /normalize endpoint (T054-T055)
- [x] Phase 9: Polish & validation (T056-T063) — Dockerfile, docker-compose, README, quickstart
- [ ] Phase 7: US5 Network Isolation Auth (T047-T053) — BLOCKED until ADR-041 cutover
- [ ] **ADR-041 Infrastructure:** Provision new host, mount ASUSTOR CIFS, deploy docker-compose
- [ ] **ADR-040 Auth Removal:** Remove X-API-Key from sidecar + backend (T048-T053) — **ONLY AFTER ADR-041 cutover**
- [ ] **ADR-041 Cutover:** Migrate DB/ES, update DNS, smoke tests, retire Desk-5439
### N8N Migration & E2E Testing ### N8N Migration & E2E Testing
- [ ] **Import `n8n.workflow.v2.json`** เข้า n8n UI และทดสอบ End-to-End - [ ] **Import `n8n.workflow.v2.json`** เข้า n8n UI และทดสอบ End-to-End
@@ -6,3 +6,7 @@
QNAP_SMB_USER=your_qnap_username QNAP_SMB_USER=your_qnap_username
QNAP_SMB_PASS=your_qnap_password QNAP_SMB_PASS=your_qnap_password
# OCR Sidecar security and storage boundary
OCR_SIDECAR_API_KEY=change-me-sidecar-api-key
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
@@ -9,15 +9,17 @@
# Container รันบน CPU เท่านั้น ไม่ต้องการ CUDA/GPU ใน container # Container รันบน CPU เท่านั้น ไม่ต้องการ CUDA/GPU ใน container
# - 2026-06-11: เพิ่ม typhoon-ocr ใน requirements.txt — poppler-utils มีอยู่แล้ว (ใช้โดย prepare_ocr_messages) # - 2026-06-11: เพิ่ม typhoon-ocr ใน requirements.txt — poppler-utils มีอยู่แล้ว (ใช้โดย prepare_ocr_messages)
# - 2026-06-11: ตัด tesseract-ocr, tesseract-ocr-tha, tesseract-ocr-eng, libsm6, libxext6, libxrender1, libfontconfig1, libx11-6 — ไม่ใช้ Tesseract อีกต่อไป # - 2026-06-11: ตัด tesseract-ocr, tesseract-ocr-tha, tesseract-ocr-eng, libsm6, libxext6, libxrender1, libfontconfig1, libx11-6 — ไม่ใช้ Tesseract อีกต่อไป
# - 2026-06-20: ADR-040 Phase 6+8 — เพิ่ม curl สำหรับ HEALTHCHECK; ลด start_period เป็น 10s (async startup ไม่ block)
FROM python:3.10-slim FROM python:3.11-slim
# ติดตั้ง system dependencies สำหรับ PDF processing และ PyMuPDF # ติดตั้ง system dependencies สำหรับ PDF processing, PyMuPDF และ curl สำหรับ healthcheck
RUN apt-get update && apt-get install -y --no-install-recommends \ RUN apt-get update && apt-get install -y --no-install-recommends \
libglib2.0-0 \ libglib2.0-0 \
libgl1 \ libgl1 \
libgomp1 \ libgomp1 \
poppler-utils \ poppler-utils \
curl \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/lib/apt/lists/*
WORKDIR /app WORKDIR /app
@@ -0,0 +1,115 @@
# OCR Sidecar — Desk-5439
HTTP API server สำหรับสกัดข้อความจาก PDF ผ่าน np-dms-ocr (Ollama) — รันบน Desk-5439 ตาม ADR-023A/ADR-040.
## สถาปัตยกรรม
```
Backend (QNAP) → POST /ocr-upload → OCR Sidecar (Desk-5439:8765)
PyMuPDF (fast-path: chars > 100)
↓ (ถ้า chars ≤ 100)
prepare_ocr_messages (typhoon_ocr)
+ poppler/pdftoppm (PDF → image)
np-dms-ocr via Ollama /v1/chat/completions
JSON → natural_text (Markdown)
```
## Endpoints
| Endpoint | Method | Auth | หน้าที่ |
|----------|--------|------|---------|
| `/health` | GET | — | ตรวจสอบสถานะ sidecar |
| `/ocr` | POST | X-API-Key | OCR จาก path (ใช้เมื่อ shared volume mount) |
| `/ocr-upload` | POST | X-API-Key | OCR จาก multipart file upload |
| `/embed` | POST | X-API-Key | BGE-M3 embedding (Dense + Sparse) พร้อม CPU fallback |
| `/rerank` | POST | X-API-Key | BGE-Reranker-Large chunk re-ranker พร้อม CPU fallback |
**Removed endpoints:**
- `POST /normalize` — ลบออกแล้วตาม ADR-040 Phase 8 (ไม่มี consumers)
## Environment Variables
| Variable | Default | หน้าที่ |
|----------|---------|---------|
| `OCR_SIDECAR_API_KEY` | (required) | API key สำหรับ authentication (Phase 1) |
| `OCR_SIDECAR_UPLOAD_BASE` | `/mnt/uploads` | Base path whitelist สำหรับ path traversal protection |
| `OLLAMA_API_URL` | `http://host.docker.internal:11434` | Ollama API URL |
| `OCR_MODEL` | `np-dms-ocr:latest` | ชื่อ OCR model ใน Ollama |
| `OCR_TIMEOUT` | `360` | Timeout วินาทีต่อ request |
| `OCR_CHAR_THRESHOLD` | `100` | Fast-path threshold (chars > 100 = ใช้ text layer โดยตรง) |
| `OCR_MAX_PAGES` | `0` | จำนวนหน้าสูงสุด (0 = ทุกหน้า) |
| `OCR_ACTIVE_PROFILE` | (optional) | ชื่อ profile ใน `ai_execution_profiles` |
| `VRAM_HEADROOM_THRESHOLD_MB` | `3000.0` | Threshold สำหรับ CPU fallback |
| `RETRIEVAL_TIMEOUT_SECONDS` | `30.0` | Timeout สำหรับ /embed และ /rerank |
| `MAX_SYSTEM_PROMPT_LENGTH` | `10000` | ความยาวสูงสุดของ systemPrompt |
## การ Deploy
```bash
# 1. คัดลอก .env.example เป็น .env และกรอกค่า
cp .env.example .env
# แก้ OCR_SIDECAR_API_KEY เป็นค่าจริง
# 2. Build และรัน
docker compose up -d --build
# 3. ตรวจสอบ
curl http://192.168.10.100:8765/health
```
## การทดสอบ
```bash
# รันทุก test (จาก project root)
python -m pytest tests/ -v
# รันเฉพาะ unit tests
python -m pytest tests/unit/ocr-sidecar/ -v
# รันเฉพาะ integration tests
python -m pytest tests/integration/ocr-sidecar/ -v
```
### Test Coverage
| Test File | หน้าที่ |
|-----------|---------|
| `test_path_traversal.py` | Path traversal protection (US1) |
| `test_api_key_validation.py` | API key validation (US1) |
| `test_residency_wiring.py` | Adaptive OCR residency wiring (US2) |
| `test_cpu_fallback.py` | CPU fallback for /embed and /rerank (US2) |
| `test_parameter_governance.py` | Runtime parameter governance (US3) |
| `test_active_prompt.py` | System prompt + DMS tags injection (US3) |
| `test_async_performance.py` | Async I/O + lifespan + concurrent requests (US4) |
## ADR-040 Phases
| Phase | Status | หน้าที่ |
|-------|--------|---------|
| Phase 1-2 | ✅ Complete | Setup + Foundational |
| Phase 3 | ✅ Complete | US1: Security Hardening |
| Phase 4 | ✅ Complete | US2: GPU Resource Management |
| Phase 5 | ✅ Complete | US3: Parameter Governance |
| Phase 6 | ✅ Complete | US4: Async I/O Performance |
| Phase 7 | ⏳ Blocked | US5: Network Isolation Auth (รอ ADR-041) |
| Phase 8 | ✅ Complete | Remove /normalize endpoint |
| Phase 9 | ✅ Complete | Polish & documentation |
## ไฟล์ในโปรเจกต์
```
ocr-sidecar/
├── app.py — FastAPI server (async I/O, lifespan)
├── Dockerfile — Docker image (python:3.10-slim + poppler + curl)
├── docker-compose.yml — Compose config (ocr-sidecar + ollama-metrics)
├── requirements.txt — Python dependencies
├── .env.example — Environment template
├── services/
│ ├── vram_monitor.py — VRAM headroom monitoring
│ └── residency_policy.py — Adaptive OCR residency calculation
└── tests/
└── test_retrieval_fallback.py — Retrieval fallback tests
```
@@ -27,56 +27,77 @@
# - 2026-06-17: ลบชื่อ Typhoon ออกจากทุกส่วน: process_with_typhoon_ocr → process_ocr, FastAPI title, comments, ตัวแปรต่างๆ # - 2026-06-17: ลบชื่อ Typhoon ออกจากทุกส่วน: process_with_typhoon_ocr → process_ocr, FastAPI title, comments, ตัวแปรต่างๆ
# - 2026-06-17: เพิ่ม systemPrompt parameter ใน /ocr-upload, _process_pdf_doc, process_ocr เพื่อรองรับ dynamic OCR system prompt injection (T026-T028) # - 2026-06-17: เพิ่ม systemPrompt parameter ใน /ocr-upload, _process_pdf_doc, process_ocr เพื่อรองรับ dynamic OCR system prompt injection (T026-T028)
# - 2026-06-18: เพิ่ม MAX_SYSTEM_PROMPT_LENGTH environment variable สำหรับ configurable validation (fix-3) # - 2026-06-18: เพิ่ม MAX_SYSTEM_PROMPT_LENGTH environment variable สำหรับ configurable validation (fix-3)
# - 2026-06-20: ADR-040 Phase 1-4 — ลบ default API key, เพิ่ม path whitelist, และ wire adaptive OCR residency
# - 2026-06-20: ADR-040 Phase 6 — async I/O refactor: async process_ocr, AsyncClient via lifespan, asyncio.to_thread model loading
# - 2026-06-20: ADR-040 Phase 8 — ลบ /normalize endpoint (ไม่มี consumers) และ pythainlp imports
import os import os
import logging import logging
import re import re
import base64
import json import json
import tempfile import tempfile
import fitz # PyMuPDF (ใช้สำหรับ page count + fast-path text extraction) import fitz # PyMuPDF (ใช้สำหรับ page count + fast-path text extraction)
import httpx import httpx
import asyncio import asyncio
from contextlib import asynccontextmanager
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Optional
from PIL import Image
import io
from typhoon_ocr import prepare_ocr_messages # External library from SCB10X (PyPI) — provides OCR message preparation for np-dms-ocr from typhoon_ocr import prepare_ocr_messages # External library from SCB10X (PyPI) — provides OCR message preparation for np-dms-ocr
from services.vram_monitor import get_vram_headroom from services.vram_monitor import get_vram_headroom
from services.residency_policy import calculate_ocr_residency
from fastapi import FastAPI, HTTPException, UploadFile, File, Form, Depends, Security, status from fastapi import FastAPI, HTTPException, UploadFile, File, Form, Depends, Security, status
from fastapi.security.api_key import APIKeyHeader from fastapi.security.api_key import APIKeyHeader
from pydantic import BaseModel from pydantic import BaseModel
from pythainlp.tokenize import word_tokenize
from pythainlp.util import normalize as thai_normalize
from FlagEmbedding import BGEM3FlagModel, FlagReranker from FlagEmbedding import BGEM3FlagModel, FlagReranker
logging.basicConfig(level=logging.INFO) logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("ocr-sidecar") logger = logging.getLogger("ocr-sidecar")
app = FastAPI(title="OCR Sidecar", version="2.0.0")
# Initialize BGE-M3 and Reranker singletons # Initialize BGE-M3 and Reranker singletons
bge_model = None bge_model = None
reranker = None reranker = None
# Shared AsyncClient สำหรับ Ollama API (T043: สร้างใน lifespan context manager)
ollama_client: httpx.AsyncClient | None = None
@app.on_event("startup")
def load_bge_models(): def _load_bge_models() -> tuple:
global bge_model, reranker """โหลด BGE-M3 และ Reranker models บน CPU RAM (T046: เรียกผ่าน asyncio.to_thread)"""
logger.info("Loading BGE-M3 and Reranker models on CPU RAM...") logger.info("Loading BGE-M3 and Reranker models on CPU RAM...")
try: try:
# BGE-M3: BAAI/bge-m3, use_fp16=False for CPU bge = BGEM3FlagModel('BAAI/bge-m3', use_fp16=False)
bge_model = BGEM3FlagModel('BAAI/bge-m3', use_fp16=False) rerank = FlagReranker('BAAI/bge-reranker-large', use_fp16=False)
# Reranker: BAAI/bge-reranker-large, use_fp16=False for CPU
reranker = FlagReranker('BAAI/bge-reranker-large', use_fp16=False)
logger.info("BGE-M3 and Reranker models loaded successfully.") logger.info("BGE-M3 and Reranker models loaded successfully.")
return bge, rerank
except Exception as e: except Exception as e:
logger.error(f"Failed to load BGE models: {e}") logger.error(f"Failed to load BGE models: {e}")
return None, None
@asynccontextmanager
async def lifespan(app_instance: FastAPI):
"""T043/T045: Lifespan context manager แทน @app.on_event('startup') — จัดการ AsyncClient และ model loading"""
global bge_model, reranker, ollama_client
# T043: สร้าง shared AsyncClient สำหรับ Ollama API
ollama_client = httpx.AsyncClient(timeout=OCR_TIMEOUT)
logger.info(f"Shared AsyncClient created (timeout={OCR_TIMEOUT}s)")
# T046: โหลด models ผ่าน asyncio.to_thread เพื่อไม่ block startup
bge_model, reranker = await asyncio.to_thread(_load_bge_models)
yield
# Cleanup: ปิด AsyncClient
if ollama_client:
await ollama_client.aclose()
logger.info("Shared AsyncClient closed.")
app = FastAPI(title="OCR Sidecar", version="2.0.0", lifespan=lifespan)
# กำหนดค่าโทเค็นความปลอดภัยของ Sidecar ตามข้อเสนอแนะในการรักษาความมั่นคงปลอดภัย # กำหนดค่าโทเค็นความปลอดภัยของ Sidecar ตามข้อเสนอแนะในการรักษาความมั่นคงปลอดภัย
OCR_SIDECAR_API_KEY = os.getenv("OCR_SIDECAR_API_KEY", "lcbp3-dms-ocr-sidecar-secure-token-2026") OCR_SIDECAR_API_KEY = os.getenv("OCR_SIDECAR_API_KEY")
if not OCR_SIDECAR_API_KEY:
raise RuntimeError("OCR_SIDECAR_API_KEY is required for OCR sidecar startup")
# กำหนดค่าความยาวสูงสุดของ systemPrompt (fix-3: configurable validation) # กำหนดค่าความยาวสูงสุดของ systemPrompt (fix-3: configurable validation)
MAX_SYSTEM_PROMPT_LENGTH = int(os.getenv("MAX_SYSTEM_PROMPT_LENGTH", "10000")) MAX_SYSTEM_PROMPT_LENGTH = int(os.getenv("MAX_SYSTEM_PROMPT_LENGTH", "10000"))
@@ -94,6 +115,8 @@ MAX_PAGES = int(os.getenv("OCR_MAX_PAGES", "0")) # 0 = ทุกหน้า
OLLAMA_API_URL = os.getenv("OLLAMA_API_URL", "http://host.docker.internal:11434") OLLAMA_API_URL = os.getenv("OLLAMA_API_URL", "http://host.docker.internal:11434")
OCR_MODEL = os.getenv("OCR_MODEL", "np-dms-ocr:latest") OCR_MODEL = os.getenv("OCR_MODEL", "np-dms-ocr:latest")
OCR_TIMEOUT = int(os.getenv("OCR_TIMEOUT", "360")) # รองรับ cold-start ~65s + inference ~30s/page OCR_TIMEOUT = int(os.getenv("OCR_TIMEOUT", "360")) # รองรับ cold-start ~65s + inference ~30s/page
OCR_SIDECAR_UPLOAD_BASE = os.getenv("OCR_SIDECAR_UPLOAD_BASE", "/mnt/uploads")
OCR_ACTIVE_PROFILE = os.getenv("OCR_ACTIVE_PROFILE")
logger.info(f"OCR Sidecar initialized (model={OCR_MODEL}, ollama={OLLAMA_API_URL})") logger.info(f"OCR Sidecar initialized (model={OCR_MODEL}, ollama={OLLAMA_API_URL})")
@@ -111,11 +134,29 @@ def filter_ocr_noise(text: str) -> str:
filtered.append(line) filtered.append(line)
return "\n".join(filtered) return "\n".join(filtered)
def validate_pdf_path(pdf_path: str) -> Path:
"""Canonicalize path และยืนยันว่าอยู่ใต้ OCR_SIDECAR_UPLOAD_BASE"""
canonical_path = os.path.abspath(os.path.realpath(pdf_path))
canonical_base = os.path.abspath(os.path.realpath(OCR_SIDECAR_UPLOAD_BASE))
try:
common_path = os.path.commonpath([canonical_path, canonical_base])
except ValueError:
common_path = ""
if common_path != canonical_base:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Path outside whitelisted base directory",
)
return Path(canonical_path)
class OcrRequest(BaseModel): class OcrRequest(BaseModel):
pdfPath: str pdfPath: str
maxPages: Optional[int] = None maxPages: Optional[int] = None
engine: Optional[str] = None engine: Optional[str] = None
keep_alive: Optional[int] = None keep_alive: Optional[int] = None
runtime_params: Optional[dict] = None
system_prompt: Optional[str] = None
dms_tags: Optional[dict] = None
class OcrResponse(BaseModel): class OcrResponse(BaseModel):
text: str text: str
@@ -133,8 +174,18 @@ def health():
"ollamaUrl": OLLAMA_API_URL, "ollamaUrl": OLLAMA_API_URL,
} }
def _process_pdf_doc(doc: fitz.Document, selected_engine: str, max_pages: int, ocr_options: dict = {}, pdf_path: str | None = None, system_prompt: Optional[str] = None) -> OcrResponse: async def _process_pdf_doc(
doc: fitz.Document,
selected_engine: str,
max_pages: int,
ocr_options: Optional[dict] = None,
pdf_path: str | None = None,
system_prompt: Optional[str] = None,
runtime_params: Optional[dict] = None,
dms_tags: Optional[dict] = None,
) -> OcrResponse:
"""ประมวลผล fitz.Document ด้วย engine ที่เลือก — shared logic สำหรับ /ocr และ /ocr-upload""" """ประมวลผล fitz.Document ด้วย engine ที่เลือก — shared logic สำหรับ /ocr และ /ocr-upload"""
ocr_options = ocr_options or {}
pages_to_process = list(range(min(len(doc), max_pages) if max_pages > 0 else len(doc))) pages_to_process = list(range(min(len(doc), max_pages) if max_pages > 0 else len(doc)))
page_count = len(pages_to_process) page_count = len(pages_to_process)
@@ -163,7 +214,16 @@ def _process_pdf_doc(doc: fitz.Document, selected_engine: str, max_pages: int, o
raise ValueError("ไม่สามารถหา PDF path — ต้องส่ง pdf_path เข้ามาด้วย") raise ValueError("ไม่สามารถหา PDF path — ต้องส่ง pdf_path เข้ามาด้วย")
ocr_text_parts = [] ocr_text_parts = []
for i in pages_to_process: for i in pages_to_process:
ocr_text_parts.append(process_ocr(resolved_path, page_num=i + 1, options_override=ocr_options, system_prompt=system_prompt)) ocr_text_parts.append(
await process_ocr(
resolved_path,
page_num=i + 1,
options_override=ocr_options,
system_prompt=system_prompt,
runtime_params=runtime_params,
dms_tags=dms_tags,
)
)
ocr_text = filter_ocr_noise("\n".join(ocr_text_parts).strip()) ocr_text = filter_ocr_noise("\n".join(ocr_text_parts).strip())
return OcrResponse( return OcrResponse(
text=ocr_text, text=ocr_text,
@@ -180,7 +240,16 @@ def _process_pdf_doc(doc: fitz.Document, selected_engine: str, max_pages: int, o
raise ValueError("ไม่สามารถหา PDF path — ต้องส่ง pdf_path เข้ามาด้วย") raise ValueError("ไม่สามารถหา PDF path — ต้องส่ง pdf_path เข้ามาด้วย")
fallback_parts = [] fallback_parts = []
for i in pages_to_process: for i in pages_to_process:
fallback_parts.append(process_ocr(resolved_path, page_num=i + 1, options_override=ocr_options, system_prompt=system_prompt)) fallback_parts.append(
await process_ocr(
resolved_path,
page_num=i + 1,
options_override=ocr_options,
system_prompt=system_prompt,
runtime_params=runtime_params,
dms_tags=dms_tags,
)
)
fallback_text = filter_ocr_noise("\n".join(fallback_parts).strip()) fallback_text = filter_ocr_noise("\n".join(fallback_parts).strip())
return OcrResponse( return OcrResponse(
text=fallback_text, text=fallback_text,
@@ -190,39 +259,95 @@ def _process_pdf_doc(doc: fitz.Document, selected_engine: str, max_pages: int, o
engineUsed="np-dms-ocr", engineUsed="np-dms-ocr",
) )
def process_ocr(pdf_path: str, page_num: int = 1, options_override: dict = {}, system_prompt: Optional[str] = None) -> str: async def process_ocr(
pdf_path: str,
page_num: int = 1,
options_override: Optional[dict] = None,
system_prompt: Optional[str] = None,
runtime_params: Optional[dict] = None,
dms_tags: Optional[dict] = None,
) -> str:
"""เรียก np-dms-ocr ผ่าน Ollama /v1/chat/completions — รับ PDF path โดยตรง ไม่ต้องแปลง PIL Image""" """เรียก np-dms-ocr ผ่าน Ollama /v1/chat/completions — รับ PDF path โดยตรง ไม่ต้องแปลง PIL Image"""
options_override = options_override or {}
if "keep_alive" in options_override:
raise ValueError("keep_alive must be calculated by OCR residency policy")
residency = await asyncio.to_thread(calculate_ocr_residency, OCR_ACTIVE_PROFILE)
model_name = OCR_MODEL model_name = OCR_MODEL
# prepare_ocr_messages จัดการ PDF → image ผ่าน poppler/pdftoppm ภายใน # prepare_ocr_messages จัดการ PDF → image ผ่าน poppler/pdftoppm ภายใน
messages = prepare_ocr_messages(pdf_path, task_type="structure", page_num=page_num) messages = prepare_ocr_messages(pdf_path, task_type="structure", page_num=page_num)
# inject system prompt ถ้ามี (ก่อน DMS tags) # inject system prompt ถ้ามี (ก่อน DMS tags)
if system_prompt: if system_prompt:
messages[0]["content"].append({"type": "text", "text": system_prompt}) messages[0]["content"].append({"type": "text", "text": system_prompt})
# inject DMS-specific extraction tags ต่อท้าย content
messages[0]["content"].append({ # Dynamic dms_tags mapping to prompts
"type": "text", if dms_tags:
"text": ( dms_text = "Additionally:\n"
for key in dms_tags.keys():
readable_name = re.sub(r'(?<!^)(?=[A-Z])|_', ' ', key).lower()
dms_text += f"- Wrap {readable_name} with <{key}>...</{key}>\n"
dms_text += "If a field is not found, omit the tag."
else:
# Fallback to default DMS extraction tags
dms_text = (
"Additionally:\n" "Additionally:\n"
"- Wrap document number with <document_number>...</document_number>\n" "- Wrap document number with <document_number>...</document_number>\n"
"- Wrap document date with <document_date>...</document_date>\n" "- Wrap document date with <document_date>...</document_date>\n"
"- Wrap received date with <received_date>...</received_date>\n" "- Wrap received date with <received_date>...</received_date>\n"
"If a field is not found, omit the tag." "If a field is not found, omit the tag."
), )
# inject DMS-specific extraction tags ต่อท้าย content
messages[0]["content"].append({
"type": "text",
"text": dms_text,
}) })
# Resolve runtime parameters: remove hardcoded fallback values from sidecar
# Use empty dict if runtime_params not provided to allow Ollama Modelfile default
params = {}
if runtime_params:
if hasattr(runtime_params, "dict"):
params = runtime_params.dict()
elif isinstance(runtime_params, dict):
params = runtime_params
# Options override (e.g., from Sandbox form parameter overrides) takes precedence
merged_params = {}
if params:
merged_params.update(params)
if options_override:
merged_params.update(options_override)
# ค่า default ตาม official; options_override ยัง override ได้บางส่วน # ค่า default ตาม official; options_override ยัง override ได้บางส่วน
logger.info(
f"OCR residency decision: keep_alive={residency.keep_alive_seconds}s "
f"reason={residency.reason} headroom={residency.vram_headroom_mb}MB"
)
payload = { payload = {
"model": model_name, "model": model_name,
"messages": messages, "messages": messages,
"max_tokens": 16000,
"stream": False, "stream": False,
"repetition_penalty": options_override.get("repeat_penalty", 1.2), "keep_alive": residency.keep_alive_seconds,
"temperature": options_override.get("temperature", 0.1),
"top_p": options_override.get("top_p", 0.6),
"keep_alive": options_override.get("keep_alive", 0), # Unload model ทันทีหลังเสร็จงานเพื่อคืน VRAM ให้ np-dms-ai ใช้งานได้
} }
# ใช้ Ollama OpenAI-compatible endpoint (/v1/chat/completions)
with httpx.Client(timeout=OCR_TIMEOUT) as client: # Only send keys to Ollama if they are defined in merged_params (to support Modelfile fallback)
response = client.post( if "temperature" in merged_params and merged_params["temperature"] is not None:
payload["temperature"] = float(merged_params["temperature"])
if "top_p" in merged_params and merged_params["top_p"] is not None:
payload["top_p"] = float(merged_params["top_p"])
if "repeat_penalty" in merged_params and merged_params["repeat_penalty"] is not None:
payload["repetition_penalty"] = float(merged_params["repeat_penalty"])
elif "repetition_penalty" in merged_params and merged_params["repetition_penalty"] is not None:
payload["repetition_penalty"] = float(merged_params["repetition_penalty"])
if "max_tokens" in merged_params and merged_params["max_tokens"] is not None:
payload["max_tokens"] = int(merged_params["max_tokens"])
# T044: ใช้ shared AsyncClient (ollama_client) แทน httpx.Client แบบ sync
# ถ้า ollama_client ยังไม่ถูกสร้าง (เช่น unit test ที่เรียกตรง) ให้สร้างชั่วคราว
client = ollama_client
if client is None:
client = httpx.AsyncClient(timeout=OCR_TIMEOUT)
response = await client.post(
f"{OLLAMA_API_URL}/v1/chat/completions", f"{OLLAMA_API_URL}/v1/chat/completions",
json=payload, json=payload,
headers={"Authorization": "Bearer ollama"}, headers={"Authorization": "Bearer ollama"},
@@ -246,35 +371,50 @@ def process_ocr(pdf_path: str, page_num: int = 1, options_override: dict = {}, s
logger.warning( logger.warning(
f"[DIAG] Ollama returned empty response — full response keys: {list(data.keys())}" f"[DIAG] Ollama returned empty response — full response keys: {list(data.keys())}"
) )
# ปิด temporary client ถ้าสร้างชั่วคราว
if ollama_client is None:
await client.aclose()
return result_text return result_text
@app.post("/ocr", response_model=OcrResponse, dependencies=[Depends(get_api_key)]) @app.post("/ocr", response_model=OcrResponse, dependencies=[Depends(get_api_key)])
def ocr_extract(req: OcrRequest): async def ocr_extract(req: OcrRequest):
"""OCR จาก path (legacy — ใช้เมื่อ sidecar และ backend เข้าถึง storage เดียวกัน)""" """OCR จาก path (legacy — ใช้เมื่อ sidecar และ backend เข้าถึง storage เดียวกัน)"""
pdf_path = Path(req.pdfPath) if req.keep_alive is not None:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="keep_alive is managed by OCR residency policy")
pdf_path = validate_pdf_path(req.pdfPath)
if not pdf_path.exists(): if not pdf_path.exists():
raise HTTPException(status_code=404, detail=f"ไม่พบไฟล์: {req.pdfPath}") raise HTTPException(status_code=404, detail=f"ไม่พบไฟล์: {req.pdfPath}")
selected_engine = (req.engine or "auto").strip().lower() selected_engine = (req.engine or "auto").strip().lower()
max_pages = req.maxPages or MAX_PAGES max_pages = req.maxPages or MAX_PAGES
ocr_options = {} ocr_options = {}
if req.keep_alive is not None:
ocr_options["keep_alive"] = req.keep_alive
try: try:
doc = fitz.open(str(pdf_path)) doc = fitz.open(str(pdf_path))
except Exception as e: except Exception as e:
raise HTTPException(status_code=422, detail=f"เปิดไฟล์ PDF ล้มเหลว: {e}") raise HTTPException(status_code=422, detail=f"เปิดไฟล์ PDF ล้มเหลว: {e}")
return _process_pdf_doc(doc, selected_engine, max_pages, ocr_options) return await _process_pdf_doc(
doc,
selected_engine,
max_pages,
ocr_options,
pdf_path=str(pdf_path),
system_prompt=req.system_prompt,
runtime_params=req.runtime_params,
dms_tags=req.dms_tags,
)
@app.post("/ocr-upload", response_model=OcrResponse, dependencies=[Depends(get_api_key)]) @app.post("/ocr-upload", response_model=OcrResponse, dependencies=[Depends(get_api_key)])
def ocr_upload( async def ocr_upload(
file: UploadFile = File(...), file: UploadFile = File(...),
engine: str = Form(default="auto"), engine: str = Form(default="auto"),
maxPages: int = Form(default=0), maxPages: int = Form(default=0),
temperature: Optional[float] = Form(default=None), temperature: Optional[float] = Form(default=None),
topP: Optional[float] = Form(default=None), topP: Optional[float] = Form(default=None),
repeatPenalty: Optional[float] = Form(default=None), repeatPenalty: Optional[float] = Form(default=None),
maxTokens: Optional[int] = Form(default=None),
keep_alive: Optional[int] = Form(default=None), keep_alive: Optional[int] = Form(default=None),
systemPrompt: Optional[str] = Form(default=None), systemPrompt: Optional[str] = Form(default=None),
dmsTags: Optional[str] = Form(default=None),
runtimeParams: Optional[str] = Form(default=None),
): ):
"""OCR จาก multipart file upload — ไม่ต้องการ shared volume mount""" """OCR จาก multipart file upload — ไม่ต้องการ shared volume mount"""
# Validate systemPrompt ถ้ามีส่งมา (gap-1: sidecar validation) # Validate systemPrompt ถ้ามีส่งมา (gap-1: sidecar validation)
@@ -292,6 +432,22 @@ def ocr_upload(
) )
selected_engine = engine.strip().lower() selected_engine = engine.strip().lower()
max_pages = maxPages or MAX_PAGES max_pages = maxPages or MAX_PAGES
# Parse runtimeParams and dmsTags from form-data JSON strings if provided
runtime_params_dict = {}
if runtimeParams:
try:
runtime_params_dict = json.loads(runtimeParams)
except Exception as e:
logger.warning(f"Failed to parse runtimeParams JSON: {e}")
dms_tags_dict = None
if dmsTags:
try:
dms_tags_dict = json.loads(dmsTags)
except Exception as e:
logger.warning(f"Failed to parse dmsTags JSON: {e}")
# รวม options override สำหรับ np-dms-ocr (ถ้า frontend ส่งมา) # รวม options override สำหรับ np-dms-ocr (ถ้า frontend ส่งมา)
ocr_options: dict = {} ocr_options: dict = {}
if temperature is not None: if temperature is not None:
@@ -300,10 +456,11 @@ def ocr_upload(
ocr_options["top_p"] = topP ocr_options["top_p"] = topP
if repeatPenalty is not None: if repeatPenalty is not None:
ocr_options["repeat_penalty"] = repeatPenalty ocr_options["repeat_penalty"] = repeatPenalty
if maxTokens is not None:
ocr_options["max_tokens"] = maxTokens
if keep_alive is not None: if keep_alive is not None:
ocr_options["keep_alive"] = keep_alive raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="keep_alive is managed by OCR residency policy")
pdf_bytes = file.file.read() pdf_bytes = file.file.read()
import tempfile
tmp_pdf_path: str | None = None tmp_pdf_path: str | None = None
try: try:
# บันทึก PDF เป็น temp file เพื่อให้ prepare_ocr_messages อ่านได้ผ่าน path # บันทึก PDF เป็น temp file เพื่อให้ prepare_ocr_messages อ่านได้ผ่าน path
@@ -315,29 +472,20 @@ def ocr_upload(
except Exception as e: except Exception as e:
raise HTTPException(status_code=422, detail=f"เปิดไฟล์ PDF ล้มเหลว: {e}") raise HTTPException(status_code=422, detail=f"เปิดไฟล์ PDF ล้มเหลว: {e}")
logger.info(f"OCR upload: {file.filename} engine={selected_engine} options={ocr_options or 'modelfile-defaults'}") logger.info(f"OCR upload: {file.filename} engine={selected_engine} options={ocr_options or 'modelfile-defaults'}")
return _process_pdf_doc(doc, selected_engine, max_pages, ocr_options, pdf_path=tmp_pdf_path, system_prompt=systemPrompt) return await _process_pdf_doc(
doc,
selected_engine,
max_pages,
ocr_options,
pdf_path=tmp_pdf_path,
system_prompt=systemPrompt,
runtime_params=runtime_params_dict,
dms_tags=dms_tags_dict,
)
finally: finally:
if tmp_pdf_path: if tmp_pdf_path:
Path(tmp_pdf_path).unlink(missing_ok=True) Path(tmp_pdf_path).unlink(missing_ok=True)
class NormalizeRequest(BaseModel):
text: str
class NormalizeResponse(BaseModel):
normalized: str
@app.post("/normalize", response_model=NormalizeResponse, dependencies=[Depends(get_api_key)])
def normalize_text(req: NormalizeRequest):
"""Normalize Thai text ด้วย PyThaiNLP สำหรับ rag-thai-preprocess queue"""
try:
# normalize unicode + ตัดคำแล้วต่อกลับด้วย space เพื่อ embedding
normalized = thai_normalize(req.text)
tokens = word_tokenize(normalized, engine="newmm", keep_whitespace=False)
result = " ".join(tokens)
return NormalizeResponse(normalized=result)
except Exception as e:
logger.warning(f"Thai normalize failed, returning raw text: {e}")
return NormalizeResponse(normalized=req.text)
class EmbedRequest(BaseModel): class EmbedRequest(BaseModel):
text: str text: str
@@ -362,7 +510,7 @@ async def embed_text(req: EmbedRequest):
raise HTTPException(status_code=503, detail="BGE-M3 model not loaded") raise HTTPException(status_code=503, detail="BGE-M3 model not loaded")
threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0")) threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0"))
timeout_sec = float(os.getenv("RETRIEVAL_TIMEOUT_SECONDS", "30.0")) timeout_sec = float(os.getenv("RETRIEVAL_TIMEOUT_SECONDS", "30.0"))
headroom = get_vram_headroom() headroom = await asyncio.to_thread(get_vram_headroom)
device = "cuda" device = "cuda"
reason = "headroom-sufficient" reason = "headroom-sufficient"
if not headroom.query_success: if not headroom.query_success:
@@ -427,7 +575,7 @@ async def rerank_chunks(req: RerankRequest):
return RerankResponse(scores=[], ranked_indices=[], device="cpu") return RerankResponse(scores=[], ranked_indices=[], device="cpu")
threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0")) threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0"))
timeout_sec = float(os.getenv("RETRIEVAL_TIMEOUT_SECONDS", "30.0")) timeout_sec = float(os.getenv("RETRIEVAL_TIMEOUT_SECONDS", "30.0"))
headroom = get_vram_headroom() headroom = await asyncio.to_thread(get_vram_headroom)
device = "cuda" device = "cuda"
reason = "headroom-sufficient" reason = "headroom-sufficient"
if not headroom.query_success: if not headroom.query_success:
@@ -1,5 +1,5 @@
# File: specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/docker-compose.yml # File: specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/docker-compose.yml
# Tesseract OCR Sidecar — รันบน Desk-5439 (AI Isolation Host) ตาม ADR-023A # OCR Sidecar — รันบน Desk-5439 (AI Isolation Host) ตาม ADR-023A/ADR-040
# Change Log: # Change Log:
# - 2026-05-25: Initial compose file สำหรับ Tesseract OCR HTTP sidecar # - 2026-05-25: Initial compose file สำหรับ Tesseract OCR HTTP sidecar
# - 2026-05-25: แก้ volumes ให้ถูกต้องสำหรับ Windows + Docker Desktop # - 2026-05-25: แก้ volumes ให้ถูกต้องสำหรับ Windows + Docker Desktop
@@ -16,6 +16,7 @@
# - 2026-06-11: US2 & US3 - เพิ่ม VRAM headroom, residency window, pressure threshold, retrieval timeout env variables # - 2026-06-11: US2 & US3 - เพิ่ม VRAM headroom, residency window, pressure threshold, retrieval timeout env variables
# - 2026-06-13: ADR-036 — เปลี่ยน TYPHOON_OCR_MODEL เป็น OCR_MODEL=np-dms-ocr:latest # - 2026-06-13: ADR-036 — เปลี่ยน TYPHOON_OCR_MODEL เป็น OCR_MODEL=np-dms-ocr:latest
# - 2026-06-17: ลบชื่อ Typhoon ออกจากทุก environment variable และ comment (เปลี่ยนเป็น OCR_* ตาม ADR-036) # - 2026-06-17: ลบชื่อ Typhoon ออกจากทุก environment variable และ comment (เปลี่ยนเป็น OCR_* ตาม ADR-036)
# - 2026-06-20: ADR-040 Phase 6+8 — ลบ OCR_LANG, USE_GPU (stale Tesseract config); เพิ่ม OCR_SIDECAR_API_KEY, OCR_ACTIVE_PROFILE
# #
# วิธีรัน: # วิธีรัน:
# docker compose up -d --build # docker compose up -d --build
@@ -39,8 +40,12 @@ services:
OCR_CHAR_THRESHOLD: "100" OCR_CHAR_THRESHOLD: "100"
OCR_PORT: "8765" OCR_PORT: "8765"
OCR_MAX_PAGES: "0" OCR_MAX_PAGES: "0"
OCR_LANG: "tha+eng" # Tesseract language code (Thai + English) # ─── Security (ADR-040 Phase 1) ─────────────────────────────────
USE_GPU: "false" # OCR sidecar รันบน CPU, np-dms-ocr ใช้ Ollama แยก # OCR_SIDECAR_API_KEY: อ่านจาก .env file (ห้าม hardcode ใน compose)
OCR_SIDECAR_API_KEY: ${OCR_SIDECAR_API_KEY}
# ─── Adaptive OCR Residency (ADR-040 Phase 4) ───────────────────
# OCR_ACTIVE_PROFILE: ชื่อ profile ใน ai_execution_profiles (ถ้าไม่ระบุ จะใช้ default)
OCR_ACTIVE_PROFILE: ${OCR_ACTIVE_PROFILE:-}
# ─── OCR via Ollama (ADR-034) ─────────────────────────────────── # ─── OCR via Ollama (ADR-034) ───────────────────────────────────
# ชี้ตรงไปยัง Ollama (port 11434) ไม่ผ่าน metrics proxy # ชี้ตรงไปยัง Ollama (port 11434) ไม่ผ่าน metrics proxy
# (proxy ไม่ forward /api/generate ได้ถูกต้อง — ทำให้ response ว่าง) # (proxy ไม่ forward /api/generate ได้ถูกต้อง — ทำให้ response ว่าง)
@@ -5,14 +5,13 @@
# - 2026-05-30: เพิ่ม opencv-python สำหรับ image preprocessing (threshold, denoise) เพื่อเพิ่มความแม่นยำ OCR # - 2026-05-30: เพิ่ม opencv-python สำหรับ image preprocessing (threshold, denoise) เพื่อเพิ่มความแม่นยำ OCR
# - 2026-06-11: เพิ่ม typhoon-ocr สำหรับ prepare_ocr_messages (official prompt builder สำหรับ typhoon-ocr1.5-3b) # - 2026-06-11: เพิ่ม typhoon-ocr สำหรับ prepare_ocr_messages (official prompt builder สำหรับ typhoon-ocr1.5-3b)
# - 2026-06-11: ตัด pytesseract, opencv-python, numpy ออก — ไม่ใช้ Tesseract อีกต่อไป # - 2026-06-11: ตัด pytesseract, opencv-python, numpy ออก — ไม่ใช้ Tesseract อีกต่อไป
# - 2026-06-20: ADR-040 Phase 8 — ตัด pythainlp และ Pillow ออก (ไม่มี /normalize endpoint แล้ว, process_ocr ใช้ prepare_ocr_messages)
PyMuPDF==1.24.0 PyMuPDF==1.24.0
fastapi==0.111.0 fastapi==0.111.0
uvicorn[standard]==0.30.1 uvicorn[standard]==0.30.1
python-multipart==0.0.9 python-multipart==0.0.9
pythainlp==5.0.4
httpx==0.27.0 httpx==0.27.0
Pillow==10.0.0
FlagEmbedding>=1.2.0 FlagEmbedding>=1.2.0
typhoon-ocr>=0.4.1 typhoon-ocr>=0.4.1
@@ -17,7 +17,7 @@ class OcrResidencyDecision:
def calculate_ocr_residency(active_profile: str = None) -> OcrResidencyDecision: def calculate_ocr_residency(active_profile: str = None) -> OcrResidencyDecision:
""" """
คำนวณ keep_alive สำหร Typhoon OCR จาก VRAM headroom และ active profile ของโมเดลหล คำนวณ keep_alive สำหร np-dms-ocr จาก VRAM headroom และ active profile ของโมเดลหล
""" """
threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0")) threshold_mb = float(os.getenv("VRAM_HEADROOM_THRESHOLD_MB", "3000.0"))
residency_window = int(os.getenv("OCR_RESIDENCY_WINDOW_SECONDS", "120")) residency_window = int(os.getenv("OCR_RESIDENCY_WINDOW_SECONDS", "120"))
@@ -0,0 +1,210 @@
<!-- File: specs/06-Decision-Records/ADR-040-ocr-sidecar-refactor.md -->
<!-- Change Log
- 2026-06-20: Created initial ADR-040 documenting OCR sidecar refactor decisions.
- Supersedes ADR-033 §7 (X-API-Key sidecar auth) in favor of network isolation.
- Preserves resolved GPU policies (Adaptive Residency, CPU Fallback, LLM-First Ownership).
- Aligns with ADR-036 Profile-Only Parameter Governance.
- References ADR-041 for server consolidation enabling network-only auth.
-->
# ADR-040: OCR Sidecar Refactor — Pure Compute Worker, Preserved GPU Policy, Network-Trust Boundary
**Status:** Proposed
**Date:** 2026-06-20
**Supersedes:** ADR-033 §7 (X-API-Key sidecar auth)
**Amends:** ADR-036 §5 (sidecar contract), ADR-034 (model identity unchanged)
**Related Documents:**
- [ADR-016: Security & Authentication](./ADR-016-security-authentication.md)
- [ADR-008: Email Notification Strategy](./ADR-008-email-notification-strategy.md)
- [ADR-029: Dynamic Prompt Management](./ADR-029-dynamic-prompt-management.md)
- [ADR-037: Active Prompt System](./ADR-037-active-prompt-system.md)
- [ADR-035: AI Pipeline & OCR Integration](./ADR-035-ai-pipeline-ocr-integration.md)
- [ADR-041: Server Consolidation](./ADR-041-server-consolidation.md)
- [CONTEXT.md](../../00-overview/CONTEXT.md)
- [OCR Sidecar Refactor Plan - Claude](../../../docs/ocr-sidecar-refactor-plan-cluade.md)
- [OCR Sidecar Refactor Plan - Qwen](../../../docs/ocr-sidecar-refactor-plan-qwen.md)
> **Note:** ADR numbers 038039 are intentionally reserved/skipped.
---
## 🎯 Context and Problem Statement
### Current Architecture
OCR Sidecar บน Desk-5439 (RTX 5060 Ti 16GB) ทำหน้าที่เป็น FastAPI HTTP service สำหรับ:
- `/ocr` - สกัดข้อความจาก PDF ผ่าน Typhoon OCR (np-dms-ocr via Ollama)
- `/embed` - สร้าง vector embedding ผ่าน BGE-M3
- `/rerank` - จัดลำดับผลลัพธ์ retrieval ผ่าน FlagReranker
- `/normalize` - normalize ภาษาไทย (ใช้โดย ThaiPreprocessProcessor)
### Problems Identified
จากการทบทวนสองแผน refactor (Claude + Qwen) พบปัญหาดังนี้:
1. **Security Bug:** Hardcoded default API key (`lcbp3-dms-ocr-sidecar-secure-token-2026`) ใน `app.py` — หาก leak จะไม่สามารถ rotate ได้โดยไม่ rebuild container
2. **Synchronous Blocking I/O:** `process_ocr` ใช้ `httpx.Client` แบบ sync ทำให้ block event loop ของ FastAPI
3. **Deprecated Startup Pattern:** ใช้ `@app.on_event("startup")` แทน `lifespan` context manager
4. **Hardcoded keep_alive:** `process_ocr` บังคับ `keep_alive: 0` แต่ไม่ได้เรียก `calculate_ocr_residency()` จาก `residency_policy.py` — ทำให้ Adaptive OCR Residency policy ไม่ทำงาน
5. **Hardcoded Runtime Parameters:** `temperature`, `top_p`, `repeat_penalty`, `max_tokens` ถูก hardcode ใน sidecar แทนการดึงจาก `ai_execution_profiles` (ADR-036 Profile-Only Parameter Governance)
6. **Path Traversal Vulnerability:** `/ocr` endpoint เปิดไฟล์ตาม `req.pdfPath` โดยไม่มี canonicalization/whitelist — เสี่ยง arbitrary file read (ADR-016)
7. **Cross-Host Trust Gap:** ปัจจุบัน sidecar อยู่บน Desk-5439 (192.168.10.100) และ backend อยู่บน QNAP (192.168.10.8) — "Docker internal network" เป็นเท็จ ต้องพึ่ง VLAN/firewall ACL
8. **Mutable Default Argument:** `process_with_typhoon_ocr(pdf_path, ..., options_override={})` — Python anti-pattern
### Conflict with Canonical Specs
การทบทวนทั้งสองแผนพบว่า:
- **Claude** สมมติ `np-dms-ai = llama3.2 3B (~23GB)` แต่ ADR-034/CONTEXT ระบุ `np-dms-ai` runtime คือ Typhoon-2.5 (~78B) — VRAM budget ผิด
- **ทั้งสองแผน** เสนอลบ `vram_monitor.py` / `residency_policy.py` และบังคับ BGE+Reranker GPU-resident — ละเมิด LLM-First GPU Ownership + CPU Fallback Retrieval ที่ CONTEXT.md ได้ resolve ไว้แล้ว
- **ทั้งสองแผน** ถือ `keep_alive` เป็น fixed config value — ละเมิด ADR-036 Gap-2 (keep_alive = lazy resource param via residency policy)
---
## ⚙️ Decision Drivers
* **Preserve Resolved GPU Policy:** Adaptive OCR Residency + CPU Fallback Retrieval + LLM-First GPU Ownership (CONTEXT.md)
* **Profile-Only Parameter Governance:** พารามิเตอร์ AI model (temperature, top_p, keep_alive) ต้องมาจาก `ai_execution_profiles` row `ocr-extract` (ADR-036)
* **Security (ADR-016):** Path traversal hardening, no hardcoded secrets
* **Network Trust Boundary:** Server consolidation (ADR-041) ทำให้ Docker-internal isolation เป็นไปได้จริง
* **No Invented Orchestration:** ห้ามสร้าง `VramMutexService`, `GpuTaskQueue`, `PromptBuilderService` ใหม่ — ใช้ existing services/Active Prompt ตาม ADR-008, ADR-029/037
* **ADR-023A Boundary:** AI sidecar ห้ามเข้าถึง DB/storage โดยตรง
---
## 🏛️ Decisions
### D1: Sidecar as Pure Compute Worker
Sidecar ทำหน้าที่เป็น compute worker เท่านั้น — orchestration, parameter governance, และ business logic อยู่ใน backend (existing services)
- **Reject:** การสร้าง `PromptBuilderService`, `OcrNoiseFilterService`, `OcrOrchestratorService` ใหม่ (Qwen plan)
- **Fast-path decision** (PyMuPDF chars > 100 → fast path): คงไว้ใน sidecar
- **Page range calculation:** ย้ายไป backend
- **Engine selection:** ไม่ต้องมีแล้ว — ใช้ np-dms-ocr ตัวเดียว (Typhoon OCR)
- **systemPrompt validation** (ตรวจสอบ placeholders เช่น `{{ocr_text}}`): backend
### D2: Remove /normalize Endpoint
- **ตัด /normalize endpoint** ออกจาก sidecar
- **ใช้แค่ np-dms-ocr (OCR)** เท่านั้น — sidecar ไม่รองรับ Thai normalization
- ThaiPreprocessProcessor ไม่มีการใช้งาน — ไม่ต้องแก้ไข backend
### D3: Async I/O + Lifespan + Shared AsyncClient
- `process_ocr``async def`
- ใช้ `httpx.AsyncClient` shared ผ่าน lifespan context manager
- เปลี่ยนจาก `@app.on_event("startup")` เป็น `@asynccontextmanager` lifespan
- Load models ผ่าน `asyncio.to_thread` เพื่อไม่ block startup
### D4: keep_alive via calculate_ocr_residency() (Lazy, ADR-036 Gap-2)
- Wire `calculate_ocr_residency(active_profile)` เข้า `process_ocr`
- ไม่ใช้ fixed value (Claude 300, Qwen 0/10m)
- **ไม่รับ** explicit `options_override["keep_alive"]` จาก backend — keep_alive เป็น lazy resource param ที่คำนวณณ process time เท่านั้น (ADR-036 Gap-2)
- **Reject:** การลบ `vram_monitor.py` / `residency_policy.py`
### D5: Retain vram_monitor + CPU-Fallback for /embed, /rerank
- **Reject:** การบังคับ BGE-M3 + Reranker GPU-resident ถาวร
- **Keep:** Dynamic CPU/GPU selection ผ่าน `.to(device)` logic
- เป็นการ implement LLM-First GPU Ownership + CPU Fallback Retrieval
### D6: Remove Hardcoded Default Key; Auth = Network Isolation (2-Phase)
- **Phase 1** (ก่อน consolidation): ลบ hardcoded default `OCR_SIDECAR_API_KEY` — fail-fast ถ้า env missing
- **Phase 2** (หลัง consolidation): **Supersedes ADR-033 §7** — ลบ `X-API-Key` validation จาก sidecar endpoints และ backend send-side
- **Network Isolation:** ตรวจสอบผ่าน Docker-internal network (post-consolidation) หรือ VLAN/firewall ACL (interim cross-host)
- **Sequencing:** ลบ `X-API-Key` เฉพาะเมื่อ ADR-041 cutover เสร็จ (single Docker host)
- **Interim Period:** ระหว่าง Phase 1 และ Phase 2, sidecar และ backend ต้อง **ยังคง** validate และส่ง `X-API-Key`
- Rotate leaked key ก่อน cutover
### D7: Path Canonicalization + Base-Path Whitelist on /ocr
- Canonicalize `pdfPath` ผ่าน `os.path.abspath()` + `os.path.realpath()`
- Whitelist base path = `OCR_SIDECAR_UPLOAD_BASE` (CIFS mount base)
- Reject paths ที่ไม่ได้อยู่ภายใต้ base path → 403 Forbidden
### D8: Runtime Params from Job Snapshot (ocr-extract row)
- **Backend** resolve params จาก `ai_execution_profiles` (row `ocr-extract` สำหรับ OCR, profile สำหรับ LLM)
- **Backend** ส่ง params (`temperature`, `top_p`, `repeat_penalty`, `max_tokens`) ไปให้ sidecar
- **Sidecar** รับ params จาก backend แล้วส่งต่อไป Ollama (ในทุกครั้งที่ load/generate)
- ห้าม hardcode defaults ใน sidecar
- Modfile ทำหน้าที่เป็น last-resort fallback เท่านั้น
- Align กับ ADR-036 Profile-Only Parameter Governance
### D9: DMS Tags + SystemPrompt from Active Prompt
- **Backend** resolve systemPrompt จาก Active Prompt ใน `ai_prompts` (ADR-029/037)
- **Backend** resolve DMS extraction tags (`<document_number>`, `<document_date>`, `<received_date>`) จาก Active Prompt
- **Backend** ส่งทั้ง systemPrompt และ DMS tags ไปให้ sidecar
- **Sidecar** รับ systemPrompt และ DMS tags จาก backend แล้วส่งต่อไป Ollama (ในทุกครั้งที่ load/generate)
- **Reject:** การสร้าง `PromptBuilderService` ใหม่เป็น prompt authority
---
## 📋 Implementation Tasks
### Phase 1 — ก่อน ADR-041 Consolidation (ยังคง X-API-Key)
| Task ID | Component | Summary | Status |
| :--- | :--- | :--- | :--- |
| T001 | Sidecar | Remove hardcoded default API key (fail-fast if env missing) | Pending |
| T002 | Sidecar | Fix mutable default arg `options_override={}` | Pending |
| T003 | Sidecar | Remove duplicate `import tempfile` | Pending |
| T004 | Sidecar | Refactor to async I/O + shared AsyncClient | Pending |
| T005 | Sidecar | Replace `@app.on_event("startup")` with lifespan | Pending |
| T006 | Sidecar | Wire `calculate_ocr_residency()` into `process_ocr` | Pending |
| T007 | Sidecar | Path canonicalization + base-path whitelist on `/ocr` | Pending |
| T008 | Sidecar | Remove hardcoded runtime params (use from job snapshot) | Pending |
| T009 | Sidecar | Receive systemPrompt + DMS tags from backend, pass to Ollama | Pending |
| T010 | Sidecar | Remove `/normalize` endpoint (D2) | Pending |
| T011 | Backend | Send runtime params from `ai_execution_profiles` snapshot to sidecar | Pending |
| T012 | Backend | Wire Active Prompt injection for DMS tags + systemPrompt | Pending |
| T013 | Tests | Pytest for path-traversal (403) | Pending |
| T014 | Tests | Unit check for residency wiring | Pending |
### Phase 2 — หลัง ADR-041 Consolidation (ลบ X-API-Key)
| Task ID | Component | Summary | Status |
| :--- | :--- | :--- | :--- |
| T016 | Sidecar | Remove `X-API-Key` validation from endpoints | Pending (ADR-041 cutover) |
| T017 | Backend | Remove `X-API-Key` send-side in `OcrService` | Pending (ADR-041 cutover) |
| T018 | Backend | Remove `X-API-Key` send-side in `SandboxOcrEngineService` | Pending (ADR-041 cutover) |
---
## 📋 Consequences
### Positive
* **OOM Safety Retained:** รักษา Adaptive OCR Residency + CPU Fallback Retrieval — ป้องกัน VRAM exhaustion
* **Spec-Consistent:** สอดคล้องกับ ADR-036, ADR-029/037, CONTEXT.md
* **Smaller Sidecar Surface:** Pure compute worker — ไม่มี business logic หรือ parameter governance
* **Security Hardened:** Path traversal fix, no hardcoded secrets
* **Performance:** Async I/O ลด blocking, shared AsyncClient ลด connection overhead
### Negative
* **Lose Defense-in-Depth Auth:** ลบ `X-API-Key` ทำให้ขึ้นอยู่กับ network isolation เท่านั้น — mitigated โดย ACL/bridge network
* **Cross-Host Firewall Rule Mandatory:** ใน topology ปัจจุบัน (ก่อน consolidation) ต้องมี VLAN/firewall ACL เป็น interim constraint
* **Migration Complexity:** Sequencing ของ auth removal ต้อง sync กับ ADR-041 cutover
---
## 🚫 Out of Scope (Future ADR)
* 1-page-1-request horizontal scaling rework (Qwen 2.7) — ต้องการ separate spec + load evidence
* OpenTelemetry/Prometheus/Grafana observability (Qwen 4.44.5) — separate ticket
* **/normalize endpoint** — ตัดออกจาก sidecar แล้ว (D2); ThaiPreprocessProcessor ไม่มีการใช้งาน
---
## 🔄 Rollback Plan
* Revert `app.py` ไปเวอร์ชันก่อน refactor
* Restore `X-API-Key` send-side ใน `OcrService` และ `SandboxOcrEngineService`
* Re-pin `keep_alive` default เป็น `0` ใน `process_ocr`
* Restore hardcoded runtime params (ถ้าต้องการ emergency fallback)
---
## 📝 Verification Plan
1. Confirm backend send-side `X-API-Key` locations:
- `backend/src/modules/ai/services/ocr.service.ts`
- `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`
2. Confirm `calculate_ocr_residency` ไม่ถูกเรียกใช้ใน `app.py` (grep) ก่อน claim gap
3. ✅ ยืนยันแล้ว: ไม่มี consumer ใดใช้ `/normalize` endpoint (grep ไม่พบใน backend)
4. Pytest สำหรับ path-traversal (expect 403)
5. Unit test สำหรับ residency wiring
@@ -0,0 +1,336 @@
<!-- File: specs/06-Decision-Records/ADR-041-server-consolidation.md -->
<!-- Change Log
- 2026-06-20: Created initial ADR-041 documenting server consolidation decision.
- Co-locate all services on single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB).
- QNAP remains NAS for uploads/permanent storage via CIFS.
- Enables ADR-040 network-only auth for sidecar via Docker-internal isolation.
-->
# ADR-041: Single-Host Server Consolidation
**Status:** Proposed
**Date:** 2026-06-20
**Related Documents:**
- [ADR-040: OCR Sidecar Refactor](./ADR-040-ocr-sidecar-refactor.md)
- [ADR-016: Security & Authentication](./ADR-016-security-authentication.md)
- [ADR-023A: Unified AI Architecture](./ADR-023A-unified-ai-architecture.md)
- [ADR-034: AI Model Change](./ADR-034-AI-model-change.md)
- [CONTEXT.md](../../00-overview/CONTEXT.md)
---
## 🎯 Context and Problem Statement
### Current Architecture
ปัจจุบัน LCBP3-DMS กระจาย services ไว้บนหลายเครื่อง:
| Service | Host | Hardware | Network |
|---------|------|----------|---------|
| Ollama (np-dms-ai, np-dms-ocr, nomic-embed) | Desk-5439 | RTX 4060 Ti 16GB | VLAN 10 (192.168.10.100) |
| OCR Sidecar (FastAPI) | Desk-5439 | Same as above | VLAN 10 (192.168.10.100) |
| Backend (NestJS) | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| Frontend (Next.js) | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| Redis | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| MariaDB | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| Elasticsearch | QNAP NAS | - | VLAN 10 (192.168.10.8) |
| File Storage | QNAP NAS | - | CIFS share `np-dms-as` |
### Problems Identified
1. **Cross-Host Trust Boundary:** Backend ↔ sidecar/Ollama ผ่าน LAN (VLAN 10) — ต้องพึ่ง VLAN/firewall ACL สำหรับ isolation (ADR-040 §4)
2. **Management Complexity:** Services กระจายบน 2 hosts → deployment, monitoring, troubleshooting ซับซ้อน
3. **GPU Resource Fragmentation:** Desk-5439 มี GPU แต่ CPU/RAM น้อย → ไม่สามารถรัน backend ได้
4. **Network Latency:** Backend ↔ Ollama ผ่าน LAN เพิ่ม latency สำหรับ AI inference
5. **Hardware Underutilization:** QNAP NAS มี CPU/RAM แต่ไม่มี GPU → ไม่สามารถรัน AI models ได้
### New Hardware
มีเซิร์ฟเวอร์ใหม่พร้อมใช้งาน:
- **CPU:** Ryzen 5 5600 (6 cores / 12 threads)
- **RAM:** 32GB DDR4
- **GPU:** RTX 5060 Ti 16GB
- **Storage:** SSD (OS) + HDD (data)
---
## ⚙️ Decision Drivers
* **Simplify Architecture:** ลดจำนวน hosts จาก 2 → 1
* **Enable Docker-Internal Isolation:** Sidecar + backend อยู่บน Docker bridge เดียวกัน → network auth จริง (ADR-040 D5)
* **Better Resource Utilization:** Single host มีทั้ง CPU, RAM, GPU ในเครื่องเดียว
* **Reduce Network Latency:** Backend ↔ Ollama ผ่าน localhost แทน LAN
* **Maintain Data Separation:** QNAP ยังคงเป็น NAS สำหรับ file storage
---
## 🏛️ Decisions
### D1: Co-locate All Services on Single Docker Host
ย้าย services ทั้งหมดไปรันบนเซิร์ฟเวอร์ใหม่:
- Ollama (np-dms-ai, np-dms-ocr, nomic-embed)
- OCR Sidecar (FastAPI)
- Backend (NestJS)
- Frontend (Next.js)
- Redis
- MariaDB
- Elasticsearch
**Retire Desk-5439** หลัง cutover สำเร็จ
### D2: ASUSTOR as Primary NAS, QNAP as Backup
QNAP (192.168.10.8) ลดบทบาทเป็น backup server เท่านั้น
ASUSTOR (192.168.10.9) เป็น Primary NAS สำหรับ:
- Upload temp storage (`/data/uploads/temp`)
- Permanent file storage (`/data/uploads/permanent`)
- CIFS share `np-dms-as` ถูก mount บน new host ผ่าน:
- `/mnt/uploads/temp``//192.168.10.9/np-dms-as/data/uploads/temp`
- `/mnt/uploads/permanent``//192.168.10.9/np-dms-as/data/uploads/permanent`
### D3: Docker-Internal Network Only for Sidecar/Ollama
- Sidecar และ Ollama **ไม่ publish ports ไป LAN** (ใช้ `expose` แทน `ports`)
- Services อยู่บน internal Docker bridge network (`dms-internal`)
- Backend ติดต่อ sidecar/Ollama ผ่าน `http://sidecar:8765` และ `http://ollama:11434` (service names)
- Frontend ติดต่อ backend ผ่าน `http://backend:3000`
- เฉพาะ Frontend และ Backend เท่านั้นที่ publish ports ไป LAN (80, 443, 3000)
**Enables ADR-040 D5:** Network isolation ผ่าน Docker-internal bridge → ลบ `X-API-Key` ได้จริง
### D4: GPU VRAM Management Reinforced
RTX 5060 Ti 16GB ต้องรองรับ:
- `np-dms-ai` (Typhoon-2.5 ~78B) ~68GB
- `np-dms-ocr` (Typhoon OCR) ~5GB
- `nomic-embed-text` ~0.5GB
- BGE-M3 + Reranker (ถ้า GPU-resident) ~4.5GB
- CUDA overhead ~1.5GB
**Total ≈ 15.5GB → OOM risk หาก load พร้อมกันทั้งหมด**
**Mandatory:**
- ADR-040 D3 (Adaptive OCR Residency via `calculate_ocr_residency()`)
- ADR-040 D4 (CPU Fallback Retrieval for embed/rerank)
- LLM-First GPU Ownership (CONTEXT.md)
- ไม่บังคับ BGE+Reranker GPU-resident ถาวร
### D5: RAM Budget Considerations
32GB RAM ต้องรองรับ:
- Node.js (Frontend) ~500MB
- NestJS (Backend) ~12GB
- MariaDB ~48GB (ขึ้นกับ dataset size)
- Redis ~500MB
- Elasticsearch ~24GB (ขึ้นกับ index size)
- Python (Sidecar) ~500MB
- Ollama ~12GB
- BGE/Reranker CPU-fallback tensors ~24GB
**Action Items:**
- Size DB/ES/Redis memory limits ก่อน cutover
- Monitor RAM usage หลัง cutover
- พิจารณา swap space ถ้าจำเป็น
### D6: Single Point of Failure (SPOF) Mitigation
Single host = SPOF risk
**Mitigation:**
- Regular backup ของ database และ file storage (QNAP)
- Disaster recovery plan สำหรับ hardware failure
- พิจารณา cold standby หรือ failover strategy ในอนาคต
---
## 📋 Implementation Tasks
| Task ID | Phase | Summary | Status |
| :--- | :--- | :--- | :--- |
| T001 | Provision | Install Docker + Docker Compose on new host | Pending |
| T002 | Provision | Mount CIFS share from ASUSTOR to `/mnt/uploads` | Pending |
| T003 | Deploy | Create `docker-compose.yml` for new host topology | Pending |
| T004 | Deploy | Configure internal bridge network (`dms-internal`) | Pending |
| T005 | Deploy | Deploy services (Ollama, sidecar, backend, frontend, Redis, DB, ES) | Pending |
| T006 | Migrate | Migrate MariaDB data from QNAP to new host | Pending |
| T007 | Migrate | Migrate Elasticsearch indices from QNAP to new host | Pending |
| T008 | Cutover | Update DNS/load balancer to point to new host | Pending |
| T009 | Cutover | Run smoke tests on new host | Pending |
| T010 | ADR-040 | Remove `X-API-Key` from sidecar + backend (ADR-040 D5) | Pending |
| T011 | Cleanup | Stop services on QNAP (QNAP becomes backup server) | Pending |
| T012 | Cleanup | Retire Desk-5439 | Pending |
---
## 📋 Target docker-compose Layout (Draft)
```yaml
version: '3.8'
networks:
dms-internal:
driver: bridge
dms-frontend:
driver: bridge
services:
# GPU Services (internal-only, no LAN publish)
ollama:
image: ollama/ollama:latest
container_name: lcbp3-ollama
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
- ollama_models:/root/.ollama
networks:
- dms-internal
expose:
- "11434"
environment:
- OLLAMA_KEEP_ALIVE=-1
ocr-sidecar:
build:
context: ./specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar
container_name: lcbp3-ocr-sidecar
restart: unless-stopped
volumes:
- asustor_uploads:/mnt/uploads:ro # Read-only CIFS mount from ASUSTOR
networks:
- dms-internal
expose:
- "8765"
depends_on:
- ollama
environment:
- OLLAMA_API_URL=http://ollama:11434
- OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
# Backend Services (internal-only)
backend:
build:
context: ./backend
container_name: lcbp3-backend
restart: unless-stopped
volumes:
- asustor_uploads:/app/uploads:ro
networks:
- dms-internal
- dms-frontend
expose:
- "3000"
depends_on:
- ollama
- ocr-sidecar
- redis
- mariadb
- elasticsearch
environment:
- OCR_API_URL=http://ocr-sidecar:8765
- OLLAMA_API_URL=http://ollama:11434
# Frontend (LAN publish)
frontend:
build:
context: ./frontend
container_name: lcbp3-frontend
restart: unless-stopped
networks:
- dms-frontend
ports:
- "3000:3000"
depends_on:
- backend
# Data Services
redis:
image: redis:7-alpine
container_name: lcbp3-redis
restart: unless-stopped
networks:
- dms-internal
volumes:
- redis_data:/data
mariadb:
image: mariadb:10.11
container_name: lcbp3-mariadb
restart: unless-stopped
networks:
- dms-internal
volumes:
- mariadb_data:/var/lib/mysql
environment:
- MYSQL_ROOT_PASSWORD=${DB_ROOT_PASSWORD}
- MYSQL_DATABASE=lcbp3
elasticsearch:
image: elasticsearch:8.11.0
container_name: lcbp3-elasticsearch
restart: unless-stopped
networks:
- dms-internal
volumes:
- es_data:/usr/share/elasticsearch/data
environment:
- discovery.type=single-node
- xpack.security.enabled=false
volumes:
ollama_models:
asustor_uploads:
driver: local
driver_opts:
type: cifs
o: "username=${ASUSTOR_USER},password=${ASUSTOR_PASS},vers=3.0,uid=0,gid=0"
device: "//192.168.10.9/np-dms-as/data/uploads"
redis_data:
mariadb_data:
es_data:
```
---
## 📋 Consequences
### Positive
* **Simplified Architecture:** Single host → easier deployment, monitoring, troubleshooting
* **True Network Isolation:** Docker-internal bridge enables ADR-040 D5 (network-only auth)
* **Reduced Latency:** Backend ↔ Ollama ผ่าน localhost
* **Better Resource Utilization:** Single host มีทั้ง CPU, RAM, GPU
* **Data Separation Maintained:** ASUSTOR เป็น Primary NAS → data แยกจาก compute; QNAP เป็น backup server
### Negative
* **SPOF Risk:** Single host = single point of failure
* **RAM Pressure:** 32GB ต้องรองรับ services ทั้งหมด + CPU-fallback tensors
* **Migration Complexity:** ต้อง migrate DB + ES + file paths
* **GPU VRAM Pressure:** 16GB ต้องอาศัย adaptive residency + CPU fallback
---
## 🔄 Rollback Plan
1. Stop services บน new host
2. Restore services บน QNAP (backend, frontend, Redis, DB, ES)
3. Restore services บน Desk-5439 (Ollama, sidecar)
4. Revert DNS/load balancer ไป QNAP
5. Update CIFS mount กลับไป ASUSTOR (192.168.10.9) บน QNAP
6. Restore `X-API-Key` ใน sidecar + backend (ADR-040 rollback)
---
## 📝 Verification Plan
1. Smoke tests บน new host:
- Backend health check
- Frontend accessible via LAN
- OCR endpoint functional
- AI inference functional
- File upload/download via CIFS
2. Monitor RAM/VRAM usage 2448 hours หลัง cutover
3. Verify ADR-040 D5 (network-only auth) ทำงานได้จริง
4. Verify ADR-040 D3/D4 (adaptive residency + CPU fallback) ทำงานได้จริง
@@ -0,0 +1,34 @@
# Specification Quality Checklist: OCR Sidecar Refactor
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-06-20
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Notes
All checklist items pass. Specification is ready for `/speckit-clarify` or `/speckit-plan`.
@@ -0,0 +1,246 @@
# Sidecar API Contract
**Version**: 1.0
**Date**: 2026-06-20
**Service**: OCR Sidecar (Desk-5439)
**Base URL**: `http://192.168.10.100:8765` (Phase 1) / `http://sidecar:8765` (Phase 2, Docker-internal)
## Overview
The OCR sidecar provides OCR processing capabilities as a pure compute worker. This document defines the API contract between backend services and the sidecar.
## Authentication
### Phase 1 (Before ADR-041 Consolidation)
All endpoints require `X-API-Key` header:
```http
X-API-Key: {OCR_SIDECAR_API_KEY}
```
If the header is missing or invalid, returns `401 Unauthorized`.
### Phase 2 (After ADR-041 Consolidation)
No authentication required. Relies on Docker-internal network isolation.
## Endpoints
### POST /ocr
Extract text from PDF file using Typhoon OCR.
**Request Headers**:
```http
Content-Type: application/json
X-API-Key: {key} # Phase 1 only
```
**Request Body**:
```json
{
"pdf_path": "/mnt/uploads/temp/abc123.pdf",
"system_prompt": "Extract document metadata from: {{ocr_text}}...",
"dms_tags": {
"document_number": "RFA-2025-001",
"document_date": "2025-01-15",
"received_date": "2025-01-16"
},
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
},
"page_range": {
"start": 1,
"end": 3
}
}
```
**Request Fields**:
- `pdf_path` (string, required): Absolute path to PDF file. Must be within whitelisted base path (`OCR_SIDECAR_UPLOAD_BASE`).
- `system_prompt` (string, optional): System prompt from Active Prompt. Contains `{{ocr_text}}` placeholder.
- `dms_tags` (object, optional): DMS extraction tags to inject into prompt.
- `document_number` (string, optional): Document number
- `document_date` (string, optional): Document date
- `received_date` (string, optional): Received date
- `runtime_params` (object, required): Runtime parameters from `ai_execution_profiles`.
- `temperature` (number, required): Temperature (0.0 - 2.0)
- `top_p` (number, required): Top P (0.0 - 1.0)
- `repeat_penalty` (number, required): Repeat penalty (typically 1.0 - 2.0)
- `max_tokens` (number, required): Max tokens
- `page_range` (object, optional): Page range for processing.
- `start` (number, required): Start page (1-indexed)
- `end` (number, required): End page (inclusive)
**Response (200 OK)**:
```json
{
"text": "Extracted text in Markdown format...",
"ocr_used": true,
"model_used": "typhoon-np-dms-ocr:latest",
"processing_time_ms": 1250,
"error": null
}
```
**Response Fields**:
- `text` (string): Extracted text in Markdown format
- `ocr_used` (boolean): Whether OCR was used (vs fast-path text layer)
- `model_used` (string): Model identifier
- `processing_time_ms` (number): Processing time in milliseconds
- `error` (string, nullable): Error message if failed
**Error Responses**:
- `400 Bad Request`: Invalid request body or parameters
- `401 Unauthorized`: Missing or invalid X-API-Key (Phase 1 only)
- `403 Forbidden`: Path outside whitelisted base directory
- `500 Internal Server Error`: Internal processing error
**Path Traversal Protection**:
- PDF path is canonicalized using `os.path.abspath()` + `os.path.realpath()`
- Path must start with whitelisted base path (`OCR_SIDECAR_UPLOAD_BASE`)
- Symlinks are resolved to their targets before whitelist check
- Returns `403 Forbidden` for any path outside base directory
### GET /health
Health check endpoint for monitoring.
**Response (200 OK)**:
```json
{
"status": "healthy",
"timestamp": "2026-06-20T10:30:00Z",
"version": "1.0.0"
}
```
**Response Fields**:
- `status` (string): Service status ("healthy" or "unhealthy")
- `timestamp` (string): ISO 8601 timestamp
- `version` (string): Service version
## Removed Endpoints
### POST /normalize (REMOVED)
This endpoint has been removed per ADR-040 D2. ThaiPreprocessProcessor has no consumers in the backend (verified by grep search).
## Rate Limiting
No rate limiting implemented on sidecar. Rate limiting is handled by backend services.
## Error Handling
All errors return JSON responses with consistent format:
```json
{
"error": "Error message",
"code": "ERROR_CODE",
"timestamp": "2026-06-20T10:30:00Z"
}
```
**Common Error Codes**:
- `INVALID_REQUEST`: Invalid request body or parameters
- `UNAUTHORIZED`: Missing or invalid authentication
- `FORBIDDEN`: Path outside whitelisted directory
- `INTERNAL_ERROR`: Internal processing error
- `OCR_FAILED`: OCR processing failed
## Examples
### Example 1: Basic OCR Request (Phase 1)
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/document.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
### Example 2: OCR with System Prompt and DMS Tags (Phase 1)
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/document.pdf",
"system_prompt": "Extract document metadata from: {{ocr_text}}",
"dms_tags": {
"document_number": "RFA-2025-001",
"document_date": "2025-01-15"
},
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
### Example 3: OCR Request (Phase 2, Docker-internal)
```bash
curl -X POST http://sidecar:8765/ocr \
-H "Content-Type: application/json" \
-d '{
"pdf_path": "/mnt/uploads/temp/document.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
### Example 4: Path Traversal Attempt (Rejected)
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Response: `403 Forbidden`
```json
{
"error": "Path outside whitelisted base directory",
"code": "FORBIDDEN",
"timestamp": "2026-06-20T10:30:00Z"
}
```
## Version History
- **1.0** (2026-06-20): Initial version for OCR sidecar refactor
- Added POST /ocr with parameter governance
- Added path traversal protection
- Removed POST /normalize endpoint
- Documented Phase 1/Phase 2 auth migration
@@ -0,0 +1,319 @@
# Data Model: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Define data contracts and entity relationships for OCR sidecar refactor
## Overview
The OCR sidecar is a pure compute worker with no database access (ADR-023/023A boundary). All data persistence and business logic remain in backend services. This document defines the data contracts between backend and sidecar.
## Entities
### OCR Request (Backend → Sidecar)
```typescript
interface OcrRequest {
pdfPath: string; // Absolute path to PDF file (whitelisted)
systemPrompt?: string; // System prompt from Active Prompt
dmsTags?: { // DMS extraction tags from Active Prompt
documentNumber?: string;
documentDate?: string;
receivedDate?: string;
};
runtimeParams: { // Runtime parameters from ai_execution_profiles
temperature: number;
top_p: number;
repeat_penalty: number;
max_tokens: number;
};
pageRange?: { // Page range for processing
start: number;
end: number;
};
}
```
### OCR Response (Sidecar → Backend)
```typescript
interface OcrResponse {
text: string; // Extracted text (Markdown format)
ocrUsed: boolean; // Whether OCR was used (vs fast-path text layer)
modelUsed: string; // Model identifier (e.g., "typhoon-np-dms-ocr")
processingTimeMs: number; // Processing time in milliseconds
error?: string; // Error message if failed
}
```
### AI Execution Profile (Database)
```sql
-- Existing table (no schema changes)
CREATE TABLE ai_execution_profiles (
id INT AUTO_INCREMENT PRIMARY KEY,
profile_name VARCHAR(100) UNIQUE NOT NULL,
model_name VARCHAR(100) NOT NULL,
parameters JSON NOT NULL, -- { temperature, top_p, repeat_penalty, max_tokens, keep_alive }
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
-- Row for OCR extraction:
-- profile_name = 'ocr-extract'
-- parameters = { temperature: 0.7, top_p: 0.9, repeat_penalty: 1.1, max_tokens: 4096 }
```
### Active Prompt (Database)
```sql
-- Existing table (no schema changes per ADR-029/037)
CREATE TABLE ai_prompts (
id INT AUTO_INCREMENT PRIMARY KEY,
public_id UUID,
prompt_type VARCHAR(50) NOT NULL, -- 'ocr_extraction'
template TEXT NOT NULL, -- System prompt template with {{ocr_text}} placeholder
context_config JSON, -- DMS tags configuration
version INT NOT NULL,
is_active TINYINT(1) DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY (prompt_type, version)
);
-- Active prompt for OCR extraction:
-- prompt_type = 'ocr_extraction'
-- template = "Extract document metadata from: {{ocr_text}}..."
-- context_config = { dmsTags: { documentNumber: true, documentDate: true, receivedDate: true } }
```
## Data Flow
### Phase 1: OCR Request Flow (Before ADR-041)
```
Backend OcrService
1. Resolve parameters from ai_execution_profiles (row 'ocr-extract')
2. Resolve Active Prompt from ai_prompts (type 'ocr_extraction')
3. Extract systemPrompt and DMS tags from Active Prompt
4. Build OcrRequest with parameters, systemPrompt, DMS tags
5. Send POST /ocr with X-API-Key header to sidecar
Sidecar (app.py)
1. Validate X-API-Key
2. Canonicalize pdfPath and check whitelist
3. Extract systemPrompt and DMS tags from request
4. Call calculate_ocr_residency(active_profile) for keep_alive
5. Process OCR with Ollama (inject systemPrompt + DMS tags)
6. Return OcrResponse
Backend OcrService
1. Parse OcrResponse
2. Return extracted text to caller
```
### Phase 2: OCR Request Flow (After ADR-041)
```
Backend OcrService
1. Resolve parameters from ai_execution_profiles (row 'ocr-extract')
2. Resolve Active Prompt from ai_prompts (type 'ocr_extraction')
3. Extract systemPrompt and DMS tags from Active Prompt
4. Build OcrRequest with parameters, systemPrompt, DMS tags
5. Send POST /ocr (NO X-API-Key header) to sidecar
Sidecar (app.py)
1. NO X-API-Key validation (network isolation only)
2. Canonicalize pdfPath and check whitelist
3. Extract systemPrompt and DMS tags from request
4. Call calculate_ocr_residency(active_profile) for keep_alive
5. Process OCR with Ollama (inject systemPrompt + DMS tags)
6. Return OcrResponse
Backend OcrService
1. Parse OcrResponse
2. Return extracted text to caller
```
## Backend Service Changes
### OcrService Parameter Resolution
```typescript
// backend/src/modules/ai/services/ocr.service.ts
async extractMetadata(documentId: string): Promise<AIMetadata> {
// 1. Resolve runtime parameters from ai_execution_profiles
const profile = await this.aiProfilesService.getActiveProfile('ocr-extract');
const runtimeParams = profile.parameters; // { temperature, top_p, repeat_penalty, max_tokens }
// 2. Resolve Active Prompt
const activePrompt = await this.aiPromptsService.getActivePrompt('ocr_extraction');
const systemPrompt = activePrompt.template;
const dmsTags = activePrompt.context_config?.dmsTags || {};
// 3. Build request
const ocrRequest: OcrRequest = {
pdfPath: document.filePath,
systemPrompt,
dmsTags,
runtimeParams,
};
// 4. Send to sidecar (with X-API-Key in Phase 1)
const response = await this.httpClient.post(
`${this.ocrApiUrl}/ocr`,
ocrRequest,
{ headers: { 'X-API-Key': this.ocrApiKey } } // Phase 1 only
);
return response.data;
}
```
### SandboxOcrEngineService Parameter Resolution
```typescript
// backend/src/modules/ai/services/sandbox-ocr-engine.service.ts
async processSandboxOcr(request: SandboxOcrRequest): Promise<SandboxOcrResult> {
// Same parameter resolution pattern as OcrService
const profile = await this.aiProfilesService.getActiveProfile('ocr-extract');
const activePrompt = await this.aiPromptsService.getActivePrompt('ocr_extraction');
const ocrRequest: OcrRequest = {
pdfPath: request.pdfPath,
systemPrompt: activePrompt.template,
dmsTags: activePrompt.context_config?.dmsTags || {},
runtimeParams: profile.parameters,
};
const response = await this.httpClient.post(
`${this.ocrApiUrl}/ocr`,
ocrRequest,
{ headers: { 'X-API-Key': this.ocrApiKey } } // Phase 1 only
);
return response.data;
}
```
## Sidecar API Changes
### POST /ocr Request Body
```python
# specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
from pydantic import BaseModel
class OcrRequest(BaseModel):
pdf_path: str
system_prompt: Optional[str] = None
dms_tags: Optional[Dict[str, str]] = None
runtime_params: RuntimeParams
page_range: Optional[PageRange] = None
class RuntimeParams(BaseModel):
temperature: float
top_p: float
repeat_penalty: float
max_tokens: int
class PageRange(BaseModel):
start: int
end: int
```
### POST /ocr Response Body
```python
class OcrResponse(BaseModel):
text: str
ocr_used: bool
model_used: str
processing_time_ms: float
error: Optional[str] = None
```
## Environment Variables
### Sidecar Environment Variables
```bash
# specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env
# Phase 1 (before ADR-041)
OCR_SIDECAR_API_KEY=required_value # Fail-fast if missing
# Phase 2 (after ADR-041) - remove OCR_SIDECAR_API_KEY
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads # CIFS mount base path
OLLAMA_API_URL=http://localhost:11434
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
```
### Backend Environment Variables
```bash
# backend/.env
# Phase 1 (before ADR-041)
OCR_API_URL=http://192.168.10.100:8765
OCR_API_KEY=required_value # Send-side X-API-Key
# Phase 2 (after ADR-041) - remove OCR_API_KEY
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/app/uploads # Backend view of uploads
```
## Validation Rules
### Path Canonicalization (Sidecar)
```python
def validate_pdf_path(pdf_path: str, base_path: str) -> str:
"""Canonicalize and whitelist PDF path"""
# 1. Canonicalize path
canonical = os.path.abspath(os.path.realpath(pdf_path))
# 2. Check whitelist
if not canonical.startswith(base_path):
raise HTTPException(
status_code=403,
detail="Path outside whitelisted base directory"
)
return canonical
```
### Parameter Validation (Backend)
```typescript
// Validate runtime parameters from ai_execution_profiles
function validateRuntimeParams(params: any): RuntimeParams {
if (!params.temperature || params.temperature < 0 || params.temperature > 2) {
throw new BusinessException('Invalid temperature value');
}
if (!params.top_p || params.top_p < 0 || params.top_p > 1) {
throw new BusinessException('Invalid top_p value');
}
// ... similar validation for other params
return params;
}
```
## No Schema Changes
This refactor does not require database schema changes:
- `ai_execution_profiles` table already exists (ADR-036)
- `ai_prompts` table already exists (ADR-029/037)
- No new tables or columns needed
- Per ADR-009: No TypeORM migrations (edit SQL directly if needed, but not needed here)
@@ -0,0 +1,147 @@
# Implementation Plan: OCR Sidecar Refactor
**Branch**: `140-ocr-sidecar-refactor` | **Date**: 2026-06-20 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/100-Infrastructures/140-ocr-sidecar-refactor/spec.md`
## Summary
Refactor the OCR sidecar on Desk-5439 to address security vulnerabilities (hardcoded API keys, path traversal), implement async I/O for performance, preserve GPU resource management policies (Adaptive OCR Residency, CPU Fallback Retrieval), and align with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System. The sidecar becomes a pure compute worker with all orchestration and parameter governance moved to backend services.
## Technical Context
**Language/Version**: Python 3.11+ (FastAPI)
**Primary Dependencies**: FastAPI 0.111.0, httpx 0.27.0, PyMuPDF 1.24.0, typhoon-ocr>=0.4.1, FlagEmbedding>=1.2.0, pythainlp 5.0.4
**Storage**: No database access (ADR-023/023A boundary - sidecar is pure compute worker)
**Testing**: pytest for path-traversal and residency wiring tests
**Target Platform**: Desk-5439 (192.168.10.100, Windows 10/11, RTX 5060 Ti 16GB GPU) via Docker
**Project Type**: Infrastructure (sidecar service)
**Performance Goals**: 20%+ throughput improvement with async I/O; VRAM exhaustion prevention under load
**Constraints**: Must preserve LLM-First GPU Ownership; must not bypass existing residency_policy.py; must align with ADR-036 Gap-2 (keep_alive as lazy resource param)
**Scale/Scope**: Single sidecar service; affects backend AI services (OcrService, SandboxOcrEngineService)
## Constitution Check
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
| Gate | Status | Justification |
|------|--------|---------------|
| ADR-019 UUID | ✅ PASS | Sidecar N/A (pure compute worker), Backend applies ADR-019 (parameter resolution in OcrService/SandboxOcrEngineService) |
| ADR-009 Schema | N/A | No database schema changes in sidecar |
| ADR-016 Security | ✅ PASS | Path traversal hardening; no hardcoded secrets; network isolation auth |
| ADR-002 Numbering | N/A | No document numbering in sidecar |
| ADR-008 BullMQ | N/A | Sidecar does not use BullMQ (backend does) |
| ADR-023/023A AI Boundary | ✅ PASS | Sidecar is pure compute worker; no DB/storage access; AI → DMS API → DB pattern preserved |
| ADR-007 Errors | ✅ PASS | FastAPI exception handling with user-friendly messages |
| TypeScript Strict | N/A | Python codebase |
## Project Structure
### Documentation (this feature)
```text
specs/100-Infrastructures/140-ocr-sidecar-refactor/
├── spec.md # Feature specification
├── plan.md # This file
├── research.md # Phase 0 output (technical decisions from ADR-040)
├── data-model.md # Phase 1 output (data contracts)
├── quickstart.md # Phase 1 output (deployment guide)
├── contracts/ # Phase 1 output (API contracts)
│ └── sidecar-api.md # Sidecar API specification
└── tasks.md # Phase 2 output (implementation tasks)
```
### Source Code
```text
specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/
├── app.py # FastAPI application (main refactor target)
├── residency_policy.py # Retain (Adaptive OCR Residency)
├── vram_monitor.py # Retain (VRAM monitoring)
├── requirements.txt # Python dependencies
├── Dockerfile # Container definition
├── docker-compose.yml # Orchestration
└── .env # Environment variables
backend/src/modules/ai/
├── services/
│ ├── ocr.service.ts # Parameter resolution + sidecar calls
│ └── sandbox-ocr-engine.service.ts # Sandbox parameter resolution
└── processors/
└── ai-batch.processor.ts # BullMQ processor (unchanged)
tests/
├── unit/
│ └── ocr-sidecar/ # Sidecar unit tests
│ ├── test_path_traversal.py # Path traversal tests
│ └── test_residency_wiring.py # Residency calculation tests
└── integration/
└── ocr-sidecar/ # Sidecar integration tests
```
**Structure Decision**: Infrastructure refactor targeting existing OCR sidecar on Desk-5439. Backend changes limited to parameter resolution in AI services. No new frontend changes.
## Complexity Tracking
> No constitution violations - all gates pass. This section not applicable.
## Phase 0: Research & Technical Decisions
All technical decisions are already documented in ADR-040. Key decisions:
### Security Decisions
- **Decision**: Remove hardcoded default API key; fail-fast if env missing
- **Rationale**: Security vulnerability - leaked key cannot be rotated without rebuild
- **Decision**: Implement path canonicalization + base-path whitelist
- **Rationale**: Prevent path traversal attacks (ADR-016)
### I/O Pattern Decisions
- **Decision**: Refactor to async I/O with shared AsyncClient via lifespan
- **Rationale**: Synchronous blocking I/O reduces throughput under load
- **Decision**: Replace `@app.on_event("startup")` with lifespan context manager
- **Rationale**: Deprecated pattern; lifespan provides better resource management
### GPU Resource Management Decisions
- **Decision**: Wire `calculate_ocr_residency()` into `process_ocr` for dynamic keep_alive
- **Rationale**: Preserve Adaptive OCR Residency policy (CONTEXT.md); avoid fixed values
- **Decision**: Retain vram_monitor.py and residency_policy.py
- **Rationale**: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
- **Decision**: Reject forced GPU-resident BGE-M3/Reranker
- **Rationale**: CPU fallback is required for VRAM pressure scenarios
### Parameter Governance Decisions
- **Decision**: Remove hardcoded runtime params; accept from backend job snapshot
- **Rationale**: ADR-036 Profile-Only Parameter Governance; dynamic tuning without rebuild
- **Decision**: Backend resolves systemPrompt and DMS tags from Active Prompt
- **Rationale**: ADR-029/037 Active Prompt System; prompt authority in DB not code
- **Decision**: Reject creating PromptBuilderService
- **Rationale**: Use existing Active Prompt system; avoid invented orchestration
### Auth Decisions
- **Decision**: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
- **Rationale**: Sequenced migration; network isolation only possible post-consolidation
- **Decision**: Interim period requires X-API-Key validation
- **Rationale**: Cross-host topology (before ADR-041) requires defense-in-depth
### Endpoint Decisions
- **Decision**: Remove /normalize endpoint
- **Rationale**: No consumers (verified by grep); ThaiPreprocessProcessor unused
- **Decision**: Fix mutable default argument `options_override={}`
- **Rationale**: Python anti-pattern; causes unexpected behavior
## Phase 1: Design & Contracts
### Data Model
See [data-model.md](./data-model.md) for detailed data contracts and entity relationships.
### API Contracts
See [contracts/sidecar-api.md](./contracts/sidecar-api.md) for sidecar API specification.
### Quickstart Guide
See [quickstart.md](./quickstart.md) for deployment and testing instructions.
## Phase 2: Implementation (Tasks)
See [tasks.md](./tasks.md) for detailed implementation tasks generated by `/speckit-tasks`.
@@ -0,0 +1,374 @@
# Quickstart: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Deployment and testing guide for OCR sidecar refactor
## Prerequisites
- Access to Desk-5439 (192.168.10.100) with Docker
- Access to backend services (QNAP 192.168.10.8)
- Python 3.11+ for local testing (optional)
- pytest for testing (optional)
## Phase 1: Deployment (Before ADR-041 Consolidation)
### Step 1: Update Sidecar Code
1. Navigate to sidecar directory:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
```
2. Update `app.py` with the following changes:
- Remove hardcoded default API key
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
- Implement async I/O with `httpx.AsyncClient` via lifespan
- Replace `@app.on_event("startup")` with lifespan context manager
- Wire `calculate_ocr_residency()` into `process_ocr`
- Implement path canonicalization + base-path whitelist on `/ocr`
- Remove hardcoded runtime parameters
- Receive systemPrompt and DMS tags from backend
- Remove `/normalize` endpoint
- Fix mutable default argument `options_override={}`
- Load models via `asyncio.to_thread` during lifespan
3. Update `requirements.txt`:
```text
PyMuPDF==1.24.0
fastapi==0.111.0
uvicorn[standard]==0.30.1
python-multipart==0.0.9
httpx==0.27.0
FlagEmbedding>=1.2.0
typhoon-ocr>=0.4.1
```
4. Update `.env`:
```bash
# Phase 1 (before ADR-041)
OCR_SIDECAR_API_KEY=your-secure-api-key-here
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
OLLAMA_API_URL=http://localhost:11434
OCR_MODEL=np-dms-ocr:latest
```
### Step 2: Update Backend Services
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
- Add parameter resolution from `ai_execution_profiles` (row `ocr-extract`)
- Add Active Prompt resolution from `ai_prompts` (type `ocr_extraction`)
- Extract systemPrompt and DMS tags from Active Prompt
- Send resolved parameters to sidecar in OCR requests
- Keep X-API-Key send-side (Phase 1)
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
- Same parameter resolution pattern as OcrService
- Keep X-API-Key send-side (Phase 1)
3. Update backend `.env`:
```bash
# Phase 1 (before ADR-041)
OCR_API_URL=http://192.168.10.100:8765
OCR_API_KEY=your-secure-api-key-here
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
```
### Step 3: Rebuild and Deploy Sidecar
1. Build Docker image on Desk-5439:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
docker-compose build
```
2. Stop existing container:
```bash
docker-compose down
```
3. Start new container:
```bash
docker-compose up -d
```
4. Verify health:
```bash
curl http://192.168.10.100:8765/health
```
Expected response:
```json
{
"status": "healthy",
"timestamp": "2026-06-20T10:30:00Z",
"version": "1.0.0"
}
```
### Step 4: Deploy Backend Changes
1. Build backend:
```bash
cd backend
pnpm run build
```
2. Deploy backend containers (via existing deploy script or manual):
```bash
# From repo root
./scripts/deploy.sh
```
3. Verify backend health:
```bash
curl http://localhost:3001/api/ai/health
```
## Phase 2: Deployment (After ADR-041 Consolidation)
**Note**: This phase can only be executed after ADR-041 server consolidation completes (single Docker host).
### Step 1: Remove X-API-Key from Sidecar
1. Update `app.py` on sidecar:
- Remove X-API-Key validation from all endpoints
- Remove `OCR_SIDECAR_API_KEY` environment variable check
2. Update `.env` on sidecar:
```bash
# Remove OCR_SIDECAR_API_KEY line
# Keep common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
OLLAMA_API_URL=http://localhost:11434
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
```
3. Rebuild and redeploy sidecar:
```bash
docker-compose down
docker-compose build
docker-compose up -d
```
### Step 2: Remove X-API-Key from Backend
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
- Remove X-API-Key header from sidecar requests
- Remove `OCR_API_KEY` environment variable usage
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
- Remove X-API-Key header from sidecar requests
- Remove `OCR_API_KEY` environment variable usage
3. Update backend `.env`:
```bash
# Remove OCR_API_KEY line
# Keep common variables
OCR_API_URL=http://sidecar:8765 # Docker-internal URL
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
```
4. Rebuild and redeploy backend:
```bash
cd backend
pnpm run build
./scripts/deploy.sh
```
## Testing
### Unit Tests (Sidecar)
1. Navigate to sidecar tests directory:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests
```
2. Run path traversal tests:
```bash
pytest test_path_traversal.py -v
```
Expected output: All tests pass, path traversal attempts return 403
3. Run residency wiring tests:
```bash
pytest test_residency_wiring.py -v
```
Expected output: All tests pass, `calculate_ocr_residency()` is called correctly
### Integration Tests (Backend)
1. Run backend AI service tests:
```bash
cd backend
pnpm test ai/ocr.service.spec.ts
pnpm test ai/sandbox-ocr-engine.service.spec.ts
```
2. Verify parameter resolution from database:
- Check that `ai_execution_profiles` row `ocr-extract` exists
- Check that `ai_prompts` has active row for `ocr_extraction` type
- Verify parameters are correctly resolved and sent to sidecar
### Manual Testing
1. Test path traversal protection:
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Expected: `403 Forbidden`
2. Test valid OCR request:
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/test.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Expected: `200 OK` with extracted text
3. Test parameter governance:
- Modify `ai_execution_profiles` row `ocr-extract` parameters
- Run OCR request
- Verify new parameters are used (check sidecar logs)
4. Test Active Prompt integration:
- Modify active prompt in `ai_prompts` for `ocr_extraction`
- Run OCR request
- Verify new system prompt is used
## Performance Testing
1. Benchmark async vs sync I/O:
```bash
# Use Apache Bench or similar tool
ab -n 1000 -c 10 -p ocr_request.json -T application/json \
http://192.168.10.100:8765/ocr
```
Expected: 20%+ throughput improvement with async I/O
2. Monitor VRAM usage:
```bash
# On Desk-5439, monitor GPU usage during OCR operations
nvidia-smi -l 1
```
Expected: VRAM usage stays within limits, no exhaustion
## Monitoring
### Health Checks
- Sidecar health: `GET http://192.168.10.100:8765/health`
- Backend AI health: `GET http://localhost:3001/api/ai/health`
### Logs
- Sidecar logs: `docker-compose logs -f ocr-sidecar`
- Backend logs: Check backend application logs
### Metrics
- Monitor OCR request latency
- Monitor VRAM usage on Desk-5439
- Monitor error rates (403 for path traversal, 500 for internal errors)
## Rollback
If issues arise during deployment:
### Rollback Sidecar
1. Revert `app.py` to previous version
2. Restore previous `.env` file
3. Rebuild and redeploy:
```bash
docker-compose down
docker-compose build
docker-compose up -d
```
### Rollback Backend
1. Revert service changes in `ocr.service.ts` and `sandbox-ocr-engine.service.ts`
2. Restore previous `.env` file
3. Rebuild and redeploy:
```bash
cd backend
pnpm run build
./scripts/deploy.sh
```
### Emergency Rollback
If immediate rollback is needed:
1. Revert `keep_alive` to fixed value `0` in `process_ocr`
2. Restore hardcoded runtime parameters
3. Restore X-API-Key validation
4. Rebuild and redeploy
## Troubleshooting
### Sidecar fails to start
1. Check environment variables are set correctly
2. Check `OCR_SIDECAR_API_KEY` is provided (Phase 1)
3. Check Docker logs: `docker-compose logs ocr-sidecar`
4. Verify Ollama is running on Desk-5439
### Path traversal returns 200 instead of 403
1. Verify `OCR_SIDECAR_UPLOAD_BASE` is set correctly
2. Check path canonicalization logic in `app.py`
3. Test with absolute paths to verify whitelist check
### Parameters not being used
1. Check `ai_execution_profiles` row `ocr-extract` exists
2. Check backend service parameter resolution logic
3. Check sidecar receives parameters in request body
4. Check sidecar passes parameters to Ollama
### VRAM exhaustion
1. Check `calculate_ocr_residency()` is being called
2. Check `vram_monitor.py` and `residency_policy.py` are present
3. Verify CPU fallback is working for `/embed` and `/rerank`
4. Monitor GPU usage with `nvidia-smi`
## References
- ADR-040: OCR Sidecar Refactor
- ADR-036: Profile-Only Parameter Governance
- ADR-029: Dynamic Prompt Management
- ADR-037: Active Prompt System
- ADR-041: Server Consolidation (dependency for Phase 2)
- [Sidecar API Contract](./contracts/sidecar-api.md)
@@ -0,0 +1,179 @@
# Research: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Document technical decisions and research findings from ADR-040
## Overview
All technical decisions for this refactor are already documented in ADR-040. This file consolidates those decisions for implementation reference.
## Security Decisions
### Hardcoded API Key Removal
- **Decision**: Remove hardcoded default API key (`lcbp3-dms-ocr-sidecar-secure-token-2026`) from `app.py`
- **Rationale**: Security vulnerability - if leaked, key cannot be rotated without rebuilding container
- **Implementation**: Fail-fast if `OCR_SIDECAR_API_KEY` environment variable is missing
- **Phase**: Phase 1 (before ADR-041 consolidation)
### Path Traversal Hardening
- **Decision**: Implement path canonicalization + base-path whitelist on `/ocr` endpoint
- **Rationale**: Prevent arbitrary file read attacks (ADR-016)
- **Implementation**:
- Use `os.path.abspath()` + `os.path.realpath()` for canonicalization
- Whitelist base path = `OCR_SIDECAR_UPLOAD_BASE` (CIFS mount base)
- Reject paths outside base path → 403 Forbidden
- **Alternatives Considered**:
- Using path validation regex only → rejected (insufficient for symlink attacks)
- Chroot jail → rejected (overkill for this use case)
## I/O Pattern Decisions
### Async I/O Refactor
- **Decision**: Refactor `process_ocr` to `async def` and use `httpx.AsyncClient` shared via lifespan
- **Rationale**: Synchronous blocking I/O reduces throughput under load; FastAPI event loop blocked
- **Implementation**:
- Replace `httpx.Client` with `httpx.AsyncClient`
- Create AsyncClient in lifespan context manager
- Load models via `asyncio.to_thread` to avoid blocking startup
- **Performance Target**: 20%+ throughput improvement under concurrent load
- **Alternatives Considered**:
- Keep sync I/O but add more workers → rejected (still blocks event loop)
- Use thread pool → rejected (adds complexity without solving root cause)
### Lifespan Pattern
- **Decision**: Replace `@app.on_event("startup")` with `@asynccontextmanager` lifespan
- **Rationale**: Deprecated pattern; lifespan provides better resource management and cleanup
- **Implementation**: Use FastAPI lifespan context manager for AsyncClient lifecycle
## GPU Resource Management Decisions
### Adaptive OCR Residency
- **Decision**: Wire `calculate_ocr_residency(active_profile)` into `process_ocr` for dynamic `keep_alive`
- **Rationale**: Preserve Adaptive OCR Residency policy from CONTEXT.md; avoid fixed values
- **Implementation**:
- Import `calculate_ocr_residency` from `residency_policy.py`
- Call function during OCR request to calculate appropriate keep_alive
- Do NOT accept explicit `options_override["keep_alive"]` from backend
- keep_alive is a lazy resource parameter calculated at process time (ADR-036 Gap-2)
- **Alternatives Rejected**:
- Fixed `keep_alive=0` (Claude plan) → rejected (violates ADR-036 Gap-2)
- Fixed `keep_alive=10m` (Qwen plan) → rejected (violates adaptive policy)
### Retain VRAM Monitor and Residency Policy
- **Decision**: Retain `vram_monitor.py` and `residency_policy.py` modules
- **Rationale**: LLM-First GPU Ownership + CPU Fallback Retrieval must be preserved
- **Alternatives Rejected**:
- Delete these modules (Claude + Qwen plans) → rejected (violates CONTEXT.md resolved GPU policies)
### CPU Fallback for Retrieval
- **Decision**: Retain dynamic CPU/GPU selection for `/embed` and `/rerank` via `.to(device)` logic
- **Rationale**: CPU fallback required when GPU is under pressure; prevents VRAM exhaustion
- **Alternatives Rejected**:
- Force BGE-M3 and Reranker GPU-resident → rejected (violates LLM-First policy)
## Parameter Governance Decisions
### Remove Hardcoded Runtime Parameters
- **Decision**: Remove hardcoded `temperature`, `top_p`, `repeat_penalty`, `max_tokens` from sidecar
- **Rationale**: ADR-036 Profile-Only Parameter Governance; enable dynamic tuning without rebuild
- **Implementation**:
- Backend resolves parameters from `ai_execution_profiles` row `ocr-extract`
- Backend sends parameters to sidecar in every request
- Sidecar passes parameters to Ollama in every load/generate call
- Modfile serves as last-resort fallback only
- **Alternatives Rejected**:
- Keep hardcoded values in sidecar → rejected (violates ADR-036)
- Create new `PromptBuilderService` → rejected (use existing Active Prompt system)
### Active Prompt Integration
- **Decision**: Backend resolves systemPrompt and DMS tags from Active Prompt in `ai_prompts`
- **Rationale**: ADR-029/037 Active Prompt System; prompt authority in database not code
- **Implementation**:
- Backend resolves Active Prompt for `ocr_extraction` type
- Backend extracts systemPrompt and DMS tags (`<document_number>`, `<document_date>`, `<received_date>`)
- Backend sends systemPrompt and DMS tags to sidecar
- Sidecar receives and injects into Ollama request in every load/generate call
- **Alternatives Rejected**:
- Create new `PromptBuilderService` → rejected (use existing ADR-029/037 system)
- Hardcode DMS tags in sidecar → rejected (violates ADR-036 parameter governance)
## Authentication Decisions
### Two-Phase Auth Migration
- **Decision**: Phase 1 - Remove hardcoded default key; Phase 2 - Remove X-API-Key after ADR-041
- **Rationale**: Sequenced migration; network isolation only possible after server consolidation
- **Phase 1 Implementation**:
- Remove hardcoded default API key
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
- Continue validating X-API-Key on both sidecar and backend
- **Phase 2 Implementation** (after ADR-041 consolidation):
- Remove X-API-Key validation from sidecar endpoints
- Remove X-API-Key send-side from `OcrService`
- Remove X-API-Key send-side from `SandboxOcrEngineService`
- Rely on Docker-internal network isolation
- **Interim Period**: X-API-Key validation must remain active until ADR-041 cutover
- **Alternatives Considered**:
- Remove X-API-Key immediately → rejected (cross-host topology requires defense-in-depth)
- Keep X-API-Key permanently → rejected (adds complexity without value post-consolidation)
## Endpoint Decisions
### Remove /normalize Endpoint
- **Decision**: Remove `/normalize` endpoint from sidecar
- **Rationale**: No consumers exist (verified by grep across backend codebase); ThaiPreprocessProcessor unused
- **Verification**: Grep search found no calls to `/normalize` or `THAI_PREPROCESS_URL`
- **Impact**: None - endpoint has no consumers
### Fix Mutable Default Argument
- **Decision**: Fix mutable default argument `options_override={}` in `process_with_typhoon_ocr`
- **Rationale**: Python anti-pattern; causes unexpected behavior when defaults are mutated
- **Implementation**: Change to `options_override: dict = None` and initialize to `{}` in function body
## Dependencies
### External Dependencies
- **FastAPI 0.111.0**: Web framework (already in use)
- **httpx 0.27.0**: Async HTTP client (upgrade from sync httpx)
- **PyMuPDF 1.24.0**: PDF processing (already in use)
- **typhoon-ocr>=0.4.1**: OCR library (already in use)
- **FlagEmbedding>=1.2.0**: Embedding model (already in use)
- **pythainlp 5.0.4**: Thai NLP (already in use)
### Internal Dependencies
- **residency_policy.py**: Must retain for Adaptive OCR Residency
- **vram_monitor.py**: Must retain for VRAM monitoring
- **backend AI services**: OcrService, SandboxOcrEngineService must be updated for parameter resolution
## Testing Strategy
### Path Traversal Tests
- Test cases for various path traversal patterns (`../../etc/passwd`, symlinks, etc.)
- Expect 403 Forbidden for all malicious paths
- Use pytest for automated testing
### Residency Wiring Tests
- Unit test to verify `calculate_ocr_residency()` is called in `process_ocr`
- Verify keep_alive value is calculated dynamically, not fixed
- Test with different VRAM pressure scenarios
### Performance Tests
- Benchmark async vs sync I/O under concurrent load
- Target: 20%+ throughput improvement
- Measure response times and resource utilization
## Rollback Plan
If issues arise during deployment:
1. Revert `app.py` to previous version
2. Restore X-API-Key send-side in backend services
3. Re-pin `keep_alive` default to `0` in `process_ocr`
4. Restore hardcoded runtime params if needed for emergency fallback
## References
- ADR-040: OCR Sidecar Refactor
- ADR-036: Profile-Only Parameter Governance
- ADR-029: Dynamic Prompt Management
- ADR-037: Active Prompt System
- ADR-041: Server Consolidation (dependency for Phase 2)
- CONTEXT.md: GPU Policy (LLM-First Ownership, CPU Fallback)
@@ -0,0 +1,168 @@
# Feature Specification: OCR Sidecar Refactor
**Feature Branch**: `140-ocr-sidecar-refactor`
**Created**: 2026-06-20
**Status**: Draft
**Input**: ADR-040: OCR Sidecar Refactor — Pure Compute Worker, Preserved GPU Policy, Network-Trust Boundary
## User Scenarios & Testing _(mandatory)_
### User Story 1 - Sidecar Security Hardening (Priority: P1)
System administrators need to ensure the OCR sidecar on Desk-5439 is secure from path traversal attacks and does not contain hardcoded secrets that cannot be rotated without rebuilding containers.
**Why this priority**: Security vulnerabilities (hardcoded API keys, path traversal) are critical risks that could lead to unauthorized access and data breaches.
**Independent Test**: Can be fully tested by attempting path traversal requests and verifying that hardcoded default keys are rejected when environment variables are missing, delivering immediate security validation.
**Acceptance Scenarios**:
1. **Given** the sidecar is running with a leaked API key, **When** an attacker attempts to use it, **Then** the system should allow key rotation without container rebuild
2. **Given** a malicious request with path traversal (e.g., `../../etc/passwd`), **When** the `/ocr` endpoint receives the request, **Then** the system returns 403 Forbidden
3. **Given** the sidecar starts without `OCR_SIDECAR_API_KEY` environment variable, **When** the container initializes, **Then** it fails fast with clear error message
---
### User Story 2 - GPU Resource Management (Priority: P1)
The system must prevent VRAM exhaustion on Desk-5439 (RTX 5060 Ti 16GB) by implementing adaptive OCR residency policy and CPU fallback for retrieval models, ensuring the LLM (Typhoon-2.5) has priority GPU access.
**Why this priority**: VRAM exhaustion causes complete system failure. The LLM-First GPU Ownership policy is critical for system stability.
**Independent Test**: Can be fully tested by monitoring VRAM usage during concurrent OCR and embedding operations, verifying that BGE-M3 and FlagReranker fall back to CPU when GPU is under pressure.
**Acceptance Scenarios**:
1. **Given** the GPU is under heavy load from LLM operations, **When** an OCR request comes in, **Then** the system uses `calculate_ocr_residency()` to determine appropriate `keep_alive` value
2. **Given** VRAM is nearly full, **When** embedding or reranking requests are made, **Then** BGE-M3 and FlagReranker automatically fall back to CPU
3. **Given** the sidecar loads OCR model, **When** the operation completes, **Then** the model is unloaded based on residency policy (not fixed `keep_alive=0` or `300`)
---
### User Story 3 - Parameter Governance via Active Prompt (Priority: P2)
Backend services need to control AI model parameters (temperature, top_p, repeat_penalty, max_tokens, keep_alive) from the database via `ai_execution_profiles` and `ai_prompts` tables, ensuring no hardcoded values in the sidecar.
**Why this priority**: This enables dynamic parameter tuning without container rebuilds, aligning with ADR-036 Profile-Only Parameter Governance and ADR-029/037 Active Prompt System.
**Independent Test**: Can be fully tested by modifying `ai_execution_profiles` row `ocr-extract` and verifying that the sidecar uses the new parameters on the next request.
**Acceptance Scenarios**:
1. **Given** the `ai_execution_profiles` row `ocr-extract` has `temperature=0.7`, **When** the backend sends OCR request, **Then** the sidecar passes `temperature=0.7` to Ollama
2. **Given** the Active Prompt in `ai_prompts` contains system prompt and DMS tags, **When** the backend resolves the prompt, **Then** the sidecar receives and injects these into the Ollama request
3. **Given** a parameter is missing from the job snapshot, **When** the sidecar processes the request, **Then** it uses Modfile as last-resort fallback only
---
### User Story 4 - Async I/O Performance (Priority: P2)
The sidecar must use asynchronous I/O patterns to prevent blocking the FastAPI event loop, improving throughput and reducing latency for OCR operations.
**Why this priority**: Synchronous blocking I/O reduces system throughput and can cause request timeouts under load.
**Independent Test**: Can be fully tested by running concurrent OCR requests and measuring response times, verifying that async implementation handles load without blocking.
**Acceptance Scenarios**:
1. **Given** the sidecar receives multiple concurrent OCR requests, **When** processing with `httpx.AsyncClient`, **Then** requests do not block each other
2. **Given** the sidecar starts up, **When** models are loaded, **Then** loading happens via `asyncio.to_thread` to avoid blocking startup
3. **Given** the sidecar is under load, **When** measuring request latency, **Then** async implementation shows improved throughput compared to sync version
---
### User Story 5 - Network Isolation Auth (Phase 2, Post-Consolidation) (Priority: P3)
After ADR-041 server consolidation completes (single Docker host), the system should remove X-API-Key validation and rely solely on Docker-internal network isolation for authentication.
**Why this priority**: This is a future-phase improvement that simplifies the system after infrastructure consolidation. It's lower priority as it depends on ADR-041 completion.
**Independent Test**: Can be fully tested after consolidation by removing X-API-Key headers and verifying that requests from within Docker network succeed while external requests fail.
**Acceptance Scenarios**:
1. **Given** ADR-041 consolidation is complete (single Docker host), **When** backend calls sidecar without X-API-Key, **Then** the request succeeds via Docker-internal network
2. **Given** consolidation is complete, **When** external network attempts to call sidecar, **Then** the request is blocked by network isolation
3. **Given** the interim period (before consolidation), **When** backend calls sidecar, **Then** X-API-Key validation is still active
---
### Edge Cases
- What happens when the OCR sidecar receives a request for a PDF file that does not exist within the whitelisted base path? (Tested via path traversal test T007)
- How does the system handle VRAM exhaustion when both LLM and OCR models attempt to load simultaneously?
- What happens when the `ai_execution_profiles` row `ocr-extract` is missing or has invalid parameter values?
- How does the sidecar handle Ollama service unavailability or timeout during OCR processing? (Handled by FastAPI exception handling with user-friendly error messages per ADR-007)
- What happens when the Active Prompt system is unavailable during OCR request processing?
- How does the system handle concurrent requests when GPU is under extreme pressure (e.g., 95% VRAM usage)?
- What happens when path canonicalization resolves to a symlink outside the base path? (Tested via path traversal test T007 with symlink scenarios)
- How does the system behave during the transition period between Phase 1 (X-API-Key) and Phase 2 (Network Isolation)?
## Requirements _(mandatory)_
### Functional Requirements
- **FR-001**: Sidecar MUST remove hardcoded default API key and fail-fast if `OCR_SIDECAR_API_KEY` environment variable is missing
- **FR-002**: Sidecar MUST implement path canonicalization via `os.path.abspath()` + `os.path.realpath()` on all PDF path inputs
- **FR-003**: Sidecar MUST enforce base-path whitelist check on `/ocr` endpoint, rejecting paths outside `OCR_SIDECAR_UPLOAD_BASE` with 403 Forbidden
- **FR-004**: Sidecar MUST refactor `process_ocr` to use `async def` and `httpx.AsyncClient` via lifespan context manager
- **FR-005**: Sidecar MUST replace `@app.on_event("startup")` with `@asynccontextmanager` lifespan pattern
- **FR-006**: Sidecar MUST wire `calculate_ocr_residency(active_profile)` into `process_ocr` for dynamic `keep_alive` calculation
- **FR-007**: Sidecar MUST NOT accept explicit `options_override["keep_alive"]` from backend (keep_alive must be calculated lazily per ADR-036 Gap-2)
- **FR-008**: Sidecar MUST retain `vram_monitor.py` and `residency_policy.py` modules (reject deletion)
- **FR-009**: Sidecar MUST retain dynamic CPU/GPU selection for `/embed` and `/rerank` endpoints via `.to(device)` logic
- **FR-010**: Sidecar MUST remove hardcoded runtime parameters (temperature, top_p, repeat_penalty, max_tokens) and accept from backend job snapshot
- **FR-011**: Sidecar MUST receive systemPrompt and DMS extraction tags from backend and pass to Ollama in every load/generate call
- **FR-012**: Sidecar MUST remove `/normalize` endpoint (ThaiPreprocessProcessor has no consumers)
- **FR-013**: Sidecar MUST fix mutable default argument `options_override={}` in `process_with_typhoon_ocr`
- **FR-014**: Sidecar MUST load models via `asyncio.to_thread` during lifespan to avoid blocking startup
- **FR-015**: Backend MUST resolve runtime parameters from `ai_execution_profiles` row `ocr-extract` and send to sidecar
- **FR-016**: Backend MUST resolve systemPrompt and DMS tags from Active Prompt in `ai_prompts` (ADR-029/037)
- **FR-017**: Backend MUST send resolved parameters to sidecar in every OCR request
- **FR-018**: Phase 2 (post-ADR-041): Sidecar MUST remove X-API-Key validation from all endpoints
- **FR-019**: Phase 2 (post-ADR-041): Backend MUST remove X-API-Key send-side in `OcrService`
- **FR-020**: Phase 2 (post-ADR-041): Backend MUST remove X-API-Key send-side in `SandboxOcrEngineService`
### Key Entities
- **OCR Sidecar (FastAPI Service)**: Pure compute worker on Desk-5439 that provides `/ocr`, `/embed`, `/rerank` endpoints. No business logic or parameter governance. Receives parameters from backend.
- **ai_execution_profiles**: Database table containing runtime parameter profiles for different AI operations (row `ocr-extract` for OCR parameters)
- **ai_prompts**: Database table containing prompt templates with versioning and activation status (ADR-029/037)
- **Backend OcrService**: Service that orchestrates OCR requests, resolves parameters from database, and sends to sidecar
- **Backend SandboxOcrEngineService**: Service for OCR sandbox testing, similar parameter resolution as OcrService
## Success Criteria _(mandatory)_
### Measurable Outcomes
- **SC-001**: Path traversal attacks return 403 Forbidden in 100% of test cases (verified by pytest suite)
- **SC-002**: VRAM exhaustion is prevented under load; system remains stable with LLM-First GPU Ownership policy (verified by VRAM monitoring during stress test)
- **SC-003**: OCR request throughput improves by at least 20% with async I/O implementation (measured by concurrent request benchmark)
- **SC-004**: Parameter changes in `ai_execution_profiles` take effect immediately without container rebuild (verified by runtime parameter update test)
- **SC-005**: System startup time does not increase despite async model loading (measured by container startup benchmark)
- **SC-006**: No hardcoded secrets remain in sidecar codebase (verified by code audit)
- **SC-007**: All sidecar endpoints respect network isolation after ADR-041 consolidation (verified by network access test)
- **SC-008**: CPU fallback for BGE-M3 and FlagReranker activates correctly when GPU is under pressure (verified by VRAM monitoring test)
## Assumptions
- ADR-041 server consolidation will complete before Phase 2 (X-API-Key removal) can be implemented
- Desk-5439 (192.168.10.100) will continue to host the OCR sidecar with RTX 5060 Ti 16GB GPU
- Ollama service on Desk-5439 will continue to provide Typhoon OCR model
- ThaiPreprocessProcessor has no active consumers (verified by grep search across backend codebase)
- `calculate_ocr_residency()` function exists in `residency_policy.py` and is not currently wired into `process_ocr`
- VLAN/firewall ACL provides interim network security before ADR-041 consolidation
## Dependencies
- ADR-041 Server Consolidation must complete before Phase 2 (X-API-Key removal)
- ADR-036 Profile-Only Parameter Governance must be implemented for parameter resolution
- ADR-029 Dynamic Prompt Management must be implemented for Active Prompt system
- ADR-037 Active Prompt System must be operational for system prompt injection
- Desk-5439 infrastructure must remain stable (GPU, network, Ollama service)
## Out of Scope
- 1-page-1-request horizontal scaling rework (separate future ADR)
- OpenTelemetry/Prometheus/Grafana observability (separate ticket)
- `/normalize` endpoint functionality (removed per D2; ThaiPreprocessProcessor has no consumers)
@@ -0,0 +1,296 @@
# Tasks: OCR Sidecar Refactor
**Input**: Design documents from `/specs/100-Infrastructures/140-ocr-sidecar-refactor/`
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/sidecar-api.md, quickstart.md
**Tests**: Tests are included for path-traversal protection and residency wiring (per spec acceptance criteria)
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
- Include exact file paths in descriptions
## Path Conventions
- **Sidecar**: `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/`
- **Backend**: `backend/src/modules/ai/`
- **Tests**: `tests/unit/ocr-sidecar/`, `tests/integration/ocr-sidecar/`
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Project initialization and basic structure
- [x] T001 Create test directory structure in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests/
- [x] T002 Create test directory structure in tests/unit/ocr-sidecar/
- [x] T003 Create test directory structure in tests/integration/ocr-sidecar/
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
- [x] T004 Update requirements.txt in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/requirements.txt (add httpx 0.27.0, remove numpy if present)
- [x] T005 Update .env template in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env (add OCR_SIDECAR_API_KEY placeholder)
- [x] T006 Update backend .env.example in backend/.env.example (add OCR_API_URL, OCR_API_KEY placeholders)
**Checkpoint**: Foundation ready - user story implementation can now begin in parallel
---
## Phase 3: User Story 1 - Sidecar Security Hardening (Priority: P1) 🎯 MVP
**Goal**: Ensure the OCR sidecar is secure from path traversal attacks and does not contain hardcoded secrets that cannot be rotated without rebuilding containers.
**Independent Test**: Attempt path traversal requests and verify they return 403 Forbidden; verify sidecar fails fast when OCR_SIDECAR_API_KEY env is missing.
### Tests for User Story 1
- [x] T007 [P] [US1] Create path traversal test in tests/unit/ocr-sidecar/test_path_traversal.py (test various path patterns: ../../etc/passwd, symlinks outside base path, etc.)
- [x] T008 [P] [US1] Create API key validation test in tests/unit/ocr-sidecar/test_api_key_validation.py (test missing key, invalid key scenarios)
### Implementation for User Story 1
- [x] T009 [US1] Remove hardcoded default API key in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T010 [US1] Add fail-fast check for OCR_SIDECAR_API_KEY environment variable in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (raise error on startup if missing)
- [x] T011 [US1] Implement path canonicalization function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (using os.path.abspath + os.path.realpath)
- [x] T012 [US1] Implement base-path whitelist check in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check against OCR_SIDECAR_UPLOAD_BASE)
- [x] T013 [US1] Add path validation to POST /ocr endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (return 403 for invalid paths)
- [x] T014 [US1] Fix mutable default argument options_override={} in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (change to None and initialize in function body)
- [x] T015 [US1] Remove duplicate import tempfile in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
**Checkpoint**: At this point, User Story 1 should be fully functional and testable independently
---
## Phase 4: User Story 2 - GPU Resource Management (Priority: P1)
**Goal**: Prevent VRAM exhaustion on Desk-5439 by implementing adaptive OCR residency policy and CPU fallback for retrieval models, ensuring LLM has priority GPU access.
**Independent Test**: Monitor VRAM usage during concurrent OCR and embedding operations; verify BGE-M3 and FlagReranker fall back to CPU when GPU is under pressure.
### Tests for User Story 2
- [x] T016 [P] [US2] Create residency wiring unit test in tests/unit/ocr-sidecar/test_residency_wiring.py (verify calculate_ocr_residency is called in process_ocr)
- [x] T017 [P] [US2] Create CPU fallback integration test in tests/integration/ocr-sidecar/test_cpu_fallback.py (verify BGE-M3 and FlagReranker use CPU when GPU under pressure)
### Implementation for User Story 2
- [x] T018 [US2] Import calculate_ocr_residency from residency_policy.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T019 [US2] Wire calculate_ocr_residency(active_profile) into process_ocr function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T020 [US2] Remove hardcoded keep_alive=0 in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T021 [US2] Reject explicit options_override["keep_alive"] from backend in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (keep_alive must be calculated lazily per ADR-036 Gap-2)
- [x] T022 [US2] Retain vram_monitor.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/ (ensure not deleted)
- [x] T023 [US2] Retain residency_policy.py in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/ (ensure not deleted)
- [x] T024 [US2] Verify dynamic CPU/GPU selection exists for /embed endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check .to(device) logic)
- [x] T025 [US2] Verify dynamic CPU/GPU selection exists for /rerank endpoint in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (check .to(device) logic)
**Checkpoint**: At this point, User Stories 1 AND 2 should both work independently
---
## Phase 5: User Story 3 - Parameter Governance via Active Prompt (Priority: P2)
**Goal**: Enable backend services to control AI model parameters from the database via ai_execution_profiles and ai_prompts tables, ensuring no hardcoded values in the sidecar.
**Independent Test**: Modify ai_execution_profiles row ocr-extract and verify that the sidecar uses the new parameters on the next request.
### Tests for User Story 3
- [x] T026 [P] [US3] Create parameter resolution integration test in tests/integration/ocr-sidecar/test_parameter_governance.py (verify parameters from ai_execution_profiles are used)
- [x] T027 [P] [US3] Create Active Prompt integration test in tests/integration/ocr-sidecar/test_active_prompt.py (verify systemPrompt and DMS tags from ai_prompts are used)
### Implementation for User Story 3
- [x] T028 [US3] Remove hardcoded runtime parameters (temperature, top_p, repeat_penalty, max_tokens) in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T029 [US3] Add runtime_params field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T030 [US3] Add system_prompt field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T031 [US3] Add dms_tags field to OcrRequest pydantic model in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T032 [US3] Pass runtime_params to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T033 [US3] Pass system_prompt to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (inject into every load/generate call)
- [x] T034 [US3] Pass dms_tags to Ollama in process_ocr in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (inject into every load/generate call)
- [x] T035 [US3] Implement parameter resolution in backend/src/modules/ai/services/ocr.service.ts (resolve from ai_execution_profiles row ocr-extract)
- [x] T036 [US3] Implement Active Prompt resolution in backend/src/modules/ai/services/ocr.service.ts (resolve from ai_prompts type ocr_extraction)
- [x] T037 [US3] Extract systemPrompt and DMS tags in backend/src/modules/ai/services/ocr.service.ts
- [x] T038 [US3] Send resolved parameters to sidecar in backend/src/modules/ai/services/ocr.service.ts
- [x] T039 [US3] Implement parameter resolution in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts (same pattern as ocr.service.ts)
- [x] T040 [US3] Implement Active Prompt resolution in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts (same pattern as ocr.service.ts)
**Checkpoint**: All user stories should now be independently functional
---
## Phase 6: User Story 4 - Async I/O Performance (Priority: P2)
**Goal**: Use asynchronous I/O patterns to prevent blocking the FastAPI event loop, improving throughput and reducing latency for OCR operations.
**Independent Test**: Run concurrent OCR requests and measure response times; verify async implementation handles load without blocking.
### Tests for User Story 4
- [x] T041 [P] [US4] Create async I/O performance test in tests/integration/ocr-sidecar/test_async_performance.py (benchmark concurrent requests)
### Implementation for User Story 4
- [x] T042 [US4] Refactor process_ocr to async def in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T043 [US4] Create AsyncClient via lifespan context manager in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T044 [US4] Replace httpx.Client with httpx.AsyncClient in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T045 [US4] Replace @app.on_event("startup") with @asynccontextmanager lifespan in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T046 [US4] Load models via asyncio.to_thread during lifespan in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py (avoid blocking startup)
---
## Phase 7: User Story 5 - Network Isolation Auth Phase 2 (Priority: P3)
**Goal**: After ADR-041 server consolidation completes, remove X-API-Key validation and rely solely on Docker-internal network isolation for authentication.
**Independent Test**: After consolidation, remove X-API-Key headers and verify that requests from within Docker network succeed while external requests fail.
### Tests for User Story 5
- [ ] T047 [P] [US5] Create network isolation test in tests/integration/ocr-sidecar/test_network_isolation.py (verify Docker-internal requests work, external requests fail)
### Implementation for User Story 5 (BLOCKED until ADR-041 consolidation complete)
- [ ] T048 [US5] Remove X-API-Key validation from all endpoints in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [ ] T049 [US5] Remove OCR_SIDECAR_API_KEY from .env in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/.env
- [ ] T050 [US5] Remove X-API-Key send-side in backend/src/modules/ai/services/ocr.service.ts
- [ ] T051 [US5] Remove X-API-Key send-side in backend/src/modules/ai/services/sandbox-ocr-engine.service.ts
- [ ] T052 [US5] Remove OCR_API_KEY from backend .env in backend/.env
- [ ] T053 [US5] Update OCR_API_URL to Docker-internal URL in backend/.env (e.g., http://sidecar:8765)
**Note**: Phase 7 tasks are BLOCKED until ADR-041 server consolidation completes. Do not implement until ADR-041 cutover is successful.
---
## Phase 8: Remove /normalize Endpoint (Cross-Cutting)
**Purpose**: Remove unused /normalize endpoint per ADR-040 D2
- [x] T054 Remove /normalize endpoint from specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py
- [x] T055 Verify no consumers exist via grep search in backend codebase
---
## Phase 9: Polish & Cross-Cutting Concerns
**Purpose**: Improvements that affect multiple user stories
- [x] T056 [P] Update Dockerfile in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/Dockerfile (if any changes needed)
- [x] T057 [P] Update docker-compose.yml in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/docker-compose.yml (if any changes needed)
- [x] T058 Run path traversal test suite and verify all tests pass
- [x] T059 Run residency wiring test suite and verify all tests pass
- [x] T060 Run parameter governance test suite and verify all tests pass
- [x] T061 Run async performance test and verify 20%+ throughput improvement
- [x] T062 Update documentation in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/README.md
- [x] T063 Validate quickstart.md deployment steps on Desk-5439
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies - can start immediately
- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
- **User Stories (Phase 3-6)**: All depend on Foundational phase completion
- User Stories 1-4 (P1, P1, P2, P2) can proceed in parallel after Phase 2
- User Story 5 (P3) is BLOCKED until ADR-041 consolidation completes
- **Remove /normalize (Phase 8)**: Can run in parallel with user stories (no dependencies)
- **Polish (Phase 9)**: Depends on all desired user stories being complete
### User Story Dependencies
- **User Story 1 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 2 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 3 (P2)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 4 (P2)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 5 (P3)**: BLOCKED until ADR-041 consolidation completes
### Within Each User Story
- Tests MUST be written and FAIL before implementation (TDD approach)
- Sidecar implementation before backend implementation (for parameter governance story)
- Core implementation before integration
- Story complete before moving to next priority
### Parallel Opportunities
- All Setup tasks (T001-T003) can run in parallel
- All Foundational tasks (T004-T006) can run in parallel
- Once Foundational phase completes, User Stories 1-4 can start in parallel (if team capacity allows)
- All tests for a user story marked [P] can run in parallel
- User Story 5 tasks can run in parallel once ADR-041 consolidation completes
- Remove /normalize task (T054-T055) can run in parallel with user stories
- Polish tasks (T056-T057) can run in parallel
---
## Parallel Example: User Story 1
```bash
# Launch all tests for User Story 1 together:
Task: "Create path traversal test in tests/unit/ocr-sidecar/test_path_traversal.py"
Task: "Create API key validation test in tests/unit/ocr-sidecar/test_api_key_validation.py"
# Launch implementation tasks sequentially (each depends on previous):
Task: "Remove hardcoded default API key in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
Task: "Add fail-fast check for OCR_SIDECAR_API_KEY environment variable in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
Task: "Implement path canonicalization function in specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py"
```
---
## Implementation Strategy
### MVP First (User Stories 1-2 Only - Critical Security & GPU Management)
1. Complete Phase 1: Setup
2. Complete Phase 2: Foundational (CRITICAL - blocks all stories)
3. Complete Phase 3: User Story 1 (Security Hardening)
4. Complete Phase 4: User Story 2 (GPU Resource Management)
5. **STOP and VALIDATE**: Test User Stories 1-2 independently
6. Deploy/demo if ready
### Incremental Delivery
1. Complete Setup + Foundational → Foundation ready
2. Add User Story 1 → Test independently → Deploy/Demo (Security MVP!)
3. Add User Story 2 → Test independently → Deploy/Demo (GPU Management MVP!)
4. Add User Story 3 → Test independently → Deploy/Demo (Parameter Governance)
5. Add User Story 4 → Test independently → Deploy/Demo (Async Performance)
6. Wait for ADR-041 consolidation → Add User Story 5 → Test independently → Deploy/Demo
7. Each story adds value without breaking previous stories
### Parallel Team Strategy
With multiple developers:
1. Team completes Setup + Foundational together
2. Once Foundational is done:
- Developer A: User Story 1 (Security)
- Developer B: User Story 2 (GPU Management)
- Developer C: User Story 3 (Parameter Governance)
- Developer D: User Story 4 (Async I/O)
3. Stories complete and integrate independently
4. After ADR-041 consolidation: Developer A/E: User Story 5 (Network Isolation)
---
## Notes
- [P] tasks = different files, no dependencies
- [Story] label maps task to specific user story for traceability
- Each user story should be independently completable and testable
- Verify tests fail before implementing
- Commit after each task or logical group
- Stop at any checkpoint to validate story independently
- User Story 5 is BLOCKED until ADR-041 consolidation completes
- Phase 7 tasks should NOT be started until ADR-041 cutover is successful
- Avoid: vague tasks, same file conflicts, cross-story dependencies that break independence
@@ -0,0 +1,36 @@
# Specification Quality Checklist: Single-Host Server Consolidation
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-06-20
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs) — spec focuses on operational outcomes
- [x] Focused on user value and business needs — admin/ops workflows clearly defined
- [x] Written for non-technical stakeholders — user stories describe journeys, not code
- [x] All mandatory sections completed — User Scenarios, Requirements, Success Criteria all filled
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain — all requirements have clear definitions
- [x] Requirements are testable and unambiguous — each FR has measurable acceptance criteria
- [x] Success criteria are measurable — SC-001 through SC-010 have specific metrics
- [x] Success criteria are technology-agnostic — focus on outcomes (parity, latency, uptime) not tools
- [x] All acceptance scenarios are defined — 5 user stories with Given/When/Then scenarios
- [x] Edge cases are identified — 7 edge cases covering GPU OOM, RAM, CIFS, SPOF, network, migration failures
- [x] Scope is clearly bounded — includes provisioning, migration, cutover, security, decommission
- [x] Dependencies and assumptions identified — 7 assumptions documented
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria — FR-001 through FR-015 mapped to user stories
- [x] User scenarios cover primary flows — P1 (provision) → P2 (migrate) → P3 (cutover) → P4 (security) → P5 (decommission)
- [x] Feature meets measurable outcomes defined in Success Criteria — 10 measurable outcomes
- [x] No implementation details leak into specification — Docker/tech names are inherent to infra spec but kept at architecture level
## Notes
- This is an infrastructure specification based on ADR-041; some technical terms (Docker, CIFS, VRAM) are inherent to the domain
- ADR-040 (OCR Sidecar Refactor) is a hard dependency for FR-008 (remove X-API-Key) and FR-009 (GPU VRAM management)
- Spec is ready for `/speckit-clarify` or `/speckit-plan`
@@ -0,0 +1,69 @@
# Docker Compose Contract: New Host
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
This contract defines the service topology for the consolidated single-host deployment.
The actual `docker-compose.new-host.yml` will be created at:
`specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
## Service Topology
| Service | Image | Networks | LAN Ports | Internal Port | Memory Limit | Depends On |
|---------|-------|----------|-----------|---------------|--------------|------------|
| ollama | ollama/ollama:latest | dms-internal | none | 11434 | 2G (host) | — |
| ocr-sidecar | build (local) | dms-internal | none | 8765 | 1G | ollama |
| backend | lcbp3-backend:latest | dms-internal, dms-frontend | 3001→3000 | 3000 | 2G | ollama, ocr-sidecar, redis, mariadb, elasticsearch, qdrant, clamav |
| frontend | lcbp3-frontend:latest | dms-frontend | 3000 | 3000 | 1G | backend |
| redis | redis:7-alpine | dms-internal | none | 6379 | 1G | — |
| mariadb | mariadb:11.8 | dms-internal | none | 3306 | 8G | — |
| elasticsearch | elasticsearch:8.11.1 | dms-internal | none | 9200 | 4G | — |
| qdrant | qdrant/qdrant:v1.16.1 | dms-internal | none | 6333 | 1G | — |
| clamav | clamav/clamav:1.4.4 | dms-internal | none | 3310 | 2G | — |
| ollama-metrics | ghcr.io/norskhelsenett/ollama-metrics:latest | dms-internal | 9924 | 9924 | 256M | ollama |
## Network Topology
```
dms-internal (bridge, no LAN access)
├── ollama:11434
├── ocr-sidecar:8765
├── backend:3000 (also on dms-frontend)
├── redis:6379
├── mariadb:3306
├── elasticsearch:9200
├── qdrant:6333
├── clamav:3310
└── ollama-metrics:9924
dms-frontend (bridge, LAN published)
├── frontend:3000 → LAN:3000
├── backend:3000 → LAN:3001 (NPM routes backend.np-dms.work → :3001)
└── ollama-metrics:9924 → LAN:9924 (Prometheus scrape target)
```
## Environment Variables (New)
| Variable | Default | Description |
|----------|---------|-------------|
| ASUSTOR_USER | (required) | CIFS share username |
| ASUSTOR_PASS | (required) | CIFS share password |
| NEW_HOST_IP | (required) | New host LAN IP for CI/CD deploy target |
## Environment Variables (Changed from QNAP)
| Variable | Old Value (QNAP) | New Value (New Host) |
|----------|------------------|---------------------|
| DB_HOST | mariadb | mariadb (unchanged — Docker DNS) |
| REDIS_HOST | cache | redis (service name change) |
| ELASTICSEARCH_HOST | search | elasticsearch (service name change) |
| QDRANT_HOST | qdrant | qdrant (unchanged) |
| OCR_API_URL | http://192.168.10.100:8765 | http://ocr-sidecar:8765 |
| OLLAMA_API_URL | http://192.168.10.100:11434 | http://ollama:11434 |
| CLAMAV_HOST | clamav | clamav (unchanged) |
## Removed Environment Variables
| Variable | Reason |
|----------|--------|
| OCR_SIDECAR_API_KEY | ADR-040 D5 — network-only auth, no API key needed |
| OCR_SIDECAR_UPLOAD_BASE | Still needed but value changes to /mnt/uploads (same) |
@@ -0,0 +1,230 @@
// File: specs/100-Infrastructures/141-server-consolidation/data-model.md
// Change Log:
// - 2026-06-20: Initial data model for Single-Host Server Consolidation
# Data Model: Single-Host Server Consolidation
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
## Infrastructure Entities
### 1. Docker Network: dms-internal
| Attribute | Type | Description |
|-----------|------|-------------|
| name | string | `dms-internal` |
| driver | string | `bridge` |
| scope | string | local (single host) |
| published_ports | none | No ports published to LAN |
**Members**: ollama, ocr-sidecar, backend, redis, mariadb, elasticsearch, qdrant, clamav, ollama-metrics
### 2. Docker Network: dms-frontend
| Attribute | Type | Description |
|-----------|------|-------------|
| name | string | `dms-frontend` |
| driver | string | `bridge` |
| scope | string | local (single host) |
| published_ports | 3000 (frontend), 3001→3000 (backend), 9924 (ollama-metrics) | Only ports published to LAN |
**Members**: frontend, backend
### 3. Docker Volume: asustor_uploads
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` |
| type | string | `cifs` |
| device | string | `//192.168.10.9/np-dms-as/data/uploads` |
| mount_options | string | `username=${ASUSTOR_USER},password=${ASUSTOR_PASS},vers=3.0,uid=0,gid=0` |
| mount_point (sidecar) | string | `/mnt/uploads` (read-only) |
| mount_point (backend) | string | `/app/uploads` (read-write) |
### 4. Docker Volume: ollama_models
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/root/.ollama` |
| content | string | Ollama model files (np-dms-ai, np-dms-ocr, nomic-embed-text) |
### 5. Docker Volume: mariadb_data
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/var/lib/mysql` |
| content | string | MariaDB data files (migrated from QNAP) |
### 6. Docker Volume: es_data
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/usr/share/elasticsearch/data` |
| content | string | Elasticsearch indices (migrated from QNAP) |
### 7. Docker Volume: redis_data
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/data` |
| content | string | Redis AOF persistence + BullMQ queue data |
### 8. Docker Volume: qdrant_data
| Attribute | Type | Description |
|-----------|------|-------------|
| driver | string | `local` (named volume) |
| mount_point | string | `/qdrant/storage` |
| content | string | Qdrant vector collections |
## Service Definitions
### ollama
| Attribute | Value |
|-----------|-------|
| image | `ollama/ollama:latest` |
| GPU | NVIDIA RTX 5060 Ti 16GB (passthrough) |
| network | dms-internal only |
| ports | none (expose 11434 internal only) |
| volumes | ollama_models → /root/.ollama |
| depends_on | none |
| healthcheck | `ollama list` (verify API responsive) |
### ocr-sidecar
| Attribute | Value |
|-----------|-------|
| build | `./specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar` |
| network | dms-internal only |
| ports | none (expose 8765 internal only) |
| volumes | asustor_uploads → /mnt/uploads (read-only) |
| depends_on | ollama |
| env | OLLAMA_API_URL=http://ollama:11434, OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads |
| healthcheck | `curl -f http://localhost:8765/health` |
### backend
| Attribute | Value |
|-----------|-------|
| image | `lcbp3-backend:${BACKEND_IMAGE_TAG:-latest}` |
| networks | dms-internal + dms-frontend |
| ports | 3001:3000 (published to LAN — NPM routes `backend.np-dms.work` → :3001) |
| volumes | asustor_uploads → /app/uploads (read-write) |
| depends_on | ollama, ocr-sidecar, redis, mariadb, elasticsearch, qdrant, clamav |
| env | OCR_API_URL=http://ocr-sidecar:8765, OLLAMA_API_URL=http://ollama:11434, DB_HOST=mariadb, REDIS_HOST=redis, ELASTICSEARCH_HOST=elasticsearch, QDRANT_HOST=qdrant |
| healthcheck | `curl -f http://localhost:3000/health` |
| memory_limit | 2G |
### frontend
| Attribute | Value |
|-----------|-------|
| image | `lcbp3-frontend:${FRONTEND_IMAGE_TAG:-latest}` |
| networks | dms-frontend only |
| ports | 3000:3000 (published to LAN) |
| depends_on | backend |
| env | INTERNAL_API_URL=http://backend:3000/api |
| healthcheck | `curl -f http://localhost:3000/` |
| memory_limit | 1G |
### redis
| Attribute | Value |
|-----------|-------|
| image | `redis:7-alpine` |
| network | dms-internal only |
| ports | none (expose 6379 internal only) |
| volumes | redis_data → /data |
| command | `redis-server --requirepass ${REDIS_PASSWORD} --appendonly yes --maxmemory-policy noeviction` |
| healthcheck | `redis-cli -a ${REDIS_PASSWORD} --no-auth-warning ping` |
| memory_limit | 1G |
### mariadb
| Attribute | Value |
|-----------|-------|
| image | `mariadb:11.8` |
| network | dms-internal only |
| ports | none (expose 3306 internal only) |
| volumes | mariadb_data → /var/lib/mysql |
| env | MARIADB_ROOT_PASSWORD, MARIADB_DATABASE=lcbp3, MARIADB_USER=center |
| command | `--character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci` |
| healthcheck | `healthcheck.sh --connect --innodb_initialized` |
| memory_limit | 8G |
### elasticsearch
| Attribute | Value |
|-----------|-------|
| image | `elasticsearch:8.11.1` |
| network | dms-internal only |
| ports | none (expose 9200 internal only) |
| volumes | es_data → /usr/share/elasticsearch/data |
| env | discovery.type=single-node, xpack.security.enabled=false, ES_JAVA_OPTS=-Xms2g -Xmx2g |
| healthcheck | `curl -s http://localhost:9200/_cluster/health` |
| memory_limit | 4G |
### qdrant
| Attribute | Value |
|-----------|-------|
| image | `qdrant/qdrant:v1.16.1` |
| network | dms-internal only |
| ports | none (expose 6333 internal only) |
| volumes | qdrant_data → /qdrant/storage |
| healthcheck | TCP check on port 6333 |
| memory_limit | 1G |
### clamav
| Attribute | Value |
|-----------|-------|
| image | `clamav/clamav:1.4.4` |
| network | dms-internal only |
| ports | none (expose 3310 internal only) |
| healthcheck | `clamdcheck.sh` |
| memory_limit | 2G |
### ollama-metrics
| Attribute | Value |
|-----------|-------|
| image | `ghcr.io/norskhelsenett/ollama-metrics:latest` |
| network | dms-internal only |
| ports | 9924:9924 (published to LAN — Prometheus on ASUSTOR scrapes `http://<new-host-ip>:9924/metrics`) |
| env | OLLAMA_HOST=http://ollama:11434 |
| memory_limit | 256M |
## Service Communication Map
```
LAN (VLAN 10)
├── :3000 (Frontend) ──→ http://backend:3000/api (dms-frontend)
├── :3001 (Backend) ──→ http://backend:3000/api (dms-frontend)
└── :9924 (ollama-metrics) ──→ Prometheus scrape target
├──→ mariadb:3306 (dms-internal)
├──→ redis:6379 (dms-internal)
├──→ elasticsearch:9200 (dms-internal)
├──→ qdrant:6333 (dms-internal)
├──→ clamav:3310 (dms-internal)
├──→ ocr-sidecar:8765 (dms-internal)
└──→ ollama:11434 (dms-internal)
```
## Path Mapping
| Service | Container Path | Source |
|---------|---------------|--------|
| Backend | `/app/uploads/temp` | ASUSTOR CIFS `/data/uploads/temp` |
| Backend | `/app/uploads/permanent` | ASUSTOR CIFS `/data/uploads/permanent` |
| Sidecar | `/mnt/uploads/temp` (read-only) | ASUSTOR CIFS `/data/uploads/temp` |
| Sidecar | `/mnt/uploads/permanent` (read-only) | ASUSTOR CIFS `/data/uploads/permanent` |
**Note**: Backend uses `/app/uploads` (read-write), Sidecar uses `/mnt/uploads` (read-only). Both map to the same ASUSTOR CIFS share. Path remapping in `ocr.service.ts` (`remapPath()`) continues to work — strip `/app/uploads` and replace with `/mnt/uploads`.
@@ -0,0 +1,124 @@
// File: specs/100-Infrastructures/141-server-consolidation/plan.md
// Change Log:
// - 2026-06-20: Initial implementation plan for Single-Host Server Consolidation
# Implementation Plan: Single-Host Server Consolidation
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/100-Infrastructures/141-server-consolidation/spec.md`
**Related ADRs**: [ADR-041](../../06-Decision-Records/ADR-041-server-consolidation.md), [ADR-040](../../06-Decision-Records/ADR-040-ocr-sidecar-refactor.md)
## Summary
Consolidate all LCBP3-DMS services from a 2-host architecture (QNAP NAS + Desk-5439) onto a single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB). ASUSTOR becomes primary NAS for file storage via CIFS. Docker internal bridge network isolates Ollama and OCR Sidecar from LAN, enabling removal of X-API-Key auth (ADR-040 D5). QNAP becomes backup server; Desk-5439 is retired.
## Technical Context
**Language/Version**: Docker Compose v2 (YAML), Bash scripts, PowerShell provisioning
**Primary Dependencies**: Docker Engine 24+, Docker Compose v2, NVIDIA Container Toolkit, CIFS Utils
**Storage**: MariaDB 11.8 (Docker volume), Elasticsearch 8.11 (Docker volume), Redis 7 (Docker volume), Qdrant v1.16 (Docker volume), ASUSTOR CIFS for file uploads
**Testing**: Smoke tests (manual + scripted), health check endpoints, data parity verification scripts
**Target Platform**: Linux (Ubuntu 22.04 LTS or Debian 12) on Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB
**Project Type**: Infrastructure (Docker Compose stack + provisioning scripts)
**Performance Goals**: Backend-to-Ollama latency <50ms (localhost vs ~2ms LAN), all containers healthy within 5 min
**Constraints**: 32GB RAM total (target <28GB usage), 16GB VRAM (target <15GB usage), CIFS mount reliability
**Scale/Scope**: 8 containers (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, ES, Qdrant) + ClamAV + ollama-metrics
## Constitution Check
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
| Principle | Status | Notes |
|-----------|--------|-------|
| ADR-016 Security | ✅ Pass | Network isolation replaces API key; no ports published for internal services |
| ADR-019 UUID | ✅ Pass | No UUID changes — infrastructure only |
| ADR-009 Schema | ✅ Pass | No schema changes — data migration via dump/restore |
| ADR-023/023A AI Boundary | ✅ Pass | Ollama isolated on Docker internal network; no direct DB/storage access |
| ADR-040 D5 Network Auth | ✅ Pass | Docker bridge isolation enables X-API-Key removal |
| ADR-008 BullMQ | ✅ Pass | Redis co-located on same host; queue behavior unchanged |
| ADR-002 Document Numbering | ✅ Pass | Redis Redlock unchanged; co-located reduces lock latency |
| SPOF Risk | ⚠️ Acknowledged | Single host = SPOF; mitigated by QNAP backup + DR plan |
**Gate Result**: PASS — no violations. SPOF risk is acknowledged in ADR-041 with mitigation plan.
## Project Structure
### Documentation (this feature)
```text
specs/100-Infrastructures/141-server-consolidation/
├── spec.md # Feature specification
├── plan.md # This file
├── research.md # Phase 0 output — research findings
├── data-model.md # Phase 1 output — infrastructure data model
├── quickstart.md # Phase 1 output — deployment guide
├── contracts/ # Phase 1 output — docker-compose contracts
│ └── docker-compose.new-host.yml
├── checklists/
│ └── requirements.md # Spec quality checklist
└── tasks.md # Phase 2 output (/speckit.tasks command)
```
### Source Code (repository root)
```text
specs/04-Infrastructure-OPS/04-00-docker-compose/
├── New-Host/ # NEW — consolidated host
│ ├── docker-compose.new-host.yml # Unified compose for all 8+ services
│ ├── .env.template # Environment template for new host
│ ├── ocr-sidecar/ # Sidecar (copied from Desk-5439, adapted)
│ │ ├── Dockerfile
│ │ ├── app.py
│ │ └── requirements.txt
│ ├── scripts/
│ │ ├── provision-host.sh # OS prep + Docker + NVIDIA toolkit
│ │ ├── migrate-mariadb.sh # Dump from QNAP → restore to new host
│ │ ├── migrate-elasticsearch.sh # Snapshot from QNAP → restore to new host
│ │ ├── smoke-test.sh # Post-cutover verification
│ │ └── rollback.sh # Emergency rollback to QNAP + Desk-5439
│ └── README.md # Deployment guide for new host
├── QNAP/ # EXISTING — becomes backup
├── Desk-5439/ # EXISTING — retired after cutover
└── ASUSTOR/ # EXISTING — Gitea runner stays
```
**Structure Decision**: New `New-Host/` directory under existing `04-00-docker-compose/` follows the established per-host directory pattern (QNAP/, Desk-5439/, ASUSTOR/). The unified compose file replaces the split QNAP/app + QNAP/service + QNAP/mariadb + Desk-5439/ocr-sidecar pattern with a single stack.
## Complexity Tracking
> No constitution check violations — table not needed.
## Implementation Phases
### Phase 1: Provision New Host (T001-T002)
- Install Ubuntu 22.04 LTS / Debian 12
- Install Docker Engine + Docker Compose v2
- Install NVIDIA drivers + nvidia-container-toolkit
- Mount ASUSTOR CIFS share to `/mnt/uploads`
- Create directory structure for Docker volumes
### Phase 2: Create Unified Docker Compose (T003-T005)
- Write `docker-compose.new-host.yml` with all services
- Configure `dms-internal` bridge network (no LAN publish for Ollama/sidecar)
- Configure `dms-frontend` bridge network (Frontend + Backend published)
- Copy OCR sidecar code from Desk-5439, adapt for Docker-internal Ollama URL
- Configure per-container memory limits per ADR-041 D5
### Phase 3: Migrate Data (T006-T007)
- Dump MariaDB from QNAP → restore to new host container
- Snapshot Elasticsearch from QNAP → restore to new host container
- Verify row count + document count parity
- Verify CIFS file access from backend container
### Phase 4: Cutover (T008-T010)
- Update Gitea CI/CD deploy target to new host
- Deploy services on new host
- Run smoke tests (login, document CRUD, OCR, AI, search)
- Remove X-API-Key from sidecar + backend (ADR-040 D5)
- Update DNS/NPM to point to new host
### Phase 5: Decommission (T011-T012)
- Stop services on QNAP (retain data for backup)
- Retire Desk-5439 (power off or repurpose)
- Monitor RAM/VRAM for 24-48 hours
- Document rollback procedure
@@ -0,0 +1,154 @@
// File: specs/100-Infrastructures/141-server-consolidation/quickstart.md
// Change Log:
// - 2026-06-20: Initial quickstart guide for Single-Host Server Consolidation
# Quickstart: Single-Host Server Consolidation
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
## Prerequisites
- New host with Ubuntu 22.04 LTS or Debian 12 installed
- Ryzen 5 5600 / 32GB RAM / RTX 5060 Ti 16GB
- Network access to VLAN 10 (192.168.10.x)
- ASUSTOR NAS accessible at 192.168.10.9 with CIFS share `np-dms-as`
- SSH access to QNAP (192.168.10.8) for data migration
- Gitea CI/CD access for deploy target update
## Step 1: Provision Host
```bash
# Run on new host (as root or sudo user)
cd /opt/lcbp3
bash specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/provision-host.sh
```
This script:
1. Installs Docker Engine + Docker Compose v2
2. Installs NVIDIA drivers + nvidia-container-toolkit
3. Creates CIFS mount for ASUSTOR at `/mnt/uploads`
4. Creates Docker volume directories
5. Verifies GPU access with `nvidia-smi`
## Step 2: Prepare .env
```bash
cd /opt/lcbp3/specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host
cp .env.template .env
# Edit .env with real values:
# - ASUSTOR_USER, ASUSTOR_PASS (CIFS credentials)
# - DB_PASSWORD, DB_ROOT_PASSWORD (from QNAP .env)
# - REDIS_PASSWORD (from QNAP .env)
# - JWT_SECRET, JWT_REFRESH_SECRET (from QNAP .env)
# - AUTH_SECRET (from QNAP .env)
# - ELASTICSEARCH_PASSWORD (from QNAP .env)
```
## Step 3: Migrate Data
```bash
# Migrate MariaDB (from QNAP to new host)
bash scripts/migrate-mariadb.sh
# Migrate Elasticsearch (from QNAP to new host)
bash scripts/migrate-elasticsearch.sh
# Verify parity
bash scripts/verify-data-parity.sh
```
## Step 4: Deploy Services
```bash
# Pull latest images from Gitea registry
docker compose --env-file .env -f docker-compose.new-host.yml pull
# Start all services
docker compose --env-file .env -f docker-compose.new-host.yml up -d
# Check health
docker compose -f docker-compose.new-host.yml ps
docker compose -f docker-compose.new-host.yml logs --tail=50
```
## Step 5: Smoke Test
```bash
# Run smoke tests
bash scripts/smoke-test.sh
```
Smoke tests verify:
- Backend health check (`GET http://localhost:3001/health`)
- Frontend accessible (`GET http://localhost:3000/`)
- Login flow (POST /api/auth/login)
- Document list (GET /api/correspondences)
- OCR endpoint (POST /api/ai/sandbox/ocr)
- AI inference (POST /api/ai/sandbox/extract)
- Full-text search (GET /api/search)
## Step 6: Update CI/CD
Update Gitea secrets:
- `HOST` → new host IP (e.g., `192.168.10.50`)
- `COMPOSE_FILE``specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
## Step 7: Cutover DNS
Update NPM (Nginx Proxy Manager) on QNAP:
- `lcbp3.np-dms.work` → new host IP
- `backend.np-dms.work` → new host IP
## Step 8: Remove X-API-Key (ADR-040 D5)
After verifying Docker-internal network isolation:
1. Remove `OCR_SIDECAR_API_KEY` from sidecar environment
2. Remove API key validation from `app.py`
3. Remove `X-API-Key` header from backend `ocr.service.ts`
4. Rebuild and redeploy sidecar + backend
## Step 9: Monitor (24-48 hours)
```bash
# Monitor RAM usage
docker stats --no-stream
# Monitor VRAM usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 60
# Monitor container health
watch -n 30 'docker compose -f docker-compose.new-host.yml ps'
```
## Step 10: Decommission Old Hosts
After 24-48 hours of stable operation:
```bash
# Stop QNAP services (retain data for backup)
ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose down'
ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose down'
# Power off Desk-5439
ssh user@192.168.10.100 'sudo shutdown -h now'
```
## Rollback (Emergency)
```bash
# Stop new host services
docker compose -f docker-compose.new-host.yml down
# Restore QNAP services
ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose up -d'
ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose up -d'
# Restore Desk-5439 services
ssh user@192.168.10.100 'cd /opt/ocr-sidecar && docker compose up -d'
# Revert DNS
# Update NPM to point back to QNAP (192.168.10.8)
# Revert CI/CD
# Update Gitea secrets HOST back to 192.168.10.8
```
@@ -0,0 +1,139 @@
// File: specs/100-Infrastructures/141-server-consolidation/research.md
// Change Log:
// - 2026-06-20: Initial research for Single-Host Server Consolidation
# Research: Single-Host Server Consolidation
**Branch**: `141-server-consolidation` | **Date**: 2026-06-20
## R1: Docker Network Isolation Strategy
**Decision**: Use two Docker bridge networks — `dms-internal` (all services) and `dms-frontend` (Frontend + Backend only, for LAN publish).
**Rationale**: Docker bridge networks provide L2 isolation. Services on `dms-internal` without `ports` mapping are unreachable from LAN. Only Frontend (3000) and Backend (3000) need LAN access. This replaces VLAN/firewall ACL reliance with Docker-native isolation.
**Alternatives Considered**:
- Single bridge network + iptables rules — more complex, error-prone
- Docker Swarm overlay network — overkill for single host
- Host network mode — no isolation, security risk
## R2: CIFS Mount Strategy for ASUSTOR
**Decision**: Use Docker named volume with CIFS driver to mount ASUSTOR share `//192.168.10.9/np-dms-as/data/uploads` as `asustor_uploads` volume, mounted at `/mnt/uploads` in sidecar and `/app/uploads` in backend.
**Rationale**: Docker CIFS volume driver handles mount lifecycle with container start/stop. Credentials in `.env` (gitignored). Both backend and sidecar see the same files via the same CIFS mount point.
**Alternatives Considered**:
- Host-level `mount -t cifs` then bind mount — requires host OS config, not portable
- SSHFS — slower than CIFS for file operations
- Sync files to local SSD — adds complexity, storage duplication
**Key Consideration**: Previous Desk-5439 setup had issues with Docker Desktop WSL2 + CIFS (see memory). On Linux host, CIFS volume driver works natively without WSL2 layer.
## R3: MariaDB Migration Strategy
**Decision**: Use `mariadb-dump` (logical dump) from QNAP MariaDB 11.8, pipe directly to new host MariaDB 11.8 container.
**Rationale**: Same MariaDB version (11.8) on both hosts → logical dump is safest. Database is small enough (<10GB estimated) that dump/restore completes within maintenance window.
**Alternatives Considered**:
- `mariabackup` (physical backup) — faster but requires same filesystem layout
- Replication (binlog) — overkill for one-time migration
- Copy raw data files — risky, requires same version + config
**Migration Command**:
```bash
# From QNAP (source) — dump all databases
mariadb-dump --single-transaction --routines --triggers \
-h 127.0.0.1 -u root -p"$DB_ROOT_PASSWORD" \
--all-databases > qnap-full-dump.sql
# On new host — restore
docker exec -i lcbp3-mariadb mariadb -u root -p"$DB_ROOT_PASSWORD" < qnap-full-dump.sql
```
## R4: Elasticsearch Migration Strategy
**Decision**: Use ES snapshot/restore API — create snapshot on QNAP ES, transfer to new host, restore.
**Rationale**: ES snapshot API is the official migration path. Handles index mappings, settings, and data. Works across same ES version (8.11.x).
**Alternatives Considered**:
- Copy raw data directory — risky, requires identical ES config
- Re-index from MariaDB — slow, loses search index tuning
- Logstash pipeline — overkill for one-time migration
**Migration Steps**:
1. Register shared filesystem repo on QNAP ES
2. Create snapshot of all indices
3. Copy snapshot files to new host ES data volume
4. Register repo on new host ES
5. Restore snapshot
## R5: GPU VRAM Management on Single Host
**Decision**: Rely on ADR-040 D3 (Adaptive OCR Residency via `calculate_ocr_residency()`) and ADR-040 D4 (CPU Fallback Retrieval). LLM-First GPU Ownership from CONTEXT.md.
**Rationale**: RTX 5060 Ti 16GB must serve:
- np-dms-ai (Typhoon-2.5 ~7-8B): ~6-8GB VRAM
- np-dms-ocr (Typhoon OCR): ~5GB VRAM
- nomic-embed-text: ~0.5GB VRAM
- CUDA overhead: ~1.5GB
- Total: ~13-15GB → tight but feasible with adaptive residency
**Key Policy**: When LLM (np-dms-ai) needs to load, OCR model is unloaded first (`keep_alive=0` for OCR). BGE-M3 + Reranker use CPU fallback when GPU is occupied.
**Alternatives Considered**:
- Force GPU-resident for all models — OOM risk (15.5GB > 16GB with overhead)
- CPU-only for all AI — too slow for production
- Second GPU — not available on new host
## R6: RAM Budget Allocation
**Decision**: Per-container memory limits in Docker Compose:
| Service | Memory Limit | Notes |
|---------|-------------|-------|
| MariaDB | 8G | Largest consumer, tune innodb_buffer_pool |
| Elasticsearch | 4G | ES_JAVA_OPTS=-Xms2g -Xmx2g |
| Backend (NestJS) | 2G | Node.js + BullMQ workers |
| Frontend (Next.js) | 1G | Standalone mode |
| Redis | 1G | In-memory + AOF |
| Qdrant | 1G | Vector DB |
| OCR Sidecar | 1G | Python + PyMuPDF |
| Ollama | 2G | Model loading + inference |
| ClamAV | 2G | Virus definitions |
| ollama-metrics | 256M | Lightweight proxy |
| **Total** | **~22.3G** | Leaves ~9.7G for OS + swap |
**Rationale**: 32GB total - 22.3GB containers = ~9.7GB for OS kernel + page cache + swap. Comfortable margin.
**Alternatives Considered**:
- No limits — risk of OOM killer affecting critical services
- Tighter limits — may cause ES/MariaDB instability
## R7: CI/CD Pipeline Update
**Decision**: Update Gitea Actions `ci-deploy.yml` to SSH-deploy to new host IP instead of QNAP IP. ASUSTOR Gitea runner stays unchanged.
**Rationale**: Gitea runner on ASUSTOR (192.168.10.9) can reach new host via VLAN 10. Only the deploy target IP changes. `deploy.sh` path to compose file updates to `New-Host/docker-compose.new-host.yml`.
**Alternatives Considered**:
- Move Gitea runner to new host — unnecessary, runner works remotely
- Manual deployment — not sustainable for ongoing releases
## R8: Rollback Strategy
**Decision**: Multi-step rollback plan documented in `rollback.sh`:
1. Stop services on new host (`docker compose down`)
2. Restore services on QNAP (start existing containers with old data)
3. Restore services on Desk-5439 (start Ollama + sidecar)
4. Revert DNS/NPM to point to QNAP
5. Revert Gitea CI/CD deploy target to QNAP
6. Re-enable X-API-Key in sidecar + backend
**Rationale**: QNAP retains all data (MariaDB, ES, Redis, files) until verified stable. Rollback is fast (<2 hours) because old infrastructure is intact.
**Alternatives Considered**:
- No rollback (accept SPOF) — too risky for production DMS
- Hot failover with replication — overkill for current scale
@@ -0,0 +1,160 @@
// File: specs/100-Infrastructures/141-server-consolidation/spec.md
// Change Log:
// - 2026-06-20: Initial specification for Single-Host Server Consolidation (ADR-041)
# Feature Specification: Single-Host Server Consolidation
**Feature Branch**: `141-server-consolidation`
**Created**: 2026-06-20
**Status**: Draft
**Category**: 100-Infrastructures
**Input**: ADR-041 — Consolidate all LCBP3-DMS services onto a single Docker host with ASUSTOR as primary NAS.
**Related ADRs**: [ADR-041](../../06-Decision-Records/ADR-041-server-consolidation.md), [ADR-040](../../06-Decision-Records/ADR-040-ocr-sidecar-refactor.md), [ADR-016](../../06-Decision-Records/ADR-016-security-authentication.md), [ADR-023A](../../06-Decision-Records/ADR-023A-unified-ai-architecture.md), [ADR-034](../../06-Decision-Records/ADR-034-AI-model-change.md)
## User Scenarios & Testing _(mandatory)_
### User Story 1 - Provision and Deploy on New Host (Priority: P1)
System administrator provisions the new single host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB), installs Docker, mounts CIFS share from ASUSTOR, and deploys all services (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, Elasticsearch) using a single Docker Compose stack with internal bridge network isolation.
**Why this priority**: Without a running host, no other work can proceed. This is the foundation for all subsequent stories.
**Independent Test**: Can be fully tested by running `docker compose up` on the new host and verifying all containers are healthy via `docker ps` and health check endpoints.
**Acceptance Scenarios**:
1. **Given** a fresh OS installation on the new host, **When** the administrator runs the provisioning script, **Then** Docker Engine and Docker Compose are installed and verified with `docker --version`
2. **Given** Docker is installed, **When** the administrator mounts the ASUSTOR CIFS share, **Then** `/mnt/uploads/temp` and `/mnt/uploads/permanent` are accessible and writable by containers
3. **Given** CIFS mounts are ready, **When** the administrator runs `docker compose up -d`, **Then** all 7 service containers start and report healthy within 5 minutes
4. **Given** all containers are running, **When** the administrator checks network isolation, **Then** Ollama and OCR Sidecar ports are NOT accessible from LAN (only Frontend port 3000 and Backend port 3000 are published)
---
### User Story 2 - Migrate Data from QNAP to New Host (Priority: P2)
Database administrator migrates MariaDB data and Elasticsearch indices from QNAP to the new host, ensuring zero data loss and minimal downtime.
**Why this priority**: Data migration is the critical path for cutover. Without migrated data, the new host cannot serve production traffic.
**Independent Test**: Can be tested by comparing row counts and index document counts between source (QNAP) and destination (new host) after migration.
**Acceptance Scenarios**:
1. **Given** the new host is running with empty MariaDB, **When** the administrator performs a database dump-and-restore from QNAP, **Then** all tables and row counts match the source exactly
2. **Given** the new host is running with empty Elasticsearch, **When** the administrator migrates indices from QNAP, **Then** all index document counts match the source exactly
3. **Given** data migration is complete, **When** the administrator runs a data integrity check script, **Then** all critical tables pass checksum verification with zero discrepancies
4. **Given** file storage is on ASUSTOR CIFS mount, **When** the administrator verifies file access from the backend container, **Then** all existing uploaded files are accessible at the expected paths
---
### User Story 3 - Cutover and Smoke Test (Priority: P3)
Operations team performs the cutover from the old 2-host architecture (QNAP + Desk-5439) to the new single host, updates DNS/network routing, and runs smoke tests to verify all system functions work end-to-end.
**Why this priority**: Cutover is the final step that makes the new host production-active. It depends on P1 and P2 being complete.
**Independent Test**: Can be tested by accessing the application via the new host's IP/hostname and performing core DMS operations (login, document upload, search, AI inference).
**Acceptance Scenarios**:
1. **Given** data migration is verified, **When** the administrator updates DNS to point to the new host, **Then** users accessing the application URL reach the new host within the DNS TTL period
2. **Given** DNS is updated, **When** a user logs in and creates a new Correspondence, **Then** the document is saved successfully and visible in the list
3. **Given** the system is live on the new host, **When** a user uploads a PDF and triggers OCR, **Then** OCR text extraction completes successfully via the internal Docker network (sidecar → Ollama)
4. **Given** the system is live, **When** a user performs a full-text search, **Then** Elasticsearch returns results with the same accuracy as before migration
5. **Given** the system is live, **When** a user triggers AI metadata extraction, **Then** the AI inference completes successfully via the internal Docker network (backend → Ollama)
---
### User Story 4 - Remove X-API-Key and Verify Network-Only Auth (Priority: P4)
Security administrator removes the `X-API-Key` header authentication from the OCR Sidecar and Backend, relying solely on Docker-internal network isolation as per ADR-040 D5.
**Why this priority**: This is a key security improvement enabled by the consolidation. It simplifies the architecture but must be validated carefully.
**Independent Test**: Can be tested by attempting to access sidecar endpoints from outside the Docker network (should fail) and from within the Docker network (should succeed without API key).
**Acceptance Scenarios**:
1. **Given** all services are on the Docker internal bridge, **When** the backend calls the sidecar without `X-API-Key`, **Then** the sidecar processes the request successfully
2. **Given** the sidecar is not publishing ports to LAN, **When** an external client attempts to reach the sidecar directly, **Then** the connection is refused
3. **Given** the `X-API-Key` code is removed, **When** the administrator reviews the sidecar and backend configuration, **Then** no hardcoded API keys remain in the codebase
---
### User Story 5 - Decommission Old Hosts (Priority: P5)
Operations team stops services on QNAP (which becomes backup server) and retires Desk-5439, completing the consolidation.
**Why this priority**: Cleanup is the final step after the new host is verified stable. It frees up old hardware and reduces management complexity.
**Independent Test**: Can be tested by verifying that QNAP services are stopped (except backup-related) and Desk-5439 is powered off or repurposed.
**Acceptance Scenarios**:
1. **Given** the new host has been stable for 24-48 hours, **When** the administrator stops backend/frontend/Redis/DB/ES services on QNAP, **Then** QNAP remains available as a backup server with data intact
2. **Given** QNAP services are stopped, **When** the administrator powers off Desk-5439, **Then** no LCBP3-DMS services are affected on the new host
3. **Given** old hosts are decommissioned, **When** the administrator verifies monitoring dashboards, **Then** only the new host is tracked as the active production host
---
### Edge Cases
- **GPU OOM during concurrent AI + OCR load**: What happens when np-dms-ai and np-dms-ocr are loaded simultaneously and VRAM exceeds 16GB? ADR-040 D3 (Adaptive OCR Residency) must unload OCR model to make room for LLM.
- **RAM exhaustion under heavy load**: What happens when MariaDB + Elasticsearch + CPU-fallback tensors consume more than 32GB? System must have swap space configured and memory limits per container.
- **CIFS mount failure**: What happens when ASUSTOR NAS is unreachable? File upload/download will fail; system must degrade gracefully with clear error messages.
- **Single host hardware failure**: What happens when the new host crashes? SPOF mitigation requires backup data on QNAP and a disaster recovery plan.
- **Network misconfiguration**: What happens if Docker bridge network is accidentally exposed? Sidecar and Ollama would be accessible from LAN, breaking the security model.
- **Database migration partial failure**: What happens if MariaDB migration fails midway? Rollback plan must restore QNAP as the active database host.
- **Elasticsearch index corruption during migration**: What happens if ES indices are corrupted during transfer? Re-indexing from MariaDB data must be available as a fallback.
## Requirements _(mandatory)_
### Functional Requirements
- **FR-001**: System MUST co-locate all 7 services (Ollama, OCR Sidecar, Backend, Frontend, Redis, MariaDB, Elasticsearch) on a single Docker host with a unified `docker-compose.yml`
- **FR-002**: System MUST use ASUSTOR (192.168.10.9) as the primary NAS for file storage via CIFS mount at `/mnt/uploads`
- **FR-003**: System MUST isolate Ollama and OCR Sidecar on a Docker internal bridge network (`dms-internal`) with no ports published to LAN
- **FR-004**: System MUST publish only Frontend (port 3000) and Backend (port 3000) to the LAN
- **FR-005**: System MUST enable backend-to-sidecar and backend-to-Ollama communication via Docker service names (`http://ocr-sidecar:8765`, `http://ollama:11434`)
- **FR-006**: System MUST migrate MariaDB data from QNAP to the new host with zero data loss
- **FR-007**: System MUST migrate Elasticsearch indices from QNAP to the new host with zero data loss
- **FR-008**: System MUST remove `X-API-Key` authentication from sidecar and backend after confirming Docker-internal network isolation (ADR-040 D5)
- **FR-009**: System MUST enforce GPU VRAM management via Adaptive OCR Residency (ADR-040 D3) and CPU Fallback Retrieval (ADR-040 D4)
- **FR-010**: System MUST configure per-container memory limits to prevent any single service from exhausting 32GB RAM
- **FR-011**: System MUST retain QNAP as a backup server with database and file storage data intact after cutover
- **FR-012**: System MUST retire Desk-5439 after cutover is verified stable for 24-48 hours
- **FR-013**: System MUST provide a rollback plan to restore services on QNAP and Desk-5439 if the new host fails
- **FR-014**: System MUST verify all core DMS functions (login, document CRUD, OCR, AI inference, search) work end-to-end on the new host before decommissioning old hosts
- **FR-015**: System MUST monitor RAM and VRAM usage for 24-48 hours post-cutover to detect resource pressure
### Key Entities _(include if feature involves data)_
- **Docker Compose Stack**: Single `docker-compose.yml` defining all 7 services, 2 networks (`dms-internal`, `dms-frontend`), and volumes (CIFS, named volumes for data)
- **CIFS Volume Mount**: ASUSTOR network share mounted as Docker volume for file storage (`/mnt/uploads/temp`, `/mnt/uploads/permanent`)
- **Docker Internal Network**: Bridge network (`dms-internal`) isolating Ollama, Sidecar, Backend, Redis, MariaDB, and Elasticsearch from LAN access
- **GPU Resource Allocation**: NVIDIA GPU passthrough to Ollama container with VRAM management via adaptive residency policies
## Success Criteria _(mandatory)_
### Measurable Outcomes
- **SC-001**: All 7 service containers start and report healthy within 5 minutes of `docker compose up -d` on the new host
- **SC-002**: Database migration completes with 100% row count parity between QNAP and new host for all critical tables
- **SC-003**: Elasticsearch migration completes with 100% document count parity between QNAP and new host for all indices
- **SC-004**: Core DMS operations (login, document upload, search, OCR, AI inference) complete successfully on the new host with zero functional regressions
- **SC-005**: Ollama and OCR Sidecar are unreachable from LAN (port scan returns closed/refused for ports 11434 and 8765)
- **SC-006**: Backend-to-Ollama latency is reduced by at least 50% compared to cross-host LAN communication (measured via AI inference response time)
- **SC-007**: RAM usage remains below 28GB (87.5% of 32GB) under normal operational load for 24 hours post-cutover
- **SC-008**: VRAM usage remains below 15GB (93.7% of 16GB) during concurrent AI inference and OCR workloads
- **SC-009**: Rollback plan can be executed within 2 hours to restore services on QNAP and Desk-5439 if needed
- **SC-010**: QNAP backup server retains a valid database snapshot within 24 hours of cutover
### Assumptions
- The new host hardware (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB) is physically available and OS-installed before provisioning begins
- ASUSTOR NAS (192.168.10.9) has sufficient storage capacity for all file uploads (temp + permanent)
- Network connectivity between the new host and ASUSTOR is via VLAN 10 with CIFS/SMB 3.0 support
- NVIDIA drivers and Docker GPU runtime (nvidia-container-toolkit) are compatible with the RTX 5060 Ti
- QNAP data (MariaDB, Elasticsearch) is in a consistent state suitable for dump-and-restore migration
- ADR-040 (OCR Sidecar Refactor) is implemented concurrently or prior to cutover for network-only auth and adaptive residency
- Gitea CI/CD pipeline can be updated to target the new host for deployment
@@ -0,0 +1,221 @@
// File: specs/100-Infrastructures/141-server-consolidation/tasks.md
// Change Log:
// - 2026-06-20: Initial task list for Single-Host Server Consolidation
// - 2026-06-20: Fix C1-C5 from analysis: backend env var update, port conflict, GPU residency, ollama-metrics port, n8n endpoints
# Tasks: Single-Host Server Consolidation
**Input**: Design documents from `/specs/100-Infrastructures/141-server-consolidation/`
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/
**Related ADRs**: ADR-041, ADR-040, ADR-016, ADR-023A, ADR-034
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3)
- Include exact file paths in descriptions
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Create directory structure and initial files for the new host deployment
- [ ] T001 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/` directory structure with subdirectories: `ocr-sidecar/`, `scripts/`
- [ ] T002 [P] Create `.env.template` at `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.template` with all required env vars from contracts
- [ ] T003 [P] Create `README.md` at `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/README.md` with deployment overview
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Provision the new host OS and create the unified Docker Compose stack — MUST be complete before any user story can proceed
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
- [ ] T004 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/provision-host.sh` — installs Docker Engine, Docker Compose v2, NVIDIA drivers, nvidia-container-toolkit, CIFS utils, creates directory structure
- [ ] T005 Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` — unified compose with all 10 services, 2 networks (dms-internal, dms-frontend), CIFS volume, named volumes, memory limits per data-model.md. Backend publishes `3001:3000` to LAN (NPM routes `backend.np-dms.work` → :3001); Frontend publishes `3000:3000`; ollama-metrics publishes `9924:9924` to LAN for Prometheus scraping from ASUSTOR
- [ ] T006 [P] Copy OCR sidecar code from `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/` to `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/` — adapt `OLLAMA_API_URL` to `http://ollama:11434` (Docker DNS), remove `ports` mapping, use `expose` only
- [ ] T007 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/Dockerfile` — verify GPU access via nvidia-container-toolkit, ensure poppler-utils installed
- [ ] T008 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/requirements.txt` — verify typhoon-ocr, PyMuPDF, httpx, fastapi versions match Desk-5439
- [ ] T008b Update backend environment variables for renamed service names: `REDIS_HOST=redis` (was `cache`), `ELASTICSEARCH_HOST=elasticsearch` (was `search`) in `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/.env.template` and `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` backend environment section — these service names changed from QNAP compose where Redis was `cache` and ES was `search`
**Checkpoint**: New host directory structure and unified compose file ready — user story implementation can now begin
---
## Phase 3: User Story 1 - Provision and Deploy on New Host (Priority: P1) 🎯 MVP
**Goal**: Administrator provisions the new host, mounts ASUSTOR CIFS, and deploys all services with Docker internal network isolation
**Independent Test**: Run `docker compose up -d` on the new host and verify all containers are healthy via `docker ps` and health check endpoints
### Implementation for User Story 1
- [ ] T009 [US1] Run `provision-host.sh` on new host — verify Docker, NVIDIA, CIFS mount at `/mnt/uploads`
- [ ] T010 [US1] Pull Ollama models on new host: `ollama pull np-dms-ai:latest`, `ollama pull np-dms-ocr:latest`, `ollama pull nomic-embed-text:latest` — verify with `ollama list`
- [ ] T011 [US1] Copy `.env.template` to `.env`, fill in all secrets from QNAP `.env` (DB passwords, JWT secrets, Redis password, ASUSTOR CIFS credentials)
- [ ] T012 [US1] Run `docker compose --env-file .env -f docker-compose.new-host.yml up -d` and verify all 10 containers start
- [ ] T013 [US1] Verify network isolation: `nmap -p 11434 <new-host-ip>` from another VLAN 10 machine should show closed/refused; `nmap -p 8765` should show closed/refused; `nmap -p 3000` (frontend) and `nmap -p 3001` (backend) should show open; `nmap -p 9924` (ollama-metrics) should show open for Prometheus
- [ ] T014 [US1] Verify health checks: `curl http://localhost:3001/health` (backend on published port 3001), `curl http://localhost:3000/` (frontend), `curl http://ocr-sidecar:8765/health` (from inside backend container via Docker DNS)
**Checkpoint**: All services running on new host with correct network isolation — MVP achieved
---
## Phase 4: User Story 2 - Migrate Data from QNAP to New Host (Priority: P2)
**Goal**: Migrate MariaDB and Elasticsearch data from QNAP to the new host with zero data loss
**Independent Test**: Compare row counts and index document counts between QNAP (source) and new host (destination) after migration
### Implementation for User Story 2
- [ ] T015 [P] [US2] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-mariadb.sh` — dump from QNAP MariaDB 11.8 via `mariadb-dump --single-transaction --routines --triggers`, pipe to new host container
- [ ] T016 [P] [US2] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/migrate-elasticsearch.sh` — create snapshot on QNAP ES, transfer files, register repo on new host, restore
- [ ] T017 [US2] Run `migrate-mariadb.sh` — verify all table row counts match between QNAP and new host
- [ ] T018 [US2] Run `migrate-elasticsearch.sh` — verify all index document counts match between QNAP and new host
- [ ] T019 [US2] Create and run `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/verify-data-parity.sh` — automated row count + document count comparison script
- [ ] T020 [US2] Verify CIFS file access: list files in `/app/uploads/temp` and `/app/uploads/permanent` from backend container, compare with ASUSTOR share
**Checkpoint**: All data migrated and verified — new host has complete production data
---
## Phase 5: User Story 3 - Cutover and Smoke Test (Priority: P3)
**Goal**: Perform production cutover from old 2-host architecture to new single host, verify all DMS functions work end-to-end
**Independent Test**: Access application via new host IP, perform core DMS operations (login, document upload, search, AI inference)
### Implementation for User Story 3
- [ ] T021 [P] [US3] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/smoke-test.sh` — automated tests for: backend health, frontend accessible, login flow, document list, OCR endpoint, AI inference, full-text search
- [ ] T022 [US3] Update Gitea secrets: `HOST` → new host IP, `COMPOSE_FILE``specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml`
- [ ] T023 [US3] Update `scripts/deploy.sh` — change `COMPOSE_FILE` path to New-Host directory
- [ ] T024 [US3] Update NPM (Nginx Proxy Manager) on QNAP: `lcbp3.np-dms.work` → new host IP:3000 (frontend), `backend.np-dms.work` → new host IP:3001 (backend)
- [ ] T024b [US3] Update n8n workflow endpoints on QNAP: change all backend API URLs from `http://192.168.10.8:3000/api` (QNAP) to `http://<new-host-ip>:3001/api` (new host) — n8n stays on QNAP but must reach backend on new host via LAN port 3001
- [ ] T025 [US3] Run `smoke-test.sh` on new host — verify all 7 smoke tests pass
- [ ] T026 [US3] Verify from external machine on VLAN 10: access `https://lcbp3.np-dms.work`, login, create a test Correspondence, upload a PDF, trigger OCR, perform search
**Checkpoint**: New host is production-active — all DMS functions verified end-to-end
---
## Phase 6: User Story 4 - Remove X-API-Key and Verify Network-Only Auth (Priority: P4)
**Goal**: Remove `X-API-Key` authentication from sidecar and backend, relying solely on Docker-internal network isolation per ADR-040 D5
**Independent Test**: Attempt to access sidecar from outside Docker network (should fail); verify backend calls sidecar without API key (should succeed)
### Implementation for User Story 4
- [ ] T027 [P] [US4] Remove `OCR_SIDECAR_API_KEY` from `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/docker-compose.new-host.yml` ocr-sidecar environment
- [ ] T028 [P] [US4] Remove API key validation from `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/ocr-sidecar/app.py` — remove `X-API-Key` header check middleware
- [ ] T029 [US4] Remove `X-API-Key` header from `backend/src/modules/ai/services/ocr.service.ts` — remove API key from HTTP client headers
- [ ] T030 [US4] Remove `OCR_SIDECAR_API_KEY` from `backend/.env.example` and any backend config that sets it
- [ ] T031 [US4] Rebuild and redeploy sidecar + backend containers — verify backend can call sidecar without API key
- [ ] T032 [US4] Verify external access blocked: `curl http://<new-host-ip>:8765/health` from VLAN 10 machine should fail (connection refused)
**Checkpoint**: Network-only auth verified — no API key needed, Docker isolation sufficient
---
## Phase 7: User Story 5 - Decommission Old Hosts (Priority: P5)
**Goal**: Stop services on QNAP (becomes backup) and retire Desk-5439, completing the consolidation
**Independent Test**: Verify QNAP services stopped (except backup), Desk-5439 powered off, new host unaffected
### Implementation for User Story 5
- [ ] T033 [P] [US5] Create `specs/04-Infrastructure-OPS/04-00-docker-compose/New-Host/scripts/rollback.sh` — emergency rollback: stop new host, restore QNAP + Desk-5439 services, revert DNS, revert CI/CD
- [ ] T034 [US5] Monitor new host for 24-48 hours: RAM usage (`docker stats`), VRAM usage (`nvidia-smi`), container health, application logs
- [ ] T034b [US5] Verify Adaptive OCR Residency (ADR-040 D3) on new RTX 5060 Ti: load `np-dms-ai` and `np-dms-ocr` concurrently, confirm `calculate_ocr_residency()` unloads OCR model when LLM needs VRAM; verify CPU Fallback Retrieval (ADR-040 D4) activates for BGE-M3/Reranker when GPU is occupied by LLM
- [ ] T035 [US5] Stop QNAP app services: `ssh admin@192.168.10.8 'cd /share/np-dms/app && docker compose down'`
- [ ] T036 [US5] Stop QNAP service stack: `ssh admin@192.168.10.8 'cd /share/np-dms/services && docker compose down'`
- [ ] T037 [US5] Retire Desk-5439: `ssh user@192.168.10.100 'sudo shutdown -h now'` (or repurpose)
- [ ] T038 [US5] Verify new host still fully operational after old hosts decommissioned — re-run `smoke-test.sh`
- [ ] T039 [US5] Take QNAP backup snapshot: `mariadb-dump` on QNAP MariaDB (if still running) or verify existing backup is current
**Checkpoint**: Consolidation complete — single host is sole production, old hosts decommissioned
---
## Phase 8: Polish & Cross-Cutting Concerns
**Purpose**: Documentation, monitoring, and final verification
- [ ] T040 [P] Update `specs/04-Infrastructure-OPS/04-00-docker-compose/README.md` — add New-Host section, mark QNAP as backup, mark Desk-5439 as retired
- [ ] T041 [P] Update `CONTEXT.md` — update infrastructure topology to reflect single-host architecture
- [ ] T042 [P] Update `AGENTS.md` — update infrastructure references (Desk-5439 → New Host, QNAP → backup)
- [ ] T043 Update `specs/04-Infrastructure-OPS/04-00-docker-compose/.env.template` — add ASUSTOR_USER, ASUSTOR_PASS, NEW_HOST_IP variables
- [ ] T044 [P] Update Prometheus/Grafana scrape config on ASUSTOR — update ollama-metrics target from `192.168.10.100:9924` to new host internal or host-published port
- [ ] T045 Run `quickstart.md` validation — follow all steps end-to-end on a fresh provision
- [ ] T046 [P] Document disaster recovery procedure — backup schedule, restore from QNAP backup, estimated RTO/RPO
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies — can start immediately
- **Foundational (Phase 2)**: Depends on Setup — BLOCKS all user stories
- **US1 (Phase 3)**: Depends on Foundational — requires physical access to new host
- **US2 (Phase 4)**: Depends on US1 (services must be running to receive migrated data)
- **US3 (Phase 5)**: Depends on US1 + US2 (services running + data migrated for cutover)
- **US4 (Phase 6)**: Depends on US3 (cutover complete, network isolation verified)
- **US5 (Phase 7)**: Depends on US3 + US4 (stable production before decommissioning)
- **Polish (Phase 8)**: Can start after US3; some tasks depend on US5
### User Story Dependencies
- **US1 (P1)**: Foundational → US1 — no dependencies on other stories
- **US2 (P2)**: US1 → US2 — needs running services to receive data
- **US3 (P3)**: US1 + US2 → US3 — needs running services + migrated data
- **US4 (P4)**: US3 → US4 — needs cutover complete to verify network isolation in production
- **US5 (P5)**: US3 + US4 → US5 — needs stable production before decommissioning
### Parallel Opportunities
- T002, T003 can run in parallel (different files)
- T006, T007, T008 can run in parallel (sidecar files, no dependencies)
- T015, T016 can run in parallel (different migration scripts)
- T027, T028 can run in parallel (different files: compose vs app.py)
- T040, T041, T042, T044 can run in parallel (different doc files)
- T027, T028, T030 can run in parallel (different files: compose, app.py, .env.example)
---
## Implementation Strategy
### MVP First (User Story 1 Only)
1. Complete Phase 1: Setup (create directory structure)
2. Complete Phase 2: Foundational (provision host + create compose)
3. Complete Phase 3: User Story 1 (deploy services)
4. **STOP and VALIDATE**: All containers healthy, network isolation verified
5. Demo to stakeholders if ready
### Incremental Delivery
1. Setup + Foundational → Infrastructure ready
2. Add US1 → Services deployed → Validate (MVP!)
3. Add US2 → Data migrated → Validate parity
4. Add US3 → Cutover complete → Validate end-to-end
5. Add US4 → Security hardened → Validate network-only auth
6. Add US5 → Old hosts retired → Validate stability
7. Polish → Documentation updated → Final validation
---
## Notes
- This is an infrastructure task — most work is shell scripts, Docker Compose YAML, and manual operations
- Physical access to the new host is required for US1
- Data migration (US2) requires SSH access to QNAP
- Cutover (US3) requires DNS/NPM access and coordination with users
- Decommission (US5) should only proceed after 24-48 hours of stable monitoring
- Rollback plan must be tested before cutover
- All env secrets must come from `.env` (gitignored) — never commit real secrets
+7
View File
@@ -36,4 +36,11 @@
| 2026-06-19 | v1.9.10 | Feature-240 AI Admin Console Collapsible Cards — เพิ่มปุ่มและฟังก์ชันพับ/คลี่การ์ดและเซกชัน พร้อมบันทึกสถานะลง localStorage และรักษา background query polling | ✅ Complete | | 2026-06-19 | v1.9.10 | Feature-240 AI Admin Console Collapsible Cards — เพิ่มปุ่มและฟังก์ชันพับ/คลี่การ์ดและเซกชัน พร้อมบันทึกสถานะลง localStorage และรักษา background query polling | ✅ Complete |
| 2026-06-19 | v1.9.10 | Deployment Timeout Fix — Added clamav health check before recreation (skip if healthy), increased CI timeout 20→30 min | ✅ Complete | | 2026-06-19 | v1.9.10 | Deployment Timeout Fix — Added clamav health check before recreation (skip if healthy), increased CI timeout 20→30 min | ✅ Complete |
| 2026-06-19 | v1.9.10 | AI Admin Response Normalization — recursive data unwrap for VRAM/prompt payloads, fixed Sandbox `.map()` crash and false OOM Guard | ✅ Complete | | 2026-06-19 | v1.9.10 | AI Admin Response Normalization — recursive data unwrap for VRAM/prompt payloads, fixed Sandbox `.map()` crash and false OOM Guard | ✅ Complete |
| 2026-06-19 | v1.9.2 | SQL Delta Consolidation — merged applied deltas into schema/seed files, updated data dictionary to v1.9.2, cleaned up deltas directory, moved INSERT statements from schema to seed file | ✅ Complete |
| 2026-06-20 | v1.9.10 | ADR-040 OCR Sidecar Refactor — Pure compute worker, async I/O, residency wiring, path hardening, network isolation (supersedes ADR-033 §7) | ✅ Proposed |
| 2026-06-20 | v1.9.10 | ADR-041 Server Consolidation — Single Docker host (Ryzen 5 5600/32GB/RTX 5060 Ti 16GB), ASUSTOR as Primary NAS, QNAP as backup | ✅ Proposed |
| 2026-06-20 | v1.9.10 | OCR Sidecar Refactor (Speckit-140) — spec.md, plan.md, tasks.md generated, 5 analysis issues fixed, ready for implementation | ✅ Ready for Implement |
| 2026-06-20 | v1.9.10 | OCR Sidecar Refactor Phase 6+8+9 — async I/O (lifespan + AsyncClient + asyncio.to_thread), ลบ /normalize endpoint, Dockerfile curl, docker-compose stale config cleanup, README.md, quickstart.md fix — 19/19 Python tests pass | ✅ Complete (Phase 7 blocked by ADR-041) |
| 2026-06-20 | v1.9.10 | OCR Backend Cleanup — typhoon-llm → np-dms-ai (processor+queue+module), tesseract → fast-path (enum+entity+controller+service+tests), P1-P3 fixes (keep_alive removal, hardcoded API key removal, env var alignment, Dockerfile 3.11, asyncio.to_thread VRAM calls) | ✅ Complete (pending tsc verify) |
| 2026-06-20 | v1.9.10 | OCR Naming Refactor — TyphoonOcr → NpDmsOcr (processor/queue/Redis key/aiModel), OcrTyphoonOptions → OcrNpDmsOptions, typhoonOptions → ocrOptions (backend 7 files + 3 tests), frontend typhoon state vars → ocr, isTyphoon → isAiPowered, Tesseract mocks → Fast Path, dead typhoon_ocr checks removed, page.tsx model name constants | ✅ Complete (pending tsc verify) |
@@ -0,0 +1,32 @@
# Session — 2026-06-19 (SQL Delta Consolidation)
## Summary
รวม SQL delta files ที่ apply แล้วเข้ากับ schema และ seed files หลัก, ลบ rollback files, อัปเดต data dictionary, และย้าย INSERT statements จาก schema file ไป seed file
## ปัญหาที่พบ (Root Cause)
ไม่มีปัญหา - เป็นงาน maintenance ปกติ
## การแก้ไข (Fix)
| ไฟล์ | การเปลี่ยนแปลง |
| -------------- | ---------------------- |
| `specs/03-Data-and-Storage/lcbp3-v1.9.0-schema-02-tables.sql` | อัปเดต `tags`, `correspondence_tags`, `system_settings`, `migration_review_queue`, `ai_audit_logs` tables; เพิ่ม `ai_available_models`, `ai_prompts`, `ai_execution_profiles`, `ai_sandbox_profiles`, `migration_errors` tables; ลบ INSERT statements |
| `specs/03-Data-and-Storage/lcbp3-v1.9.0-seed-basic.sql` | เพิ่ม AI seed data (ai_available_models, ai_execution_profiles, ai_sandbox_profiles); เพิ่ม system_settings INSERT statements |
| `specs/03-Data-and-Storage/03-01-data-dictionary.md` | อัปเดต version เป็น 1.9.2; อัปเดต `ai_audit_logs` definition; เพิ่ม entries สำหรับ `ai_available_models`, `ai_prompts`, `ai_execution_profiles`, `ai_sandbox_profiles`, `migration_errors` |
| `specs/03-Data-and-Storage/deltas/` | ลบ rollback files 15 ไฟล์และ .sql files 26 ไฟล์ทั้งหมด |
## กฎที่ Lock แล้ว
- **Schema Management**: ใช้ ADR-009 (no migrations) - แก้ SQL schema โดยตรง และใช้ delta files สำหรับ tracking
- **Seed Data Separation**: INSERT statements ต้องอยู่ใน seed files ไม่ใช่ schema files
- **Data Dictionary Sync**: เมื่อแก้ schema ต้องอัปเดต data dictionary พร้อม version bump
## Verification
- [x] Schema file ไม่มี INSERT statements
- [x] Seed file มี system_settings INSERT statements
- [x] AI seed data ถูกเพิ่มใน seed-basic.sql
- [x] Data dictionary version ถูก bump เป็น 1.9.2
- [x] Delta directory ถูก clean up (เหลือเฉพาะ README.md)
@@ -0,0 +1,79 @@
# Session 2026-06-20 — OCR Backend Cleanup (Legacy Alias Removal)
## Summary
ทำความสะอาด backend code ให้ใช้ canonical naming อย่างสม่ำเสมอ: เปลี่ยน `typhoon-llm``np-dms-ai`, ลบ `tesseract` references ทั้งหมด, และ apply recommended fixes (P1P3) จาก code review
## การเปลี่ยนแปลง (Fix)
### P1: Critical Fixes
| ไฟล์ | การเปลี่ยนแปลง |
| --- | --- |
| `backend/src/modules/ai/services/ocr.service.ts` | P1-1: ลบ `keep_alive` ออกจาก multipart form data (sidecar คำนวณ internally); P1-2: ลบ hardcoded API key default — throw ถ้า `OCR_SIDECAR_API_KEY` ไม่ set; เปลี่ยน `processWithTyphoon``processWithNpDmsOcr`, `processWithTesseract``processWithFastPath`; audit log model names เปลี่ยนเป็น `fast-path`/`pymupdf` และ `np-dms-ocr` |
| `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts` | P1-2: ลบ hardcoded API key default — throw ถ้า `OCR_SIDECAR_API_KEY` ไม่ set; ลบ `'tesseract'` ออกจาก `SandboxOcrEngineType`; อัปเดต routing condition และ fallback comments |
### P2: Important Fixes
| ไฟล์ | การเปลี่ยนแปลง |
| --- | --- |
| `backend/.env.example` | P2-1: `OCR_API_KEY``OCR_SIDECAR_API_KEY` (align กับ code); P2-2: OCR URL `192.168.10.8``192.168.10.100` (Desk-5439); ลบ `THAI_PREPROCESS_URL` (endpoint deleted in ADR-040 Phase 8); comment "PaddleOCR" → "np-dms-ocr" |
| `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/Dockerfile` | P2-5: `python:3.10-slim``python:3.11-slim` |
### P3: Medium Priority
| ไฟล์ | การเปลี่ยนแปลง |
| --- | --- |
| `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/app.py` | P3-1/P3-2: Wrap sync VRAM calls ใน `asyncio.to_thread()``calculate_ocr_residency()` ใน `process_ocr`, `get_vram_headroom()` ใน `/embed` และ `/rerank` |
| `specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/services/residency_policy.py` | อัปเดต comment "Typhoon OCR" → "np-dms-ocr" |
### typhoon-llm → np-dms-ai Rename
| ไฟล์ | การเปลี่ยนแปลง |
| --- | --- |
| `backend/src/modules/ai/processors/np-dms-ai.processor.ts` | **สร้างใหม่**`NpDmsAiProcessor`, `QUEUE_NP_DMS_AI = 'np-dms-ai'`, `NpDmsAiJobData`, Redis key `ai:np-dms-ai:llm:`, audit `aiModel: 'np-dms-ai'` |
| `backend/src/modules/ai/processors/typhoon-llm.processor.ts` | **ลบ** — แทนที่ด้วย `np-dms-ai.processor.ts` |
| `backend/src/modules/ai/ai.module.ts` | เปลี่ยน import จาก `typhoon-llm.processor``np-dms-ai.processor`; queue name `QUEUE_TYPHOON_LLM``QUEUE_NP_DMS_AI`; provider `TyphoonLlmProcessor``NpDmsAiProcessor` |
### Tesseract Cleanup
| ไฟล์ | การเปลี่ยนแปลง |
| --- | --- |
| `backend/src/modules/ai/entities/ocr-engine-configuration.entity.ts` | `TESSERACT``FAST_PATH`, `TYPHOON_OCR``NP_DMS_OCR` ใน enum |
| `backend/src/modules/ai/ai.controller.ts` | ลบ `'tesseract'` ออกจาก Swagger enum และ `validEngineTypes` |
| `backend/src/modules/ai/entities/ai-audit-log.entity.ts` | อัปเดต comment examples: `tesseract``fast-path`, `typhoon-ocr-3b``np-dms-ocr` |
### Test Files Updated
| ไฟล์ | การเปลี่ยนแปลง |
| --- | --- |
| `backend/src/modules/ai/services/sandbox-ocr-engine.service.spec.ts` | ลบ tesseract test block; อัปเดต `engineUsed` expectations จาก `'tesseract'``'fast-path'`; อัปเดต fallback test descriptions และ mock text |
### User Manual Changes (หลัง session)
| ไฟล์ | การเปลี่ยนแปลง |
| --- | --- |
| `backend/src/modules/ai/processors/typhoon-ocr.processor.ts` | **ลบโดย user** — renamed ไป `np-dms-ocr-processor.ts` |
| `backend/src/modules/ai/processors/np-dms-ocr-processor.ts` | **สร้างโดย user** — renamed file; class name ยังเป็น `TyphoonOcrProcessor` และ queue `QUEUE_TYPHOON_OCR` |
| `backend/src/modules/ai/ai.module.ts` | User อัปเดต import path เป็น `./processors/np-dms-ocr-processor` |
## กฎที่ Lock แล้ว
- **Canonical naming:** `np-dms-ai` (LLM processor/queue), `np-dms-ocr` (OCR engine), `fast-path` (PyMuPDF text layer) — ไม่ใช้ `typhoon-llm`, `tesseract`, หรือ `Typhoon OCR` ใน code ใหม่
- **API Key:** `OCR_SIDECAR_API_KEY` เป็น mandatory env var — ห้ามมี hardcoded default
- **keep_alive:** Backend ไม่ส่ง `keep_alive` ใน form data — sidecar คำนวณผ่าน `calculate_ocr_residency()` เท่านั้น
- **VRAM calls:** Sync VRAM/residency calls ใน async endpoints ต้อง wrap ใน `asyncio.to_thread()`
## Remaining Work (Next Session)
- [ ] **Rename `TyphoonOcrProcessor` → `NpDmsOcrProcessor`** ใน `np-dms-ocr-processor.ts` (class name + queue constant `QUEUE_TYPHOON_OCR``QUEUE_NP_DMS_OCR`)
- [ ] **อัปเดต `ai.module.ts`** import ให้ใช้ `NpDmsOcrProcessor` และ `QUEUE_NP_DMS_OCR`
- [ ] **อัปเดต `typhoon-ocr.processor.spec.ts`** ถ้ามี — rename และ update references
- [ ] **tsc --noEmit verification** หลัง rename ครบ
- [ ] **Backend build** เพื่อยืนยันไม่มี broken imports
## Verification
- [ ] `pnpm --filter backend exec tsc --noEmit` — ยังไม่ได้รัน (pending rename TyphoonOcrProcessor)
- [ ] Backend unit tests — ยังไม่ได้รัน
- [ ] Python tests — ยังไม่ได้รันหลัง asyncio.to_thread changes
@@ -0,0 +1,51 @@
# Session — 2026-06-20 (OCR Naming Refactor: typhoon → np-dms-ocr)
## Summary
ทำการ refactor naming conventions ที่เกี่ยวข้องกับ OCR engine จาก "typhoon" เป็น "np-dms-ocr" ทั้ง backend และ frontend อย่างครบถ้วน รวมถึงการ cleanup Tesseract references ที่เหลืออยู่ใน test files และ source code
## การเปลี่ยนแปลง (Changes)
### Backend (7 files + 3 test files)
| ไฟล์ | การเปลี่ยนแปลง |
| --- | --- |
| `sandbox-ocr-engine.service.ts` | `OcrTyphoonOptions``OcrNpDmsOptions`, `typhoonOptions``ocrOptions`, log messages |
| `np-dms-ocr-processor.ts` | Import `OcrNpDmsOptions`, field `typhoonOptions``ocrOptions` |
| `ai.controller.ts` | `typhoonOptions``ocrOptions` ใน `submitSandboxOcr` |
| `ai-batch.processor.ts` | Import + all `typhoonOptions``ocrOptions` (5 locations) |
| `ai-queue.service.ts` | Field `typhoonOptions``ocrOptions` ใน payload type + job data |
| `ocr.service.ts` | `OcrDetectionInput.typhoonOptions``ocrOptions`, override logic |
| `ai.module.ts` | Change log comment: `TyphoonOcrProcessor``NpDmsOcrProcessor` |
| `ai-batch.processor.spec.ts` | Test assertion: `typhoonOptions``ocrOptions` |
| `ocr.service.spec.ts` | Test input: `typhoonOptions``ocrOptions` |
| `sandbox-ocr-engine.service.spec.ts` | Test description: `typhoonOptions``ocrOptions` |
### Frontend (7 files)
| ไฟล์ | การเปลี่ยนแปลง |
| --- | --- |
| `admin-ai.service.ts` | Param `typhoonOptions``ocrOptions` ใน `submitSandboxOcr` |
| `OcrSandboxPromptManager.tsx` | State vars `typhoon*``ocr*`, UI label "Typhoon OCR Options" → "OCR Options", engineType mapping `tesseract``fast_path``auto`, ลบ dead `typhoon_ocr` check, fallback label "Tesseract" → "Fast Path (OCR)" |
| `OcrEngineSelector.tsx` | `isTyphoon``isAiPowered`, ลบ dead `typhoon_ocr` check |
| `SandboxTestArea.tsx` | UI labels "Typhoon OCR" → "np-dms-ocr" |
| `page.tsx` | ลบ `name.includes('typhoon')` 4 จุด, เปลี่ยน `name.includes('ocr')``name.includes(OCR_MODEL_NAME)` 4 จุด |
| `ocr-engine-selector.test.tsx` | Mock Tesseract → Fast Path (PyMuPDF), assertions อัปเดต |
| `OcrEngineSelector.test.tsx` | Mock Tesseract → Fast Path (PyMuPDF), assertions อัปเดต |
| `ocr-sandbox-prompt-manager.test.tsx` | Mock `engineType: 'typhoon_ocr'``'np_dms_ocr'` |
## กฎที่ Lock แล้ว
- **D28 (เดิม):** Canonical naming: `np-dms-ai` (LLM), `np-dms-ocr` (OCR), `fast-path` (PyMuPDF) — ครบทุก layer แล้ว
- **เพิ่มเติม:** Frontend model name normalization ใช้ constants `OCR_MODEL_NAME` และ `MAIN_MODEL_NAME` เท่านั้น — ห้าม hardcoded strings
- **เพิ่มเติม:** Backend `OcrEngineType` enum มีแค่ `FAST_PATH` และ `NP_DMS_OCR` — ไม่มี `TESSERACT` หรือ `TYPHOON_OCR` แล้ว
## Verification
- [ ] `pnpm --filter backend exec tsc --noEmit` — ยังไม่ได้รัน
- [ ] `pnpm --filter lcbp3-frontend exec tsc --noEmit` — ยังไม่ได้รัน
- [ ] Backend unit tests — ยังไม่ได้รัน
- [ ] Frontend unit tests — ยังไม่ได้รัน
- [x] `grep_search` สำหรับ `typhoon|Typhoon|TYPHOON` ใน frontend — เหลือเฉพาะ change log comments
- [x] `grep_search` สำหรับ `tesseract|Tesseract` ใน frontend — เหลือเฉพาะ change log comments
- [x] `grep_search` สำหรับ `typhoonOptions|OcrTyphoonOptions|QUEUE_TYPHOON_OCR|TyphoonOcrProcessor` ใน backend — ไม่พบ
@@ -0,0 +1,41 @@
<!-- File: specs/88-logs/session-2026-06-20-ocr-sidecar-refactor-adr.md -->
# Session — 2026-06-20 (OCR Sidecar Refactor & Server Consolidation ADRs)
## Summary
สร้าง ADR-040 (OCR Sidecar Refactor) และ ADR-041 (Server Consolidation) โดย reconcile 2 แผน refactor (Claude + Qwen) กับ canonical specs (AGENTS.md, CONTEXT.md, ADR-033/034/036) และอัปเดต CONTEXT.md flagged ambiguities
## ปัญหาที่พบ (Root Cause)
- แผน refactor ทั้งสอง (Claude + Qwen) มี conflicts กับ resolved policies:
- ลบ `vram_monitor.py` / `residency_policy.py` → ละเมิด Adaptive OCR Residency + CPU Fallback Retrieval
- Force BGE+Reranker GPU-resident → ละเมิด LLM-First GPU Ownership
- Fixed `keep_alive` → ละเมิด ADR-036 Gap-2 (keep_alive เป็น lazy resource param)
- Cross-host trust gap: sidecar อยู่บน Desk-5439, backend อยู่บน QNAP → "Docker internal isolation" เป็นเท็จ
## การแก้ไข (Fix)
| ไฟล์ | การเปลี่ยนแปลง |
| ----- | ----------------- |
| `specs/06-Decision-Records/ADR-040-ocr-sidecar-refactor.md` | สร้าง ADR ใหม่สำหรับ OCR sidecar refactor — preserve GPU policies, async I/O, path hardening, network isolation (supersedes ADR-033 §7) |
| `specs/06-Decision-Records/ADR-041-server-consolidation.md` | สร้าง ADR ใหม่สำหรับ server consolidation — single Docker host, ASUSTOR as Primary NAS, QNAP as backup |
| `CONTEXT.md` | เพิ่ม 2 resolved ambiguities: OCR Sidecar X-API-Key (network isolation only), Cross-host trust gap (server consolidation) |
## กฎที่ Lock แล้ว
| ID | Decision | ADR |
| -- | -------- | --- |
| D21 | OCR Sidecar = Pure Compute Worker — orchestration/params อยู่ใน backend existing services | ADR-040 D1 |
| D22 | Wire `calculate_ocr_residency()` ใน `process_ocr` — keep_alive เป็น lazy resource param (ADR-036 Gap-2) | ADR-040 D3 |
| D23 | Retain vram_monitor + CPU-fallback for `/embed`,`/rerank` — ห้าม force GPU-resident | ADR-040 D4 |
| D24 | Remove X-API-Key — auth = network isolation (supersedes ADR-033 §7) | ADR-040 D5 |
| D25 | Server Consolidation — co-locate ทุก services บน single Docker host (Ryzen 5 5600 / 32GB / RTX 5060 Ti 16GB) | ADR-041 D1 |
| D26 | ASUSTOR (192.168.10.9) = Primary NAS, QNAP = Backup server | ADR-041 D2 |
| D27 | Docker-internal network only for sidecar/Ollama — enables ADR-040 D5 network-only auth | ADR-041 D3 |
## Verification
- [ ] ADR-040 และ ADR-041 ถูก review และ approve
- [ ] Implementation tasks ใน ADR-040/041 ถูก execute
- [ ] Server consolidation cutover สำเร็จ
- [ ] X-API-Key removal สำเร็จหลัง consolidation cutover
@@ -0,0 +1,67 @@
# Session — 2026-06-20 (OCR Sidecar Refactor Phase 6-9 Implementation)
## Summary
Implemented Phases 6, 8, and 9 of the OCR Sidecar Refactor (Speckit-140) following ADR-040. Phase 6 refactored the sidecar to async I/O with lifespan context manager. Phase 8 removed the unused `/normalize` endpoint. Phase 9 polished Dockerfile, docker-compose.yml, created README.md, and validated quickstart.md. All 19 Python tests pass.
## ปัญหาที่พบ (Root Cause)
ไม่มี bug ใน session นี้ — เป็นการ implement feature ใหม่ตาม ADR-040:
- **Sync I/O bottleneck**: `process_ocr` ใช้ `httpx.Client` แบบ sync ทำให้ block FastAPI event loop
- **Stale startup pattern**: `@app.on_event("startup")` deprecate แล้วใน FastAPI 0.111+
- **Unused /normalize endpoint**: ไม่มี consumers ใน backend codebase
- **Stale Docker config**: `OCR_LANG`, `USE_GPU` เป็น Tesseract config ที่ไม่ใช้แล้ว
- **Missing curl**: Dockerfile ไม่มี `curl` ทำให้ HEALTHCHECK ล้มเหลว
## การแก้ไข (Fix)
### Phase 6: Async I/O Performance (T041-T046)
| ไฟล์ | การเปลี่ยนแปลง |
|------|----------------|
| `specs/04-Infrastructure-OPS/.../ocr-sidecar/app.py` | `process_ocr``async def`, `_process_pdf_doc``async def`, `/ocr` + `/ocr-upload``async def` |
| `app.py` | เพิ่ม `ollama_client` global (`httpx.AsyncClient`) สร้างใน lifespan context manager |
| `app.py` | แทน `@app.on_event("startup")` ด้วย `@asynccontextmanager lifespan` |
| `app.py` | Model loading ผ่าน `asyncio.to_thread(_load_bge_models)` |
| `app.py` | แทน `httpx.Client` ด้วย `await client.post()` (AsyncClient) |
| `tests/integration/ocr-sidecar/test_async_performance.py` | **New file**: 6 tests (coroutine check, lifespan, ollama_client global, /normalize removed, concurrent requests) |
| `tests/unit/ocr-sidecar/test_residency_wiring.py` | Updated: `FakeClient``FakeAsyncClient`, sync → `asyncio.run()` |
| `tests/integration/ocr-sidecar/test_parameter_governance.py` | Updated: async mock, patch `ollama_client` |
| `tests/integration/ocr-sidecar/test_active_prompt.py` | Updated: async mock, patch `ollama_client` |
### Phase 8: Remove /normalize (T054-T055)
| ไฟล์ | การเปลี่ยนแปลง |
|------|----------------|
| `app.py` | ลบ `NormalizeRequest`, `NormalizeResponse`, `/normalize` endpoint, `pythainlp` imports |
| `requirements.txt` | ลบ `pythainlp==5.0.4` และ `Pillow==10.0.0` |
| (grep verified) | ไม่มี `/normalize` หรือ `THAI_PREPROCESS_URL` consumers ใน backend |
### Phase 9: Polish (T056-T063)
| ไฟล์ | การเปลี่ยนแปลง |
|------|----------------|
| `Dockerfile` | เพิ่ม `curl` สำหรับ HEALTHCHECK, change log entry |
| `docker-compose.yml` | ลบ `OCR_LANG`, `USE_GPU`; เพิ่ม `OCR_SIDECAR_API_KEY`, `OCR_ACTIVE_PROFILE` |
| `README.md` | **New file**: architecture, endpoints, env vars, deploy guide, test coverage |
| `quickstart.md` | แก้ stale requirements (ลบ pythainlp/Pillow), แก้ `TYPHOON_OCR_MODEL``OCR_MODEL` |
| `tasks.md` | Mark T041-T046, T054-T063 as `[x]` |
## กฎที่ Lock แล้ว
- **Async I/O pattern**: `process_ocr` ต้องเป็น `async def` และใช้ `httpx.AsyncClient` ผ่าน `ollama_client` global (สร้างใน lifespan)
- **Lifespan over startup event**: ใช้ `@asynccontextmanager lifespan` แทน `@app.on_event("startup")` — deprecate แล้วใน FastAPI 0.111+
- **Model loading non-blocking**: ใช้ `asyncio.to_thread()` สำหรับ model loading ใน lifespan เพื่อไม่ block startup
- **No /normalize**: endpoint ถูกลบแล้ว — ไม่มี consumers ใน backend
- **Test mock pattern**: ใช้ `FakeAsyncClient` (async `post()` + `aclose()`) แทน `FakeClient` (sync) สำหรับทุก test ที่ mock Ollama API
## Verification
- [x] `python -m pytest tests/ -v`**19/19 tests passed** in 4.81s
- [x] `test_process_ocr_is_coroutine_function` — process_ocr เป็น async ✅
- [x] `test_process_pdf_doc_is_coroutine_function` — _process_pdf_doc เป็น async ✅
- [x] `test_app_uses_lifespan_not_startup_event` — ใช้ lifespan ไม่ใช่ on_event ✅
- [x] `test_app_has_async_client_global` — ollama_client global มีอยู่ ✅
- [x] `test_normalize_endpoint_removed` — /normalize ถูกลบแล้ว ✅
- [x] `test_concurrent_ocr_requests_dont_block` — 3 concurrent requests สำเร็จ ✅
- [x] Existing tests (path traversal, API key, residency, CPU fallback, parameter governance, active prompt) — ทั้งหมดผ่าน ✅
@@ -0,0 +1,59 @@
# Session — 2026-06-20 (OCR Sidecar Refactor Speckit Workflow)
## Summary
Executed complete `/speckit.prepare` workflow for OCR Sidecar Refactor (ADR-040). Generated spec.md, plan.md, research.md, data-model.md, contracts/sidecar-api.md, quickstart.md, and tasks.md. Performed consistency analysis and fixed all identified issues (1 CRITICAL, 2 MEDIUM, 2 LOW).
## ปัญหาที่พบ (Root Cause)
None. This was a planning/specification workflow, not a bug fix session.
## การแก้ไข (Fix)
N/A - Specification generation workflow.
## สิ่งที่ทำใน Session
| ไฟล์ | การเปลี่ยนแปลง |
| ----- | ------------------ |
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/spec.md` | Created feature specification with 5 user stories, 20 functional requirements, 8 success criteria |
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/plan.md` | Created implementation plan with technical context, constitution check, and phase structure |
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/research.md` | Created technical decisions documentation from ADR-040 |
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/data-model.md` | Created data contracts and entity relationships |
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/contracts/sidecar-api.md` | Created sidecar API specification |
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/quickstart.md` | Created deployment and testing guide |
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/tasks.md` | Created 63 implementation tasks organized by user story |
| `specs/100-Infrastructures/140-ocr-sidecar-refactor/checklists/requirements.md` | Created specification quality validation checklist |
## Analysis & Fixes
| Issue | Severity | Fix |
| ----- | -------- | --- |
| C1 - Constitution Check ADR-019 | CRITICAL | Updated plan.md to acknowledge ADR-019 applies to backend services (parameter resolution in OcrService/SandboxOcrEngineService) |
| U1 - Symlink resolution edge case | MEDIUM | Updated spec.md edge case to reference test T007 |
| U2 - Ollama unavailability edge case | MEDIUM | Updated spec.md edge case to note handled by FastAPI exception handling per ADR-007 |
| I1 - IP address inconsistency | LOW | Standardized IP to 192.168.10.100 in spec.md and plan.md |
| I2 - Task description clarity | LOW | Changed tasks T022/T023 from "Verify" to "Retain" |
## กฎที่ Lock แล้ว
- OCR sidecar is a pure compute worker (no DB/storage access per ADR-023/023A)
- Backend services handle all parameter governance (ai_execution_profiles, ai_prompts)
- Adaptive OCR Residency must be preserved (vram_monitor.py, residency_policy.py retained)
- CPU fallback for BGE-M3/FlagReranker must be preserved
- Phase 2 (X-API-Key removal) is BLOCKED until ADR-041 consolidation completes
## Verification
- [x] All 5 user stories have acceptance criteria
- [x] All 20 functional requirements have task coverage (100%)
- [x] Constitution check passes with proper ADR-019 acknowledgment
- [x] No ambiguities or duplications found
- [x] All 5 analysis issues fixed
- [x] Ready for `/speckit-implement`
## Next Steps
- Execute `/speckit-implement` to begin implementation
- Start with MVP (User Stories 1-2: Security Hardening + GPU Resource Management)
- User Story 5 (Network Isolation Auth Phase 2) remains BLOCKED until ADR-041 consolidation
@@ -0,0 +1,96 @@
# File: tests/integration/ocr-sidecar/test_active_prompt.py
# Change Log:
# - 2026-06-20: Initial creation for US3 active prompt integration tests.
import sys
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import patch
from fastapi.testclient import TestClient
UNIT_DIR = Path(__file__).resolve().parents[2] / "unit" / "ocr-sidecar"
if str(UNIT_DIR) not in sys.path:
sys.path.insert(0, str(UNIT_DIR))
from test_path_traversal import FakeDocument, load_app
class FakeAsyncResponse:
def raise_for_status(self) -> None:
return None
def json(self) -> dict:
return {"choices": [{"message": {"content": "{\"natural_text\": \"prompt result\"}"}}]}
class FakeAsyncClient:
last_payload = None
def __init__(self, *args, **kwargs) -> None:
pass
async def post(self, url: str, json: dict, headers: dict) -> FakeAsyncResponse:
FakeAsyncClient.last_payload = json
return FakeAsyncResponse()
async def aclose(self) -> None:
pass
def test_ocr_injects_system_prompt_and_dms_tags(tmp_path: Path) -> None:
upload_base = tmp_path / "uploads"
upload_base.mkdir()
pdf_path = upload_base / "document.pdf"
pdf_path.write_bytes(b"%PDF-1.4\n")
app_module = load_app(upload_base)
client = TestClient(app_module.app)
decision = SimpleNamespace(keep_alive_seconds=120, reason="headroom-sufficient", vram_headroom_mb=9000.0)
fake_client = FakeAsyncClient()
FakeAsyncClient.last_payload = None
# Prepare dummy message structure
initial_messages = [{"role": "user", "content": [{"type": "text", "text": "OCR Page content"}]}]
with patch.object(app_module, "calculate_ocr_residency", return_value=decision), \
patch.object(app_module, "prepare_ocr_messages", return_value=initial_messages), \
patch.object(app_module.fitz, "open", return_value=FakeDocument()), \
patch.object(app_module, "ollama_client", fake_client):
response = client.post(
"/ocr",
json={
"pdfPath": str(pdf_path),
"engine": "np-dms-ocr",
"system_prompt": "Custom system instruction",
"dms_tags": {
"document_number": "true",
"document_date": "true"
},
"runtime_params": {
"temperature": 0.1,
"top_p": 0.5,
"repeat_penalty": 1.0,
"max_tokens": 4096
}
},
headers={"X-API-Key": "test-key"}
)
assert response.status_code == 200
# Verify the message content in last payload sent to Ollama
sent_messages = FakeAsyncClient.last_payload["messages"]
# We expect system_prompt to be appended to messages[0]["content"]
content_list = sent_messages[0]["content"]
# Verify system prompt exists
system_prompt_found = any(c.get("type") == "text" and c.get("text") == "Custom system instruction" for c in content_list)
assert system_prompt_found, "System prompt was not injected into message content"
# Verify DMS tags instruction exists
dms_tags_instruction = any(c.get("type") == "text" and "<document_number>" in c.get("text") and "<document_date>" in c.get("text") for c in content_list)
assert dms_tags_instruction, "DMS tags instructions were not injected correctly"
@@ -0,0 +1,129 @@
# File: tests/integration/ocr-sidecar/test_async_performance.py
# Change Log:
# - 2026-06-20: Added ADR-040 US4 async I/O performance tests for process_ocr and lifespan.
import asyncio
import inspect
import sys
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
UNIT_DIR = Path(__file__).resolve().parents[2] / "unit" / "ocr-sidecar"
if str(UNIT_DIR) not in sys.path:
sys.path.insert(0, str(UNIT_DIR))
from test_path_traversal import load_app
class FakeAsyncResponse:
"""จำลอง httpx.AsyncClient response"""
def raise_for_status(self) -> None:
return None
def json(self) -> dict:
return {"choices": [{"message": {"content": '{"natural_text": "ok"}'}}]}
class FakeAsyncClient:
"""จำลอง httpx.AsyncClient สำหรับ async process_ocr"""
def __init__(self, *args, **kwargs) -> None:
self.payload = None
FakeAsyncClient.last_payload = None
async def post(self, url: str, json: dict, headers: dict) -> FakeAsyncResponse:
self.payload = json
FakeAsyncClient.last_payload = json
return FakeAsyncResponse()
async def aclose(self) -> None:
pass
FakeAsyncClient.last_payload = None
def test_process_ocr_is_coroutine_function(tmp_path: Path) -> None:
"""T042: process_ocr ต้องเป็น async def (coroutine function)"""
app_module = load_app(tmp_path)
assert inspect.iscoroutinefunction(app_module.process_ocr), (
"process_ocr must be async def per ADR-040 US4"
)
def test_process_pdf_doc_is_coroutine_function(tmp_path: Path) -> None:
"""T042: _process_pdf_doc ต้องเป็น async def เพราะเรียก process_ocr"""
app_module = load_app(tmp_path)
assert inspect.iscoroutinefunction(app_module._process_pdf_doc), (
"_process_pdf_doc must be async def per ADR-040 US4"
)
def test_app_uses_lifespan_not_startup_event(tmp_path: Path) -> None:
"""T045: app ต้องใช้ lifespan context manager ไม่ใช่ @app.on_event('startup')"""
app_module = load_app(tmp_path)
app_obj = app_module.app
# FastAPI เก็บ lifespan ใน app.router.lifespan_context
assert hasattr(app_obj.router, "lifespan_context"), (
"App must use lifespan parameter, not @app.on_event('startup')"
)
# ตรวจสอบว่าไม่มี startup event handlers แบบเดิม
startup_handlers = app_obj.router.on_startup
assert len(startup_handlers) == 0, (
"App must not register @app.on_event('startup') handlers"
)
def test_app_has_async_client_global(tmp_path: Path) -> None:
"""T043: app module ต้องมี ollama_client global สำหรับ AsyncClient"""
app_module = load_app(tmp_path)
assert hasattr(app_module, "ollama_client"), (
"app module must have ollama_client global for shared AsyncClient"
)
def test_normalize_endpoint_removed(tmp_path: Path) -> None:
"""T054: /normalize endpoint ต้องถูกลบออกแล้ว"""
app_module = load_app(tmp_path)
routes = [r.path for r in app_module.app.routes]
assert "/normalize" not in routes, (
"/normalize endpoint must be removed per ADR-040 D2"
)
def test_concurrent_ocr_requests_dont_block(tmp_path: Path) -> None:
"""T041: concurrent OCR requests ต้องไม่ block กัน (async I/O)"""
app_module = load_app(tmp_path)
decision = SimpleNamespace(
keep_alive_seconds=60,
reason="headroom-sufficient",
vram_headroom_mb=9000.0,
)
fake_client = FakeAsyncClient()
async def run_concurrent() -> list[str]:
"""รัน process_ocr 3 ครั้งพร้อมกัน วัดว่าไม่ block"""
with (
patch.object(app_module, "calculate_ocr_residency", return_value=decision),
patch.object(app_module, "prepare_ocr_messages", return_value=[{"content": []}]),
patch.object(app_module, "ollama_client", fake_client),
):
tasks = [
app_module.process_ocr("/tmp/test.pdf", page_num=i + 1)
for i in range(3)
]
results = await asyncio.gather(*tasks)
return results
results = asyncio.run(run_concurrent())
assert len(results) == 3
assert all(r == "ok" for r in results)
# ทุก request ต้องส่ง payload ได้สำเร็จ
assert FakeAsyncClient.last_payload is not None
assert FakeAsyncClient.last_payload["keep_alive"] == 60
@@ -0,0 +1,49 @@
# File: tests/integration/ocr-sidecar/test_cpu_fallback.py
# Change Log:
# - 2026-06-20: Added ADR-040 CPU fallback integration coverage for retrieval endpoints.
from pathlib import Path
from unittest.mock import MagicMock, patch
from fastapi.testclient import TestClient
import sys
UNIT_DIR = Path(__file__).resolve().parents[2] / "unit" / "ocr-sidecar"
if str(UNIT_DIR) not in sys.path:
sys.path.insert(0, str(UNIT_DIR))
from test_path_traversal import load_app
def test_embed_uses_cpu_when_vram_headroom_is_low(tmp_path: Path) -> None:
app_module = load_app(tmp_path)
client = TestClient(app_module.app)
bge_model = MagicMock()
bge_model.encode.return_value = {
"dense_vecs": [[0.1, 0.2]],
"lexical_weights": [{"101": 0.5}],
}
headroom = MagicMock(total_mb=16384.0, used_mb=15000.0, available_mb=1000.0, query_success=True)
with patch.object(app_module, "bge_model", bge_model), patch.object(app_module, "get_vram_headroom", return_value=headroom):
response = client.post("/embed", json={"text": "hello"}, headers={"X-API-Key": "test-key"})
assert response.status_code == 200
assert response.json()["device"] == "cpu"
bge_model.model.to.assert_called_with("cpu")
def test_rerank_uses_cpu_when_vram_headroom_is_low(tmp_path: Path) -> None:
app_module = load_app(tmp_path)
client = TestClient(app_module.app)
reranker = MagicMock()
reranker.compute_score.return_value = [0.9]
headroom = MagicMock(total_mb=16384.0, used_mb=15000.0, available_mb=1000.0, query_success=True)
with patch.object(app_module, "reranker", reranker), patch.object(app_module, "get_vram_headroom", return_value=headroom):
response = client.post(
"/rerank",
json={"query": "q", "chunks": ["chunk"]},
headers={"X-API-Key": "test-key"},
)
assert response.status_code == 200
assert response.json()["device"] == "cpu"
reranker.model.to.assert_called_with("cpu")
@@ -0,0 +1,81 @@
# File: tests/integration/ocr-sidecar/test_parameter_governance.py
# Change Log:
# - 2026-06-20: Initial creation for US3 parameter governance integration tests.
import sys
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import patch
from fastapi.testclient import TestClient
UNIT_DIR = Path(__file__).resolve().parents[2] / "unit" / "ocr-sidecar"
if str(UNIT_DIR) not in sys.path:
sys.path.insert(0, str(UNIT_DIR))
from test_path_traversal import FakeDocument, load_app
class FakeAsyncResponse:
def raise_for_status(self) -> None:
return None
def json(self) -> dict:
return {"choices": [{"message": {"content": "{\"natural_text\": \"governed result\"}"}}]}
class FakeAsyncClient:
last_payload = None
def __init__(self, *args, **kwargs) -> None:
pass
async def post(self, url: str, json: dict, headers: dict) -> FakeAsyncResponse:
FakeAsyncClient.last_payload = json
return FakeAsyncResponse()
async def aclose(self) -> None:
pass
def test_ocr_uses_governed_runtime_parameters(tmp_path: Path) -> None:
upload_base = tmp_path / "uploads"
upload_base.mkdir()
pdf_path = upload_base / "document.pdf"
pdf_path.write_bytes(b"%PDF-1.4\n")
app_module = load_app(upload_base)
client = TestClient(app_module.app)
decision = SimpleNamespace(keep_alive_seconds=120, reason="headroom-sufficient", vram_headroom_mb=9000.0)
fake_client = FakeAsyncClient()
FakeAsyncClient.last_payload = None
with patch.object(app_module, "calculate_ocr_residency", return_value=decision), \
patch.object(app_module, "prepare_ocr_messages", return_value=[{"content": []}]), \
patch.object(app_module.fitz, "open", return_value=FakeDocument()), \
patch.object(app_module, "ollama_client", fake_client):
response = client.post(
"/ocr",
json={
"pdfPath": str(pdf_path),
"engine": "np-dms-ocr",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
},
headers={"X-API-Key": "test-key"}
)
assert response.status_code == 200
assert response.json()["text"] == "governed result"
# Check that parameters were passed to Ollama payload
assert FakeAsyncClient.last_payload["temperature"] == 0.7
assert FakeAsyncClient.last_payload["top_p"] == 0.9
assert FakeAsyncClient.last_payload["repetition_penalty"] == 1.1
assert FakeAsyncClient.last_payload["max_tokens"] == 4096
@@ -0,0 +1,42 @@
# File: tests/unit/ocr-sidecar/test_api_key_validation.py
# Change Log:
# - 2026-06-20: Added ADR-040 API key startup and request validation tests.
import importlib
import os
import sys
from pathlib import Path
import pytest
from fastapi.testclient import TestClient
from test_path_traversal import SIDECAR_DIR, install_import_stubs, load_app
def test_sidecar_fails_fast_when_api_key_missing(monkeypatch: pytest.MonkeyPatch, tmp_path: Path) -> None:
install_import_stubs()
monkeypatch.delenv("OCR_SIDECAR_API_KEY", raising=False)
monkeypatch.setenv("OCR_SIDECAR_UPLOAD_BASE", str(tmp_path))
if str(SIDECAR_DIR) not in sys.path:
sys.path.insert(0, str(SIDECAR_DIR))
sys.modules.pop("app", None)
with pytest.raises(RuntimeError, match="OCR_SIDECAR_API_KEY is required"):
importlib.import_module("app")
def test_sidecar_rejects_invalid_api_key(tmp_path: Path) -> None:
app_module = load_app(tmp_path)
client = TestClient(app_module.app)
response = client.post(
"/embed",
json={"text": "hello"},
headers={"X-API-Key": "wrong-key"},
)
assert response.status_code == 401
def test_sidecar_rejects_missing_api_key(tmp_path: Path) -> None:
app_module = load_app(tmp_path)
client = TestClient(app_module.app)
response = client.post("/embed", json={"text": "hello"})
assert response.status_code == 401
@@ -0,0 +1,114 @@
# File: tests/unit/ocr-sidecar/test_path_traversal.py
# Change Log:
# - 2026-06-20: Added ADR-040 path traversal tests for OCR sidecar.
import importlib
import os
import sys
import types
from pathlib import Path
from unittest.mock import patch
from fastapi.testclient import TestClient
SIDECAR_DIR = Path(__file__).resolve().parents[3] / "specs" / "04-Infrastructure-OPS" / "04-00-docker-compose" / "Desk-5439" / "ocr-sidecar"
def install_import_stubs() -> None:
"""ติดตั้ง stub สำหรับ dependency หนักเพื่อให้ unit test import app ได้เร็ว"""
fitz_module = types.ModuleType("fitz")
fitz_module.Document = object
fitz_module.open = lambda *args, **kwargs: None
sys.modules["fitz"] = fitz_module
typhoon_module = types.ModuleType("typhoon_ocr")
typhoon_module.prepare_ocr_messages = lambda *args, **kwargs: [{"content": []}]
sys.modules["typhoon_ocr"] = typhoon_module
flag_module = types.ModuleType("FlagEmbedding")
flag_module.BGEM3FlagModel = lambda *args, **kwargs: None
flag_module.FlagReranker = lambda *args, **kwargs: None
sys.modules["FlagEmbedding"] = flag_module
pil_module = types.ModuleType("PIL")
pil_image_module = types.ModuleType("PIL.Image")
pil_module.Image = pil_image_module
sys.modules["PIL"] = pil_module
sys.modules["PIL.Image"] = pil_image_module
pythainlp_module = types.ModuleType("pythainlp")
tokenize_module = types.ModuleType("pythainlp.tokenize")
tokenize_module.word_tokenize = lambda text, **kwargs: text.split()
util_module = types.ModuleType("pythainlp.util")
util_module.normalize = lambda text: text
sys.modules["pythainlp"] = pythainlp_module
sys.modules["pythainlp.tokenize"] = tokenize_module
sys.modules["pythainlp.util"] = util_module
def load_app(upload_base: Path):
install_import_stubs()
os.environ["OCR_SIDECAR_API_KEY"] = "test-key"
os.environ["OCR_SIDECAR_UPLOAD_BASE"] = str(upload_base)
if str(SIDECAR_DIR) not in sys.path:
sys.path.insert(0, str(SIDECAR_DIR))
sys.modules.pop("app", None)
return importlib.import_module("app")
class FakePage:
def get_text(self) -> str:
return "A" * 120
class FakeDocument:
name = "fake.pdf"
def __len__(self) -> int:
return 1
def __getitem__(self, index: int) -> FakePage:
return FakePage()
def test_ocr_rejects_parent_traversal_outside_upload_base(tmp_path: Path) -> None:
upload_base = tmp_path / "uploads"
upload_base.mkdir()
app_module = load_app(upload_base)
client = TestClient(app_module.app)
outside_path = upload_base / ".." / "outside.pdf"
response = client.post(
"/ocr",
json={"pdfPath": str(outside_path)},
headers={"X-API-Key": "test-key"},
)
assert response.status_code == 403
def test_ocr_rejects_prefix_sibling_path(tmp_path: Path) -> None:
upload_base = tmp_path / "uploads"
sibling = tmp_path / "uploads_evil"
upload_base.mkdir()
sibling.mkdir()
app_module = load_app(upload_base)
client = TestClient(app_module.app)
response = client.post(
"/ocr",
json={"pdfPath": str(sibling / "document.pdf")},
headers={"X-API-Key": "test-key"},
)
assert response.status_code == 403
def test_ocr_accepts_canonical_path_inside_upload_base(tmp_path: Path) -> None:
upload_base = tmp_path / "uploads"
upload_base.mkdir()
pdf_path = upload_base / "document.pdf"
pdf_path.write_bytes(b"%PDF-1.4\n")
app_module = load_app(upload_base)
client = TestClient(app_module.app)
with patch.object(app_module.fitz, "open", return_value=FakeDocument()):
response = client.post(
"/ocr",
json={"pdfPath": str(pdf_path)},
headers={"X-API-Key": "test-key"},
)
assert response.status_code == 200
assert response.json()["engineUsed"] == "fast-path"
@@ -0,0 +1,81 @@
# File: tests/unit/ocr-sidecar/test_residency_wiring.py
# Change Log:
# - 2026-06-20: Added ADR-040 residency wiring tests for process_ocr.
# - 2026-06-20: Updated for async process_ocr (Phase 6 — async I/O refactor).
import asyncio
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import MagicMock, patch
import pytest
from test_path_traversal import load_app
class FakeAsyncResponse:
"""จำลอง httpx.AsyncClient response สำหรับ async process_ocr"""
def raise_for_status(self) -> None:
return None
def json(self) -> dict:
return {"choices": [{"message": {"content": "{\"natural_text\": \"ok\"}"}}]}
class FakeAsyncClient:
"""จำลอง httpx.AsyncClient สำหรับ async process_ocr"""
def __init__(self, *args, **kwargs) -> None:
self.payload = None
FakeAsyncClient.last_payload = None
async def post(self, url: str, json: dict, headers: dict) -> FakeAsyncResponse:
self.payload = json
FakeAsyncClient.last_payload = json
return FakeAsyncResponse()
async def aclose(self) -> None:
pass
FakeAsyncClient.last_payload = None
def test_process_ocr_uses_calculated_residency_keep_alive(tmp_path: Path) -> None:
"""T019: process_ocr ต้องเรียก calculate_ocr_residency และใช้ค่า keep_alive ที่คำนวณได้"""
app_module = load_app(tmp_path)
decision = SimpleNamespace(keep_alive_seconds=120, reason="headroom-sufficient", vram_headroom_mb=9000.0)
fake_client = FakeAsyncClient()
with patch.object(app_module, "calculate_ocr_residency", return_value=decision) as calculate, \
patch.object(app_module, "prepare_ocr_messages", return_value=[{"content": []}]), \
patch.object(app_module, "ollama_client", fake_client):
result = asyncio.run(app_module.process_ocr("/tmp/test.pdf", page_num=1))
assert result == "ok"
calculate.assert_called_once_with(app_module.OCR_ACTIVE_PROFILE)
assert FakeAsyncClient.last_payload["keep_alive"] == 120
def test_process_ocr_rejects_backend_keep_alive_override(tmp_path: Path) -> None:
"""T021: process_ocr ต้องปฏิเสธ keep_alive จาก backend"""
app_module = load_app(tmp_path)
async def run_test():
with pytest.raises(ValueError, match="keep_alive must be calculated"):
await app_module.process_ocr("/tmp/test.pdf", options_override={"keep_alive": 0})
asyncio.run(run_test())
def test_ocr_endpoint_rejects_keep_alive_override(tmp_path: Path) -> None:
"""T021: /ocr endpoint ต้องปฏิเสธ keep_alive ใน request body"""
app_module = load_app(tmp_path)
from fastapi.testclient import TestClient
client = TestClient(app_module.app)
response = client.post(
"/ocr",
json={"pdfPath": str(tmp_path / "document.pdf"), "keep_alive": 0},
headers={"X-API-Key": "test-key"},
)
assert response.status_code == 400