690517:1449 204 and 302 refactor #03

2026-05-17 14:49:45 +07:00
parent 544bb30277
commit 50bffdf38a
53 changed files with 4026 additions and 617 deletions
@@ -1,30 +1,34 @@
-
-# ADR-020 AI Integration Architecture
+# ADR-023/023A AI Integration Architecture

 ## CRITICAL RULES

- **ALWAYS** follow ADR-018 AI boundary policy (isolation on Admin Desktop)
- **ALWAYS** use RFA-First approach for AI implementation
+- **ALWAYS** follow ADR-023 AI boundary policy (isolation on Admin Desktop)
+- **ALWAYS** use ADR-023A 2-model stack (gemma4:e4b Q8_0 + nomic-embed-text)
+- **ALWAYS** use BullMQ 2-queue (ai-realtime + ai-batch) for GPU overload prevention
 - **NEVER** allow AI direct database/storage access
 - **ALWAYS** implement human-in-the-loop validation
 - **NEVER** send sensitive data to cloud AI services
+- **ALWAYS** enforce Qdrant projectPublicId filter (compile-time enforcement)
+- **NEVER** allow n8n to call Ollama/Qdrant directly (must go through DMS API → BullMQ)

 ## AI Integration Patterns

 ### Architecture Overview

 ```
-Frontend → AI Gateway API → Admin Desktop (Ollama) → Backend Validation
+Frontend → AI Gateway API → BullMQ → Admin Desktop (Ollama) → Backend Validation
+n8n (Migration) → DMS API → BullMQ → Admin Desktop (Ollama) → Backend Validation
 ```

 ### Key Components

-| Component | Location | Purpose |
-|-----------|----------|---------|
-| **AI Gateway** | Backend (NestJS) | API endpoints, validation, audit logging |
-| **Ollama Engine** | Admin Desktop (Desk-5439) | LLM inference (Gemma 4) |
-| **OCR Engine** | Admin Desktop (Desk-5439) | Thai/English text extraction |
-| **Orchestrator** | QNAP NAS (n8n) | Workflow management |
+| Component         | Location                  | Purpose                                                                  |
+| ----------------- | ------------------------- | ------------------------------------------------------------------------ |
+| **AI Gateway**    | Backend (NestJS)          | API endpoints, validation, audit logging                                 |
+| **BullMQ Queues** | Backend (NestJS)          | ai-realtime (RAG/Suggest), ai-batch (OCR/Extract/Embed)                  |
+| **Ollama Engine** | Admin Desktop (Desk-5439) | gemma4:e4b Q8_0 (LLM) + nomic-embed-text (Embedding)                     |
+| **OCR Engine**    | Admin Desktop (Desk-5439) | PaddleOCR + PyThaiNLP (Thai/English text extraction)                     |
+| **Orchestrator**  | QNAP NAS (n8n)            | Migration Phase orchestrator only (calls DMS API, never Ollama directly) |

 ## Backend Implementation (NestJS)

@@ -32,24 +36,50 @@ Frontend → AI Gateway API → Admin Desktop (Ollama) → Backend Validation
 // AI Module with boundary enforcement
@Module({
  controllers: [AiController],
-  providers: [AiService, AiGateway],
+  providers: [AiService, AiGateway, QdrantService],
  exports: [AiService],
 })
 export class AiModule {
  constructor() {
-    // Enforce ADR-018 boundaries
+    // Enforce ADR-023 boundaries
  }
 }

+// QdrantService with compile-time projectPublicId enforcement
+@Injectable()
+export class QdrantService {
+  async search(
+    projectPublicId: string,   // required — compile-time enforcement
+    vector: number[],
+    topK: number = 5,
+  ): Promise<QdrantSearchResult[]> {
+    return this.client.search('documents', {
+      vector,
+      limit: topK,
+      filter: {
+        must: [{ key: 'project_public_id', match: { value: projectPublicId } }],
+      },
+    });
+  }
+
+  async upsert(
+    projectPublicId: string,   // required
+    chunks: DocumentChunk[],
+  ): Promise<void> { ... }
+
+  // ❌ NEVER expose rawSearch() or method without projectPublicId filter
+}
+
 // AI Service with validation
@Injectable()
 export class AiService {
  async extractMetadata(documentId: string): Promise<AIMetadata> {
    // 1. Validate permissions
-    // 2. Send to Admin Desktop AI
-    // 3. Validate AI response
-    // 4. Log audit trail
-    // 5. Return validated results
+    // 2. Queue job to BullMQ (ai-batch or ai-realtime)
+    // 3. Worker sends to Admin Desktop AI (gemma4:e4b Q8_0)
+    // 4. Validate AI response
+    // 5. Log audit trail to ai_audit_logs
+    // 6. Return validated results
  }
 }
 ```
@@ -74,24 +104,37 @@ const DocumentReviewForm = ({ document, aiSuggestions }) => {

 ## Security Requirements

- **AI Isolation:** All AI processing on Admin Desktop only
+- **AI Isolation:** All AI processing on Admin Desktop only (Desk-5439)
 - **Data Privacy:** No cloud AI services, on-premises only
- **Audit Trail:** Log all AI interactions and human validations
+- **Audit Trail:** Log all AI interactions and human validations to ai_audit_logs
 - **Rate Limiting:** Prevent AI abuse and resource exhaustion
 - **Validation:** All AI outputs must be validated before use
+- **Multi-tenant Isolation:** Qdrant queries MUST include projectPublicId filter (compile-time enforcement)
+- **n8n Boundary:** n8n MUST call DMS API → BullMQ, NEVER Ollama/Qdrant directly
+- **GPU Overload Prevention:** BullMQ 2-queue (ai-realtime + ai-batch) with concurrency=1
+
+## ADR-023A Specific Rules
+
+- **2-Model Stack:** gemma4:e4b Q8_0 (~4.0GB) + nomic-embed-text (~0.3GB) = ~4.3GB VRAM peak
+- **PDF 3-Page Limit:** Classification/Tagging uses first 3 pages only (NOT RAG embedding)
+- **RAG Embedding:** Full document chunked at 512 tokens/64 tokens overlap
+- **OCR Auto-Detect:** PyMuPDF chars > 100 → Fast path, else PaddleOCR
+- **Embed Auto-Trigger:** AUTO after commit (parallel), gap covered by DB search
+- **Threshold Recalibration:** After 100-500 docs, based on ai_audit_logs analysis

 ## Required Implementation

- [ ] AiModule with ADR-018 boundary enforcement
+- [ ] AiModule with ADR-023 boundary enforcement
 - [ ] AI Gateway API endpoints with validation
+- [ ] BullMQ 2-queue setup (ai-realtime + ai-batch)
+- [ ] QdrantService with projectPublicId enforcement
 - [ ] DocumentReviewForm reusable component
- [ ] Admin Desktop Ollama + PaddleOCR setup
- [ ] n8n workflow orchestration
- [ ] AI audit logging and monitoring
+- [ ] Admin Desktop Ollama (gemma4:e4b Q8_0 + nomic-embed-text) + PaddleOCR setup
+- [ ] n8n workflow orchestration (Migration Phase only)
+- [ ] AI audit logging and monitoring (ai_audit_logs)
 - [ ] Human-in-the-loop validation workflows

 ## Related Documents

- `specs/06-Decision-Records/ADR-018-ai-boundary.md`
- `specs/06-Decision-Records/ADR-020-ai-intelligence-integration.md`
- `specs/06-Decision-Records/ADR-017-ollama-data-migration.md`
+- `specs/06-Decision-Records/ADR-023-unified-ai-architecture.md` (Base architecture)
+- `specs/06-Decision-Records/ADR-023A-unified-ai-architecture.md` (Model revision - current)