trigger
ADR-023/023A AI Integration Architecture
CRITICAL RULES
- ALWAYS follow ADR-023 AI boundary policy (isolation on Admin Desktop)
- ALWAYS use ADR-023A 2-model stack (gemma4:e2b + nomic-embed-text)
- ALWAYS use BullMQ 2-queue (ai-realtime + ai-batch) for GPU overload prevention
- NEVER allow AI direct database/storage access
- ALWAYS implement human-in-the-loop validation
- NEVER send sensitive data to cloud AI services
- ALWAYS enforce Qdrant projectPublicId filter (compile-time enforcement)
- NEVER allow n8n to call Ollama/Qdrant directly (must go through DMS API → BullMQ)
AI Integration Patterns
Architecture Overview
Key Components
| Component |
Location |
Purpose |
| AI Gateway |
Backend (NestJS) |
API endpoints, validation, audit logging |
| BullMQ Queues |
Backend (NestJS) |
ai-realtime (RAG/Suggest), ai-batch (OCR/Extract/Embed) |
| Ollama Engine |
Admin Desktop (Desk-5439) |
gemma4:e2b (LLM) + nomic-embed-text (Embedding) |
| OCR Engine |
Admin Desktop (Desk-5439) |
PaddleOCR + PyThaiNLP (Thai/English text extraction) |
| Orchestrator |
QNAP NAS (n8n) |
Migration Phase orchestrator only (calls DMS API, never Ollama directly) |
Backend Implementation (NestJS)
Frontend Pattern (Next.js)
Security Requirements
- AI Isolation: All AI processing on Admin Desktop only (Desk-5439)
- Data Privacy: No cloud AI services, on-premises only
- Audit Trail: Log all AI interactions and human validations to ai_audit_logs
- Rate Limiting: Prevent AI abuse and resource exhaustion
- Validation: All AI outputs must be validated before use
- Multi-tenant Isolation: Qdrant queries MUST include projectPublicId filter (compile-time enforcement)
- n8n Boundary: n8n MUST call DMS API → BullMQ, NEVER Ollama/Qdrant directly
- GPU Overload Prevention: BullMQ 2-queue (ai-realtime + ai-batch) with concurrency=1
ADR-023A Specific Rules
- 2-Model Stack: gemma4:e2b + nomic-embed-text
- PDF 3-Page Limit: Classification/Tagging uses first 3 pages only (NOT RAG embedding)
- RAG Embedding: Full document chunked at 512 tokens/64 tokens overlap
- OCR Auto-Detect: PyMuPDF chars > 100 → Fast path, else PaddleOCR
- Embed Auto-Trigger: AUTO after commit (parallel), gap covered by DB search
- Threshold Recalibration: After 100-500 docs, based on ai_audit_logs analysis
Required Implementation
Related Documents
specs/06-Decision-Records/ADR-023-unified-ai-architecture.md (Base architecture)
specs/06-Decision-Records/ADR-023A-unified-ai-architecture.md (Model revision - current)
specs/06-Decision-Records/ADR-024-intent-classification-strategy.md (Pattern→LLM Fallback)
specs/06-Decision-Records/ADR-025-ai-tool-layer-architecture.md (Tool Registry)
specs/06-Decision-Records/ADR-026-document-chat-ui-pattern.md (Chat UI)
specs/06-Decision-Records/ADR-027-ai-admin-console-and-dynamic-control.md (Admin Console)
specs/06-Decision-Records/ADR-028-migration-architecture-refactor.md (Migration Pipeline)