Files
lcbp3/specs/300-others/301-unified-ai-arch/research.md
T
admin 0240d80da5
CI / CD Pipeline / build (push) Successful in 6m1s
CI / CD Pipeline / deploy (push) Failing after 6m42s
690514:2019 204-rfa-approval-refactor #01
2026-05-14 20:19:21 +07:00

28 lines
2.2 KiB
Markdown

# Technical Research: Unified AI Architecture
**Feature**: Unified AI Architecture (ADR-023)
**Date**: 2026-05-14
## Unknown 1: Integration Auth for n8n AI Pipeline
**Decision**: Create a dedicated `ServiceAccount` API token mechanism for n8n to communicate with the DMS Backend API.
**Rationale**: n8n runs on the isolated AI Host (Desk-5439) and requires programmatic access to upload legacy documents and update staging queue statuses. Standard user JWTs expire too quickly, so a long-lived, restricted-scope service account token is required.
**Alternatives considered**:
- Using a standard Admin user account (rejected due to token expiry and audit trail mingling).
- Unauthenticated internal webhook (rejected due to ADR-016 Zero Trust policy).
## Unknown 2: Qdrant Multi-tenant Payload Filters
**Decision**: Enforce `project_public_id` natively at the NestJS `QdrantService` layer for every search query.
**Rationale**: Ensures that RAG queries absolutely cannot leak data across projects, satisfying SC-004. The backend will automatically inject the user's currently active project ID into the Qdrant filter condition before executing the search.
**Alternatives considered**:
- Having the LLM filter the context (rejected due to high risk of hallucination and leakage).
## Unknown 3: BullMQ RAG Queue Configuration
**Decision**: Implement a dedicated `rag-query-queue` in BullMQ with a concurrency limit of `1`.
**Rationale**: The local LLM (gemma4:9b) on Desk-5439 has limited VRAM (8GB). Processing more than one RAG query at a time will cause Out-Of-Memory (OOM) crashes. Queuing guarantees stability.
**Alternatives considered**:
- Load balancing across multiple GPUs (rejected: hardware constraints, only one RTX 2060 Super available).
## Unknown 4: UI/UX for Graceful AI Fallback
**Decision**: Use React Context (`AiStatusProvider`) in Next.js to globally distribute the AI Host health status. If offline, AI-specific form fields (like auto-suggest chips) and the RAG Chat widget will conditionally render a disabled state or hide entirely.
**Rationale**: Provides a seamless graceful degradation experience without requiring individual components to implement repetitive health-check logic.