690514:2019 204-rfa-approval-refactor #01
CI / CD Pipeline / build (push) Successful in 6m1s
CI / CD Pipeline / deploy (push) Failing after 6m42s

This commit is contained in:
2026-05-14 20:19:21 +07:00
parent 07cc6d47b1
commit 0240d80da5
183 changed files with 20050 additions and 1017 deletions
@@ -0,0 +1,34 @@
# Specification Quality Checklist: Unified AI Architecture
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-05-14
**Feature**: [spec.md](./spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Notes
- All checklist items pass based on initial generation.
@@ -0,0 +1,71 @@
openapi: 3.0.3
info:
title: LCBP3-DMS AI API
version: 1.0.0
paths:
/api/ai/legacy-migration/ingest:
post:
summary: Upload legacy documents to the AI Pipeline
security:
- BearerAuth: []
requestBody:
content:
multipart/form-data:
schema:
type: object
properties:
files:
type: array
items:
type: string
format: binary
responses:
'202':
description: Accepted and queued for processing
/api/ai/legacy-migration/queue:
get:
summary: List documents in the staging queue
responses:
'200':
description: Returns a list of migration review records
/api/ai/legacy-migration/queue/{publicId}/approve:
post:
summary: Approve a document and import to DB
parameters:
- in: path
name: publicId
required: true
schema:
type: string
format: uuid
responses:
'200':
description: Document successfully imported
/api/ai/rag/query:
post:
summary: Submit a conversational query to the local LLM
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
projectPublicId:
type: string
format: uuid
query:
type: string
responses:
'202':
description: Query queued via BullMQ, returns a Job ID
/api/ai/audit-logs:
delete:
summary: Hard delete AI audit logs
responses:
'204':
description: Logs deleted successfully (Requires SYSTEM_ADMIN)
@@ -0,0 +1,34 @@
# Data Model: Unified AI Architecture
## Entity: `migration_review_queue`
Stores legacy document processing results waiting for human-in-the-loop validation.
**Table Structure**:
- `id` (INT, PK, Auto Increment) - Internal ID
- `public_id` (BINARY(16), UUIDv7, Unique, Indexed) - Exposed to API (ADR-019)
- `batch_id` (VARCHAR) - Groups documents from a single run
- `original_file_name` (VARCHAR) - Name of the uploaded PDF
- `extracted_metadata` (JSON) - The AI-extracted fields (title, date, categories, etc.)
- `confidence_score` (DECIMAL) - Overall AI confidence score (0.0 to 1.0)
- `status` (ENUM) - `PENDING`, `IMPORTED`, `REJECTED`
- `error_reason` (TEXT, Nullable) - Populated if status is `REJECTED`
- `version` (INT) - OptLocking via `@VersionColumn`
- `created_at` (TIMESTAMP)
- `updated_at` (TIMESTAMP)
## Entity: `ai_audit_logs`
Development feedback logs for recording AI vs Human decisions.
**Table Structure**:
- `id` (INT, PK, Auto Increment) - Internal ID
- `public_id` (BINARY(16), UUIDv7, Unique, Indexed) - Exposed to API
- `document_public_id` (BINARY(16), Nullable) - References the document if successfully imported
- `model_name` (VARCHAR) - LLM Model used (e.g., "gemma4:9b")
- `ai_suggestion_json` (JSON) - What the AI suggested
- `human_override_json` (JSON) - What the human actually saved
- `confidence_score` (DECIMAL)
- `confirmed_by_user_id` (INT) - FK to users table
- `created_at` (TIMESTAMP)
## Entity: `Project` (Existing)
*Added Context*: The `publicId` of this table is injected natively into Qdrant payload filters as `project_public_id` for multi-tenant isolation.
@@ -0,0 +1,72 @@
# Implementation Plan: Unified AI Architecture
**Branch**: `301-unified-ai-arch` | **Date**: 2026-05-14 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `specs/300-others/301-unified-ai-arch/spec.md`
## Summary
Implement a Master AI Architecture enforcing strict physical isolation of AI workloads on a dedicated Admin Desktop (Desk-5439). The system supports secure legacy document migration via a staging queue, context-aware conversational RAG queries, and detailed AI audit logging, all orchestrated through robust backend queues (BullMQ) and multi-tenant security filters.
## Technical Context
**Language/Version**: TypeScript (Node.js v24.15.0) for Backend/Frontend
**Primary Dependencies**: NestJS 11, Next.js 16, BullMQ, Qdrant Node Client, n8n
**Storage**: MariaDB 11.8 (Relational), Qdrant (Vector), Redis (Queue/Cache)
**Testing**: Jest (Backend), Vitest (Frontend), E2E with Playwright
**Target Platform**: QNAP Container Station (Production), Desk-5439 (AI Host)
**Project Type**: Monorepo Web Application (Backend + Frontend)
**Performance Goals**: RAG Response < 10s (p95), Migration throughput 1000 pages/hour
**Constraints**: AI host has limited VRAM (8GB), necessitating concurrency limit of 1 for LLM generation.
**Scale/Scope**: 20,000+ legacy documents, project-wide deployment.
## Constitution Check
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
- **ADR-019 UUID**: `publicId` used exclusively. No INT primary keys exposed.
- **ADR-009 Database**: Schema changes via raw SQL deltas.
- **ADR-016 Security**: CASL RBAC strictly enforced (`@UseGuards(CaslAbilityGuard)`). Idempotency-Key headers required.
- **ADR-008 BullMQ**: Heavy AI orchestration and RAG queuing managed via BullMQ.
- **ADR-018/023 AI Boundary**: AI host connects via DMS API. No direct database access.
- **TypeScript Strict**: Explicit types, no `any`, proper error handling via `BusinessException`.
## Project Structure
### Documentation (this feature)
```text
specs/300-others/301-unified-ai-arch/
├── spec.md
├── plan.md # This file
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
└── tasks.md # (To be created)
```
### Source Code (repository root)
```text
backend/
├── src/
│ ├── ai/
│ │ ├── ai.module.ts
│ │ ├── ai.controller.ts
│ │ ├── ai.service.ts
│ │ ├── qdrant.service.ts
│ │ ├── rag.processor.ts
│ │ └── dto/
│ └── database/
│ └── sql/
frontend/
├── src/
│ ├── app/(dashboard)/ai-staging/
│ ├── components/ai/
│ │ ├── AiStatusBanner.tsx
│ │ └── RagChatWidget.tsx
│ └── lib/api/ai.ts
```
**Structure Decision**: Integrated into the existing Next.js / NestJS monorepo architecture, utilizing a dedicated `AiModule` in the backend to centralize all external AI API calls and queue management.
@@ -0,0 +1,21 @@
# Quickstart: Unified AI Architecture
## 1. Setup the AI Host (Desk-5439)
1. Install Ollama and pull `gemma4:9b` and `nomic-embed-text`.
2. Start the Qdrant container with persistent storage.
3. Start n8n and configure the API key to connect to the DMS backend.
## 2. Environment Variables (Backend)
Add the following to your `.env`:
```bash
AI_HOST_URL=http://<desk-5439-ip>
AI_QDRANT_URL=http://<desk-5439-ip>:6333
AI_N8N_WEBHOOK_URL=http://<desk-5439-ip>:5678
AI_N8N_SERVICE_TOKEN=your-secure-token
```
## 3. Usage Flow (RAG)
1. User submits a query via the Next.js `RagChatWidget`.
2. Backend validates JWT and creates a BullMQ job on `rag-query-queue`.
3. Worker retrieves the job, injects the `projectPublicId` filter into Qdrant.
4. Worker fetches context, queries Ollama, and streams/returns the response.
@@ -0,0 +1,27 @@
# Technical Research: Unified AI Architecture
**Feature**: Unified AI Architecture (ADR-023)
**Date**: 2026-05-14
## Unknown 1: Integration Auth for n8n AI Pipeline
**Decision**: Create a dedicated `ServiceAccount` API token mechanism for n8n to communicate with the DMS Backend API.
**Rationale**: n8n runs on the isolated AI Host (Desk-5439) and requires programmatic access to upload legacy documents and update staging queue statuses. Standard user JWTs expire too quickly, so a long-lived, restricted-scope service account token is required.
**Alternatives considered**:
- Using a standard Admin user account (rejected due to token expiry and audit trail mingling).
- Unauthenticated internal webhook (rejected due to ADR-016 Zero Trust policy).
## Unknown 2: Qdrant Multi-tenant Payload Filters
**Decision**: Enforce `project_public_id` natively at the NestJS `QdrantService` layer for every search query.
**Rationale**: Ensures that RAG queries absolutely cannot leak data across projects, satisfying SC-004. The backend will automatically inject the user's currently active project ID into the Qdrant filter condition before executing the search.
**Alternatives considered**:
- Having the LLM filter the context (rejected due to high risk of hallucination and leakage).
## Unknown 3: BullMQ RAG Queue Configuration
**Decision**: Implement a dedicated `rag-query-queue` in BullMQ with a concurrency limit of `1`.
**Rationale**: The local LLM (gemma4:9b) on Desk-5439 has limited VRAM (8GB). Processing more than one RAG query at a time will cause Out-Of-Memory (OOM) crashes. Queuing guarantees stability.
**Alternatives considered**:
- Load balancing across multiple GPUs (rejected: hardware constraints, only one RTX 2060 Super available).
## Unknown 4: UI/UX for Graceful AI Fallback
**Decision**: Use React Context (`AiStatusProvider`) in Next.js to globally distribute the AI Host health status. If offline, AI-specific form fields (like auto-suggest chips) and the RAG Chat widget will conditionally render a disabled state or hide entirely.
**Rationale**: Provides a seamless graceful degradation experience without requiring individual components to implement repetitive health-check logic.
@@ -0,0 +1,106 @@
# Feature Specification: Unified AI Architecture
**Feature Branch**: `301-unified-ai-arch`
**Created**: 2026-05-14
**Status**: Draft
**Input**: User description: "ADR-023: Unified AI Architecture"
## Clarifications
### Session 2026-05-14
- Q: Legacy Migration Input Mechanism (How are legacy PDFs fed into the AI pipeline?) → A: Both a secure Admin API endpoint and a watched network folder.
- Q: Failure Handling for Corrupted PDFs (How does the system handle documents that fail AI processing?) → A: Mark as "Rejected" in the staging queue with an error reason.
- Q: AI Audit Log Retention Policy (How long are AI audit logs retained?) → A: Infinite retention until manually deleted by a System Admin.
- Q: RAG Concurrency Handling (How are concurrent RAG queries managed to prevent LLM timeouts?) → A: Request Queuing via BullMQ for sequential processing.
## User Scenarios & Testing _(mandatory)_
### User Story 1 - Legacy Document Migration and Review (Priority: P1)
As a System Administrator, I want to process legacy PDF documents through an AI pipeline and review the extracted metadata in a staging queue so that I can ensure data integrity before importing them into the main database.
**Why this priority**: Legacy document migration is the most critical immediate need for the LCBP3 project, as 20,000+ documents need to be imported securely and accurately.
**Independent Test**: Can be fully tested by uploading a batch of PDF documents, verifying they appear in the staging queue with AI-extracted metadata, and approving/rejecting them, which then commits them to the permanent storage.
**Acceptance Scenarios**:
1. **Given** a batch of legacy PDF documents, **When** processed by the AI pipeline, **Then** they appear in the staging queue with extracted metadata and confidence scores.
2. **Given** a document in the staging queue with a high confidence score, **When** reviewed by an Admin, **Then** it is marked as ready for import without warnings.
3. **Given** a document in the staging queue, **When** the Admin approves it, **Then** the file is moved to permanent storage and committed to the main database.
---
### User Story 2 - RAG Conversational Q&A (Priority: P2)
As a Document Controller or Org Admin, I want to ask natural language questions about project documents (RFA/Correspondence) and get context-aware answers so that I can quickly find information without manually reading full texts.
**Why this priority**: Enhances the user experience significantly by allowing conversational search over complex engineering documents.
**Independent Test**: Can be tested by asking a specific project-related question and receiving an answer generated by the AI, accompanied by citations to the relevant documents.
**Acceptance Scenarios**:
1. **Given** a user has access to a specific project, **When** they ask a question via the RAG interface, **Then** the system returns an answer based ONLY on documents within that project's scope.
2. **Given** the RAG system is queried, **When** generating an answer, **Then** the response is returned in under 10 seconds.
---
### User Story 3 - AI Audit Log Management (Priority: P3)
As a System Admin, I want to view and manage AI audit logs so that I can provide feedback to the development team to improve the AI models and clean up test data.
**Why this priority**: Essential for continuous improvement of the AI models and keeping the database clean, but not a blocker for the core user workflows.
**Independent Test**: Can be tested by generating AI suggestions, verifying they are logged in the audit table, and then having a System Admin perform a hard delete on those logs.
**Acceptance Scenarios**:
1. **Given** an AI suggestion is presented to a user, **When** the user confirms or overrides it, **Then** the action is recorded in the AI audit log with confidence scores and user feedback.
2. **Given** a System Admin is viewing the AI audit logs, **When** they select records to delete, **Then** the records are hard-deleted and this deletion is recorded in the main compliance audit log.
### Edge Cases
- What happens when the isolated AI host is offline?
- The system must fail gracefully, hiding AI suggestions and disabling RAG features, while allowing normal manual document operations to continue.
- How does the system handle concurrent processing of a massive number of documents?
- The system must queue tasks sequentially to prevent memory/resource overload on the isolated machine.
- What happens if the AI suggests an invalid category or enum value?
- The validation layer must reject invalid values before they are presented to the user or saved to the database.
- What happens if a legacy document fails AI processing (e.g., corruption or timeout)?
- The document must be marked as "Rejected" in the staging queue with a specific error reason, keeping it visible for Admin review.
- How does the system handle concurrent RAG queries from multiple users?
- RAG queries must be placed in a BullMQ queue and processed sequentially (or with strict concurrency limits) to prevent LLM crashes, potentially showing a "Waiting in queue..." state to the user.
## Requirements _(mandatory)_
### Functional Requirements
- **FR-001**: System MUST process AI workloads exclusively on a physically isolated machine without direct database or storage access.
- **FR-001b**: System MUST support legacy document ingestion via BOTH a secure Admin API upload endpoint and a watched network folder.
- **FR-002**: System MUST enforce a multi-tenant boundary by filtering vector search queries by project identifier.
- **FR-003**: System MUST provide a staging queue for legacy document migration with human-in-the-loop review.
- **FR-004**: System MUST record AI suggestions and human overrides in a dedicated AI audit log.
- **FR-005**: System MUST allow System Admins to hard-delete AI audit logs, logging the deletion action in the main compliance audit log.
- **FR-005b**: System MUST retain AI audit logs indefinitely until a System Admin explicitly performs a manual hard-delete.
- **FR-006**: System MUST gracefully disable AI features when the AI host is offline, allowing manual data entry to continue uninterrupted.
- **FR-007**: System MUST validate the final payload (whether AI-suggested or human-overridden) against defined master data before database commitment.
- **FR-008**: System MUST utilize BullMQ for asynchronous vector deletion in Qdrant when documents are deleted, ensuring eventual consistency via retries to prevent orphaned vectors.
- **FR-009**: System MUST limit users to 1 active concurrent RAG query, blocking new queries until the active one completes.
- **FR-010**: System MUST enforce a rate limit of maximum 5 RAG queries per minute per user.
- **FR-011**: System SHOULD attempt to abort the underlying LLM generation if a client disconnects during a streaming response, requiring the user to requeue upon reconnecting.
- **FR-012**: For all constrained master data fields, the UI MUST enforce selection via predefined lists (Dropdowns) and prohibit arbitrary free-text input during the human review process.
### Key Entities
- **Migration Review Record**: Represents a document in the staging queue, containing extracted metadata, confidence score, and review status (Pending, Imported, Rejected).
- **AI Audit Log**: Represents a record of AI suggestions vs human decisions for model feedback and training purposes.
## Success Criteria _(mandatory)_
### Measurable Outcomes
- **SC-001**: Legacy document migration pipeline can process 1,000 pages per hour without system degradation.
- **SC-002**: The RAG Q&A system answers user queries with a response time (p95) of under 10 seconds.
- **SC-003**: 100% of AI-processed documents pass validation against master data schemas before being committed to the database.
- **SC-004**: Zero cross-project data leakage incidents during RAG queries.
@@ -0,0 +1,110 @@
# Tasks: Unified AI Architecture
**Input**: Design documents from `/specs/300-others/301-unified-ai-arch/`
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/openapi.yaml
**Tests**: Tests are OPTIONAL for this implementation phase unless specifically requested during PR review.
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Project initialization and basic structure
- [ ] T001 Initialize `AiModule` inside `backend/src/ai/ai.module.ts`
- [ ] T002 [P] Install `qdrant-js` client dependency in the backend workspace
- [ ] T003 Add `AI_HOST_URL`, `AI_QDRANT_URL`, `AI_N8N_SERVICE_TOKEN` to backend `.env` configuration
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
- [ ] T004 Setup `QdrantService` in `backend/src/ai/qdrant.service.ts` to manage vector DB connections
- [ ] T005 [P] Setup BullMQ infrastructure in `AiModule` (configure `AiQueueService`)
- [ ] T006 [P] Implement `ServiceAccountGuard` to validate n8n service tokens for internal API routes
- [ ] T007 Implement SQL Schema Deltas for `migration_review_queue` and `ai_audit_logs` in MariaDB
- [ ] T008 Implement TypeORM base entities mapping to the created SQL tables
**Checkpoint**: Foundation ready - user story implementation can now begin
---
## Phase 3: User Story 1 - Legacy Document Migration and Review (Priority: P1) 🎯 MVP
**Goal**: Process legacy PDFs through an AI pipeline and review extracted metadata in a staging queue before DB commit.
**Independent Test**: Upload PDF batch via n8n/endpoint, verify they appear in the UI queue, and approve them successfully.
### Implementation for User Story 1
- [ ] T009 [P] [US1] Create `MigrationReviewRecord` TypeORM Entity in `backend/src/ai/entities/migration-review.entity.ts`
- [ ] T010 [US1] Implement `AiIngestService` to handle batch ingestion and queue creation
- [ ] T011 [US1] Implement `POST /api/ai/legacy-migration/ingest` in `AiController` using `ServiceAccountGuard`
- [ ] T011b [P] [US1] Export n8n workflow definition to `backend/src/ai/workflows/folder-watcher.json` to monitor the network directory and POST to the ingest API (FR-001b)
- [ ] T012 [US1] Implement `GET /api/ai/legacy-migration/queue` in `AiController`
- [ ] T013 [US1] Implement `POST /api/ai/legacy-migration/queue/{publicId}/approve` with Zod/class-validator payload checking (FR-007)
- [ ] T014 [P] [US1] Create Frontend API hooks for staging queue in `frontend/src/lib/api/ai.ts`
- [ ] T015 [US1] Build Frontend Staging Queue Table UI in `frontend/src/app/(dashboard)/ai-staging/page.tsx`
- [ ] T016 [US1] Implement UI Form dropdown constraints for master data fields in the approval modal (FR-012)
- [ ] T017 [US1] Build `AiStatusBanner.tsx` component in `frontend/src/components/ai/AiStatusBanner.tsx` to handle offline graceful degradation
**Checkpoint**: At this point, User Story 1 should be fully functional.
---
## Phase 4: User Story 2 - RAG Conversational Q&A (Priority: P2)
**Goal**: Ask natural language questions about project documents with context-aware RAG answers.
**Independent Test**: Submit a RAG query for a specific project; verify response < 10s and accurate isolation.
### Implementation for User Story 2
- [ ] T018 [P] [US2] Create BullMQ Processor `rag.processor.ts` with strict concurrency limit = 1 (FR-009)
- [ ] T019 [US2] Implement `AiRagService` containing Ollama LLM integration logic
- [ ] T020 [US2] Enforce `projectPublicId` filtering natively in Qdrant search payload inside `AiRagService`
- [ ] T021 [US2] Implement `POST /api/ai/rag/query` to push jobs to BullMQ and apply rate limiting (5 per min) (FR-010)
- [ ] T022 [US2] Add AbortController logic to backend processor to cancel LLM generation on client disconnect (FR-011)
- [ ] T023 [P] [US2] Build `RagChatWidget.tsx` component with streaming/polling UI for queue wait status
**Checkpoint**: RAG capability is fully implemented and throttled safely.
---
## Phase 5: User Story 3 - AI Audit Log Management (Priority: P3)
**Goal**: View and manage AI audit logs for model feedback, with safe deletion capabilities.
**Independent Test**: Generate AI suggestions, verify logs exist, and test hard delete as a System Admin.
### Implementation for User Story 3
- [ ] T024 [P] [US3] Create `AiAuditLog` TypeORM Entity in `backend/src/ai/entities/ai-audit-log.entity.ts`
- [ ] T025 [US3] Inject Audit Log creation logic into the `/approve` endpoint (capture Human vs AI differences)
- [ ] T026 [US3] Implement `DELETE /api/ai/audit-logs` endpoint with `@UseGuards(CaslAbilityGuard)` checking for `SYSTEM_ADMIN`
- [ ] T027 [US3] Create BullMQ Processor `vector-deletion.processor.ts` to handle asynchronous vector cleanup (FR-008)
- [ ] T028 [US3] Integrate `vector-deletion-queue` dispatch into the main Document Deletion service
**Checkpoint**: AI Audit and safe vector cleanup are complete.
---
## Phase 6: Polish & Cross-Cutting Concerns
**Purpose**: Improvements that affect multiple user stories
- [ ] T029 Code cleanup and CASL RBAC matrix review for all AI endpoints
- [ ] T030 E2E Validation of the BullMQ concurrency limit (stress test 10 concurrent requests)
- [ ] T031 Finalize `README.md` and `quickstart.md` documentation for Desk-5439 setup
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: Start immediately
- **Foundational (Phase 2)**: Depends on Phase 1 - BLOCKS US1, US2, US3
- **User Stories (Phases 3-5)**: Depend on Phase 2. Should be executed sequentially (US1 -> US2 -> US3) due to shared services, but frontend/backend tasks within each story can run in parallel.
- **Polish (Phase 6)**: Depends on all stories completing.
### Parallel Opportunities
- Database schema changes (T007) and Backend Auth Setup (T006)
- Frontend UI components (T015, T017, T023) can be stubbed concurrently with backend API creation.
- Entity creation (T009, T024) and BullMQ Processor setup (T018)