feat(ai-admin-console): complete implementation and resolve lint compilation errors

2026-05-21 21:42:25 +07:00
parent 1580ab2c18
commit 91e9c714df
39 changed files with 3724 additions and 72 deletions
@@ -0,0 +1,146 @@
+// File: specs/200-fullstacks/227-ai-admin-console/spec.md
+// Change Log:
+// - 2026-05-20: Feature Specification สำหรับระบบ AI Admin Console
+// - 2026-05-21: Restructure following spec-template.md with User Stories, FRs, Success Criteria
+
+# Feature Specification: AI Admin Console
+
+**Feature Branch**: `227-ai-admin-console`
+**Created**: 2026-05-20
+**Status**: Draft
+**Category**: 200-fullstacks
+**Input**: ADR-027 AI Admin Panel and Dynamic Control Architecture
+
+---
+
+## User Scenarios & Testing
+
+### User Story 1 - Superadmin Toggles AI System On/Off (Priority: P1)
+
+As a Superadmin, I need to dynamically enable or disable AI features for all regular users without redeploying the system, so that I can perform maintenance, manage system load, or handle AI infrastructure issues gracefully.
+
+**Why this priority**: This is the core control mechanism of the feature. Without it, the admin cannot perform emergency maintenance or manage system resources during high load periods.
+
+**Independent Test**: Can be fully tested by a Superadmin toggling the AI switch and observing that regular users immediately see the disabled state (within polling interval) while the Superadmin retains full access.
+
+**Acceptance Scenarios**:
+
+1. **Given** the AI system is currently enabled, **When** a Superadmin toggles the switch to disabled, **Then** the setting is persisted to database and cache, and regular users see disabled AI buttons within 30 seconds
+2. **Given** the AI system is currently disabled, **When** a Superadmin toggles the switch to enabled, **Then** regular users can access AI features again after the polling interval
+3. **Given** a regular user has AI permissions, **When** they attempt to use AI features while the system is disabled, **Then** they receive HTTP 503 with a user-friendly message explaining temporary unavailability
+
+---
+
+### User Story 2 - Normal Users Experience Soft Fallback (Priority: P1)
+
+As a regular user with AI permissions, I need clear visual feedback when AI features are temporarily disabled, so that I understand why AI buttons are unavailable and can complete my work manually without confusion.
+
+**Why this priority**: Critical for user experience. Abrupt feature disappearance creates confusion and support tickets. Soft fallback maintains user trust.
+
+**Independent Test**: Can be tested by disabling AI system and verifying that regular users see disabled buttons with tooltips and global banner, rather than errors or missing UI elements.
+
+**Acceptance Scenarios**:
+
+1. **Given** the AI system is disabled by admin, **When** a regular user views a document form with AI suggestion buttons, **Then** those buttons appear disabled with a tooltip explaining "ระบบ AI ไม่พร้อมใช้งานชั่วคราว"
+2. **Given** the AI system is disabled, **When** a regular user loads any page, **Then** a global banner appears at the top stating AI is temporarily unavailable
+3. **Given** a regular user attempts direct API access to AI endpoints while disabled, **When** the request is made, **Then** the system returns HTTP 503 with recovery guidance
+
+---
+
+### User Story 3 - Superadmin Monitors AI Health Status (Priority: P2)
+
+As a Superadmin, I need real-time visibility into AI infrastructure health (Ollama, Qdrant, BullMQ queues), so that I can diagnose issues, monitor latency, and make informed decisions about enabling/disabling AI services.
+
+**Why this priority**: Essential for operational awareness but secondary to the control mechanism itself.
+
+**Independent Test**: Can be tested by accessing the AI Admin Console health dashboard and verifying all metrics display correctly with appropriate status indicators.
+
+**Acceptance Scenarios**:
+
+1. **Given** the AI Admin Console is accessed, **When** a Superadmin views the health panel, **Then** they see Ollama latency, active model version, Qdrant collection stats, and BullMQ queue metrics (waiting/active/failed jobs)
+2. **Given** a service is experiencing issues, **When** health check runs, **Then** the status displays as degraded/down with relevant metrics highlighted
+3. **Given** the Superadmin is monitoring the system, **When** they refresh or view the dashboard, **Then** metrics are cached for 30 seconds to prevent excessive load
+
+---
+
+### User Story 4 - Superadmin Uses RAG Playground Sandbox (Priority: P2)
+
+As a Superadmin, I need an isolated RAG testing environment where I can query documents and receive AI-generated responses with citations, so that I can test and refine AI behavior without affecting production queues or user experiences.
+
+**Why this priority**: Enables safe testing and troubleshooting of AI capabilities during maintenance windows.
+
+**Independent Test**: Can be tested by submitting a RAG query in the sandbox and receiving a complete response with document citations, while verifying the job runs through the isolated sandbox queue.
+
+**Acceptance Scenarios**:
+
+1. **Given** the AI system is disabled for regular users, **When** a Superadmin submits a RAG query in the sandbox, **Then** the query processes through the isolated queue and returns results with citations
+2. **Given** a RAG job is submitted, **When** it is processing, **Then** the Superadmin can poll for status updates every 5 seconds and see progress
+3. **Given** the sandbox queue has multiple jobs, **When** jobs are processed, **Then** Superadmin jobs have SUPERADMIN priority (higher than regular batch jobs)
+
+---
+
+### User Story 5 - Superadmin Uses OCR Sandbox for Metadata Extraction (Priority: P2)
+
+As a Superadmin, I need to upload PDF files to an isolated OCR sandbox to test metadata extraction capabilities, so that I can validate AI accuracy and tune extraction parameters without impacting production document processing.
+
+**Why this priority**: Supports AI tuning and validation workflows, enabling data-driven improvements to extraction accuracy.
+
+**Independent Test**: Can be tested by uploading a PDF to the OCR sandbox and receiving extracted metadata in JSON format with confidence scores.
+
+**Acceptance Scenarios**:
+
+1. **Given** a PDF file is uploaded to the OCR sandbox, **When** processing completes, **Then** the system returns extracted metadata as formatted JSON with syntax highlighting
+2. **Given** an OCR job is submitted, **When** processing fails, **Then** the error is displayed inline in a red box with actionable guidance
+3. **Given** the queue length is >= 3, **When** additional sandbox requests are made, **Then** dynamic rate limiting applies (10 requests/hour per user)
+
+---
+
+### Edge Cases
+
+- **EC-001**: What happens when Redis cache is unavailable? System must fall back to database query with <100ms latency penalty
+- **EC-002**: How does system handle concurrent toggle requests? Last-write-wins with optimistic locking; invalid cache after successful write
+- **EC-003**: What if Ollama/Qdrant times out during health check? Health service returns DEGRADED status, not DOWN; timeout is 5 seconds per service
+- **EC-004**: How are long-running sandbox jobs handled? Job status polling available; jobs can be cancelled by admin; results cached for 1 hour
+- **EC-005**: What happens if a Superadmin loses permissions mid-session? Next API request returns 403; UI redirects to unauthorized page
+
+---
+
+## Requirements
+
+### Functional Requirements
+
+- **FR-001**: System MUST provide a toggle switch accessible only to Superadmin (`system.manage_all`) to enable/disable AI features system-wide
+- **FR-002**: System MUST persist AI enabled/disabled state to `system_settings` table with Redis caching for <1ms latency on status checks
+- **FR-003**: System MUST display disabled AI buttons with explanatory tooltips to regular users when AI is turned off
+- **FR-004**: System MUST show a global banner at the top of all pages when AI is disabled, visible only to users with AI permissions
+- **FR-005**: System MUST return HTTP 503 Service Unavailable to regular users attempting AI API calls when AI is disabled
+- **FR-006**: System MUST allow Superadmins full AI access (including sandbox) even when AI is disabled for regular users
+- **FR-007**: System MUST provide health monitoring dashboard showing Ollama latency, model version, Qdrant stats, and BullMQ queue metrics
+- **FR-008**: System MUST cache health check results for 30 seconds to prevent excessive infrastructure load
+- **FR-009**: System MUST provide isolated RAG sandbox queue (`ai-admin-sandbox`) with SUPERADMIN job priority
+- **FR-010**: System MUST provide isolated OCR sandbox for PDF metadata extraction with JSON output and syntax highlighting
+- **FR-011**: System MUST implement dynamic rate limiting for sandbox based on queue length (queue < 3: no limit, queue >= 3: 10 req/hr)
+- **FR-012**: System MUST poll AI status every 30 seconds from frontend for users with AI permissions
+- **FR-013**: System MUST support job status polling every 5 seconds for sandbox operations
+- **FR-014**: System MUST implement AiEnabledGuard with layered permission check (system.manage_all + ai.suggest/ai.rag_query bypass)
+
+### Key Entities
+
+- **SystemSetting**: Stores dynamic configuration values (AI_FEATURES_ENABLED, etc.) with metadata (data_type, category, validation_rules)
+- **SandboxJob**: Represents a sandbox operation (RAG query or OCR extraction) with priority, status, and results
+- **HealthStatus**: Aggregated health metrics from Ollama, Qdrant, and BullMQ with status indicators (HEALTHY/DEGRADED/DOWN)
+
+---
+
+## Success Criteria
+
+### Measurable Outcomes
+
+- **SC-001**: Superadmin can toggle AI system state with changes reflected to regular users within 30 seconds
+- **SC-002**: AI status check API responds in under 1ms when cached, under 50ms on cache miss
+- **SC-003**: 100% of regular users see disabled AI buttons with tooltips when AI is turned off (no hidden or broken UI)
+- **SC-004**: Health dashboard displays all 3 services (Ollama, Qdrant, BullMQ) with <5 second data staleness
+- **SC-005**: Sandbox RAG queries return complete responses with citations within 2x normal queue processing time
+- **SC-006**: Sandbox OCR extraction returns valid JSON for 95% of test PDFs with clear error messages for failures
+- **SC-007**: Zero unauthorized access to admin endpoints (verified by security tests)
+- **SC-008**: System gracefully degrades when AI disabled with zero error reports from confused users