feat(rfa-ai): Complete RFA Approval Refactor and AI Model Revision

2026-05-16 10:59:53 +07:00
parent 6cb3ae10ee
commit 1a162bf320
105 changed files with 5088 additions and 1083 deletions
@@ -0,0 +1,95 @@
+# Cross-Spec: BullMQ Queue Coordination
+
+**Date**: 2026-05-16  
+**Features**: 204-rfa-approval-refactor + 302-ai-model-revision  
+**Document**: Coordination strategy for shared BullMQ infrastructure
+
+---
+
+## Queue Overview
+
+| Queue | Feature | Job Types | Priority | Notes |
+|-------|---------|-----------|----------|-------|
+| `ai-realtime` | AI Model Revision | ai-suggest, rag-query | HIGH | Interactive, must not be blocked |
+| `ai-batch` | AI Model Revision | ocr, extract-metadata, embed-document | LOW | Batch processing, can be paused |
+| `rfa-reminders` | RFA Approval | reminder-send, escalation | MEDIUM | Scheduled notifications |
+| `rfa-distribution` | RFA Approval | distribute-document | MEDIUM | Post-approval distribution |
+
+---
+
+## Coordination Rules
+
+### 1. Queue Isolation
+
+```typescript
+// AI queues are isolated from RFA queues
+// Each feature has dedicated queue names
+export const QUEUE_AI_REALTIME = 'ai-realtime';
+export const QUEUE_AI_BATCH = 'ai-batch';
+export const QUEUE_RFA_REMINDERS = 'rfa-reminders';
+export const QUEUE_RFA_DISTRIBUTION = 'rfa-distribution';
+```
+
+### 2. Priority Strategy
+
+| Priority Level | Queue | Use Case |
+|---------------|-------|----------|
+| 1 (Highest) | ai-realtime | User-facing AI suggestions |
+| 2 | rfa-reminders | Due date notifications |
+| 3 | rfa-distribution | Document distribution |
+| 4 (Lowest) | ai-batch | Background embedding |
+
+### 3. Auto-Pause Mechanism
+
+```typescript
+// AI Realtime Processor pauses ai-batch when active
+@OnWorkerEvent('active')
+async onActive() {
+  await this.aiBatchQueue.pause();
+}
+
+@OnWorkerEvent('completed')
+@OnWorkerEvent('failed')
+async onCompletedOrFailed() {
+  await this.aiBatchQueue.resume();
+}
+```
+
+### 4. Concurrency Limits
+
+| Queue | Concurrency | Reason |
+|-------|-------------|--------|
+| ai-realtime | 1 | GPU sharing with ai-batch |
+| ai-batch | 1 | GPU sharing with ai-realtime |
+| rfa-reminders | 5 | Email notifications can batch |
+| rfa-distribution | 3 | Transmittal creation moderate |
+
+### 5. Conflict Prevention
+
+- **No job name conflicts**: Each job type has unique naming
+- **No data cross-contamination**: Different payloads per queue
+- **Separate Redis keys**: Queue prefixes ensure isolation
+
+---
+
+## Monitoring
+
+Check queue status:
+```bash
+# Redis CLI
+redis-cli KEYS "bull:*"
+
+# Check queue lengths
+redis-cli LLEN "bull:ai-realtime:wait"
+redis-cli LLEN "bull:rfa-reminders:wait"
+```
+
+---
+
+## Verification Checklist
+
+- [x] `ai-realtime` and `ai-batch` have auto-pause/resume
+- [x] `rfa-reminders` doesn't block AI queues
+- [x] All queues have unique names
+- [x] Concurrency configured per queue
+- [x] Priority levels documented
@@ -0,0 +1,105 @@
+# Cross-Spec: GPU Resource Coordination
+
+**Date**: 2026-05-16  
+**Hardware**: RTX 2060 Super 8GB (Desk-5439)  
+**Target Peak**: ~4.5GB VRAM  
+**Document**: GPU scheduling strategy for AI workloads
+
+---
+
+## GPU Workload Overview
+
+| Feature | Queue | GPU Usage | Duration | Frequency |
+|---------|-------|-----------|----------|-----------|
+| AI Model Revision | ai-realtime | High (gemma4:e4b) | 5-30s | On user action |
+| AI Model Revision | ai-batch | High (gemma4:e4b) | 30-120s | Background |
+| RFA Approval | rfa-reminders | None | - | - |
+| RFA Approval | rfa-distribution | None | - | - |
+
+---
+
+## Scheduling Strategy
+
+### 1. Time-Based Scheduling
+
+```
+Peak Hours (09:00-18:00):
+├── ai-realtime: ACTIVE (user requests)
+└── ai-batch: PAUSED (defer to off-peak)
+
+Off-Peak Hours (18:00-09:00):
+├── ai-realtime: ACTIVE (reduced load)
+└── ai-batch: ACTIVE (background processing)
+```
+
+### 2. Dynamic Pause/Resume
+
+```typescript
+// AiRealtimeProcessor auto-manages ai-batch
+@Processor(QUEUE_AI_REALTIME, { concurrency: 1 })
+export class AiRealtimeProcessor {
+  @OnWorkerEvent('active')
+  async pauseBatch() {
+    await this.aiBatchQueue.pause();
+    this.logger.log('Paused ai-batch for realtime job');
+  }
+
+  @OnWorkerEvent('completed')
+  async resumeBatch() {
+    const activeCount = await this.aiRealtimeQueue.getActiveCount();
+    if (activeCount === 0) {
+      await this.aiBatchQueue.resume();
+      this.logger.log('Resumed ai-batch (no active realtime jobs)');
+    }
+  }
+}
+```
+
+### 3. VRAM Budget Management
+
+| Model | VRAM Usage | Context |
+|-------|------------|---------|
+| gemma4:e4b Q8_0 | ~4.5GB peak | Main inference |
+| nomic-embed-text | ~0.5GB | Embedding only |
+| **Total Budget** | **~5GB** | Safety margin 3GB |
+
+### 4. Contention Prevention
+
+- **Single Model Loading**: Only gemma4:e4b loaded at a time
+- **No Concurrent GPU Jobs**: concurrency=1 for both AI queues
+- **Memory Cleanup**: Explicit cleanup after each job
+- **Queue Draining**: ai-batch pauses when ai-realtime active
+
+---
+
+## Monitoring Commands
+
+```bash
+# Monitor GPU usage on Desk-5439
+watch -n 1 nvidia-smi
+
+# Check Ollama model status
+curl http://192.168.10.100:11434/api/ps
+
+# Monitor queue states
+redis-cli KEYS "bull:*:meta"
+```
+
+---
+
+## Fallback Strategy
+
+If GPU unavailable:
+1. ai-realtime: Return "AI service temporarily unavailable"
+2. ai-batch: Queue jobs with delay, retry every 5 minutes
+3. RFA features: Unaffected (no GPU usage)
+
+---
+
+## Verification Checklist
+
+- [x] ai-realtime has auto-pause for ai-batch
+- [x] concurrency=1 for both AI queues
+- [x] VRAM monitoring in place
+- [x] Fallback handling for GPU unavailability
+- [x] RFA queues don't use GPU