feat(rfa-ai): Complete RFA Approval Refactor and AI Model Revision
This commit is contained in:
@@ -0,0 +1,95 @@
|
||||
# Cross-Spec: BullMQ Queue Coordination
|
||||
|
||||
**Date**: 2026-05-16
|
||||
**Features**: 204-rfa-approval-refactor + 302-ai-model-revision
|
||||
**Document**: Coordination strategy for shared BullMQ infrastructure
|
||||
|
||||
---
|
||||
|
||||
## Queue Overview
|
||||
|
||||
| Queue | Feature | Job Types | Priority | Notes |
|
||||
|-------|---------|-----------|----------|-------|
|
||||
| `ai-realtime` | AI Model Revision | ai-suggest, rag-query | HIGH | Interactive, must not be blocked |
|
||||
| `ai-batch` | AI Model Revision | ocr, extract-metadata, embed-document | LOW | Batch processing, can be paused |
|
||||
| `rfa-reminders` | RFA Approval | reminder-send, escalation | MEDIUM | Scheduled notifications |
|
||||
| `rfa-distribution` | RFA Approval | distribute-document | MEDIUM | Post-approval distribution |
|
||||
|
||||
---
|
||||
|
||||
## Coordination Rules
|
||||
|
||||
### 1. Queue Isolation
|
||||
|
||||
```typescript
|
||||
// AI queues are isolated from RFA queues
|
||||
// Each feature has dedicated queue names
|
||||
export const QUEUE_AI_REALTIME = 'ai-realtime';
|
||||
export const QUEUE_AI_BATCH = 'ai-batch';
|
||||
export const QUEUE_RFA_REMINDERS = 'rfa-reminders';
|
||||
export const QUEUE_RFA_DISTRIBUTION = 'rfa-distribution';
|
||||
```
|
||||
|
||||
### 2. Priority Strategy
|
||||
|
||||
| Priority Level | Queue | Use Case |
|
||||
|---------------|-------|----------|
|
||||
| 1 (Highest) | ai-realtime | User-facing AI suggestions |
|
||||
| 2 | rfa-reminders | Due date notifications |
|
||||
| 3 | rfa-distribution | Document distribution |
|
||||
| 4 (Lowest) | ai-batch | Background embedding |
|
||||
|
||||
### 3. Auto-Pause Mechanism
|
||||
|
||||
```typescript
|
||||
// AI Realtime Processor pauses ai-batch when active
|
||||
@OnWorkerEvent('active')
|
||||
async onActive() {
|
||||
await this.aiBatchQueue.pause();
|
||||
}
|
||||
|
||||
@OnWorkerEvent('completed')
|
||||
@OnWorkerEvent('failed')
|
||||
async onCompletedOrFailed() {
|
||||
await this.aiBatchQueue.resume();
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Concurrency Limits
|
||||
|
||||
| Queue | Concurrency | Reason |
|
||||
|-------|-------------|--------|
|
||||
| ai-realtime | 1 | GPU sharing with ai-batch |
|
||||
| ai-batch | 1 | GPU sharing with ai-realtime |
|
||||
| rfa-reminders | 5 | Email notifications can batch |
|
||||
| rfa-distribution | 3 | Transmittal creation moderate |
|
||||
|
||||
### 5. Conflict Prevention
|
||||
|
||||
- **No job name conflicts**: Each job type has unique naming
|
||||
- **No data cross-contamination**: Different payloads per queue
|
||||
- **Separate Redis keys**: Queue prefixes ensure isolation
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
Check queue status:
|
||||
```bash
|
||||
# Redis CLI
|
||||
redis-cli KEYS "bull:*"
|
||||
|
||||
# Check queue lengths
|
||||
redis-cli LLEN "bull:ai-realtime:wait"
|
||||
redis-cli LLEN "bull:rfa-reminders:wait"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [x] `ai-realtime` and `ai-batch` have auto-pause/resume
|
||||
- [x] `rfa-reminders` doesn't block AI queues
|
||||
- [x] All queues have unique names
|
||||
- [x] Concurrency configured per queue
|
||||
- [x] Priority levels documented
|
||||
@@ -0,0 +1,105 @@
|
||||
# Cross-Spec: GPU Resource Coordination
|
||||
|
||||
**Date**: 2026-05-16
|
||||
**Hardware**: RTX 2060 Super 8GB (Desk-5439)
|
||||
**Target Peak**: ~4.5GB VRAM
|
||||
**Document**: GPU scheduling strategy for AI workloads
|
||||
|
||||
---
|
||||
|
||||
## GPU Workload Overview
|
||||
|
||||
| Feature | Queue | GPU Usage | Duration | Frequency |
|
||||
|---------|-------|-----------|----------|-----------|
|
||||
| AI Model Revision | ai-realtime | High (gemma4:e4b) | 5-30s | On user action |
|
||||
| AI Model Revision | ai-batch | High (gemma4:e4b) | 30-120s | Background |
|
||||
| RFA Approval | rfa-reminders | None | - | - |
|
||||
| RFA Approval | rfa-distribution | None | - | - |
|
||||
|
||||
---
|
||||
|
||||
## Scheduling Strategy
|
||||
|
||||
### 1. Time-Based Scheduling
|
||||
|
||||
```
|
||||
Peak Hours (09:00-18:00):
|
||||
├── ai-realtime: ACTIVE (user requests)
|
||||
└── ai-batch: PAUSED (defer to off-peak)
|
||||
|
||||
Off-Peak Hours (18:00-09:00):
|
||||
├── ai-realtime: ACTIVE (reduced load)
|
||||
└── ai-batch: ACTIVE (background processing)
|
||||
```
|
||||
|
||||
### 2. Dynamic Pause/Resume
|
||||
|
||||
```typescript
|
||||
// AiRealtimeProcessor auto-manages ai-batch
|
||||
@Processor(QUEUE_AI_REALTIME, { concurrency: 1 })
|
||||
export class AiRealtimeProcessor {
|
||||
@OnWorkerEvent('active')
|
||||
async pauseBatch() {
|
||||
await this.aiBatchQueue.pause();
|
||||
this.logger.log('Paused ai-batch for realtime job');
|
||||
}
|
||||
|
||||
@OnWorkerEvent('completed')
|
||||
async resumeBatch() {
|
||||
const activeCount = await this.aiRealtimeQueue.getActiveCount();
|
||||
if (activeCount === 0) {
|
||||
await this.aiBatchQueue.resume();
|
||||
this.logger.log('Resumed ai-batch (no active realtime jobs)');
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. VRAM Budget Management
|
||||
|
||||
| Model | VRAM Usage | Context |
|
||||
|-------|------------|---------|
|
||||
| gemma4:e4b Q8_0 | ~4.5GB peak | Main inference |
|
||||
| nomic-embed-text | ~0.5GB | Embedding only |
|
||||
| **Total Budget** | **~5GB** | Safety margin 3GB |
|
||||
|
||||
### 4. Contention Prevention
|
||||
|
||||
- **Single Model Loading**: Only gemma4:e4b loaded at a time
|
||||
- **No Concurrent GPU Jobs**: concurrency=1 for both AI queues
|
||||
- **Memory Cleanup**: Explicit cleanup after each job
|
||||
- **Queue Draining**: ai-batch pauses when ai-realtime active
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Commands
|
||||
|
||||
```bash
|
||||
# Monitor GPU usage on Desk-5439
|
||||
watch -n 1 nvidia-smi
|
||||
|
||||
# Check Ollama model status
|
||||
curl http://192.168.10.100:11434/api/ps
|
||||
|
||||
# Monitor queue states
|
||||
redis-cli KEYS "bull:*:meta"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Fallback Strategy
|
||||
|
||||
If GPU unavailable:
|
||||
1. ai-realtime: Return "AI service temporarily unavailable"
|
||||
2. ai-batch: Queue jobs with delay, retry every 5 minutes
|
||||
3. RFA features: Unaffected (no GPU usage)
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [x] ai-realtime has auto-pause for ai-batch
|
||||
- [x] concurrency=1 for both AI queues
|
||||
- [x] VRAM monitoring in place
|
||||
- [x] Fallback handling for GPU unavailability
|
||||
- [x] RFA queues don't use GPU
|
||||
Reference in New Issue
Block a user