690524:1919 ADR-028-228-migration #04
CI / CD Pipeline / build (push) Successful in 4m10s
CI / CD Pipeline / deploy (push) Successful in 3m52s

This commit is contained in:
2026-05-24 19:19:46 +07:00
parent 93fd95a6b3
commit 1564f8648d
22 changed files with 1422 additions and 255 deletions
@@ -11,9 +11,9 @@
**Requirements:**
- **OS**: Windows 10/11 หรือ Linux (Desk-5439)
- **GPU**: NVIDIA GPU ที่รองรับ CUDA 11.8+ (VRAM ≥ 6GB แนะนำ)
- **GPU**: NVIDIA GPU ที่รองรับ CUDA 11.8+ (VRAM ≥ 4GB แนะนำ)
- **Ollama Version**: ≥ 0.5.0
- **Models**: `gemma4:e2b` (Q4_K_M quantization) + `nomic-embed-text`
- **Models**: `gemma4:e2b` (Q4 quantization) + `nomic-embed-text`
**Verification Steps:**
@@ -30,7 +30,7 @@ nvidia-smi
ollama list
# Expected output:
# NAME ID SIZE MODIFIED
# gemma4:e2b <hash> 2.4 GB <timestamp>
# gemma4:e2b <hash> 2.0 GB <timestamp>
# nomic-embed-text <hash> 274 MB <timestamp>
# 4. Test model inference (quick test)
@@ -54,7 +54,7 @@ ollama pull nomic-embed-text
# Verify VRAM usage during inference
nvidia-smi --query-gpu=memory.used --format=csv,noheader
# Expected: < 5120 MB (5GB threshold per SC-003)
# Expected: < 3072 MB (3GB threshold per SC-003)
```
**Troubleshooting:**
@@ -285,7 +285,7 @@ curl http://192.168.10.XX:8765/health
### 7. GPU Resource Monitoring (Critical for SC-003)
**Requirements:**
- **VRAM Limit**: ≤ 5GB peak (per SC-003)
- **VRAM Limit**: ≤ 3GB peak (per SC-003)
- **Concurrency**: 1 job per queue (enforced by BullMQ)
**Verification Commands:**
@@ -303,12 +303,12 @@ nvidia-smi --query-gpu=timestamp,memory.used,utilization.gpu \
```
**Expected Behavior:**
- **ai-batch job**: VRAM peaks at ~2.5GB (gemma4:e2b Q4_K_M)
- **ai-realtime job**: VRAM peaks at ~2.5GB (same model)
- **ai-batch job**: VRAM peaks at ~2.0GB (gemma4:e2b Q4)
- **ai-realtime job**: VRAM peaks at ~2.0GB (same model)
- **No concurrent jobs**: ai-batch pauses when ai-realtime active (GPU protection)
**Troubleshooting:**
- **VRAM overflow (>5GB)**: Reduce model quantization or increase GPU memory
- **VRAM overflow (>3GB)**: Reduce model quantization or increase GPU memory
- **GPU contention**: Verify BullMQ concurrency=1 enforcement
- **Slow inference**: Check GPU utilization, consider faster model quantization
@@ -399,7 +399,7 @@ grep -r "typhoon" backend/src --include="*.ts"
# 2. Measure VRAM peak during job run (verify SC-003):
nvidia-smi --query-gpu=memory.used --format=csv,noheader
# Expected: value < 5120 MB (5GB threshold per SC-003)
# Expected: value < 3072 MB (3GB threshold per SC-003)
# Repeat during both ai-batch and ai-realtime jobs to verify peak
```