3.7 KiB
3.7 KiB
// File: specs/200-fullstacks/235-ai-runtime-policy-refactor/quickstart.md // Change Log: // - 2026-06-11: Verification quickstart for AI Runtime Policy Refactor
Quickstart: AI Runtime Policy Refactor — Verification Guide
Prerequisites
- Backend running (
pnpm run start:devinbackend/) - OCR sidecar running on Desk-5439 (
docker compose upin ocr-sidecar/) - Ollama running with
np-dms-aiandnp-dms-ocrtags registered - Admin user token available
Gate 1: Policy Contract Verification
1A. Reject model.key (should return 400)
curl -X POST http://localhost:3001/api/ai/jobs \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"type": "rag-query", "model": {"key": "typhoon2.5-np-dms:latest"}}' \
| jq '.statusCode, .message'
# Expected: 400, message about model.key not allowed
1B. Reject parameter overrides (should return 400)
curl -X POST http://localhost:3001/api/ai/jobs \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"type": "rag-query", "temperature": 0.9}' \
| jq '.statusCode'
# Expected: 400
1C. Valid executionProfile (should return 201)
curl -X POST http://localhost:3001/api/ai/jobs \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"type": "rag-query", "executionProfile": "balanced", "documentPublicId": "<uuid>"}' \
| jq '.data.modelUsed'
# Expected: "np-dms-ai"
1D. large-context by non-admin (should return 403)
curl -X POST http://localhost:3001/api/ai/jobs \
-H "Authorization: Bearer $NON_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"type": "rag-query", "executionProfile": "large-context"}' \
| jq '.statusCode'
# Expected: 403
Gate 2: Canonical Naming Verification
2A. Check audit log after job
SELECT metadata->>'$.modelUsed' FROM ai_audit_logs ORDER BY created_at DESC LIMIT 1;
-- Expected: "np-dms-ai" (ไม่ใช่ "typhoon2.5-np-dms:latest")
2B. Check Admin Console (Manual)
- เปิด
/admin/aiใน browser - ตรวจว่า model labels ทั้งหมดแสดง
np-dms-aiและnp-dms-ocr - ตรวจว่าไม่มี
typhoon*ปรากฏใน UI
Gate 3: Adaptive OCR Residency Verification
3A. OCR under large-context profile
# ส่ง OCR job ขณะที่มี large-context job active
# ดู sidecar log
docker logs ocr-sidecar --tail 20
# Expected log line: keep_alive=0 reason=large-context-active
3B. OCR with headroom sufficient
# ส่ง OCR job เมื่อ GPU headroom สูง (ไม่มี model loaded หนัก)
docker logs ocr-sidecar --tail 20
# Expected log line: keep_alive=120 reason=headroom-sufficient
Gate 4: Retrieval CPU Fallback Verification
4A. Force GPU pressure then run RAG
# 1. Force load large model
curl http://localhost:11434/api/generate -d '{"model":"np-dms-ai","prompt":"warmup","keep_alive":-1}'
# 2. Run RAG query
curl -X POST http://localhost:3001/api/ai/jobs \
-H "Authorization: Bearer $TOKEN" \
-d '{"type":"rag-query","executionProfile":"balanced","documentPublicId":"<uuid>"}' \
| jq '.data.status'
# Expected: "completed" (ไม่ fail)
# 3. ตรวจ sidecar log
docker logs ocr-sidecar --tail 20
# Expected: device=cpu reason=gpu-headroom-below-threshold
Automated Test Suite
# Backend unit + integration tests
cd backend
pnpm test -- --testPathPattern="ai-policy|ocr-residency|execution-profile"
# Sidecar tests
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
pytest tests/ -v
All tests must pass before cutover gate is considered complete.