Files
lcbp3/specs/200-fullstacks/235-ai-runtime-policy-refactor/validation-report.md
T
admin 0227b7b982
CI / CD Pipeline / build (push) Successful in 4m16s
CI / CD Pipeline / deploy (push) Successful in 11m51s
feat(ai-runtime): complete ai runtime policy refactor (ADR-035)
2026-06-12 08:07:15 +07:00

9.2 KiB
Raw Blame History

// File: specs/200-fullstacks/235-ai-runtime-policy-refactor/validation-report.md // Change Log: // - 2026-06-11: Initial validation report for feature 235

Validation Report: AI Runtime Policy Refactor

Date: 2026-06-11 Feature: 235-ai-runtime-policy-refactor Status: PARTIAL

Coverage Summary

Metric Count Percentage
Requirements Covered 22/25 88%
Acceptance Criteria Met 14/19 74%
Edge Cases Handled 6/7 86%
Tests Present 18/25 72%

What Was Validated

Requirement Matrix

Requirement Status Evidence Notes
FR-A01 Covered DTO forbidden fields + controller integration tests HTTP 400 path implemented
FR-A02 Partial DTO still accepts payload and projectPublicId Spec text conflicts with rag-query/query + tenant isolation contract
FR-A03 Covered AiPolicyService.getProfileForJobType() + AiService.submitUnifiedJob() Backend assigns profile from job type
FR-A04 Covered Admin Console + OCR Sandbox UI Visibility exists in UI; enforcement is by contract removal, not separate guard
FR-A05 Covered AiPolicyService.createJobPayload() Mapping includes profile, canonical model, snapshot params
FR-A06 Covered deterministic switch in getProfileForJobType() No unmapped internal job type found
FR-A07 Covered backend DTOs, frontend normalization, sandbox badge mapping Canonical labels present across layers inspected
FR-A08 Covered worker audit writes effectiveProfile, canonicalModel, snapshotParamsJson enqueue-time false success log removed
FR-A09 Covered createJobPayload() snapshot + worker uses payload snapshot Predictable per-dispatch parameters
FR-B01 Covered AiPolicyService default policy map + DB/cache lookup Runtime policy layer exists
FR-B02 Covered OcrService.calculateOcrResidency() Dynamic keep_alive decision implemented
FR-B03 Covered deep-analysis/high-pressure branches + residency tests Safe OCR unload path exists
FR-B04 Covered residency window branch + tests Positive keep_alive path exists
FR-B05 Covered VRAM query failure fallback + tests Safe default keep_alive=0 exists
FR-B06 Covered OcrService logs decision context Log behavior implemented, not live-verified
FR-C01 Covered /embed headroom check + CPU fallback Sidecar code present
FR-C02 Covered /rerank headroom check + CPU fallback Sidecar code present
FR-C03 Covered /embed + /rerank timeout -> HTTP 504 No partial result path found
FR-C04 Covered device/reason logging in sidecar Log behavior implemented
FR-C05 Partial rag-query backend path exists No executed integration/manual proof that fallback path completes end-to-end
FR-C06 Covered env threshold usage + safe default in VRAM query failure Configurable threshold present
FR-D01 Partial config default=2 + processor logic + unit tests No live worker concurrency proof beyond unit tests
FR-D02 Covered lightweight job classification list Matches spec set
FR-D03 Covered AiService.submitUnifiedJob() + realtime redirect tests rag-query stays in ai-batch
FR-D04 Covered active-job counter + queue policy tests Resume now waits for all realtime jobs

Acceptance Criteria Gaps

Scenario Status Notes
US1-3 Admin Console shows canonical names only Partial Code supports it, but no manual browser validation recorded
US1-5 OCR Sandbox reveals effective profile/modelUsed Partial UI/service evidence exists, but no executed sandbox validation record
US2-4 OCR logs residency decision with headroom Partial Logging code exists; no captured runtime log artifact
US3-4 RAG still answers under CPU fallback Partial Code path exists; no completed end-to-end run
US5-1 executable cutover gate Partial backend targeted tests passed, but sidecar pytest was not executed in this validation pass
US5-2 Admin Console labels manual check Missing T032 still unchecked
US5-3 OCR Sandbox behavior across headroom scenarios Missing T032 still unchecked

Edge Case Review

Edge Case Status Notes
VRAM query failure -> keep_alive: 0 Handled explicit safe default in backend + sidecar
caller sends forbidden profile/model fields Handled DTO/controller tests cover this
admin-only large-context when VRAM insufficient Partial spec branch is stale after contract removal; no current caller path exists
OCR job races with main model generation Handled high-pressure/deep-analysis path forces unload
CPU fallback timeout must fail clearly Handled 504 implemented
Ollama /api/ps schema drift after cutover Handled safe default available=0 path exists
headroom snapshot/request race acceptable Handled implementation follows spec assumption; no stronger synchronization introduced

Success Criteria Notes

Success Criterion Status Notes
SC-001 Likely Met automated rejection tests exist
SC-002 Partial code normalization exists; no full manual surface sweep attached
SC-003 Not Validated no latency measurement artifact
SC-004 Partial fallback code exists; no executed end-to-end proof
SC-005 Partial backend tests executed, sidecar pytest/manual cutover not completed
SC-006 Partial concurrency config + unit tests exist, no throughput measurement

Key Findings

  1. Implementation is broadly aligned with the runtime-policy refactor design, especially on policy mapping, canonical naming, adaptive OCR residency, retrieval CPU fallback, and queue pause/resume correctness.
  2. Validation cannot be promoted to PASS yet because the feature still lacks the manual Gate 14 evidence from quickstart.md and this pass did not execute the Python sidecar pytest suite.
  3. The spec artifact set contains one material inconsistency: FR-A02 says CreateAiJobDto should only expose type, documentPublicId, and attachmentPublicId, but the same spec and implemented contract require payload.query and projectPublicId for rag-query. The code follows the richer contract, not the literal FR-A02 text.
  4. quickstart.md is stale against the implemented Option B contract in at least Gate 1C, 1D, and 4A because it still sends executionProfile / large-context style caller input that the new DTO now forbids.

Recommendations

  1. Complete T032 by running the manual Gate 14 flow on a real backend + OCR sidecar environment and append the captured results to this feature folder.
  2. Run pytest specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests -v once the sidecar environment is ready, then update this report with the result.
  3. Reconcile FR-A02 and quickstart.md with the actual Option B contract so the validation target and operator guide no longer contradict the implementation.
  4. Add one end-to-end proof for FR-C05/SC-004: force GPU pressure, submit rag-query, and capture both successful response and sidecar device=cpu log.
  5. Add one concurrency-focused execution proof for FR-D01/SC-006 if the team wants PASS to include runtime throughput evidence rather than unit-level proof only.