refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
CI / CD Pipeline / build (push) Successful in 7m37s
CI / CD Pipeline / deploy (push) Failing after 20m15s

This commit is contained in:
2026-06-20 16:37:04 +07:00
parent d418d791a4
commit a80ebef285
70 changed files with 5762 additions and 452 deletions
@@ -0,0 +1,374 @@
# Quickstart: OCR Sidecar Refactor
**Date**: 2026-06-20
**Purpose**: Deployment and testing guide for OCR sidecar refactor
## Prerequisites
- Access to Desk-5439 (192.168.10.100) with Docker
- Access to backend services (QNAP 192.168.10.8)
- Python 3.11+ for local testing (optional)
- pytest for testing (optional)
## Phase 1: Deployment (Before ADR-041 Consolidation)
### Step 1: Update Sidecar Code
1. Navigate to sidecar directory:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
```
2. Update `app.py` with the following changes:
- Remove hardcoded default API key
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
- Implement async I/O with `httpx.AsyncClient` via lifespan
- Replace `@app.on_event("startup")` with lifespan context manager
- Wire `calculate_ocr_residency()` into `process_ocr`
- Implement path canonicalization + base-path whitelist on `/ocr`
- Remove hardcoded runtime parameters
- Receive systemPrompt and DMS tags from backend
- Remove `/normalize` endpoint
- Fix mutable default argument `options_override={}`
- Load models via `asyncio.to_thread` during lifespan
3. Update `requirements.txt`:
```text
PyMuPDF==1.24.0
fastapi==0.111.0
uvicorn[standard]==0.30.1
python-multipart==0.0.9
httpx==0.27.0
FlagEmbedding>=1.2.0
typhoon-ocr>=0.4.1
```
4. Update `.env`:
```bash
# Phase 1 (before ADR-041)
OCR_SIDECAR_API_KEY=your-secure-api-key-here
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
OLLAMA_API_URL=http://localhost:11434
OCR_MODEL=np-dms-ocr:latest
```
### Step 2: Update Backend Services
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
- Add parameter resolution from `ai_execution_profiles` (row `ocr-extract`)
- Add Active Prompt resolution from `ai_prompts` (type `ocr_extraction`)
- Extract systemPrompt and DMS tags from Active Prompt
- Send resolved parameters to sidecar in OCR requests
- Keep X-API-Key send-side (Phase 1)
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
- Same parameter resolution pattern as OcrService
- Keep X-API-Key send-side (Phase 1)
3. Update backend `.env`:
```bash
# Phase 1 (before ADR-041)
OCR_API_URL=http://192.168.10.100:8765
OCR_API_KEY=your-secure-api-key-here
# Common variables
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
```
### Step 3: Rebuild and Deploy Sidecar
1. Build Docker image on Desk-5439:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
docker-compose build
```
2. Stop existing container:
```bash
docker-compose down
```
3. Start new container:
```bash
docker-compose up -d
```
4. Verify health:
```bash
curl http://192.168.10.100:8765/health
```
Expected response:
```json
{
"status": "healthy",
"timestamp": "2026-06-20T10:30:00Z",
"version": "1.0.0"
}
```
### Step 4: Deploy Backend Changes
1. Build backend:
```bash
cd backend
pnpm run build
```
2. Deploy backend containers (via existing deploy script or manual):
```bash
# From repo root
./scripts/deploy.sh
```
3. Verify backend health:
```bash
curl http://localhost:3001/api/ai/health
```
## Phase 2: Deployment (After ADR-041 Consolidation)
**Note**: This phase can only be executed after ADR-041 server consolidation completes (single Docker host).
### Step 1: Remove X-API-Key from Sidecar
1. Update `app.py` on sidecar:
- Remove X-API-Key validation from all endpoints
- Remove `OCR_SIDECAR_API_KEY` environment variable check
2. Update `.env` on sidecar:
```bash
# Remove OCR_SIDECAR_API_KEY line
# Keep common variables
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
OLLAMA_API_URL=http://localhost:11434
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
```
3. Rebuild and redeploy sidecar:
```bash
docker-compose down
docker-compose build
docker-compose up -d
```
### Step 2: Remove X-API-Key from Backend
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
- Remove X-API-Key header from sidecar requests
- Remove `OCR_API_KEY` environment variable usage
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
- Remove X-API-Key header from sidecar requests
- Remove `OCR_API_KEY` environment variable usage
3. Update backend `.env`:
```bash
# Remove OCR_API_KEY line
# Keep common variables
OCR_API_URL=http://sidecar:8765 # Docker-internal URL
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
```
4. Rebuild and redeploy backend:
```bash
cd backend
pnpm run build
./scripts/deploy.sh
```
## Testing
### Unit Tests (Sidecar)
1. Navigate to sidecar tests directory:
```bash
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests
```
2. Run path traversal tests:
```bash
pytest test_path_traversal.py -v
```
Expected output: All tests pass, path traversal attempts return 403
3. Run residency wiring tests:
```bash
pytest test_residency_wiring.py -v
```
Expected output: All tests pass, `calculate_ocr_residency()` is called correctly
### Integration Tests (Backend)
1. Run backend AI service tests:
```bash
cd backend
pnpm test ai/ocr.service.spec.ts
pnpm test ai/sandbox-ocr-engine.service.spec.ts
```
2. Verify parameter resolution from database:
- Check that `ai_execution_profiles` row `ocr-extract` exists
- Check that `ai_prompts` has active row for `ocr_extraction` type
- Verify parameters are correctly resolved and sent to sidecar
### Manual Testing
1. Test path traversal protection:
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Expected: `403 Forbidden`
2. Test valid OCR request:
```bash
curl -X POST http://192.168.10.100:8765/ocr \
-H "Content-Type: application/json" \
-H "X-API-Key: your-api-key" \
-d '{
"pdf_path": "/mnt/uploads/temp/test.pdf",
"runtime_params": {
"temperature": 0.7,
"top_p": 0.9,
"repeat_penalty": 1.1,
"max_tokens": 4096
}
}'
```
Expected: `200 OK` with extracted text
3. Test parameter governance:
- Modify `ai_execution_profiles` row `ocr-extract` parameters
- Run OCR request
- Verify new parameters are used (check sidecar logs)
4. Test Active Prompt integration:
- Modify active prompt in `ai_prompts` for `ocr_extraction`
- Run OCR request
- Verify new system prompt is used
## Performance Testing
1. Benchmark async vs sync I/O:
```bash
# Use Apache Bench or similar tool
ab -n 1000 -c 10 -p ocr_request.json -T application/json \
http://192.168.10.100:8765/ocr
```
Expected: 20%+ throughput improvement with async I/O
2. Monitor VRAM usage:
```bash
# On Desk-5439, monitor GPU usage during OCR operations
nvidia-smi -l 1
```
Expected: VRAM usage stays within limits, no exhaustion
## Monitoring
### Health Checks
- Sidecar health: `GET http://192.168.10.100:8765/health`
- Backend AI health: `GET http://localhost:3001/api/ai/health`
### Logs
- Sidecar logs: `docker-compose logs -f ocr-sidecar`
- Backend logs: Check backend application logs
### Metrics
- Monitor OCR request latency
- Monitor VRAM usage on Desk-5439
- Monitor error rates (403 for path traversal, 500 for internal errors)
## Rollback
If issues arise during deployment:
### Rollback Sidecar
1. Revert `app.py` to previous version
2. Restore previous `.env` file
3. Rebuild and redeploy:
```bash
docker-compose down
docker-compose build
docker-compose up -d
```
### Rollback Backend
1. Revert service changes in `ocr.service.ts` and `sandbox-ocr-engine.service.ts`
2. Restore previous `.env` file
3. Rebuild and redeploy:
```bash
cd backend
pnpm run build
./scripts/deploy.sh
```
### Emergency Rollback
If immediate rollback is needed:
1. Revert `keep_alive` to fixed value `0` in `process_ocr`
2. Restore hardcoded runtime parameters
3. Restore X-API-Key validation
4. Rebuild and redeploy
## Troubleshooting
### Sidecar fails to start
1. Check environment variables are set correctly
2. Check `OCR_SIDECAR_API_KEY` is provided (Phase 1)
3. Check Docker logs: `docker-compose logs ocr-sidecar`
4. Verify Ollama is running on Desk-5439
### Path traversal returns 200 instead of 403
1. Verify `OCR_SIDECAR_UPLOAD_BASE` is set correctly
2. Check path canonicalization logic in `app.py`
3. Test with absolute paths to verify whitelist check
### Parameters not being used
1. Check `ai_execution_profiles` row `ocr-extract` exists
2. Check backend service parameter resolution logic
3. Check sidecar receives parameters in request body
4. Check sidecar passes parameters to Ollama
### VRAM exhaustion
1. Check `calculate_ocr_residency()` is being called
2. Check `vram_monitor.py` and `residency_policy.py` are present
3. Verify CPU fallback is working for `/embed` and `/rerank`
4. Monitor GPU usage with `nvidia-smi`
## References
- ADR-040: OCR Sidecar Refactor
- ADR-036: Profile-Only Parameter Governance
- ADR-029: Dynamic Prompt Management
- ADR-037: Active Prompt System
- ADR-041: Server Consolidation (dependency for Phase 2)
- [Sidecar API Contract](./contracts/sidecar-api.md)