refactor(ai): OCR sidecar canonical naming cleanup — typhoon→np-dms, remove hardcoded keys, asyncio.to_thread, ADR-040/041
This commit is contained in:
@@ -0,0 +1,374 @@
|
||||
# Quickstart: OCR Sidecar Refactor
|
||||
|
||||
**Date**: 2026-06-20
|
||||
**Purpose**: Deployment and testing guide for OCR sidecar refactor
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Access to Desk-5439 (192.168.10.100) with Docker
|
||||
- Access to backend services (QNAP 192.168.10.8)
|
||||
- Python 3.11+ for local testing (optional)
|
||||
- pytest for testing (optional)
|
||||
|
||||
## Phase 1: Deployment (Before ADR-041 Consolidation)
|
||||
|
||||
### Step 1: Update Sidecar Code
|
||||
|
||||
1. Navigate to sidecar directory:
|
||||
```bash
|
||||
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
|
||||
```
|
||||
|
||||
2. Update `app.py` with the following changes:
|
||||
- Remove hardcoded default API key
|
||||
- Fail-fast if `OCR_SIDECAR_API_KEY` env missing
|
||||
- Implement async I/O with `httpx.AsyncClient` via lifespan
|
||||
- Replace `@app.on_event("startup")` with lifespan context manager
|
||||
- Wire `calculate_ocr_residency()` into `process_ocr`
|
||||
- Implement path canonicalization + base-path whitelist on `/ocr`
|
||||
- Remove hardcoded runtime parameters
|
||||
- Receive systemPrompt and DMS tags from backend
|
||||
- Remove `/normalize` endpoint
|
||||
- Fix mutable default argument `options_override={}`
|
||||
- Load models via `asyncio.to_thread` during lifespan
|
||||
|
||||
3. Update `requirements.txt`:
|
||||
```text
|
||||
PyMuPDF==1.24.0
|
||||
fastapi==0.111.0
|
||||
uvicorn[standard]==0.30.1
|
||||
python-multipart==0.0.9
|
||||
httpx==0.27.0
|
||||
FlagEmbedding>=1.2.0
|
||||
typhoon-ocr>=0.4.1
|
||||
```
|
||||
|
||||
4. Update `.env`:
|
||||
```bash
|
||||
# Phase 1 (before ADR-041)
|
||||
OCR_SIDECAR_API_KEY=your-secure-api-key-here
|
||||
|
||||
# Common variables
|
||||
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
|
||||
OLLAMA_API_URL=http://localhost:11434
|
||||
OCR_MODEL=np-dms-ocr:latest
|
||||
```
|
||||
|
||||
### Step 2: Update Backend Services
|
||||
|
||||
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
|
||||
- Add parameter resolution from `ai_execution_profiles` (row `ocr-extract`)
|
||||
- Add Active Prompt resolution from `ai_prompts` (type `ocr_extraction`)
|
||||
- Extract systemPrompt and DMS tags from Active Prompt
|
||||
- Send resolved parameters to sidecar in OCR requests
|
||||
- Keep X-API-Key send-side (Phase 1)
|
||||
|
||||
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
|
||||
- Same parameter resolution pattern as OcrService
|
||||
- Keep X-API-Key send-side (Phase 1)
|
||||
|
||||
3. Update backend `.env`:
|
||||
```bash
|
||||
# Phase 1 (before ADR-041)
|
||||
OCR_API_URL=http://192.168.10.100:8765
|
||||
OCR_API_KEY=your-secure-api-key-here
|
||||
|
||||
# Common variables
|
||||
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
|
||||
```
|
||||
|
||||
### Step 3: Rebuild and Deploy Sidecar
|
||||
|
||||
1. Build Docker image on Desk-5439:
|
||||
```bash
|
||||
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar
|
||||
docker-compose build
|
||||
```
|
||||
|
||||
2. Stop existing container:
|
||||
```bash
|
||||
docker-compose down
|
||||
```
|
||||
|
||||
3. Start new container:
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
4. Verify health:
|
||||
```bash
|
||||
curl http://192.168.10.100:8765/health
|
||||
```
|
||||
|
||||
Expected response:
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"timestamp": "2026-06-20T10:30:00Z",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Deploy Backend Changes
|
||||
|
||||
1. Build backend:
|
||||
```bash
|
||||
cd backend
|
||||
pnpm run build
|
||||
```
|
||||
|
||||
2. Deploy backend containers (via existing deploy script or manual):
|
||||
```bash
|
||||
# From repo root
|
||||
./scripts/deploy.sh
|
||||
```
|
||||
|
||||
3. Verify backend health:
|
||||
```bash
|
||||
curl http://localhost:3001/api/ai/health
|
||||
```
|
||||
|
||||
## Phase 2: Deployment (After ADR-041 Consolidation)
|
||||
|
||||
**Note**: This phase can only be executed after ADR-041 server consolidation completes (single Docker host).
|
||||
|
||||
### Step 1: Remove X-API-Key from Sidecar
|
||||
|
||||
1. Update `app.py` on sidecar:
|
||||
- Remove X-API-Key validation from all endpoints
|
||||
- Remove `OCR_SIDECAR_API_KEY` environment variable check
|
||||
|
||||
2. Update `.env` on sidecar:
|
||||
```bash
|
||||
# Remove OCR_SIDECAR_API_KEY line
|
||||
# Keep common variables
|
||||
OCR_SIDECAR_UPLOAD_BASE=/mnt/uploads
|
||||
OLLAMA_API_URL=http://localhost:11434
|
||||
TYPHOON_OCR_MODEL=typhoon-np-dms-ocr:latest
|
||||
```
|
||||
|
||||
3. Rebuild and redeploy sidecar:
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose build
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Step 2: Remove X-API-Key from Backend
|
||||
|
||||
1. Update `backend/src/modules/ai/services/ocr.service.ts`:
|
||||
- Remove X-API-Key header from sidecar requests
|
||||
- Remove `OCR_API_KEY` environment variable usage
|
||||
|
||||
2. Update `backend/src/modules/ai/services/sandbox-ocr-engine.service.ts`:
|
||||
- Remove X-API-Key header from sidecar requests
|
||||
- Remove `OCR_API_KEY` environment variable usage
|
||||
|
||||
3. Update backend `.env`:
|
||||
```bash
|
||||
# Remove OCR_API_KEY line
|
||||
# Keep common variables
|
||||
OCR_API_URL=http://sidecar:8765 # Docker-internal URL
|
||||
OCR_SIDECAR_UPLOAD_BASE=/app/uploads
|
||||
```
|
||||
|
||||
4. Rebuild and redeploy backend:
|
||||
```bash
|
||||
cd backend
|
||||
pnpm run build
|
||||
./scripts/deploy.sh
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Unit Tests (Sidecar)
|
||||
|
||||
1. Navigate to sidecar tests directory:
|
||||
```bash
|
||||
cd specs/04-Infrastructure-OPS/04-00-docker-compose/Desk-5439/ocr-sidecar/tests
|
||||
```
|
||||
|
||||
2. Run path traversal tests:
|
||||
```bash
|
||||
pytest test_path_traversal.py -v
|
||||
```
|
||||
|
||||
Expected output: All tests pass, path traversal attempts return 403
|
||||
|
||||
3. Run residency wiring tests:
|
||||
```bash
|
||||
pytest test_residency_wiring.py -v
|
||||
```
|
||||
|
||||
Expected output: All tests pass, `calculate_ocr_residency()` is called correctly
|
||||
|
||||
### Integration Tests (Backend)
|
||||
|
||||
1. Run backend AI service tests:
|
||||
```bash
|
||||
cd backend
|
||||
pnpm test ai/ocr.service.spec.ts
|
||||
pnpm test ai/sandbox-ocr-engine.service.spec.ts
|
||||
```
|
||||
|
||||
2. Verify parameter resolution from database:
|
||||
- Check that `ai_execution_profiles` row `ocr-extract` exists
|
||||
- Check that `ai_prompts` has active row for `ocr_extraction` type
|
||||
- Verify parameters are correctly resolved and sent to sidecar
|
||||
|
||||
### Manual Testing
|
||||
|
||||
1. Test path traversal protection:
|
||||
```bash
|
||||
curl -X POST http://192.168.10.100:8765/ocr \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-API-Key: your-api-key" \
|
||||
-d '{
|
||||
"pdf_path": "/mnt/uploads/temp/../../etc/passwd",
|
||||
"runtime_params": {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.9,
|
||||
"repeat_penalty": 1.1,
|
||||
"max_tokens": 4096
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
Expected: `403 Forbidden`
|
||||
|
||||
2. Test valid OCR request:
|
||||
```bash
|
||||
curl -X POST http://192.168.10.100:8765/ocr \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "X-API-Key: your-api-key" \
|
||||
-d '{
|
||||
"pdf_path": "/mnt/uploads/temp/test.pdf",
|
||||
"runtime_params": {
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.9,
|
||||
"repeat_penalty": 1.1,
|
||||
"max_tokens": 4096
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
Expected: `200 OK` with extracted text
|
||||
|
||||
3. Test parameter governance:
|
||||
- Modify `ai_execution_profiles` row `ocr-extract` parameters
|
||||
- Run OCR request
|
||||
- Verify new parameters are used (check sidecar logs)
|
||||
|
||||
4. Test Active Prompt integration:
|
||||
- Modify active prompt in `ai_prompts` for `ocr_extraction`
|
||||
- Run OCR request
|
||||
- Verify new system prompt is used
|
||||
|
||||
## Performance Testing
|
||||
|
||||
1. Benchmark async vs sync I/O:
|
||||
```bash
|
||||
# Use Apache Bench or similar tool
|
||||
ab -n 1000 -c 10 -p ocr_request.json -T application/json \
|
||||
http://192.168.10.100:8765/ocr
|
||||
```
|
||||
|
||||
Expected: 20%+ throughput improvement with async I/O
|
||||
|
||||
2. Monitor VRAM usage:
|
||||
```bash
|
||||
# On Desk-5439, monitor GPU usage during OCR operations
|
||||
nvidia-smi -l 1
|
||||
```
|
||||
|
||||
Expected: VRAM usage stays within limits, no exhaustion
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Health Checks
|
||||
|
||||
- Sidecar health: `GET http://192.168.10.100:8765/health`
|
||||
- Backend AI health: `GET http://localhost:3001/api/ai/health`
|
||||
|
||||
### Logs
|
||||
|
||||
- Sidecar logs: `docker-compose logs -f ocr-sidecar`
|
||||
- Backend logs: Check backend application logs
|
||||
|
||||
### Metrics
|
||||
|
||||
- Monitor OCR request latency
|
||||
- Monitor VRAM usage on Desk-5439
|
||||
- Monitor error rates (403 for path traversal, 500 for internal errors)
|
||||
|
||||
## Rollback
|
||||
|
||||
If issues arise during deployment:
|
||||
|
||||
### Rollback Sidecar
|
||||
|
||||
1. Revert `app.py` to previous version
|
||||
2. Restore previous `.env` file
|
||||
3. Rebuild and redeploy:
|
||||
```bash
|
||||
docker-compose down
|
||||
docker-compose build
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Rollback Backend
|
||||
|
||||
1. Revert service changes in `ocr.service.ts` and `sandbox-ocr-engine.service.ts`
|
||||
2. Restore previous `.env` file
|
||||
3. Rebuild and redeploy:
|
||||
```bash
|
||||
cd backend
|
||||
pnpm run build
|
||||
./scripts/deploy.sh
|
||||
```
|
||||
|
||||
### Emergency Rollback
|
||||
|
||||
If immediate rollback is needed:
|
||||
1. Revert `keep_alive` to fixed value `0` in `process_ocr`
|
||||
2. Restore hardcoded runtime parameters
|
||||
3. Restore X-API-Key validation
|
||||
4. Rebuild and redeploy
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Sidecar fails to start
|
||||
|
||||
1. Check environment variables are set correctly
|
||||
2. Check `OCR_SIDECAR_API_KEY` is provided (Phase 1)
|
||||
3. Check Docker logs: `docker-compose logs ocr-sidecar`
|
||||
4. Verify Ollama is running on Desk-5439
|
||||
|
||||
### Path traversal returns 200 instead of 403
|
||||
|
||||
1. Verify `OCR_SIDECAR_UPLOAD_BASE` is set correctly
|
||||
2. Check path canonicalization logic in `app.py`
|
||||
3. Test with absolute paths to verify whitelist check
|
||||
|
||||
### Parameters not being used
|
||||
|
||||
1. Check `ai_execution_profiles` row `ocr-extract` exists
|
||||
2. Check backend service parameter resolution logic
|
||||
3. Check sidecar receives parameters in request body
|
||||
4. Check sidecar passes parameters to Ollama
|
||||
|
||||
### VRAM exhaustion
|
||||
|
||||
1. Check `calculate_ocr_residency()` is being called
|
||||
2. Check `vram_monitor.py` and `residency_policy.py` are present
|
||||
3. Verify CPU fallback is working for `/embed` and `/rerank`
|
||||
4. Monitor GPU usage with `nvidia-smi`
|
||||
|
||||
## References
|
||||
|
||||
- ADR-040: OCR Sidecar Refactor
|
||||
- ADR-036: Profile-Only Parameter Governance
|
||||
- ADR-029: Dynamic Prompt Management
|
||||
- ADR-037: Active Prompt System
|
||||
- ADR-041: Server Consolidation (dependency for Phase 2)
|
||||
- [Sidecar API Contract](./contracts/sidecar-api.md)
|
||||
Reference in New Issue
Block a user