260223:1415 20260223 nextJS & nestJS Best pratices

2026-02-23 14:15:06 +07:00
parent c90a664f53
commit ef16817f38
164 changed files with 24815 additions and 311 deletions
--- a/specs/04-Infrastructure-OPS/04-01-docker-compose.md
+++ b/specs/04-Infrastructure-OPS/04-01-docker-compose.md
--- a/specs/04-Infrastructure-OPS/04-02-backup-recovery.md
+++ b/specs/04-Infrastructure-OPS/04-02-backup-recovery.md
@@ -0,0 +1,856 @@
+# 04.2 Backup & Disaster Recovery
+**Project:** LCBP3-DMS
+**Version:** 1.8.0
+**Status:** Active
+**Owner:** Nattanin Peancharoen / DevOps Team
+**Last Updated:** 2026-02-23
+
+> 📍 **Backup Target Server:** ASUSTOR AS5403T (Infrastructure & Backup)
+> 🖥️ **Primary Source Server:** QNAP TS-473A (Application & Database)
+
+---
+
+## 📖 Overview
+
+This document outlines the backup strategies, scripts (ASUSTOR pulling from QNAP), recovery procedures, and comprehensive disaster recovery planning for LCBP3-DMS.
+
+---
+
+# Backup & Recovery Procedures
+
+**Project:** LCBP3-DMS
+**Version:** 1.8.0
+**Last Updated:** 2025-12-02
+
+---
+
+## 📋 Overview
+
+This document outlines backup strategies, recovery procedures, and disaster recovery planning for LCBP3-DMS.
+
+---
+
+## 🎯 Backup Strategy
+
+### Backup Schedule
+
+| Data Type              | Frequency      | Retention | Method                  |
+| ---------------------- | -------------- | --------- | ----------------------- |
+| Database (Full)        | Daily at 02:00 | 30 days   | mysqldump + compression |
+| Database (Incremental) | Every 6 hours  | 7 days    | Binary logs             |
+| File Uploads           | Daily at 03:00 | 30 days   | rsync to backup server  |
+| Configuration Files    | Weekly         | 90 days   | Git repository          |
+| Elasticsearch Indexes  | Weekly         | 14 days   | Snapshot to S3/NFS      |
+| Application Logs       | Daily          | 90 days   | Rotation + archival     |
+
+### Backup Locations
+
+**Primary Backup:** QNAP NAS `/backup/lcbp3-dms`
+**Secondary Backup:** External backup server (rsync)
+**Offsite Backup:** Cloud storage (optional - for critical data)
+
+---
+
+## 💾 Database Backup
+
+### Automated Daily Backup Script
+
+```bash
+#!/bin/bash
+# File: /scripts/backup-database.sh
+
+# Configuration
+BACKUP_DIR="/backup/lcbp3-dms/database"
+DB_CONTAINER="lcbp3-mariadb"
+DB_NAME="lcbp3_dms"
+DB_USER="backup_user"
+DB_PASS="<BACKUP_USER_PASSWORD>"
+RETENTION_DAYS=30
+
+# Create backup directory
+BACKUP_FILE="$BACKUP_DIR/lcbp3_$(date +%Y%m%d_%H%M%S).sql.gz"
+mkdir -p "$BACKUP_DIR"
+
+# Perform backup
+echo "Starting database backup to $BACKUP_FILE"
+docker exec $DB_CONTAINER mysqldump \
+  --user=$DB_USER \
+  --password=$DB_PASS \
+  --single-transaction \
+  --routines \
+  --triggers \
+  --databases $DB_NAME \
+  | gzip > "$BACKUP_FILE"
+
+# Check backup success
+if [ $? -eq 0 ]; then
+  echo "Backup completed successfully"
+
+  # Delete old backups
+  find "$BACKUP_DIR" -name "*.sql.gz" -type f -mtime +$RETENTION_DAYS -delete
+  echo "Old backups cleaned up (retention: $RETENTION_DAYS days)"
+else
+  echo "ERROR: Backup failed!"
+  exit 1
+fi
+```
+
+### Schedule with Cron
+
+```bash
+# Edit crontab
+crontab -e
+
+# Add backup job (runs daily at 2 AM)
+0 2 * * * /scripts/backup-database.sh >> /var/log/backup-database.log 2>&1
+```
+
+### Manual Database Backup
+
+```bash
+# Backup specific database
+docker exec lcbp3-mariadb mysqldump \
+  -u root -p \
+  --single-transaction \
+  lcbp3_dms > backup_$(date +%Y%m%d).sql
+
+# Compress backup
+gzip backup_$(date +%Y%m%d).sql
+```
+
+---
+
+## 📂 File Uploads Backup
+
+### Automated Rsync Backup
+
+```bash
+#!/bin/bash
+# File: /scripts/backup-uploads.sh
+
+SOURCE="/var/lib/docker/volumes/lcbp3_uploads/_data"
+DEST="/backup/lcbp3-dms/uploads"
+RETENTION_DAYS=30
+
+# Create incremental backup with rsync
+rsync -av --delete \
+  --backup --backup-dir="$DEST/backup-$(date +%Y%m%d)" \
+  "$SOURCE/" "$DEST/current/"
+
+# Cleanup old backups
+find "$DEST" -maxdepth 1 -type d -name "backup-*" -mtime +$RETENTION_DAYS -exec rm -rf {} \;
+
+echo "Upload backup completed: $(date)"
+```
+
+### Schedule Uploads Backup
+
+```bash
+# Run daily at 3 AM
+0 3 * * * /scripts/backup-uploads.sh >> /var/log/backup-uploads.log 2>&1
+```
+
+---
+
+## 🔄 Database Recovery
+
+### Full Database Restore
+
+```bash
+# Step 1: Stop backend application
+docker stop lcbp3-backend
+
+# Step 2: Restore database from backup
+gunzip < backup_20241201.sql.gz | \
+  docker exec -i lcbp3-mariadb mysql -u root -p lcbp3_dms
+
+# Step 3: Verify restore
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  USE lcbp3_dms;
+  SELECT COUNT(*) FROM users;
+  SELECT COUNT(*) FROM correspondences;
+"
+
+# Step 4: Restart backend
+docker start lcbp3-backend
+```
+
+### Point-in-Time Recovery (Using Binary Logs)
+
+```bash
+# Step 1: Restore last full backup
+gunzip < backup_20241201_020000.sql.gz | \
+  docker exec -i lcbp3-mariadb mysql -u root -p lcbp3_dms
+
+# Step 2: Apply binary logs since backup
+docker exec lcbp3-mariadb mysqlbinlog \
+  --start-datetime="2024-12-01 02:00:00" \
+  --stop-datetime="2024-12-01 14:30:00" \
+  /var/lib/mysql/mysql-bin.000001 | \
+  docker exec -i lcbp3-mariadb mysql -u root -p lcbp3_dms
+```
+
+---
+
+## 📁 File Uploads Recovery
+
+### Restore from Backup
+
+```bash
+# Stop backend to prevent file operations
+docker stop lcbp3-backend
+
+# Restore files
+rsync -av \
+  /backup/lcbp3-dms/uploads/current/ \
+  /var/lib/docker/volumes/lcbp3_uploads/_data/
+
+# Verify permissions
+docker exec lcbp3-backend chown -R node:node /app/uploads
+
+# Restart backend
+docker start lcbp3-backend
+```
+
+---
+
+## 🚨 Disaster Recovery Plan
+
+### RTO & RPO
+
+- **RTO (Recovery Time Objective):** 4 hours
+- **RPO (Recovery Point Objective):** 24 hours (for files), 6 hours (for database)
+
+### DR Scenarios
+
+#### Scenario 1: Database Corruption
+
+**Detection:** Database errors in logs, application errors
+**Recovery Time:** 30 minutes
+**Steps:**
+
+1. Stop backend
+2. Restore last full backup
+3. Apply binary logs (if needed)
+4. Verify data integrity
+5. Restart services
+
+#### Scenario 2: Complete Server Failure
+
+**Detection:** Server unresponsive
+**Recovery Time:** 4 hours
+**Steps:**
+
+1. Provision new QNAP server or VM
+2. Install Docker & Container Station
+3. Clone Git repository
+4. Restore database backup
+5. Restore file uploads
+6. Deploy containers
+7. Update DNS (if needed)
+8. Verify functionality
+
+#### Scenario 3: Ransomware Attack
+
+**Detection:** Encrypted files, ransom note
+**Recovery Time:** 6 hours
+**Steps:**
+
+1. **DO NOT pay ransom**
+2. Isolate infected server
+3. Provision clean environment
+4. Restore from offsite backup
+5. Scan restored backup for malware
+6. Deploy and verify
+7. Review security logs
+8. Implement additional security measures
+
+---
+
+## ✅ Backup Verification
+
+### Weekly Backup Testing
+
+```bash
+#!/bin/bash
+# File: /scripts/test-backup.sh
+
+# Create temporary test database
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  CREATE DATABASE IF NOT EXISTS test_restore;
+"
+
+# Restore latest backup to test database
+LATEST_BACKUP=$(ls -t /backup/lcbp3-dms/database/*.sql.gz | head -1)
+gunzip < "$LATEST_BACKUP" | \
+  sed 's/USE `lcbp3_dms`/USE `test_restore`/g' | \
+  docker exec -i lcbp3-mariadb mysql -u root -p
+
+# Verify table counts
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  SELECT COUNT(*)  FROM test_restore.users;
+  SELECT COUNT(*) FROM test_restore.correspondences;
+"
+
+# Cleanup
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  DROP DATABASE test_restore;
+"
+
+echo "Backup verification completed: $(date)"
+```
+
+### Monthly DR Drill
+
+- Test full system restore on standby server
+- Document time taken and issues encountered
+- Update DR procedures based on findings
+
+---
+
+## 📊 Backup Monitoring
+
+### Backup Status Dashboard
+
+Monitor:
+
+- ✅ Last successful backup timestamp
+- ✅ Backup file size (detect anomalies)
+- ✅ Backup success/failure rate
+- ✅ Available backup storage space
+
+### Alerts
+
+Send alert if:
+
+- ❌ Backup fails
+- ❌ Backup file size < 50% of average (possible corruption)
+- ❌ No backup in last 48 hours
+- ❌ Backup storage < 20% free
+
+---
+
+## 🔧 Maintenance
+
+### Optimize Backup Performance
+
+```sql
+-- Enable InnoDB compression for large tables
+ALTER TABLE correspondences ROW_FORMAT=COMPRESSED;
+ALTER TABLE workflow_history ROW_FORMAT=COMPRESSED;
+
+-- Archive old audit logs
+-- Move records older than 1 year to archive table
+INSERT INTO audit_logs_archive
+SELECT * FROM audit_logs
+WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);
+
+DELETE FROM audit_logs
+WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);
+```
+
+---
+
+## 📚 Backup Checklist
+
+### Daily Tasks
+
+- [ ] Verify automated backups completed
+- [ ] Check backup log files for errors
+- [ ] Monitor backup storage space
+
+### Weekly Tasks
+
+- [ ] Test restore from random backup
+- [ ] Review backup size trends
+- [ ] Verify offsite backups synced
+
+### Monthly Tasks
+
+- [ ] Full DR drill
+- [ ] Review and update DR procedures
+- [ ] Test backup restoration on different server
+
+### Quarterly Tasks
+
+- [ ] Audit backup access controls
+- [ ] Review backup retention policies
+- [ ] Update backup documentation
+
+---
+
+## 🔗 Related Documents
+
+- [Deployment Guide](04-01-deployment-guide.md)
+- [Monitoring & Alerting](04-03-monitoring-alerting.md)
+- [Incident Response](04-07-incident-response.md)
+
+---
+
+**Version:** 1.8.0
+**Last Review:** 2025-12-01
+**Next Review:** 2026-03-01
+
+
+---
+
+# Backup Strategy สำหรับ LCBP3-DMS
+
+> 📍 **Deploy on:** ASUSTOR AS5403T (Infrastructure Server)
+> 🎯 **Backup Target:** QNAP TS-473A (Application & Database)
+> 📄 **Version:** v1.8.0
+
+---
+
+## Overview
+
+ระบบ Backup แบบ Pull-based: ASUSTOR ดึงข้อมูลจาก QNAP เพื่อความปลอดภัย
+หาก QNAP ถูกโจมตี ผู้โจมตีจะไม่สามารถลบ Backup บน ASUSTOR ได้
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                     BACKUP ARCHITECTURE                          │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│   QNAP (Source)                    ASUSTOR (Backup Target)       │
+│   192.168.10.8                     192.168.10.9                  │
+│                                                                  │
+│   ┌──────────────┐   SSH/Rsync    ┌──────────────────────┐       │
+│   │  MariaDB     │ ─────────────▶ │  /volume1/backup/db/ │       │
+│   │  (mysqldump) │   Daily 2AM    │  (Restic Repository) │       │
+│   └──────────────┘                └──────────────────────┘       │
+│                                                                  │
+│   ┌──────────────┐                ┌──────────────────────┐       │
+│   │  Redis RDB   │ ─────────────▶ │  /volume1/backup/    │       │
+│   │  + AOF       │   Daily 3AM    │  redis/              │       │
+│   └──────────────┘                └──────────────────────┘       │
+│                                                                  │
+│   ┌──────────────┐                ┌──────────────────────┐       │
+│   │  App Config  │ ─────────────▶ │  /volume1/backup/    │       │
+│   │  + Volumes   │   Weekly Sun   │  config/             │       │
+│   └──────────────┘                └──────────────────────┘       │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 1. MariaDB Backup
+
+### 1.1 Daily Database Backup Script
+
+```bash
+#!/bin/bash
+# File: /volume1/np-dms/scripts/backup-mariadb.sh
+# Run on: ASUSTOR (Pull from QNAP)
+
+DATE=$(date +%Y%m%d_%H%M%S)
+BACKUP_DIR="/volume1/backup/db"
+QNAP_IP="192.168.10.8"
+DB_NAME="lcbp3_db"
+DB_USER="root"
+DB_PASSWORD="${MARIADB_ROOT_PASSWORD}"
+
+echo "🔄 Starting MariaDB backup at $DATE"
+
+# Create backup directory
+mkdir -p $BACKUP_DIR
+
+# Remote mysqldump via SSH
+ssh admin@$QNAP_IP "docker exec mariadb mysqldump \
+  --single-transaction \
+  --routines \
+  --triggers \
+  -u $DB_USER -p$DB_PASSWORD $DB_NAME" > $BACKUP_DIR/lcbp3_$DATE.sql
+
+# Compress
+gzip $BACKUP_DIR/lcbp3_$DATE.sql
+
+# Add to Restic repository
+restic -r $BACKUP_DIR/restic-repo backup $BACKUP_DIR/lcbp3_$DATE.sql.gz
+
+# Keep only last 30 days of raw files
+find $BACKUP_DIR -name "lcbp3_*.sql.gz" -mtime +30 -delete
+
+echo "✅ MariaDB backup complete: lcbp3_$DATE.sql.gz"
+```
+
+### 1.2 Cron Schedule (ASUSTOR)
+
+```cron
+# MariaDB daily backup at 2 AM
+0 2 * * * /volume1/np-dms/scripts/backup-mariadb.sh >> /var/log/backup-mariadb.log 2>&1
+```
+
+---
+
+## 2. Redis Backup
+
+### 2.1 Redis Backup Script
+
+```bash
+#!/bin/bash
+# File: /volume1/np-dms/scripts/backup-redis.sh
+# Run on: ASUSTOR (Pull from QNAP)
+
+DATE=$(date +%Y%m%d_%H%M%S)
+BACKUP_DIR="/volume1/backup/redis"
+QNAP_IP="192.168.10.8"
+
+echo "🔄 Starting Redis backup at $DATE"
+
+mkdir -p $BACKUP_DIR
+
+# Trigger BGSAVE on QNAP Redis
+ssh admin@$QNAP_IP "docker exec cache redis-cli BGSAVE"
+sleep 10
+
+# Copy RDB and AOF files
+scp admin@$QNAP_IP:/share/np-dms/services/cache/data/dump.rdb $BACKUP_DIR/redis_$DATE.rdb
+scp admin@$QNAP_IP:/share/np-dms/services/cache/data/appendonly.aof $BACKUP_DIR/redis_$DATE.aof
+
+# Compress
+tar -czf $BACKUP_DIR/redis_$DATE.tar.gz \
+  $BACKUP_DIR/redis_$DATE.rdb \
+  $BACKUP_DIR/redis_$DATE.aof
+
+# Cleanup raw files
+rm $BACKUP_DIR/redis_$DATE.rdb $BACKUP_DIR/redis_$DATE.aof
+
+echo "✅ Redis backup complete: redis_$DATE.tar.gz"
+```
+
+### 2.2 Cron Schedule
+
+```cron
+# Redis daily backup at 3 AM
+0 3 * * * /volume1/np-dms/scripts/backup-redis.sh >> /var/log/backup-redis.log 2>&1
+```
+
+---
+
+## 3. Application Config Backup
+
+### 3.1 Weekly Config Backup Script
+
+```bash
+#!/bin/bash
+# File: /volume1/np-dms/scripts/backup-config.sh
+# Run on: ASUSTOR (Pull from QNAP)
+
+DATE=$(date +%Y%m%d)
+BACKUP_DIR="/volume1/backup/config"
+QNAP_IP="192.168.10.8"
+
+echo "🔄 Starting config backup at $DATE"
+
+mkdir -p $BACKUP_DIR
+
+# Sync Docker compose files and configs
+rsync -avz --delete \
+  admin@$QNAP_IP:/share/np-dms/ \
+  $BACKUP_DIR/np-dms_$DATE/ \
+  --exclude='*/data/*' \
+  --exclude='*/logs/*' \
+  --exclude='node_modules'
+
+# Compress
+tar -czf $BACKUP_DIR/config_$DATE.tar.gz $BACKUP_DIR/np-dms_$DATE
+
+# Cleanup
+rm -rf $BACKUP_DIR/np-dms_$DATE
+
+echo "✅ Config backup complete: config_$DATE.tar.gz"
+```
+
+### 3.2 Cron Schedule
+
+```cron
+# Config weekly backup on Sunday at 4 AM
+0 4 * * 0 /volume1/np-dms/scripts/backup-config.sh >> /var/log/backup-config.log 2>&1
+```
+
+---
+
+## 4. Retention Policy
+
+| Backup Type | Frequency | Retention | Storage Est. |
+| :---------- | :-------- | :-------- | :----------- |
+| MariaDB     | Daily     | 30 days   | ~5GB/month   |
+| Redis       | Daily     | 7 days    | ~500MB       |
+| Config      | Weekly    | 4 weeks   | ~200MB       |
+| Restic      | Daily     | 6 months  | Deduplicated |
+
+---
+
+## 5. Restic Repository Setup
+
+```bash
+# Initialize Restic repository (one-time)
+restic init -r /volume1/backup/restic-repo
+
+# Set password in environment
+export RESTIC_PASSWORD="your-secure-backup-password"
+
+# Check repository status
+restic -r /volume1/backup/restic-repo snapshots
+
+# Prune old snapshots (keep 30 daily, 4 weekly, 6 monthly)
+restic -r /volume1/backup/restic-repo forget \
+  --keep-daily 30 \
+  --keep-weekly 4 \
+  --keep-monthly 6 \
+  --prune
+```
+
+---
+
+## 6. Verification Script
+
+```bash
+#!/bin/bash
+# File: /volume1/np-dms/scripts/verify-backup.sh
+
+echo "📋 Backup Verification Report"
+echo "=============================="
+echo ""
+
+# Check latest MariaDB backup
+LATEST_DB=$(ls -t /volume1/backup/db/*.sql.gz 2>/dev/null | head -1)
+if [ -n "$LATEST_DB" ]; then
+  echo "✅ Latest DB backup: $LATEST_DB"
+  echo "   Size: $(du -h $LATEST_DB | cut -f1)"
+else
+  echo "❌ No DB backup found!"
+fi
+
+# Check latest Redis backup
+LATEST_REDIS=$(ls -t /volume1/backup/redis/*.tar.gz 2>/dev/null | head -1)
+if [ -n "$LATEST_REDIS" ]; then
+  echo "✅ Latest Redis backup: $LATEST_REDIS"
+else
+  echo "❌ No Redis backup found!"
+fi
+
+# Check Restic repository
+echo ""
+echo "📦 Restic Snapshots:"
+restic -r /volume1/backup/restic-repo snapshots --latest 5
+```
+
+---
+
+> 📝 **หมายเหตุ**: เอกสารนี้อ้างอิงจาก Architecture Document **v1.8.0**
+
+
+---
+
+# Disaster Recovery Plan สำหรับ LCBP3-DMS
+
+> 📍 **Version:** v1.8.0
+> 🖥️ **Primary Server:** QNAP TS-473A (Application & Database)
+> 💾 **Backup Server:** ASUSTOR AS5403T (Infrastructure & Backup)
+
+---
+
+## RTO/RPO Targets
+
+| Scenario                    | RTO     | RPO    | Priority |
+| :-------------------------- | :------ | :----- | :------- |
+| Single backend node failure | 0 min   | 0      | P0       |
+| Redis failure               | 5 min   | 0      | P0       |
+| MariaDB failure             | 10 min  | 0      | P0       |
+| QNAP total failure          | 2 hours | 15 min | P1       |
+| Data corruption             | 4 hours | 1 day  | P2       |
+
+---
+
+## 1. Quick Recovery Procedures
+
+### 1.1 Service Not Responding
+
+```bash
+# Check container status
+docker ps -a | grep <service-name>
+
+# Restart specific service
+docker restart <container-name>
+
+# Check logs for errors
+docker logs <container-name> --tail 100
+```
+
+### 1.2 Redis Failure
+
+```bash
+# Check status
+docker exec cache redis-cli ping
+
+# Restart
+docker restart cache
+
+# Verify
+docker exec cache redis-cli ping
+```
+
+### 1.3 MariaDB Failure
+
+```bash
+# Check status
+docker exec mariadb mysql -u root -p -e "SELECT 1"
+
+# Restart
+docker restart mariadb
+
+# Wait for startup
+sleep 30
+
+# Verify
+docker exec mariadb mysql -u root -p -e "SHOW DATABASES"
+```
+
+---
+
+## 2. Full System Recovery
+
+### 2.1 Recovery Prerequisites (ASUSTOR)
+
+ตรวจสอบว่า Backup files พร้อมใช้งาน:
+
+```bash
+# SSH to ASUSTOR
+ssh admin@192.168.10.9
+
+# List available backups
+ls -la /volume1/backup/db/
+ls -la /volume1/backup/redis/
+ls -la /volume1/backup/config/
+
+# Check Restic snapshots
+restic -r /volume1/backup/restic-repo snapshots
+```
+
+### 2.2 QNAP Recovery Script
+
+```bash
+#!/bin/bash
+# File: /volume1/np-dms/scripts/disaster-recovery.sh
+# Run on: ASUSTOR (Push to QNAP)
+
+QNAP_IP="192.168.10.8"
+BACKUP_DIR="/volume1/backup"
+
+echo "🚨 Starting Disaster Recovery..."
+echo "================================"
+
+# 1. Restore Docker Network
+echo "1️⃣ Creating Docker network..."
+ssh admin@$QNAP_IP "docker network create lcbp3 || true"
+
+# 2. Restore config files
+echo "2️⃣ Restoring configuration files..."
+LATEST_CONFIG=$(ls -t $BACKUP_DIR/config/*.tar.gz | head -1)
+tar -xzf $LATEST_CONFIG -C /tmp/
+rsync -avz /tmp/np-dms/ admin@$QNAP_IP:/share/np-dms/
+
+# 3. Start infrastructure services
+echo "3️⃣ Starting MariaDB..."
+ssh admin@$QNAP_IP "cd /share/np-dms/mariadb && docker-compose up -d"
+sleep 30
+
+# 4. Restore database
+echo "4️⃣ Restoring database..."
+LATEST_DB=$(ls -t $BACKUP_DIR/db/*.sql.gz | head -1)
+gunzip -c $LATEST_DB | ssh admin@$QNAP_IP "docker exec -i mariadb mysql -u root -p\$MYSQL_ROOT_PASSWORD lcbp3_db"
+
+# 5. Start Redis
+echo "5️⃣ Starting Redis..."
+ssh admin@$QNAP_IP "cd /share/np-dms/services && docker-compose up -d cache"
+
+# 6. Restore Redis data (if needed)
+echo "6️⃣ Restoring Redis data..."
+LATEST_REDIS=$(ls -t $BACKUP_DIR/redis/*.tar.gz | head -1)
+tar -xzf $LATEST_REDIS -C /tmp/
+scp /tmp/redis_*.rdb admin@$QNAP_IP:/share/np-dms/services/cache/data/dump.rdb
+ssh admin@$QNAP_IP "docker restart cache"
+
+# 7. Start remaining services
+echo "7️⃣ Starting application services..."
+ssh admin@$QNAP_IP "cd /share/np-dms/services && docker-compose up -d"
+ssh admin@$QNAP_IP "cd /share/np-dms/npm && docker-compose up -d"
+
+# 8. Health check
+echo "8️⃣ Running health checks..."
+sleep 60
+curl -f https://lcbp3.np-dms.work/health || echo "⚠️ Frontend not ready"
+curl -f https://backend.np-dms.work/health || echo "⚠️ Backend not ready"
+
+echo ""
+echo "✅ Disaster Recovery Complete"
+echo "⚠️ Please verify system functionality manually"
+```
+
+---
+
+## 3. Data Corruption Recovery
+
+### 3.1 Point-in-Time Recovery (Database)
+
+```bash
+# List available Restic snapshots
+restic -r /volume1/backup/restic-repo snapshots
+
+# Restore specific snapshot
+restic -r /volume1/backup/restic-repo restore <snapshot-id> --target /tmp/restore/
+
+# Apply restored backup
+gunzip -c /tmp/restore/lcbp3_*.sql.gz | \
+  ssh admin@192.168.10.8 "docker exec -i mariadb mysql -u root -p\$MYSQL_ROOT_PASSWORD lcbp3_db"
+```
+
+### 3.2 Selective Table Recovery
+
+```bash
+# Extract specific tables from backup
+gunzip -c /volume1/backup/db/lcbp3_YYYYMMDD.sql.gz | \
+  grep -A1000 "CREATE TABLE \`documents\`" | \
+  grep -B1000 "UNLOCK TABLES" > /tmp/documents_table.sql
+
+# Restore specific table
+ssh admin@192.168.10.8 "docker exec -i mariadb mysql -u root -p\$MYSQL_ROOT_PASSWORD lcbp3_db" < /tmp/documents_table.sql
+```
+
+---
+
+## 4. Communication & Escalation
+
+### 4.1 Incident Response
+
+| Severity | Response Time | Notify                         |
+| :------- | :------------ | :----------------------------- |
+| P0       | Immediate     | Admin Team + Management        |
+| P1       | 30 minutes    | Admin Team                     |
+| P2       | 2 hours       | Admin Team (next business day) |
+
+### 4.2 Post-Incident Checklist
+
+- [ ] Identify root cause
+- [ ] Document timeline of events
+- [ ] Verify all services restored
+- [ ] Check data integrity
+- [ ] Update monitoring alerts if needed
+- [ ] Create incident report
+
+---
+
+## 5. Testing Schedule
+
+| Test Type               | Frequency | Last Tested | Next Due |
+| :---------------------- | :-------- | :---------- | :------- |
+| Backup Verification     | Weekly    | -           | -        |
+| Single Service Recovery | Monthly   | -           | -        |
+| Full DR Test            | Quarterly | -           | -        |
+
+---
+
+> 📝 **หมายเหตุ**: เอกสารนี้อ้างอิงจาก Architecture Document **v1.8.0**
--- a/specs/04-Infrastructure-OPS/04-03-monitoring.md
+++ b/specs/04-Infrastructure-OPS/04-03-monitoring.md
--- a/specs/04-Infrastructure-OPS/04-04-deployment-guide.md
+++ b/specs/04-Infrastructure-OPS/04-04-deployment-guide.md
@@ -0,0 +1,937 @@
+# Deployment Guide: LCBP3-DMS
+
+---
+
+**Project:** LCBP3-DMS (Laem Chabang Port Phase 3 - Document Management System)
+**Version:** 1.8.0
+**Last Updated:** 2025-12-02
+**Owner:** Operations Team
+**Status:** Active
+
+---
+
+## 📋 Overview
+
+This guide provides step-by-step instructions for deploying the LCBP3-DMS system on QNAP Container Station using Docker Compose with Blue-Green deployment strategy.
+
+### Deployment Strategy
+
+- **Platform:** QNAP TS-473A with Container Station
+- **Orchestration:** Docker Compose
+- **Deployment Method:** Blue-Green Deployment
+- **Zero Downtime:** Yes
+- **Rollback Capability:** Instant rollback via NGINX switch
+
+---
+
+## 🎯 Prerequisites
+
+### Hardware Requirements
+
+| Component  | Minimum Specification      |
+| ---------- | -------------------------- |
+| CPU        | 4 cores @ 2.0 GHz          |
+| RAM        | 16 GB                      |
+| Storage    | 500 GB SSD (System + Data) |
+| Network    | 1 Gbps Ethernet            |
+| QNAP Model | TS-473A or equivalent      |
+
+### Software Requirements
+
+| Software          | Version | Purpose                  |
+| ----------------- | ------- | ------------------------ |
+| QNAP QTS          | 5.x+    | Operating System         |
+| Container Station | 3.x+    | Docker Management        |
+| Docker            | 20.10+  | Container Runtime        |
+| Docker Compose    | 2.x+    | Multi-container Orchestr |
+
+### Network Requirements
+
+- Static IP address for QNAP server
+- Domain name (e.g., `lcbp3-dms.example.com`)
+- SSL certificate (Let's Encrypt or commercial)
+- Firewall rules:
+  - Port 80 (HTTP → HTTPS redirect)
+  - Port 443 (HTTPS)
+  - Port 22 (SSH for management)
+
+---
+
+## 🏗️ Infrastructure Setup
+
+### 1. Directory Structure
+
+Create the following directory structure on QNAP:
+
+```bash
+# SSH into QNAP
+ssh admin@qnap-ip
+
+# Create base directory
+mkdir -p /volume1/lcbp3
+
+# Create blue-green environments
+mkdir -p /volume1/lcbp3/blue
+mkdir -p /volume1/lcbp3/green
+
+# Create shared directories
+mkdir -p /volume1/lcbp3/shared/uploads
+mkdir -p /volume1/lcbp3/shared/logs
+mkdir -p /volume1/lcbp3/shared/backups
+
+# Create persistent volumes
+mkdir -p /volume1/lcbp3/volumes/mariadb-data
+mkdir -p /volume1/lcbp3/volumes/redis-data
+mkdir -p /volume1/lcbp3/volumes/elastic-data
+
+# Create NGINX proxy directory
+mkdir -p /volume1/lcbp3/nginx-proxy
+
+# Set permissions
+chmod -R 755 /volume1/lcbp3
+chown -R admin:administrators /volume1/lcbp3
+```
+
+**Final Structure:**
+
+```
+/volume1/lcbp3/
+├── blue/                    # Blue environment
+│   ├── docker-compose.yml
+│   ├── .env.production
+│   └── nginx.conf
+│
+├── green/                   # Green environment
+│   ├── docker-compose.yml
+│   ├── .env.production
+│   └── nginx.conf
+│
+├── nginx-proxy/             # Main reverse proxy
+│   ├── docker-compose.yml
+│   ├── nginx.conf
+│   └── ssl/
+│       ├── cert.pem
+│       └── key.pem
+│
+├── shared/                  # Shared across blue/green
+│   ├── uploads/
+│   ├── logs/
+│   └── backups/
+│
+├── volumes/                 # Persistent data
+│   ├── mariadb-data/
+│   ├── redis-data/
+│   └── elastic-data/
+│
+├── scripts/                 # Deployment scripts
+│   ├── deploy.sh
+│   ├── rollback.sh
+│   └── health-check.sh
+│
+└── current                  # File containing "blue" or "green"
+```
+
+### 2. SSL Certificate Setup
+
+```bash
+# Option 1: Let's Encrypt (Recommended)
+# Install certbot on QNAP
+opkg install certbot
+
+# Generate certificate
+certbot certonly --standalone \
+  -d lcbp3-dms.example.com \
+  --email admin@example.com \
+  --agree-tos
+
+# Copy to nginx-proxy
+cp /etc/letsencrypt/live/lcbp3-dms.example.com/fullchain.pem \
+   /volume1/lcbp3/nginx-proxy/ssl/cert.pem
+cp /etc/letsencrypt/live/lcbp3-dms.example.com/privkey.pem \
+   /volume1/lcbp3/nginx-proxy/ssl/key.pem
+
+# Option 2: Commercial Certificate
+# Upload cert.pem and key.pem to /volume1/lcbp3/nginx-proxy/ssl/
+```
+
+---
+
+## 📝 Configuration Files
+
+### 1. Environment Variables (.env.production)
+
+Create `.env.production` in both `blue/` and `green/` directories:
+
+```bash
+# File: /volume1/lcbp3/blue/.env.production
+# DO NOT commit this file to Git!
+
+# Application
+NODE_ENV=production
+APP_NAME=LCBP3-DMS
+APP_URL=https://lcbp3-dms.example.com
+
+# Database
+DB_HOST=lcbp3-mariadb
+DB_PORT=3306
+DB_USERNAME=lcbp3_user
+DB_PASSWORD=<CHANGE_ME_STRONG_PASSWORD>
+DB_DATABASE=lcbp3_dms
+DB_POOL_SIZE=20
+
+# Redis
+REDIS_HOST=lcbp3-redis
+REDIS_PORT=6379
+REDIS_PASSWORD=<CHANGE_ME_STRONG_PASSWORD>
+REDIS_DB=0
+
+# JWT Authentication
+JWT_SECRET=<CHANGE_ME_RANDOM_64_CHAR_STRING>
+JWT_EXPIRES_IN=8h
+JWT_REFRESH_EXPIRES_IN=7d
+
+# File Storage
+UPLOAD_PATH=/app/uploads
+MAX_FILE_SIZE=52428800
+ALLOWED_FILE_TYPES=.pdf,.doc,.docx,.xls,.xlsx,.dwg,.zip
+
+# Email (SMTP)
+SMTP_HOST=smtp.gmail.com
+SMTP_PORT=587
+SMTP_SECURE=false
+SMTP_USERNAME=<YOUR_EMAIL>
+SMTP_PASSWORD=<YOUR_APP_PASSWORD>
+SMTP_FROM=noreply@example.com
+
+# Elasticsearch
+ELASTICSEARCH_NODE=http://lcbp3-elasticsearch:9200
+ELASTICSEARCH_USERNAME=elastic
+ELASTICSEARCH_PASSWORD=<CHANGE_ME>
+
+# Rate Limiting
+THROTTLE_TTL=60
+THROTTLE_LIMIT=100
+
+# Logging
+LOG_LEVEL=info
+LOG_FILE_PATH=/app/logs
+
+# ClamAV (Virus Scanning)
+CLAMAV_HOST=lcbp3-clamav
+CLAMAV_PORT=3310
+```
+
+### 2. Docker Compose - Blue Environment
+
+```yaml
+# File: /volume1/lcbp3/blue/docker-compose.yml
+version: '3.8'
+
+services:
+  backend:
+    image: lcbp3-backend:latest
+    container_name: lcbp3-blue-backend
+    restart: unless-stopped
+    env_file:
+      - .env.production
+    volumes:
+      - /volume1/lcbp3/shared/uploads:/app/uploads
+      - /volume1/lcbp3/shared/logs:/app/logs
+    depends_on:
+      - mariadb
+      - redis
+      - elasticsearch
+    networks:
+      - lcbp3-network
+    healthcheck:
+      test: ['CMD', 'curl', '-f', 'http://localhost:3000/health']
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 40s
+
+  frontend:
+    image: lcbp3-frontend:latest
+    container_name: lcbp3-blue-frontend
+    restart: unless-stopped
+    environment:
+      - NEXT_PUBLIC_API_URL=https://lcbp3-dms.example.com/api
+    depends_on:
+      - backend
+    networks:
+      - lcbp3-network
+    healthcheck:
+      test: ['CMD', 'curl', '-f', 'http://localhost:3000']
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+  mariadb:
+    image: mariadb:11.8
+    container_name: lcbp3-mariadb
+    restart: unless-stopped
+    environment:
+      MYSQL_ROOT_PASSWORD: ${DB_PASSWORD}
+      MYSQL_DATABASE: ${DB_DATABASE}
+      MYSQL_USER: ${DB_USERNAME}
+      MYSQL_PASSWORD: ${DB_PASSWORD}
+    volumes:
+      - /volume1/lcbp3/volumes/mariadb-data:/var/lib/mysql
+    networks:
+      - lcbp3-network
+    command: >
+      --character-set-server=utf8mb4
+      --collation-server=utf8mb4_unicode_ci
+      --max_connections=200
+      --innodb_buffer_pool_size=2G
+    healthcheck:
+      test: ['CMD', 'mysqladmin', 'ping', '-h', 'localhost']
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  redis:
+    image: redis:7-alpine
+    container_name: lcbp3-redis
+    restart: unless-stopped
+    command: >
+      redis-server
+      --requirepass ${REDIS_PASSWORD}
+      --appendonly yes
+      --appendfsync everysec
+      --maxmemory 2gb
+      --maxmemory-policy allkeys-lru
+    volumes:
+      - /volume1/lcbp3/volumes/redis-data:/data
+    networks:
+      - lcbp3-network
+    healthcheck:
+      test: ['CMD', 'redis-cli', 'ping']
+      interval: 10s
+      timeout: 3s
+      retries: 3
+
+  elasticsearch:
+    image: elasticsearch:8.11.0
+    container_name: lcbp3-elasticsearch
+    restart: unless-stopped
+    environment:
+      - discovery.type=single-node
+      - xpack.security.enabled=true
+      - ELASTIC_PASSWORD=${ELASTICSEARCH_PASSWORD}
+      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
+    volumes:
+      - /volume1/lcbp3/volumes/elastic-data:/usr/share/elasticsearch/data
+    networks:
+      - lcbp3-network
+    healthcheck:
+      test: ['CMD-SHELL', 'curl -f http://localhost:9200/_cluster/health || exit 1']
+      interval: 30s
+      timeout: 10s
+      retries: 5
+
+networks:
+  lcbp3-network:
+    name: lcbp3-blue-network
+    driver: bridge
+```
+
+### 3. Docker Compose - NGINX Proxy
+
+```yaml
+# File: /volume1/lcbp3/nginx-proxy/docker-compose.yml
+version: '3.8'
+
+services:
+  nginx:
+    image: nginx:alpine
+    container_name: lcbp3-nginx
+    restart: unless-stopped
+    ports:
+      - "80:80"
+      - "443:443"
+    volumes:
+      - ./nginx.conf:/etc/nginx/nginx.conf:ro
+      - ./ssl:/etc/nginx/ssl:ro
+      - /volume1/lcbp3/shared/logs/nginx:/var/log/nginx
+    networks:
+      - lcbp3-blue-network
+      - lcbp3-green-network
+    healthcheck:
+      test: ['CMD', 'nginx', '-t']
+      interval: 30s
+      timeout: 10s
+      retries: 3
+
+networks:
+  lcbp3-blue-network:
+    external: true
+  lcbp3-green-network:
+    external: true
+```
+
+### 4. NGINX Configuration
+
+```nginx
+# File: /volume1/lcbp3/nginx-proxy/nginx.conf
+
+user nginx;
+worker_processes auto;
+error_log /var/log/nginx/error.log warn;
+pid /var/run/nginx.pid;
+
+events {
+    worker_connections 1024;
+}
+
+http {
+    include /etc/nginx/mime.types;
+    default_type application/octet-stream;
+
+    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
+                    '$status $body_bytes_sent "$http_referer" '
+                    '"$http_user_agent" "$http_x_forwarded_for"';
+
+    access_log /var/log/nginx/access.log main;
+
+    sendfile on;
+    tcp_nopush on;
+    tcp_nodelay on;
+    keepalive_timeout 65;
+    types_hash_max_size 2048;
+    client_max_body_size 50M;
+
+    # Gzip compression
+    gzip on;
+    gzip_vary on;
+    gzip_proxied any;
+    gzip_comp_level 6;
+    gzip_types text/plain text/css text/xml text/javascript
+               application/json application/javascript application/xml+rss;
+
+    # Upstream backends (switch between blue/green)
+    upstream backend {
+        server lcbp3-blue-backend:3000 max_fails=3 fail_timeout=30s;
+        keepalive 32;
+    }
+
+    upstream frontend {
+        server lcbp3-blue-frontend:3000 max_fails=3 fail_timeout=30s;
+        keepalive 32;
+    }
+
+    # HTTP to HTTPS redirect
+    server {
+        listen 80;
+        server_name lcbp3-dms.example.com;
+        return 301 https://$server_name$request_uri;
+    }
+
+    # HTTPS server
+    server {
+        listen 443 ssl http2;
+        server_name lcbp3-dms.example.com;
+
+        # SSL configuration
+        ssl_certificate /etc/nginx/ssl/cert.pem;
+        ssl_certificate_key /etc/nginx/ssl/key.pem;
+        ssl_protocols TLSv1.2 TLSv1.3;
+        ssl_ciphers HIGH:!aNULL:!MD5;
+        ssl_prefer_server_ciphers on;
+        ssl_session_cache shared:SSL:10m;
+        ssl_session_timeout 10m;
+
+        # Security headers
+        add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
+        add_header X-Frame-Options "SAMEORIGIN" always;
+        add_header X-Content-Type-Options "nosniff" always;
+        add_header X-XSS-Protection "1; mode=block" always;
+
+        # Frontend (Next.js)
+        location / {
+            proxy_pass http://frontend;
+            proxy_http_version 1.1;
+            proxy_set_header Upgrade $http_upgrade;
+            proxy_set_header Connection 'upgrade';
+            proxy_set_header Host $host;
+            proxy_set_header X-Real-IP $remote_addr;
+            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+            proxy_cache_bypass $http_upgrade;
+        }
+
+        # Backend API
+        location /api {
+            proxy_pass http://backend;
+            proxy_http_version 1.1;
+            proxy_set_header Connection "";
+            proxy_set_header Host $host;
+            proxy_set_header X-Real-IP $remote_addr;
+            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+
+            # Timeouts for file uploads
+            proxy_connect_timeout 300s;
+            proxy_send_timeout 300s;
+            proxy_read_timeout 300s;
+        }
+
+        # Health check endpoint (no logging)
+        location /health {
+            proxy_pass http://backend/health;
+            access_log off;
+        }
+
+        # Static files caching
+        location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
+            proxy_pass http://frontend;
+            expires 1y;
+            add_header Cache-Control "public, immutable";
+        }
+    }
+}
+```
+
+---
+
+## 🚀 Initial Deployment
+
+### Step 1: Prepare Docker Images
+
+```bash
+# Build images (on development machine)
+cd /path/to/lcbp3/backend
+docker build -t lcbp3-backend:1.0.0 .
+docker tag lcbp3-backend:1.0.0 lcbp3-backend:latest
+
+cd /path/to/lcbp3/frontend
+docker build -t lcbp3-frontend:1.0.0 .
+docker tag lcbp3-frontend:1.0.0 lcbp3-frontend:latest
+
+# Save images to tar files
+docker save lcbp3-backend:latest | gzip > lcbp3-backend-latest.tar.gz
+docker save lcbp3-frontend:latest | gzip > lcbp3-frontend-latest.tar.gz
+
+# Transfer to QNAP
+scp lcbp3-backend-latest.tar.gz admin@qnap-ip:/volume1/lcbp3/
+scp lcbp3-frontend-latest.tar.gz admin@qnap-ip:/volume1/lcbp3/
+
+# Load images on QNAP
+ssh admin@qnap-ip
+cd /volume1/lcbp3
+docker load < lcbp3-backend-latest.tar.gz
+docker load < lcbp3-frontend-latest.tar.gz
+```
+
+### Step 2: Initialize Database
+
+```bash
+# Start MariaDB only
+cd /volume1/lcbp3/blue
+docker-compose up -d mariadb
+
+# Wait for MariaDB to be ready
+docker exec lcbp3-mariadb mysqladmin ping -h localhost
+
+# Run migrations
+docker-compose up -d backend
+docker exec lcbp3-blue-backend npm run migration:run
+
+# Seed initial data (if needed)
+docker exec lcbp3-blue-backend npm run seed
+```
+
+### Step 3: Start Blue Environment
+
+```bash
+cd /volume1/lcbp3/blue
+
+# Start all services
+docker-compose up -d
+
+# Check status
+docker-compose ps
+
+# View logs
+docker-compose logs -f
+
+# Wait for health checks
+sleep 30
+
+# Test health endpoint
+curl http://localhost:3000/health
+```
+
+### Step 4: Start NGINX Proxy
+
+```bash
+cd /volume1/lcbp3/nginx-proxy
+
+# Create networks (if not exist)
+docker network create lcbp3-blue-network
+docker network create lcbp3-green-network
+
+# Start NGINX
+docker-compose up -d
+
+# Test NGINX configuration
+docker exec lcbp3-nginx nginx -t
+
+# Check NGINX logs
+docker logs lcbp3-nginx
+```
+
+### Step 5: Set Current Environment
+
+```bash
+# Mark blue as current
+echo "blue" > /volume1/lcbp3/current
+```
+
+### Step 6: Verify Deployment
+
+```bash
+# Test HTTPS endpoint
+curl -k https://lcbp3-dms.example.com/health
+
+# Test API
+curl -k https://lcbp3-dms.example.com/api/health
+
+# Check all containers
+docker ps --filter "name=lcbp3"
+
+# Check logs for errors
+docker-compose -f /volume1/lcbp3/blue/docker-compose.yml logs --tail=100
+```
+
+---
+
+## 🔄 Blue-Green Deployment Process
+
+### Deployment Script
+
+```bash
+# File: /volume1/lcbp3/scripts/deploy.sh
+#!/bin/bash
+
+set -e  # Exit on error
+
+# Configuration
+LCBP3_DIR="/volume1/lcbp3"
+CURRENT=$(cat $LCBP3_DIR/current)
+TARGET=$([[ "$CURRENT" == "blue" ]] && echo "green" || echo "blue")
+
+echo "========================================="
+echo "LCBP3-DMS Blue-Green Deployment"
+echo "========================================="
+echo "Current environment: $CURRENT"
+echo "Target environment:  $TARGET"
+echo "========================================="
+
+# Step 1: Backup database
+echo "[1/9] Creating database backup..."
+BACKUP_FILE="$LCBP3_DIR/shared/backups/db-backup-$(date +%Y%m%d-%H%M%S).sql"
+docker exec lcbp3-mariadb mysqldump -u root -p${DB_PASSWORD} lcbp3_dms > $BACKUP_FILE
+gzip $BACKUP_FILE
+echo "✓ Backup created: $BACKUP_FILE.gz"
+
+# Step 2: Pull latest images
+echo "[2/9] Pulling latest Docker images..."
+cd $LCBP3_DIR/$TARGET
+docker-compose pull
+echo "✓ Images pulled"
+
+# Step 3: Update configuration
+echo "[3/9] Updating configuration..."
+# Copy .env if changed
+if [ -f "$LCBP3_DIR/.env.production.new" ]; then
+    cp $LCBP3_DIR/.env.production.new $LCBP3_DIR/$TARGET/.env.production
+    echo "✓ Configuration updated"
+fi
+
+# Step 4: Start target environment
+echo "[4/9] Starting $TARGET environment..."
+docker-compose up -d
+echo "✓ $TARGET environment started"
+
+# Step 5: Wait for services to be ready
+echo "[5/9] Waiting for services to be healthy..."
+sleep 10
+
+# Check backend health
+for i in {1..30}; do
+    if docker exec lcbp3-${TARGET}-backend curl -f http://localhost:3000/health > /dev/null 2>&1; then
+        echo "✓ Backend is healthy"
+        break
+    fi
+    if [ $i -eq 30 ]; then
+        echo "✗ Backend health check failed!"
+        docker-compose logs backend
+        exit 1
+    fi
+    sleep 2
+done
+
+# Step 6: Run database migrations
+echo "[6/9] Running database migrations..."
+docker exec lcbp3-${TARGET}-backend npm run migration:run
+echo "✓ Migrations completed"
+
+# Step 7: Switch NGINX to target environment
+echo "[7/9] Switching NGINX to $TARGET..."
+sed -i "s/lcbp3-${CURRENT}-backend/lcbp3-${TARGET}-backend/g" $LCBP3_DIR/nginx-proxy/nginx.conf
+sed -i "s/lcbp3-${CURRENT}-frontend/lcbp3-${TARGET}-frontend/g" $LCBP3_DIR/nginx-proxy/nginx.conf
+docker exec lcbp3-nginx nginx -t
+docker exec lcbp3-nginx nginx -s reload
+echo "✓ NGINX switched to $TARGET"
+
+# Step 8: Verify new environment
+echo "[8/9] Verifying new environment..."
+sleep 5
+if curl -f -k https://lcbp3-dms.example.com/health > /dev/null 2>&1; then
+    echo "✓ New environment is responding"
+else
+    echo "✗ New environment verification failed!"
+    echo "Rolling back..."
+    ./rollback.sh
+    exit 1
+fi
+
+# Step 9: Stop old environment
+echo "[9/9] Stopping $CURRENT environment..."
+cd $LCBP3_DIR/$CURRENT
+docker-compose down
+echo "✓ $CURRENT environment stopped"
+
+# Update current pointer
+echo "$TARGET" > $LCBP3_DIR/current
+
+echo "========================================="
+echo "✓ Deployment completed successfully!"
+echo "Active environment: $TARGET"
+echo "========================================="
+
+# Send notification (optional)
+# /scripts/send-notification.sh "Deployment completed: $TARGET is now active"
+```
+
+### Rollback Script
+
+```bash
+# File: /volume1/lcbp3/scripts/rollback.sh
+#!/bin/bash
+
+set -e
+
+LCBP3_DIR="/volume1/lcbp3"
+CURRENT=$(cat $LCBP3_DIR/current)
+PREVIOUS=$([[ "$CURRENT" == "blue" ]] && echo "green" || echo "blue")
+
+echo "========================================="
+echo "LCBP3-DMS Rollback"
+echo "========================================="
+echo "Current: $CURRENT"
+echo "Rolling back to: $PREVIOUS"
+echo "========================================="
+
+# Switch NGINX back
+echo "[1/3] Switching NGINX to $PREVIOUS..."
+sed -i "s/lcbp3-${CURRENT}-backend/lcbp3-${PREVIOUS}-backend/g" $LCBP3_DIR/nginx-proxy/nginx.conf
+sed -i "s/lcbp3-${CURRENT}-frontend/lcbp3-${PREVIOUS}-frontend/g" $LCBP3_DIR/nginx-proxy/nginx.conf
+docker exec lcbp3-nginx nginx -s reload
+echo "✓ NGINX switched"
+
+# Start previous environment if stopped
+echo "[2/3] Ensuring $PREVIOUS environment is running..."
+cd $LCBP3_DIR/$PREVIOUS
+docker-compose up -d
+sleep 10
+echo "✓ $PREVIOUS environment is running"
+
+# Verify
+echo "[3/3] Verifying rollback..."
+if curl -f -k https://lcbp3-dms.example.com/health > /dev/null 2>&1; then
+    echo "✓ Rollback successful"
+    echo "$PREVIOUS" > $LCBP3_DIR/current
+else
+    echo "✗ Rollback verification failed!"
+    exit 1
+fi
+
+echo "========================================="
+echo "✓ Rollback completed"
+echo "Active environment: $PREVIOUS"
+echo "========================================="
+```
+
+### Make Scripts Executable
+
+```bash
+chmod +x /volume1/lcbp3/scripts/deploy.sh
+chmod +x /volume1/lcbp3/scripts/rollback.sh
+```
+
+---
+
+## 📋 Deployment Checklist
+
+### Pre-Deployment
+
+- [ ] Backup current database
+- [ ] Tag Docker images with version
+- [ ] Update `.env.production` if needed
+- [ ] Review migration scripts
+- [ ] Notify stakeholders of deployment window
+- [ ] Verify SSL certificate validity (> 30 days)
+- [ ] Check disk space (> 20% free)
+- [ ] Review recent error logs
+
+### During Deployment
+
+- [ ] Pull latest Docker images
+- [ ] Start target environment (blue/green)
+- [ ] Run database migrations
+- [ ] Verify health checks pass
+- [ ] Switch NGINX proxy
+- [ ] Verify application responds correctly
+- [ ] Check for errors in logs
+- [ ] Monitor performance metrics
+
+### Post-Deployment
+
+- [ ] Monitor logs for 30 minutes
+- [ ] Check performance metrics
+- [ ] Verify all features working
+- [ ] Test critical user flows
+- [ ] Stop old environment
+- [ ] Update deployment log
+- [ ] Notify stakeholders of completion
+- [ ] Archive old Docker images
+
+---
+
+## 🔍 Troubleshooting
+
+### Common Issues
+
+#### 1. Container Won't Start
+
+```bash
+# Check logs
+docker logs lcbp3-blue-backend
+
+# Check resource usage
+docker stats
+
+# Restart container
+docker restart lcbp3-blue-backend
+```
+
+#### 2. Database Connection Failed
+
+```bash
+# Check MariaDB is running
+docker ps | grep mariadb
+
+# Test connection
+docker exec lcbp3-mariadb mysql -u lcbp3_user -p -e "SELECT 1"
+
+# Check environment variables
+docker exec lcbp3-blue-backend env | grep DB_
+```
+
+#### 3. NGINX 502 Bad Gateway
+
+```bash
+# Check backend is running
+curl http://localhost:3000/health
+
+# Check NGINX configuration
+docker exec lcbp3-nginx nginx -t
+
+# Check NGINX logs
+docker logs lcbp3-nginx
+
+# Reload NGINX
+docker exec lcbp3-nginx nginx -s reload
+```
+
+#### 4. Migration Failed
+
+```bash
+# Check migration status
+docker exec lcbp3-blue-backend npm run migration:show
+
+# Revert last migration
+docker exec lcbp3-blue-backend npm run migration:revert
+
+# Re-run migrations
+docker exec lcbp3-blue-backend npm run migration:run
+```
+
+---
+
+## 📊 Monitoring
+
+### Health Checks
+
+```bash
+# Backend health
+curl https://lcbp3-dms.example.com/health
+
+# Database health
+docker exec lcbp3-mariadb mysqladmin ping
+
+# Redis health
+docker exec lcbp3-redis redis-cli ping
+
+# All containers status
+docker ps --filter "name=lcbp3" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
+```
+
+### Performance Monitoring
+
+```bash
+# Container resource usage
+docker stats --no-stream
+
+# Disk usage
+df -h /volume1/lcbp3
+
+# Database size
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  SELECT table_schema AS 'Database',
+         ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
+  FROM information_schema.tables
+  WHERE table_schema = 'lcbp3_dms'
+  GROUP BY table_schema;"
+```
+
+---
+
+## 🔐 Security Best Practices
+
+1. **Change Default Passwords:** Update all passwords in `.env.production`
+2. **SSL/TLS:** Always use HTTPS in production
+3. **Firewall:** Only expose ports 80, 443, and 22 (SSH)
+4. **Regular Updates:** Keep Docker images updated
+5. **Backup Encryption:** Encrypt database backups
+6. **Access Control:** Limit SSH access to specific IPs
+7. **Secrets Management:** Never commit `.env` files to Git
+8. **Log Monitoring:** Review logs daily for suspicious activity
+
+---
+
+## 📚 Related Documentation
+
+- [Environment Setup Guide](04-02-environment-setup.md)
+- [Backup & Recovery](04-04-backup-recovery.md)
+- [Monitoring & Alerting](04-03-monitoring-alerting.md)
+- [Maintenance Procedures](04-05-maintenance-procedures.md)
+- [ADR-015: Deployment Infrastructure](../05-decisions/ADR-015-deployment-infrastructure.md)
+
+---
+
+**Version:** 1.8.0
+**Last Updated:** 2025-12-02
+**Next Review:** 2026-06-01
--- a/specs/04-Infrastructure-OPS/04-05-maintenance-procedures.md
+++ b/specs/04-Infrastructure-OPS/04-05-maintenance-procedures.md
@@ -0,0 +1,501 @@
+# Maintenance Procedures
+
+**Project:** LCBP3-DMS
+**Version:** 1.8.0
+**Last Updated:** 2025-12-02
+
+---
+
+## 📋 Overview
+
+This document outlines routine maintenance tasks, update procedures, and optimization guidelines for LCBP3-DMS.
+
+---
+
+## 📅 Maintenance Schedule
+
+### Daily Tasks
+
+- Monitor system health and backups
+- Review error logs
+- Check disk space
+
+### Weekly Tasks
+
+- Database optimization
+- Log rotation and cleanup
+- Security patch review
+- Performance monitoring review
+
+### Monthly Tasks
+
+- SSL certificate check
+- Dependency updates (Security patches)
+- Database maintenance
+- Backup restoration test
+
+### Quarterly Tasks
+
+- Full system update
+- Capacity planning review
+- Security audit
+- Disaster recovery drill
+
+---
+
+## 🔄 Update Procedures
+
+### Application Updates
+
+#### Backend Update
+
+```bash
+#!/bin/bash
+# File: /scripts/update-backend.sh
+
+# Step 1: Backup database
+/scripts/backup-database.sh
+
+# Step 2: Pull latest code
+cd /app/lcbp3/backend
+git pull origin main
+
+# Step 3: Install dependencies
+docker exec lcbp3-backend npm install
+
+# Step 4: Run migrations
+docker exec lcbp3-backend npm run migration:run
+
+# Step 5: Build application
+docker exec lcbp3-backend npm run build
+
+# Step 6: Restart backend
+docker restart lcbp3-backend
+
+# Step 7: Verify health
+sleep 10
+curl -f http://localhost:3000/health || {
+  echo "Health check failed! Rolling back..."
+  docker exec lcbp3-backend npm run migration:revert
+  docker restart lcbp3-backend
+  exit 1
+}
+
+echo "Backend updated successfully"
+```
+
+#### Frontend Update
+
+```bash
+#!/bin/bash
+# File: /scripts/update-frontend.sh
+
+# Step 1: Pull latest code
+cd /app/lcbp3/frontend
+git pull origin main
+
+# Step 2: Install dependencies
+docker exec lcbp3-frontend npm install
+
+# Step 3: Build application
+docker exec lcbp3-frontend npm run build
+
+# Step 4: Restart frontend
+docker restart lcbp3-frontend
+
+# Step 5: Verify
+sleep 10
+curl -f http://localhost:3001 || {
+  echo "Frontend failed to start!"
+  exit 1
+}
+
+echo "Frontend updated successfully"
+```
+
+### Zero-Downtime Deployment
+
+```bash
+#!/bin/bash
+# File: /scripts/zero-downtime-deploy.sh
+
+# Using blue-green deployment strategy
+
+# Step 1: Start new "green" backend
+docker-compose -f docker-compose.green.yml up -d backend
+
+# Step 2: Wait for health check
+for i in {1..30}; do
+  curl -f http://localhost:3002/health && break
+  sleep 2
+done
+
+# Step 3: Switch NGINX to green
+docker exec lcbp3-nginx nginx -s reload
+
+# Step 4: Stop old "blue" backend
+docker stop lcbp3-backend-blue
+
+echo "Deployment completed with zero downtime"
+```
+
+---
+
+## 🗄️ Database Maintenance
+
+### Weekly Database Optimization
+
+```sql
+-- File: /scripts/optimize-database.sql
+
+-- Optimize tables
+OPTIMIZE TABLE correspondences;
+OPTIMIZE TABLE rfas;
+OPTIMIZE TABLE workflow_instances;
+OPTIMIZE TABLE attachments;
+
+-- Analyze tables for query optimization
+ANALYZE TABLE correspondences;
+ANALYZE TABLE rfas;
+
+-- Check for table corruption
+CHECK TABLE correspondences;
+CHECK TABLE rfas;
+
+-- Rebuild indexes if fragmented
+ALTER TABLE correspondences ENGINE=InnoDB;
+```
+
+```bash
+#!/bin/bash
+# File: /scripts/weekly-db-maintenance.sh
+
+docker exec lcbp3-mariadb mysql -u root -p lcbp3_dms < /scripts/optimize-database.sql
+
+echo "Database optimization completed: $(date)"
+```
+
+### Monthly Database Cleanup
+
+```sql
+-- Archive old audit logs (older than 1 year)
+INSERT INTO audit_logs_archive
+SELECT * FROM audit_logs
+WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);
+
+DELETE FROM audit_logs
+WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);
+
+-- Clean up deleted notifications (older than 90 days)
+DELETE FROM notifications
+WHERE deleted_at IS NOT NULL
+AND deleted_at < DATE_SUB(NOW(), INTERVAL 90 DAY);
+
+-- Clean up expired temp uploads (older than 24h)
+DELETE FROM temp_uploads
+WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 DAY);
+
+-- Optimize after cleanup
+OPTIMIZE TABLE audit_logs;
+OPTIMIZE TABLE notifications;
+OPTIMIZE TABLE temp_uploads;
+```
+
+---
+
+## 📦 Dependency Updates
+
+### Security Patch Updates (Monthly)
+
+```bash
+#!/bin/bash
+# File: /scripts/update-dependencies.sh
+
+cd /app/lcbp3/backend
+
+# Check for security vulnerabilities
+npm audit
+
+# Update security patches only (no major versions)
+npm audit fix
+
+# Run tests
+npm test
+
+# If tests pass, commit and deploy
+git add package*.json
+git commit -m "chore: security patch updates"
+git push origin main
+```
+
+### Major Version Updates (Quarterly)
+
+```bash
+# Check for outdated packages
+npm outdated
+
+# Update one major dependency at a time
+npm install @nestjs/core@latest
+
+# Test thoroughly
+npm test
+npm run test:e2e
+
+# If successful, commit
+git commit -am "chore: update @nestjs/core to vX.X.X"
+```
+
+---
+
+## 🧹 Log Management
+
+### Log Rotation Configuration
+
+```bash
+# File: /etc/logrotate.d/lcbp3-dms
+
+/app/logs/*.log {
+  daily
+  rotate 30
+  compress
+  delaycompress
+  missingok
+  notifempty
+  create 0640 node node
+  sharedscripts
+  postrotate
+    docker exec lcbp3-backend kill -USR1 1
+  endscript
+}
+```
+
+### Manual Log Cleanup
+
+```bash
+#!/bin/bash
+# File: /scripts/cleanup-logs.sh
+
+# Delete logs older than 90 days
+find /app/logs -name "*.log" -type f -mtime +90 -delete
+
+# Compress logs older than 7 days
+find /app/logs -name "*.log" -type f -mtime +7 -exec gzip {} \;
+
+# Clean Docker logs
+docker system prune -f --volumes --filter "until=720h"
+
+echo "Log cleanup completed: $(date)"
+```
+
+---
+
+## 🔐 SSL Certificate Renewal
+
+### Check Certificate Expiry
+
+```bash
+#!/bin/bash
+# File: /scripts/check-ssl-cert.sh
+
+CERT_FILE="/app/nginx/ssl/cert.pem"
+EXPIRY_DATE=$(openssl x509 -enddate -noout -in "$CERT_FILE" | cut -d= -f2)
+EXPIRY_EPOCH=$(date -d "$EXPIRY_DATE" +%s)
+NOW_EPOCH=$(date +%s)
+DAYS_LEFT=$(( ($EXPIRY_EPOCH - $NOW_EPOCH) / 86400 ))
+
+echo "SSL certificate expires in $DAYS_LEFT days"
+
+if [ $DAYS_LEFT -lt 30 ]; then
+  echo "WARNING: SSL certificate expires soon!"
+  # Send alert
+  /scripts/send-alert-email.sh "SSL Certificate Expiring" "Certificate expires in $DAYS_LEFT days"
+fi
+```
+
+### Renew SSL Certificate (Let's Encrypt)
+
+```bash
+#!/bin/bash
+# File: /scripts/renew-ssl.sh
+
+# Renew certificate
+certbot renew --webroot -w /app/nginx/html
+
+# Copy new certificate
+cp /etc/letsencrypt/live/lcbp3-dms.example.com/fullchain.pem /app/nginx/ssl/cert.pem
+cp /etc/letsencrypt/live/lcbp3-dms.example.com/privkey.pem /app/nginx/ssl/key.pem
+
+# Reload NGINX
+docker exec lcbp3-nginx nginx -s reload
+
+echo "SSL certificate renewed: $(date)"
+```
+
+---
+
+## 🧪 Performance Optimization
+
+### Database Query Optimization
+
+```sql
+-- Find slow queries
+SELECT * FROM mysql.slow_log
+ORDER BY query_time DESC
+LIMIT 10;
+
+-- Add indexes for frequently queried columns
+CREATE INDEX idx_correspondences_status ON correspondences(status);
+CREATE INDEX idx_rfas_workflow_status ON rfas(workflow_status);
+CREATE INDEX idx_attachments_entity ON attachments(entity_type, entity_id);
+
+-- Analyze query execution plan
+EXPLAIN SELECT * FROM correspondences
+WHERE status = 'PENDING'
+AND created_at > DATE_SUB(NOW(), INTERVAL 30 DAY);
+```
+
+### Redis Cache Optimization
+
+```bash
+#!/bin/bash
+# File: /scripts/optimize-redis.sh
+
+# Check Redis memory usage
+docker exec lcbp3-redis redis-cli INFO memory
+
+# Set max memory policy
+docker exec lcbp3-redis redis-cli CONFIG SET maxmemory 1gb
+docker exec lcbp3-redis redis-cli CONFIG SET maxmemory-policy allkeys-lru
+
+# Save configuration
+docker exec lcbp3-redis redis-cli CONFIG REWRITE
+
+# Clear stale cache (if needed)
+docker exec lcbp3-redis redis-cli FLUSHDB
+```
+
+### Application Performance Tuning
+
+```typescript
+// Enable production optimizations in NestJS
+// File: backend/src/main.ts
+
+async function bootstrap() {
+  const app = await NestFactory.create(AppModule, {
+    logger:
+      process.env.NODE_ENV === 'production'
+        ? ['error', 'warn']
+        : ['log', 'error', 'warn', 'debug'],
+  });
+
+  // Enable compression
+  app.use(compression());
+
+  // Enable caching
+  app.useGlobalInterceptors(new CacheInterceptor());
+
+  // Set global timeout
+  app.use(timeout('30s'));
+
+  await app.listen(3000);
+}
+```
+
+---
+
+## 🔒 Security Maintenance
+
+### Monthly Security Tasks
+
+```bash
+#!/bin/bash
+# File: /scripts/security-maintenance.sh
+
+# Update system packages
+apt-get update && apt-get upgrade -y
+
+# Update ClamAV virus definitions
+docker exec lcbp3-clamav freshclam
+
+# Scan for rootkits
+rkhunter --check --skip-keypress
+
+# Check for unauthorized users
+awk -F: '($3 >= 1000) {print $1}' /etc/passwd
+
+# Review sudo access
+cat /etc/sudoers
+
+# Check firewall rules
+iptables -L -n -v
+
+echo "Security maintenance completed: $(date)"
+```
+
+---
+
+## ✅ Maintenance Checklist
+
+### Pre-Maintenance
+
+- [ ] Announce maintenance window to users
+- [ ] Backup database and files
+- [ ] Document current system state
+- [ ] Prepare rollback plan
+
+### During Maintenance
+
+- [ ] Put system in maintenance mode (if needed)
+- [ ] Perform updates/changes
+- [ ] Run smoke tests
+- [ ] Monitor system health
+
+### Post-Maintenance
+
+- [ ] Verify all services running
+- [ ] Run full test suite
+- [ ] Monitor performance metrics
+- [ ] Communicate completion to users
+- [ ] Document changes made
+
+---
+
+## 🔧 Emergency Maintenance
+
+### Unplanned Maintenance Procedures
+
+1. **Assess Urgency**
+
+   - Can it wait for scheduled maintenance?
+   - Is it causing active issues?
+
+2. **Communicate Impact**
+
+   - Notify stakeholders immediately
+   - Estimate downtime
+   - Provide updates every 30 minutes
+
+3. **Execute Carefully**
+
+   - Always backup first
+   - Have rollback plan ready
+   - Test in staging if possible
+
+4. **Post-Maintenance Review**
+   - Document what happened
+   - Identify preventive measures
+   - Update runbooks
+
+---
+
+## 📚 Related Documents
+
+- [Deployment Guide](04-01-deployment-guide.md)
+- [Backup & Recovery](04-04-backup-recovery.md)
+- [Monitoring & Alerting](04-03-monitoring-alerting.md)
+
+---
+
+**Version:** 1.8.0
+**Last Review:** 2025-12-01
+**Next Review:** 2026-03-01
--- a/specs/04-Infrastructure-OPS/04-06-security-operations.md
+++ b/specs/04-Infrastructure-OPS/04-06-security-operations.md
@@ -0,0 +1,444 @@
+# Security Operations
+
+**Project:** LCBP3-DMS
+**Version:** 1.8.0
+**Last Updated:** 2025-12-02
+
+---
+
+## 📋 Overview
+
+This document outlines security monitoring, access control management, vulnerability management, and security incident response for LCBP3-DMS.
+
+---
+
+## 🔒 Access Control Management
+
+### User Access Review
+
+**Monthly Tasks:**
+
+```bash
+#!/bin/bash
+# File: /scripts/audit-user-access.sh
+
+# Export active users
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  SELECT user_id, username, email, primary_organization_id, is_active, last_login_at
+  FROM lcbp3_dms.users
+  WHERE is_active = 1
+  ORDER BY last_login_at DESC;
+" > /reports/active-users-$(date +%Y%m%d).csv
+
+# Find dormant accounts (no login > 90 days)
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  SELECT user_id, username, email, last_login_at,
+         DATEDIFF(NOW(), last_login_at) AS days_inactive
+  FROM lcbp3_dms.users
+  WHERE is_active = 1
+  AND (last_login_at IS NULL OR last_login_at < DATE_SUB(NOW(), INTERVAL 90 DAY));
+"
+
+echo "User access audit completed: $(date)"
+```
+
+### Role & Permission Audit
+
+```sql
+-- Review users with elevated permissions
+SELECT u.username, u.email, r.role_name, r.scope
+FROM users u
+JOIN user_assignments ua ON u.user_id = ua.user_id
+JOIN roles r ON ua.role_id = r.role_id
+WHERE r.role_name IN ('Superadmin', 'Document Controller', 'Project Manager')
+ORDER BY r.role_name, u.username;
+
+-- Review Global scope roles (highest privilege)
+SELECT u.username, r.role_name
+FROM users u
+JOIN user_assignments ua ON u.user_id = ua.user_id
+JOIN roles r ON ua.role_id = r.role_id
+WHERE r.scope = 'Global';
+```
+
+---
+
+## 🛡️ Security Monitoring
+
+### Log Monitoring for Security Events
+
+```bash
+#!/bin/bash
+# File: /scripts/monitor-security-events.sh
+
+# Check for failed login attempts
+docker logs lcbp3-backend | grep "Failed login" | tail -20
+
+# Check for unauthorized access attempts (403)
+docker logs lcbp3-backend | grep "403" | tail -20
+
+# Check for unusual activity patterns
+docker logs lcbp3-backend | grep -E "DELETE|DROP|TRUNCATE" | tail -20
+
+# Check for SQL injection attempts
+docker logs lcbp3-backend | grep -i "SELECT.*FROM.*WHERE" | grep -v "legitimate" | tail -20
+```
+
+### Failed Login Monitoring
+
+```sql
+-- Find accounts with multiple failed login attempts
+SELECT username, failed_attempts, locked_until
+FROM users
+WHERE failed_attempts >= 3
+ORDER BY failed_attempts DESC;
+
+-- Unlock user account after verification
+UPDATE users
+SET failed_attempts = 0, locked_until = NULL
+WHERE user_id = ?;
+```
+
+---
+
+## 🔐 Secrets & Credentials Management
+
+### Password Rotation Schedule
+
+| Credential             | Rotation Frequency       | Owner        |
+| ---------------------- | ------------------------ | ------------ |
+| Database Root Password | Every 90 days            | DBA          |
+| Database App Password  | Every 90 days            | DevOps       |
+| JWT Secret             | Every 180 days           | Backend Team |
+| Redis Password         | Every 90 days            | DevOps       |
+| SMTP Password          | When provider requires   | Operations   |
+| SSL Private Key        | With certificate renewal | Operations   |
+
+### Password Rotation Procedure
+
+```bash
+#!/bin/bash
+# File: /scripts/rotate-db-password.sh
+
+# Generate new password
+NEW_PASSWORD=$(openssl rand -base64 32)
+
+# Update database user password
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  ALTER USER 'lcbp3_user'@'%' IDENTIFIED BY '$NEW_PASSWORD';
+  FLUSH PRIVILEGES;
+"
+
+# Update application .env file
+sed -i "s/^DB_PASS=.*/DB_PASS=$NEW_PASSWORD/" /app/backend/.env
+
+# Restart backend to apply new password
+docker restart lcbp3-backend
+
+# Verify connection
+sleep 10
+curl -f http://localhost:3000/health || {
+  echo "FAILED: Backend cannot connect with new password"
+  # Rollback procedure...
+  exit 1
+}
+
+echo "Database password rotated successfully: $(date)"
+# Store password securely (e.g., password manager)
+```
+
+---
+
+## 🚨 Vulnerability Management
+
+### Dependency Vulnerability Scanning
+
+```bash
+#!/bin/bash
+# File: /scripts/scan-vulnerabilities.sh
+
+# Backend dependencies
+cd /app/backend
+npm audit --production
+
+# Critical/High vulnerabilities
+VULNERABILITIES=$(npm audit --production --json | jq '.metadata.vulnerabilities.high + .metadata.vulnerabilities.critical')
+
+if [ "$VULNERABILITIES" -gt 0 ]; then
+  echo "WARNING: $VULNERABILITIES critical/high vulnerabilities found!"
+  npm audit --production > /reports/security-audit-$(date +%Y%m%d).txt
+  # Send alert
+  /scripts/send-alert-email.sh "Security Vulnerabilities Detected" "Found $VULNERABILITIES critical/high vulnerabilities"
+fi
+
+# Frontend dependencies
+cd /app/frontend
+npm audit --production
+```
+
+### Container Image Scanning
+
+```bash
+#!/bin/bash
+# File: /scripts/scan-images.sh
+
+# Install Trivy (if not installed)
+# wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | apt-key add -
+# echo "deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main" | tee -a /etc/apt/sources.list.d/trivy.list
+# apt-get update && apt-get install trivy
+
+# Scan Docker images
+trivy image --severity HIGH,CRITICAL lcbp3-backend:latest
+trivy image --severity HIGH,CRITICAL lcbp3-frontend:latest
+trivy image --severity HIGH,CRITICAL mariadb:11.8
+trivy image --severity HIGH,CRITICAL redis:7.2-alpine
+```
+
+---
+
+## 🔍 Security Hardening
+
+### Server Hardening Checklist
+
+- [ ] Disable root SSH login
+- [ ] Use SSH key authentication only
+- [ ] Configure firewall (allow only necessary ports)
+- [ ] Enable automatic security updates
+- [ ] Remove unnecessary services
+- [ ] Configure fail2ban for brute-force protection
+- [ ] Enable SELinux/AppArmor
+- [ ] Regular security patch updates
+
+### Docker Security
+
+```yaml
+# docker-compose.yml - Security best practices
+
+services:
+  backend:
+    # Run as non-root user
+    user: 'node:node'
+
+    # Read-only root filesystem
+    read_only: true
+
+    # No new privileges
+    security_opt:
+      - no-new-privileges:true
+
+    # Limit capabilities
+    cap_drop:
+      - ALL
+    cap_add:
+      - NET_BIND_SERVICE
+
+    # Resource limits
+    deploy:
+      resources:
+        limits:
+          cpus: '2'
+          memory: 2G
+        reservations:
+          memory: 512M
+```
+
+### Database Security
+
+```sql
+-- Remove anonymous users
+DELETE FROM mysql.user WHERE User='';
+
+-- Remove test database
+DROP DATABASE IF EXISTS test;
+
+-- Remove remote root login
+DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1');
+
+-- Create dedicated backup user with minimal privileges
+CREATE USER 'backup_user'@'localhost' IDENTIFIED BY 'STRONG_PASSWORD';
+GRANT SELECT, LOCK TABLES, SHOW VIEW, EVENT, TRIGGER ON lcbp3_dms.* TO 'backup_user'@'localhost';
+
+-- Enable SSL for database connections
+-- GRANT USAGE ON *.* TO 'lcbp3_user'@'%' REQUIRE SSL;
+
+FLUSH PRIVILEGES;
+```
+
+---
+
+## 🚨 Security Incident Response
+
+### Incident Classification
+
+| Type                    | Examples                     | Response Time    |
+| ----------------------- | ---------------------------- | ---------------- |
+| **Data Breach**         | Unauthorized data access     | Immediate (< 1h) |
+| **Account Compromise**  | Stolen credentials           | Immediate (< 1h) |
+| **DDoS Attack**         | Service unavailable          | Immediate (< 1h) |
+| **Malware/Ransomware**  | Infected systems             | Immediate (< 1h) |
+| **Unauthorized Access** | Failed authentication spikes | High (< 4h)      |
+| **Suspicious Activity** | Unusual patterns             | Medium (< 24h)   |
+
+### Data Breach Response
+
+**Immediate Actions:**
+
+1. **Contain the breach**
+
+   ```bash
+   # Block suspicious IPs at firewall level
+   iptables -A INPUT -s <SUSPICIOUS_IP> -j DROP
+
+   # Disable compromised user accounts
+   docker exec lcbp3-mariadb mysql -u root -p -e "
+     UPDATE lcbp3_dms.users
+     SET is_active = 0
+     WHERE user_id = <COMPROMISED_USER_ID>;
+   "
+   ```
+
+2. **Assess impact**
+
+   ```sql
+   -- Check audit logs for unauthorized access
+   SELECT * FROM audit_logs
+   WHERE user_id = <COMPROMISED_USER_ID>
+   AND created_at >= '<SUSPECTED_START_TIME>'
+   ORDER BY created_at DESC;
+
+   -- Check what documents were accessed
+   SELECT DISTINCT entity_id, entity_type, action
+   FROM audit_logs
+   WHERE user_id = <COMPROMISED_USER_ID>;
+   ```
+
+3. **Notify stakeholders**
+
+   - Security officer
+   - Management
+   - Affected users (if applicable)
+   - Legal team (if required by law)
+
+4. **Document everything**
+   - Timeline of events
+   - Data accessed/compromised
+   - Actions taken
+   - Lessons learned
+
+### Account Compromise Response
+
+```bash
+#!/bin/bash
+# File: /scripts/respond-account-compromise.sh
+
+USER_ID=$1
+
+# 1. Immediately disable account
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  UPDATE lcbp3_dms.users
+  SET is_active = 0,
+      locked_until = DATE_ADD(NOW(), INTERVAL 24 HOUR)
+  WHERE user_id = $USER_ID;
+"
+
+# 2. Invalidate all sessions
+docker exec lcbp3-redis redis-cli DEL "session:user:$USER_ID:*"
+
+# 3. Generate audit report
+docker exec lcbp3-mariadb mysql -u root -p -e "
+  SELECT * FROM lcbp3_dms.audit_logs
+  WHERE user_id = $USER_ID
+  AND created_at >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
+  ORDER BY created_at DESC;
+" > /reports/compromise-audit-user-$USER_ID-$(date +%Y%m%d).txt
+
+# 4. Notify security team
+/scripts/send-alert-email.sh "Account Compromise" "User ID $USER_ID has been compromised and disabled"
+
+echo "Account compromise response completed for User ID: $USER_ID"
+```
+
+---
+
+## 📊 Security Metrics & KPIs
+
+### Monthly Security Report
+
+| Metric                      | Target    | Actual |
+| --------------------------- | --------- | ------ |
+| Failed Login Attempts       | < 100/day | Track  |
+| Locked Accounts             | < 5/month | Track  |
+| Critical Vulnerabilities    | 0         | Track  |
+| High Vulnerabilities        | < 5       | Track  |
+| Unpatched Systems           | 0         | Track  |
+| Security Incidents          | 0         | Track  |
+| Mean Time To Detect (MTTD)  | < 1 hour  | Track  |
+| Mean Time To Respond (MTTR) | < 4 hours | Track  |
+
+---
+
+## 🔐 Compliance & Audit
+
+### Audit Log Retention
+
+- **Access Logs:** 1 year
+- **Security Events:** 2 years
+- **Admin Actions:** 3 years
+- **Data Changes:** 7 years (as required)
+
+### Compliance Checklist
+
+- [ ] Regular security audits (quarterly)
+- [ ] Penetration testing (annually)
+- [ ] Access control reviews (monthly)
+- [ ] Encryption at rest and in transit
+- [ ] Secure password policies enforced
+- [ ] Multi-factor authentication (if required)
+- [ ] Data backup and recovery tested
+- [ ] Incident response plan documented and tested
+
+---
+
+## ✅ Security Operations Checklist
+
+### Daily
+
+- [ ] Review security alerts and logs
+- [ ] Monitor failed login attempts
+- [ ] Check for unusual access patterns
+- [ ] Verify backup completion
+
+### Weekly
+
+- [ ] Review user access logs
+- [ ] Scan for vulnerabilities
+- [ ] Update virus definitions
+- [ ] Review firewall logs
+
+### Monthly
+
+- [ ] User access audit
+- [ ] Role and permission review
+- [ ] Security patch application
+- [ ] Compliance review
+
+### Quarterly
+
+- [ ] Full security audit
+- [ ] Penetration testing
+- [ ] Disaster recovery drill
+- [ ] Update security policies
+
+---
+
+## 🔗 Related Documents
+
+- [Incident Response](04-07-incident-response.md)
+- [Monitoring & Alerting](04-03-monitoring-alerting.md)
+- [ADR-004: RBAC Implementation](../05-decisions/ADR-004-rbac-implementation.md)
+
+---
+
+**Version:** 1.8.0
+**Last Review:** 2025-12-01
+**Next Review:** 2026-03-01
--- a/specs/04-Infrastructure-OPS/04-07-incident-response.md
+++ b/specs/04-Infrastructure-OPS/04-07-incident-response.md
@@ -0,0 +1,483 @@
+# Incident Response Procedures
+
+**Project:** LCBP3-DMS
+**Version:** 1.8.0
+**Last Updated:** 2025-12-02
+
+---
+
+## 📋 Overview
+
+This document outlines incident classification, response procedures, and post-incident reviews for LCBP3-DMS.
+
+---
+
+## 🚨 Incident Classification
+
+### Severity Levels
+
+| Severity          | Description                  | Response Time     | Examples                                        |
+| ----------------- | ---------------------------- | ----------------- | ----------------------------------------------- |
+| **P0 - Critical** | Complete system outage       | 15 minutes        | Database down, All services unavailable         |
+| **P1 - High**     | Major functionality impaired | 1 hour            | Authentication failing, Cannot create documents |
+| **P2 - Medium**   | Degraded performance         | 4 hours           | Slow response time, Some features broken        |
+| **P3 - Low**      | Minor issues                 | Next business day | UI glitch, Non-critical bug                     |
+
+---
+
+## 📞 Incident Response Team
+
+### Roles & Responsibilities
+
+**Incident Commander (IC)**
+
+- Coordinates response efforts
+- Makes final decisions
+- Communicates with stakeholders
+
+**Technical Lead (TL)**
+
+- Diagnoses technical issues
+- Implements fixes
+- Coordinates with engineers
+
+**Communications Lead (CL)**
+
+- Updates stakeholders
+- Manages internal/external communications
+- Documents incident timeline
+
+**On-Call Engineer**
+
+- First responder
+- Initial triage and investigation
+- Escalates to appropriate team
+
+---
+
+## 🔄 Incident Response Workflow
+
+```mermaid
+flowchart TD
+    Start([Incident Detected]) --> Acknowledge[Acknowledge Incident]
+    Acknowledge --> Assess[Assess Severity]
+    Assess --> P0{Severity?}
+
+    P0 -->|P0/P1| Alert[Page Incident Commander]
+    P0 -->|P2/P3| Assign[Assign to On-Call]
+
+    Alert --> Investigate[Investigate Root Cause]
+    Assign --> Investigate
+
+    Investigate --> Mitigate[Implement Mitigation]
+    Mitigate --> Verify[Verify Resolution]
+
+    Verify --> Resolved{Resolved?}
+    Resolved -->|No| Escalate[Escalate/Re-assess]
+    Escalate --> Investigate
+
+    Resolved -->|Yes| Communicate[Communicate Resolution]
+    Communicate --> PostMortem[Schedule Post-Mortem]
+    PostMortem --> End([Close Incident])
+```
+
+---
+
+## 📋 Incident Response Playbooks
+
+### P0: Database Down
+
+**Symptoms:**
+
+- Backend returns 500 errors
+- Cannot connect to database
+- Health check fails
+
+**Immediate Actions:**
+
+1. **Verify Issue**
+
+   ```bash
+   docker ps | grep mariadb
+   docker logs lcbp3-mariadb --tail=50
+   ```
+
+2. **Attempt Restart**
+
+   ```bash
+   docker restart lcbp3-mariadb
+   ```
+
+3. **Check Database Process**
+
+   ```bash
+   docker exec lcbp3-mariadb ps aux | grep mysql
+   ```
+
+4. **If Restart Fails:**
+
+   ```bash
+   # Check disk space
+   df -h
+
+   # Check database logs for corruption
+   docker exec lcbp3-mariadb cat /var/log/mysql/error.log
+
+   # If corrupted, restore from backup
+   # See backup-recovery.md
+   ```
+
+5. **Escalate to DBA** if not resolved in 30 minutes
+
+---
+
+### P0: Complete System Outage
+
+**Symptoms:**
+
+- All services return 502/503
+- Health checks fail
+- Users cannot access system
+
+**Immediate Actions:**
+
+1. **Check Container Status**
+
+   ```bash
+   docker-compose ps
+   # Identify which containers are down
+   ```
+
+2. **Restart All Services**
+
+   ```bash
+   docker-compose restart
+   ```
+
+3. **Check QNAP Server Resources**
+
+   ```bash
+   top
+   df -h
+   free -h
+   ```
+
+4. **Check Network**
+
+   ```bash
+   ping 8.8.8.8
+   netstat -tlnp
+   ```
+
+5. **If Server Issue:**
+   - Reboot QNAP server
+   - Contact QNAP support
+
+---
+
+### P1: Authentication System Failing
+
+**Symptoms:**
+
+- Users cannot log in
+- JWT validation fails
+- 401 errors
+
+**Immediate Actions:**
+
+1. **Check Redis (Session Store)**
+
+   ```bash
+   docker exec lcbp3-redis redis-cli ping
+   # Should return PONG
+   ```
+
+2. **Check JWT Secret Configuration**
+
+   ```bash
+   docker exec lcbp3-backend env | grep JWT_SECRET
+   # Verify not empty
+   ```
+
+3. **Check Backend Logs**
+
+   ```bash
+   docker logs lcbp3-backend --tail=100 | grep "JWT\|Auth"
+   ```
+
+4. **Temporary Mitigation:**
+   ```bash
+   # Restart backend to reload config
+   docker restart lcbp3-backend
+   ```
+
+---
+
+### P1: File Upload Failing
+
+**Symptoms:**
+
+- Users cannot upload files
+- 500 errors on file upload
+- "Disk full" errors
+
+**Immediate Actions:**
+
+1. **Check Disk Space**
+
+   ```bash
+   df -h /var/lib/docker/volumes/lcbp3_uploads
+   ```
+
+2. **If Disk Full:**
+
+   ```bash
+   # Clean up temp uploads
+   find /var/lib/docker/volumes/lcbp3_uploads/_data/temp \
+     -type f -mtime +1 -delete
+   ```
+
+3. **Check ClamAV (Virus Scanner)**
+
+   ```bash
+   docker logs lcbp3-clamav --tail=50
+   docker restart lcbp3-clamav
+   ```
+
+4. **Check File Permissions**
+   ```bash
+   docker exec lcbp3-backend ls -la /app/uploads
+   ```
+
+---
+
+### P2: Slow Performance
+
+**Symptoms:**
+
+- Pages load slowly
+- API response time > 2s
+- Users complain about slowness
+
+**Actions:**
+
+1. **Check System Resources**
+
+   ```bash
+   docker stats
+   # Identify high CPU/memory containers
+   ```
+
+2. **Check Database Performance**
+
+   ```sql
+   -- Show slow queries
+   SHOW PROCESSLIST;
+
+   -- Check connections
+   SHOW STATUS LIKE 'Threads_connected';
+   ```
+
+3. **Check Redis**
+
+   ```bash
+   docker exec lcbp3-redis redis-cli --stat
+   ```
+
+4. **Check Application Logs**
+
+   ```bash
+   docker logs lcbp3-backend | grep "Slow request"
+   ```
+
+5. **Temporary Mitigation:**
+   - Restart slow containers
+   - Clear Redis cache if needed
+   - Kill long-running queries
+
+---
+
+### P2: Email Notifications Not Sending
+
+**Symptoms:**
+
+- Users not receiving emails
+- Email queue backing up
+
+**Actions:**
+
+1. **Check Email Queue**
+
+   ```bash
+   # Access BullMQ dashboard or check Redis
+   docker exec lcbp3-redis redis-cli LLEN bull:email:waiting
+   ```
+
+2. **Check Email Processor Logs**
+
+   ```bash
+   docker logs lcbp3-backend | grep "email\|SMTP"
+   ```
+
+3. **Test SMTP Connection**
+
+   ```bash
+   docker exec lcbp3-backend node -e "
+   const nodemailer = require('nodemailer');
+   const transport = nodemailer.createTransport({
+     host: process.env.SMTP_HOST,
+     port: process.env.SMTP_PORT,
+     auth: {
+       user: process.env.SMTP_USER,
+       pass: process.env.SMTP_PASS
+     }
+   });
+   transport.verify().then(console.log).catch(console.error);
+   "
+   ```
+
+4. **Check SMTP Credentials**
+   - Verify not expired
+   - Check firewall/network access
+
+---
+
+## 📝 Incident Documentation
+
+### Incident Report Template
+
+```markdown
+# Incident Report: [Brief Description]
+
+**Incident ID:** INC-YYYYMMDD-001
+**Severity:** P1
+**Status:** Resolved
+**Incident Commander:** [Name]
+
+## Timeline
+
+| Time  | Event                                                     |
+| ----- | --------------------------------------------------------- |
+| 14:00 | Alert: High error rate detected                           |
+| 14:05 | On-call engineer acknowledged                             |
+| 14:10 | Identified root cause: Database connection pool exhausted |
+| 14:15 | Implemented mitigation: Increased pool size               |
+| 14:20 | Verified resolution                                       |
+| 14:30 | Incident resolved                                         |
+
+## Impact
+
+- **Duration:** 30 minutes
+- **Affected Users:** ~50 users
+- **Affected Services:** Document creation, Search
+- **Data Loss:** None
+
+## Root Cause
+
+Database connection pool was exhausted due to slow queries not releasing connections.
+
+## Resolution
+
+1. Increased connection pool size from 10 to 20
+2. Optimized slow queries
+3. Added connection pool monitoring
+
+## Action Items
+
+- [ ] Add connection pool size alert (Owner: DevOps, Due: Next Sprint)
+- [ ] Implement automatic query timeouts (Owner: Backend, Due: 2025-12-15)
+- [ ] Review all queries for optimization (Owner: DBA, Due: 2025-12-31)
+
+## Lessons Learned
+
+- Connection pool monitoring was insufficient
+- Need automated remediation for common issues
+```
+
+---
+
+## 🔍 Post-Incident Review (PIR)
+
+### PIR Meeting Agenda
+
+1. **Timeline Review** (10 min)
+
+   - What happened and when?
+   - What was the impact?
+
+2. **Root Cause Analysis** (15 min)
+
+   - Why did it happen?
+   - What were the contributing factors?
+
+3. **What Went Well** (10 min)
+
+   - What did we do right?
+   - What helped us resolve quickly?
+
+4. **What Went Wrong** (15 min)
+
+   - What could we have done better?
+   - What slowed us down?
+
+5. **Action Items** (10 min)
+   - What changes will prevent this?
+   - Who owns each action?
+   - When will they be completed?
+
+### PIR Best Practices
+
+- **Blameless Culture:** Focus on systems, not individuals
+- **Actionable Outcomes:** Every PIR should produce concrete actions
+- **Follow Through:** Track action items to completion
+- **Share Learnings:** Distribute PIR summary to entire team
+
+---
+
+## 📊 Incident Metrics
+
+### Track & Review Monthly
+
+- **MTTR (Mean Time To Resolution):** Average time to resolve incidents
+- **MTBF (Mean Time Between Failures):** Average time between incidents
+- **Incident Frequency:** Number of incidents per month
+- **Severity Distribution:** Breakdown by P0/P1/P2/P3
+- **Repeat Incidents:** Same root cause occurring multiple times
+
+---
+
+## ✅ Incident Response Checklist
+
+### During Incident
+
+- [ ] Acknowledge incident in tracking system
+- [ ] Assess severity and assign IC
+- [ ] Create incident channel (Slack/Teams)
+- [ ] Begin documenting timeline
+- [ ] Investigate and implement mitigation
+- [ ] Communicate status updates every 30 min (P0/P1)
+- [ ] Verify resolution
+- [ ] Communicate resolution to stakeholders
+
+### After Incident
+
+- [ ] Create incident report
+- [ ] Schedule PIR within 48 hours
+- [ ] Identify action items
+- [ ] Assign owners and deadlines
+- [ ] Update runbooks/playbooks
+- [ ] Share learnings with team
+
+---
+
+## 🔗 Related Documents
+
+- [Monitoring & Alerting](04-03-monitoring-alerting.md)
+- [Backup & Recovery](04-04-backup-recovery.md)
+- [Security Operations](04-06-security-operations.md)
+
+---
+
+**Version:** 1.8.0
+**Last Review:** 2025-12-01
+**Next Review:** 2026-03-01
--- a/specs/04-Infrastructure-OPS/README.md
+++ b/specs/04-Infrastructure-OPS/README.md
@@ -0,0 +1,36 @@
+# Infrastructure & Operations (OPS) Guide
+
+**Project:** LCBP3-DMS
+**Version:** 1.8.0
+**Last Updated:** 2026-02-23
+
+---
+
+## 📋 Overview
+
+This directory (`04-Infrastructure-OPS/`) serves as the single source of truth for all infrastructure setups, networking rules, Docker Compose configurations, backups, and site reliability operations for the LCBP3-DMS project.
+
+It consolidates what was previously split across multiple operations and specification folders into a cohesive set of manuals for DevOps, System Administrators, and On-Call Engineers.
+
+---
+
+## 📂 Document Index
+
+| File                                                                     | Purpose                | Key Contents                                                                                |
+| ------------------------------------------------------------------------ | ---------------------- | ------------------------------------------------------------------------------------------- |
+| **[04-01-docker-compose.md](./04-01-docker-compose.md)**                 | Core Environment Setup | `.env` configs, Blue/Green Docker Compose, MariaDB & Redis optimization                     |
+| **[04-02-backup-recovery.md](./04-02-backup-recovery.md)**               | Disaster Recovery      | RTO/RPO strategies, QNAP to ASUSTOR backup scripts, Restic/Mysqldump config                 |
+| **[04-03-monitoring.md](./04-03-monitoring.md)**                         | Observability          | Prometheus metrics, AlertManager rules (inclusive of Document Numbering DB), Grafana alerts |
+| **[04-04-deployment-guide.md](./04-04-deployment-guide.md)**             | Production Rollout     | Step-by-step Blue-Green deployment scripts, rollback playbooks, Nginx Reverse Proxy         |
+| **[04-05-maintenance-procedures.md](./04-05-maintenance-procedures.md)** | Routine Care           | Log rotation, dependency zero-downtime updates, scheduled DB optimizations                  |
+| **[04-06-security-operations.md](./04-06-security-operations.md)**       | Hardening & Audit      | User access review scripts, SSL renewals, vulnerability scanning procedures                 |
+| **[04-07-incident-response.md](./04-07-incident-response.md)**           | Escalation             | P0-P3 classifications, incident commander roles, Post-Incident Review (PIR)                 |
+
+---
+
+## 🎯 Guiding Principles
+
+1. **Zero Downtime Deployments**: Utilize the Blue/Green architecture outlined in `04-04` wherever possible.
+2. **Infrastructure as Code**: No manual unscripted changes. Modify the `docker-compose.yml` specs and `.env.production` templates directly.
+3. **Automated Backups**: Backups must be validated automatically using the ASUSTOR pulling mechanism in `04-02`.
+4. **Actionable Alerts**: No noisy monitoring. Prometheus alerts in `04-03` should route to Slack/PagerDuty only when action is required.