260223:1415 20260223 nextJS & nestJS Best pratices
All checks were successful
Build and Deploy / deploy (push) Successful in 4m44s

This commit is contained in:
admin
2026-02-23 14:15:06 +07:00
parent c90a664f53
commit ef16817f38
164 changed files with 24815 additions and 311 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,856 @@
# 04.2 Backup & Disaster Recovery
**Project:** LCBP3-DMS
**Version:** 1.8.0
**Status:** Active
**Owner:** Nattanin Peancharoen / DevOps Team
**Last Updated:** 2026-02-23
> 📍 **Backup Target Server:** ASUSTOR AS5403T (Infrastructure & Backup)
> 🖥️ **Primary Source Server:** QNAP TS-473A (Application & Database)
---
## 📖 Overview
This document outlines the backup strategies, scripts (ASUSTOR pulling from QNAP), recovery procedures, and comprehensive disaster recovery planning for LCBP3-DMS.
---
# Backup & Recovery Procedures
**Project:** LCBP3-DMS
**Version:** 1.8.0
**Last Updated:** 2025-12-02
---
## 📋 Overview
This document outlines backup strategies, recovery procedures, and disaster recovery planning for LCBP3-DMS.
---
## 🎯 Backup Strategy
### Backup Schedule
| Data Type | Frequency | Retention | Method |
| ---------------------- | -------------- | --------- | ----------------------- |
| Database (Full) | Daily at 02:00 | 30 days | mysqldump + compression |
| Database (Incremental) | Every 6 hours | 7 days | Binary logs |
| File Uploads | Daily at 03:00 | 30 days | rsync to backup server |
| Configuration Files | Weekly | 90 days | Git repository |
| Elasticsearch Indexes | Weekly | 14 days | Snapshot to S3/NFS |
| Application Logs | Daily | 90 days | Rotation + archival |
### Backup Locations
**Primary Backup:** QNAP NAS `/backup/lcbp3-dms`
**Secondary Backup:** External backup server (rsync)
**Offsite Backup:** Cloud storage (optional - for critical data)
---
## 💾 Database Backup
### Automated Daily Backup Script
```bash
#!/bin/bash
# File: /scripts/backup-database.sh
# Configuration
BACKUP_DIR="/backup/lcbp3-dms/database"
DB_CONTAINER="lcbp3-mariadb"
DB_NAME="lcbp3_dms"
DB_USER="backup_user"
DB_PASS="<BACKUP_USER_PASSWORD>"
RETENTION_DAYS=30
# Create backup directory
BACKUP_FILE="$BACKUP_DIR/lcbp3_$(date +%Y%m%d_%H%M%S).sql.gz"
mkdir -p "$BACKUP_DIR"
# Perform backup
echo "Starting database backup to $BACKUP_FILE"
docker exec $DB_CONTAINER mysqldump \
--user=$DB_USER \
--password=$DB_PASS \
--single-transaction \
--routines \
--triggers \
--databases $DB_NAME \
| gzip > "$BACKUP_FILE"
# Check backup success
if [ $? -eq 0 ]; then
echo "Backup completed successfully"
# Delete old backups
find "$BACKUP_DIR" -name "*.sql.gz" -type f -mtime +$RETENTION_DAYS -delete
echo "Old backups cleaned up (retention: $RETENTION_DAYS days)"
else
echo "ERROR: Backup failed!"
exit 1
fi
```
### Schedule with Cron
```bash
# Edit crontab
crontab -e
# Add backup job (runs daily at 2 AM)
0 2 * * * /scripts/backup-database.sh >> /var/log/backup-database.log 2>&1
```
### Manual Database Backup
```bash
# Backup specific database
docker exec lcbp3-mariadb mysqldump \
-u root -p \
--single-transaction \
lcbp3_dms > backup_$(date +%Y%m%d).sql
# Compress backup
gzip backup_$(date +%Y%m%d).sql
```
---
## 📂 File Uploads Backup
### Automated Rsync Backup
```bash
#!/bin/bash
# File: /scripts/backup-uploads.sh
SOURCE="/var/lib/docker/volumes/lcbp3_uploads/_data"
DEST="/backup/lcbp3-dms/uploads"
RETENTION_DAYS=30
# Create incremental backup with rsync
rsync -av --delete \
--backup --backup-dir="$DEST/backup-$(date +%Y%m%d)" \
"$SOURCE/" "$DEST/current/"
# Cleanup old backups
find "$DEST" -maxdepth 1 -type d -name "backup-*" -mtime +$RETENTION_DAYS -exec rm -rf {} \;
echo "Upload backup completed: $(date)"
```
### Schedule Uploads Backup
```bash
# Run daily at 3 AM
0 3 * * * /scripts/backup-uploads.sh >> /var/log/backup-uploads.log 2>&1
```
---
## 🔄 Database Recovery
### Full Database Restore
```bash
# Step 1: Stop backend application
docker stop lcbp3-backend
# Step 2: Restore database from backup
gunzip < backup_20241201.sql.gz | \
docker exec -i lcbp3-mariadb mysql -u root -p lcbp3_dms
# Step 3: Verify restore
docker exec lcbp3-mariadb mysql -u root -p -e "
USE lcbp3_dms;
SELECT COUNT(*) FROM users;
SELECT COUNT(*) FROM correspondences;
"
# Step 4: Restart backend
docker start lcbp3-backend
```
### Point-in-Time Recovery (Using Binary Logs)
```bash
# Step 1: Restore last full backup
gunzip < backup_20241201_020000.sql.gz | \
docker exec -i lcbp3-mariadb mysql -u root -p lcbp3_dms
# Step 2: Apply binary logs since backup
docker exec lcbp3-mariadb mysqlbinlog \
--start-datetime="2024-12-01 02:00:00" \
--stop-datetime="2024-12-01 14:30:00" \
/var/lib/mysql/mysql-bin.000001 | \
docker exec -i lcbp3-mariadb mysql -u root -p lcbp3_dms
```
---
## 📁 File Uploads Recovery
### Restore from Backup
```bash
# Stop backend to prevent file operations
docker stop lcbp3-backend
# Restore files
rsync -av \
/backup/lcbp3-dms/uploads/current/ \
/var/lib/docker/volumes/lcbp3_uploads/_data/
# Verify permissions
docker exec lcbp3-backend chown -R node:node /app/uploads
# Restart backend
docker start lcbp3-backend
```
---
## 🚨 Disaster Recovery Plan
### RTO & RPO
- **RTO (Recovery Time Objective):** 4 hours
- **RPO (Recovery Point Objective):** 24 hours (for files), 6 hours (for database)
### DR Scenarios
#### Scenario 1: Database Corruption
**Detection:** Database errors in logs, application errors
**Recovery Time:** 30 minutes
**Steps:**
1. Stop backend
2. Restore last full backup
3. Apply binary logs (if needed)
4. Verify data integrity
5. Restart services
#### Scenario 2: Complete Server Failure
**Detection:** Server unresponsive
**Recovery Time:** 4 hours
**Steps:**
1. Provision new QNAP server or VM
2. Install Docker & Container Station
3. Clone Git repository
4. Restore database backup
5. Restore file uploads
6. Deploy containers
7. Update DNS (if needed)
8. Verify functionality
#### Scenario 3: Ransomware Attack
**Detection:** Encrypted files, ransom note
**Recovery Time:** 6 hours
**Steps:**
1. **DO NOT pay ransom**
2. Isolate infected server
3. Provision clean environment
4. Restore from offsite backup
5. Scan restored backup for malware
6. Deploy and verify
7. Review security logs
8. Implement additional security measures
---
## ✅ Backup Verification
### Weekly Backup Testing
```bash
#!/bin/bash
# File: /scripts/test-backup.sh
# Create temporary test database
docker exec lcbp3-mariadb mysql -u root -p -e "
CREATE DATABASE IF NOT EXISTS test_restore;
"
# Restore latest backup to test database
LATEST_BACKUP=$(ls -t /backup/lcbp3-dms/database/*.sql.gz | head -1)
gunzip < "$LATEST_BACKUP" | \
sed 's/USE `lcbp3_dms`/USE `test_restore`/g' | \
docker exec -i lcbp3-mariadb mysql -u root -p
# Verify table counts
docker exec lcbp3-mariadb mysql -u root -p -e "
SELECT COUNT(*) FROM test_restore.users;
SELECT COUNT(*) FROM test_restore.correspondences;
"
# Cleanup
docker exec lcbp3-mariadb mysql -u root -p -e "
DROP DATABASE test_restore;
"
echo "Backup verification completed: $(date)"
```
### Monthly DR Drill
- Test full system restore on standby server
- Document time taken and issues encountered
- Update DR procedures based on findings
---
## 📊 Backup Monitoring
### Backup Status Dashboard
Monitor:
- ✅ Last successful backup timestamp
- ✅ Backup file size (detect anomalies)
- ✅ Backup success/failure rate
- ✅ Available backup storage space
### Alerts
Send alert if:
- ❌ Backup fails
- ❌ Backup file size < 50% of average (possible corruption)
- ❌ No backup in last 48 hours
- ❌ Backup storage < 20% free
---
## 🔧 Maintenance
### Optimize Backup Performance
```sql
-- Enable InnoDB compression for large tables
ALTER TABLE correspondences ROW_FORMAT=COMPRESSED;
ALTER TABLE workflow_history ROW_FORMAT=COMPRESSED;
-- Archive old audit logs
-- Move records older than 1 year to archive table
INSERT INTO audit_logs_archive
SELECT * FROM audit_logs
WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);
DELETE FROM audit_logs
WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);
```
---
## 📚 Backup Checklist
### Daily Tasks
- [ ] Verify automated backups completed
- [ ] Check backup log files for errors
- [ ] Monitor backup storage space
### Weekly Tasks
- [ ] Test restore from random backup
- [ ] Review backup size trends
- [ ] Verify offsite backups synced
### Monthly Tasks
- [ ] Full DR drill
- [ ] Review and update DR procedures
- [ ] Test backup restoration on different server
### Quarterly Tasks
- [ ] Audit backup access controls
- [ ] Review backup retention policies
- [ ] Update backup documentation
---
## 🔗 Related Documents
- [Deployment Guide](04-01-deployment-guide.md)
- [Monitoring & Alerting](04-03-monitoring-alerting.md)
- [Incident Response](04-07-incident-response.md)
---
**Version:** 1.8.0
**Last Review:** 2025-12-01
**Next Review:** 2026-03-01
---
# Backup Strategy สำหรับ LCBP3-DMS
> 📍 **Deploy on:** ASUSTOR AS5403T (Infrastructure Server)
> 🎯 **Backup Target:** QNAP TS-473A (Application & Database)
> 📄 **Version:** v1.8.0
---
## Overview
ระบบ Backup แบบ Pull-based: ASUSTOR ดึงข้อมูลจาก QNAP เพื่อความปลอดภัย
หาก QNAP ถูกโจมตี ผู้โจมตีจะไม่สามารถลบ Backup บน ASUSTOR ได้
```
┌─────────────────────────────────────────────────────────────────┐
│ BACKUP ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ QNAP (Source) ASUSTOR (Backup Target) │
│ 192.168.10.8 192.168.10.9 │
│ │
│ ┌──────────────┐ SSH/Rsync ┌──────────────────────┐ │
│ │ MariaDB │ ─────────────▶ │ /volume1/backup/db/ │ │
│ │ (mysqldump) │ Daily 2AM │ (Restic Repository) │ │
│ └──────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Redis RDB │ ─────────────▶ │ /volume1/backup/ │ │
│ │ + AOF │ Daily 3AM │ redis/ │ │
│ └──────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ App Config │ ─────────────▶ │ /volume1/backup/ │ │
│ │ + Volumes │ Weekly Sun │ config/ │ │
│ └──────────────┘ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
---
## 1. MariaDB Backup
### 1.1 Daily Database Backup Script
```bash
#!/bin/bash
# File: /volume1/np-dms/scripts/backup-mariadb.sh
# Run on: ASUSTOR (Pull from QNAP)
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/volume1/backup/db"
QNAP_IP="192.168.10.8"
DB_NAME="lcbp3_db"
DB_USER="root"
DB_PASSWORD="${MARIADB_ROOT_PASSWORD}"
echo "🔄 Starting MariaDB backup at $DATE"
# Create backup directory
mkdir -p $BACKUP_DIR
# Remote mysqldump via SSH
ssh admin@$QNAP_IP "docker exec mariadb mysqldump \
--single-transaction \
--routines \
--triggers \
-u $DB_USER -p$DB_PASSWORD $DB_NAME" > $BACKUP_DIR/lcbp3_$DATE.sql
# Compress
gzip $BACKUP_DIR/lcbp3_$DATE.sql
# Add to Restic repository
restic -r $BACKUP_DIR/restic-repo backup $BACKUP_DIR/lcbp3_$DATE.sql.gz
# Keep only last 30 days of raw files
find $BACKUP_DIR -name "lcbp3_*.sql.gz" -mtime +30 -delete
echo "✅ MariaDB backup complete: lcbp3_$DATE.sql.gz"
```
### 1.2 Cron Schedule (ASUSTOR)
```cron
# MariaDB daily backup at 2 AM
0 2 * * * /volume1/np-dms/scripts/backup-mariadb.sh >> /var/log/backup-mariadb.log 2>&1
```
---
## 2. Redis Backup
### 2.1 Redis Backup Script
```bash
#!/bin/bash
# File: /volume1/np-dms/scripts/backup-redis.sh
# Run on: ASUSTOR (Pull from QNAP)
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/volume1/backup/redis"
QNAP_IP="192.168.10.8"
echo "🔄 Starting Redis backup at $DATE"
mkdir -p $BACKUP_DIR
# Trigger BGSAVE on QNAP Redis
ssh admin@$QNAP_IP "docker exec cache redis-cli BGSAVE"
sleep 10
# Copy RDB and AOF files
scp admin@$QNAP_IP:/share/np-dms/services/cache/data/dump.rdb $BACKUP_DIR/redis_$DATE.rdb
scp admin@$QNAP_IP:/share/np-dms/services/cache/data/appendonly.aof $BACKUP_DIR/redis_$DATE.aof
# Compress
tar -czf $BACKUP_DIR/redis_$DATE.tar.gz \
$BACKUP_DIR/redis_$DATE.rdb \
$BACKUP_DIR/redis_$DATE.aof
# Cleanup raw files
rm $BACKUP_DIR/redis_$DATE.rdb $BACKUP_DIR/redis_$DATE.aof
echo "✅ Redis backup complete: redis_$DATE.tar.gz"
```
### 2.2 Cron Schedule
```cron
# Redis daily backup at 3 AM
0 3 * * * /volume1/np-dms/scripts/backup-redis.sh >> /var/log/backup-redis.log 2>&1
```
---
## 3. Application Config Backup
### 3.1 Weekly Config Backup Script
```bash
#!/bin/bash
# File: /volume1/np-dms/scripts/backup-config.sh
# Run on: ASUSTOR (Pull from QNAP)
DATE=$(date +%Y%m%d)
BACKUP_DIR="/volume1/backup/config"
QNAP_IP="192.168.10.8"
echo "🔄 Starting config backup at $DATE"
mkdir -p $BACKUP_DIR
# Sync Docker compose files and configs
rsync -avz --delete \
admin@$QNAP_IP:/share/np-dms/ \
$BACKUP_DIR/np-dms_$DATE/ \
--exclude='*/data/*' \
--exclude='*/logs/*' \
--exclude='node_modules'
# Compress
tar -czf $BACKUP_DIR/config_$DATE.tar.gz $BACKUP_DIR/np-dms_$DATE
# Cleanup
rm -rf $BACKUP_DIR/np-dms_$DATE
echo "✅ Config backup complete: config_$DATE.tar.gz"
```
### 3.2 Cron Schedule
```cron
# Config weekly backup on Sunday at 4 AM
0 4 * * 0 /volume1/np-dms/scripts/backup-config.sh >> /var/log/backup-config.log 2>&1
```
---
## 4. Retention Policy
| Backup Type | Frequency | Retention | Storage Est. |
| :---------- | :-------- | :-------- | :----------- |
| MariaDB | Daily | 30 days | ~5GB/month |
| Redis | Daily | 7 days | ~500MB |
| Config | Weekly | 4 weeks | ~200MB |
| Restic | Daily | 6 months | Deduplicated |
---
## 5. Restic Repository Setup
```bash
# Initialize Restic repository (one-time)
restic init -r /volume1/backup/restic-repo
# Set password in environment
export RESTIC_PASSWORD="your-secure-backup-password"
# Check repository status
restic -r /volume1/backup/restic-repo snapshots
# Prune old snapshots (keep 30 daily, 4 weekly, 6 monthly)
restic -r /volume1/backup/restic-repo forget \
--keep-daily 30 \
--keep-weekly 4 \
--keep-monthly 6 \
--prune
```
---
## 6. Verification Script
```bash
#!/bin/bash
# File: /volume1/np-dms/scripts/verify-backup.sh
echo "📋 Backup Verification Report"
echo "=============================="
echo ""
# Check latest MariaDB backup
LATEST_DB=$(ls -t /volume1/backup/db/*.sql.gz 2>/dev/null | head -1)
if [ -n "$LATEST_DB" ]; then
echo "✅ Latest DB backup: $LATEST_DB"
echo " Size: $(du -h $LATEST_DB | cut -f1)"
else
echo "❌ No DB backup found!"
fi
# Check latest Redis backup
LATEST_REDIS=$(ls -t /volume1/backup/redis/*.tar.gz 2>/dev/null | head -1)
if [ -n "$LATEST_REDIS" ]; then
echo "✅ Latest Redis backup: $LATEST_REDIS"
else
echo "❌ No Redis backup found!"
fi
# Check Restic repository
echo ""
echo "📦 Restic Snapshots:"
restic -r /volume1/backup/restic-repo snapshots --latest 5
```
---
> 📝 **หมายเหตุ**: เอกสารนี้อ้างอิงจาก Architecture Document **v1.8.0**
---
# Disaster Recovery Plan สำหรับ LCBP3-DMS
> 📍 **Version:** v1.8.0
> 🖥️ **Primary Server:** QNAP TS-473A (Application & Database)
> 💾 **Backup Server:** ASUSTOR AS5403T (Infrastructure & Backup)
---
## RTO/RPO Targets
| Scenario | RTO | RPO | Priority |
| :-------------------------- | :------ | :----- | :------- |
| Single backend node failure | 0 min | 0 | P0 |
| Redis failure | 5 min | 0 | P0 |
| MariaDB failure | 10 min | 0 | P0 |
| QNAP total failure | 2 hours | 15 min | P1 |
| Data corruption | 4 hours | 1 day | P2 |
---
## 1. Quick Recovery Procedures
### 1.1 Service Not Responding
```bash
# Check container status
docker ps -a | grep <service-name>
# Restart specific service
docker restart <container-name>
# Check logs for errors
docker logs <container-name> --tail 100
```
### 1.2 Redis Failure
```bash
# Check status
docker exec cache redis-cli ping
# Restart
docker restart cache
# Verify
docker exec cache redis-cli ping
```
### 1.3 MariaDB Failure
```bash
# Check status
docker exec mariadb mysql -u root -p -e "SELECT 1"
# Restart
docker restart mariadb
# Wait for startup
sleep 30
# Verify
docker exec mariadb mysql -u root -p -e "SHOW DATABASES"
```
---
## 2. Full System Recovery
### 2.1 Recovery Prerequisites (ASUSTOR)
ตรวจสอบว่า Backup files พร้อมใช้งาน:
```bash
# SSH to ASUSTOR
ssh admin@192.168.10.9
# List available backups
ls -la /volume1/backup/db/
ls -la /volume1/backup/redis/
ls -la /volume1/backup/config/
# Check Restic snapshots
restic -r /volume1/backup/restic-repo snapshots
```
### 2.2 QNAP Recovery Script
```bash
#!/bin/bash
# File: /volume1/np-dms/scripts/disaster-recovery.sh
# Run on: ASUSTOR (Push to QNAP)
QNAP_IP="192.168.10.8"
BACKUP_DIR="/volume1/backup"
echo "🚨 Starting Disaster Recovery..."
echo "================================"
# 1. Restore Docker Network
echo "1⃣ Creating Docker network..."
ssh admin@$QNAP_IP "docker network create lcbp3 || true"
# 2. Restore config files
echo "2⃣ Restoring configuration files..."
LATEST_CONFIG=$(ls -t $BACKUP_DIR/config/*.tar.gz | head -1)
tar -xzf $LATEST_CONFIG -C /tmp/
rsync -avz /tmp/np-dms/ admin@$QNAP_IP:/share/np-dms/
# 3. Start infrastructure services
echo "3⃣ Starting MariaDB..."
ssh admin@$QNAP_IP "cd /share/np-dms/mariadb && docker-compose up -d"
sleep 30
# 4. Restore database
echo "4⃣ Restoring database..."
LATEST_DB=$(ls -t $BACKUP_DIR/db/*.sql.gz | head -1)
gunzip -c $LATEST_DB | ssh admin@$QNAP_IP "docker exec -i mariadb mysql -u root -p\$MYSQL_ROOT_PASSWORD lcbp3_db"
# 5. Start Redis
echo "5⃣ Starting Redis..."
ssh admin@$QNAP_IP "cd /share/np-dms/services && docker-compose up -d cache"
# 6. Restore Redis data (if needed)
echo "6⃣ Restoring Redis data..."
LATEST_REDIS=$(ls -t $BACKUP_DIR/redis/*.tar.gz | head -1)
tar -xzf $LATEST_REDIS -C /tmp/
scp /tmp/redis_*.rdb admin@$QNAP_IP:/share/np-dms/services/cache/data/dump.rdb
ssh admin@$QNAP_IP "docker restart cache"
# 7. Start remaining services
echo "7⃣ Starting application services..."
ssh admin@$QNAP_IP "cd /share/np-dms/services && docker-compose up -d"
ssh admin@$QNAP_IP "cd /share/np-dms/npm && docker-compose up -d"
# 8. Health check
echo "8⃣ Running health checks..."
sleep 60
curl -f https://lcbp3.np-dms.work/health || echo "⚠️ Frontend not ready"
curl -f https://backend.np-dms.work/health || echo "⚠️ Backend not ready"
echo ""
echo "✅ Disaster Recovery Complete"
echo "⚠️ Please verify system functionality manually"
```
---
## 3. Data Corruption Recovery
### 3.1 Point-in-Time Recovery (Database)
```bash
# List available Restic snapshots
restic -r /volume1/backup/restic-repo snapshots
# Restore specific snapshot
restic -r /volume1/backup/restic-repo restore <snapshot-id> --target /tmp/restore/
# Apply restored backup
gunzip -c /tmp/restore/lcbp3_*.sql.gz | \
ssh admin@192.168.10.8 "docker exec -i mariadb mysql -u root -p\$MYSQL_ROOT_PASSWORD lcbp3_db"
```
### 3.2 Selective Table Recovery
```bash
# Extract specific tables from backup
gunzip -c /volume1/backup/db/lcbp3_YYYYMMDD.sql.gz | \
grep -A1000 "CREATE TABLE \`documents\`" | \
grep -B1000 "UNLOCK TABLES" > /tmp/documents_table.sql
# Restore specific table
ssh admin@192.168.10.8 "docker exec -i mariadb mysql -u root -p\$MYSQL_ROOT_PASSWORD lcbp3_db" < /tmp/documents_table.sql
```
---
## 4. Communication & Escalation
### 4.1 Incident Response
| Severity | Response Time | Notify |
| :------- | :------------ | :----------------------------- |
| P0 | Immediate | Admin Team + Management |
| P1 | 30 minutes | Admin Team |
| P2 | 2 hours | Admin Team (next business day) |
### 4.2 Post-Incident Checklist
- [ ] Identify root cause
- [ ] Document timeline of events
- [ ] Verify all services restored
- [ ] Check data integrity
- [ ] Update monitoring alerts if needed
- [ ] Create incident report
---
## 5. Testing Schedule
| Test Type | Frequency | Last Tested | Next Due |
| :---------------------- | :-------- | :---------- | :------- |
| Backup Verification | Weekly | - | - |
| Single Service Recovery | Monthly | - | - |
| Full DR Test | Quarterly | - | - |
---
> 📝 **หมายเหตุ**: เอกสารนี้อ้างอิงจาก Architecture Document **v1.8.0**

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,937 @@
# Deployment Guide: LCBP3-DMS
---
**Project:** LCBP3-DMS (Laem Chabang Port Phase 3 - Document Management System)
**Version:** 1.8.0
**Last Updated:** 2025-12-02
**Owner:** Operations Team
**Status:** Active
---
## 📋 Overview
This guide provides step-by-step instructions for deploying the LCBP3-DMS system on QNAP Container Station using Docker Compose with Blue-Green deployment strategy.
### Deployment Strategy
- **Platform:** QNAP TS-473A with Container Station
- **Orchestration:** Docker Compose
- **Deployment Method:** Blue-Green Deployment
- **Zero Downtime:** Yes
- **Rollback Capability:** Instant rollback via NGINX switch
---
## 🎯 Prerequisites
### Hardware Requirements
| Component | Minimum Specification |
| ---------- | -------------------------- |
| CPU | 4 cores @ 2.0 GHz |
| RAM | 16 GB |
| Storage | 500 GB SSD (System + Data) |
| Network | 1 Gbps Ethernet |
| QNAP Model | TS-473A or equivalent |
### Software Requirements
| Software | Version | Purpose |
| ----------------- | ------- | ------------------------ |
| QNAP QTS | 5.x+ | Operating System |
| Container Station | 3.x+ | Docker Management |
| Docker | 20.10+ | Container Runtime |
| Docker Compose | 2.x+ | Multi-container Orchestr |
### Network Requirements
- Static IP address for QNAP server
- Domain name (e.g., `lcbp3-dms.example.com`)
- SSL certificate (Let's Encrypt or commercial)
- Firewall rules:
- Port 80 (HTTP → HTTPS redirect)
- Port 443 (HTTPS)
- Port 22 (SSH for management)
---
## 🏗️ Infrastructure Setup
### 1. Directory Structure
Create the following directory structure on QNAP:
```bash
# SSH into QNAP
ssh admin@qnap-ip
# Create base directory
mkdir -p /volume1/lcbp3
# Create blue-green environments
mkdir -p /volume1/lcbp3/blue
mkdir -p /volume1/lcbp3/green
# Create shared directories
mkdir -p /volume1/lcbp3/shared/uploads
mkdir -p /volume1/lcbp3/shared/logs
mkdir -p /volume1/lcbp3/shared/backups
# Create persistent volumes
mkdir -p /volume1/lcbp3/volumes/mariadb-data
mkdir -p /volume1/lcbp3/volumes/redis-data
mkdir -p /volume1/lcbp3/volumes/elastic-data
# Create NGINX proxy directory
mkdir -p /volume1/lcbp3/nginx-proxy
# Set permissions
chmod -R 755 /volume1/lcbp3
chown -R admin:administrators /volume1/lcbp3
```
**Final Structure:**
```
/volume1/lcbp3/
├── blue/ # Blue environment
│ ├── docker-compose.yml
│ ├── .env.production
│ └── nginx.conf
├── green/ # Green environment
│ ├── docker-compose.yml
│ ├── .env.production
│ └── nginx.conf
├── nginx-proxy/ # Main reverse proxy
│ ├── docker-compose.yml
│ ├── nginx.conf
│ └── ssl/
│ ├── cert.pem
│ └── key.pem
├── shared/ # Shared across blue/green
│ ├── uploads/
│ ├── logs/
│ └── backups/
├── volumes/ # Persistent data
│ ├── mariadb-data/
│ ├── redis-data/
│ └── elastic-data/
├── scripts/ # Deployment scripts
│ ├── deploy.sh
│ ├── rollback.sh
│ └── health-check.sh
└── current # File containing "blue" or "green"
```
### 2. SSL Certificate Setup
```bash
# Option 1: Let's Encrypt (Recommended)
# Install certbot on QNAP
opkg install certbot
# Generate certificate
certbot certonly --standalone \
-d lcbp3-dms.example.com \
--email admin@example.com \
--agree-tos
# Copy to nginx-proxy
cp /etc/letsencrypt/live/lcbp3-dms.example.com/fullchain.pem \
/volume1/lcbp3/nginx-proxy/ssl/cert.pem
cp /etc/letsencrypt/live/lcbp3-dms.example.com/privkey.pem \
/volume1/lcbp3/nginx-proxy/ssl/key.pem
# Option 2: Commercial Certificate
# Upload cert.pem and key.pem to /volume1/lcbp3/nginx-proxy/ssl/
```
---
## 📝 Configuration Files
### 1. Environment Variables (.env.production)
Create `.env.production` in both `blue/` and `green/` directories:
```bash
# File: /volume1/lcbp3/blue/.env.production
# DO NOT commit this file to Git!
# Application
NODE_ENV=production
APP_NAME=LCBP3-DMS
APP_URL=https://lcbp3-dms.example.com
# Database
DB_HOST=lcbp3-mariadb
DB_PORT=3306
DB_USERNAME=lcbp3_user
DB_PASSWORD=<CHANGE_ME_STRONG_PASSWORD>
DB_DATABASE=lcbp3_dms
DB_POOL_SIZE=20
# Redis
REDIS_HOST=lcbp3-redis
REDIS_PORT=6379
REDIS_PASSWORD=<CHANGE_ME_STRONG_PASSWORD>
REDIS_DB=0
# JWT Authentication
JWT_SECRET=<CHANGE_ME_RANDOM_64_CHAR_STRING>
JWT_EXPIRES_IN=8h
JWT_REFRESH_EXPIRES_IN=7d
# File Storage
UPLOAD_PATH=/app/uploads
MAX_FILE_SIZE=52428800
ALLOWED_FILE_TYPES=.pdf,.doc,.docx,.xls,.xlsx,.dwg,.zip
# Email (SMTP)
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_SECURE=false
SMTP_USERNAME=<YOUR_EMAIL>
SMTP_PASSWORD=<YOUR_APP_PASSWORD>
SMTP_FROM=noreply@example.com
# Elasticsearch
ELASTICSEARCH_NODE=http://lcbp3-elasticsearch:9200
ELASTICSEARCH_USERNAME=elastic
ELASTICSEARCH_PASSWORD=<CHANGE_ME>
# Rate Limiting
THROTTLE_TTL=60
THROTTLE_LIMIT=100
# Logging
LOG_LEVEL=info
LOG_FILE_PATH=/app/logs
# ClamAV (Virus Scanning)
CLAMAV_HOST=lcbp3-clamav
CLAMAV_PORT=3310
```
### 2. Docker Compose - Blue Environment
```yaml
# File: /volume1/lcbp3/blue/docker-compose.yml
version: '3.8'
services:
backend:
image: lcbp3-backend:latest
container_name: lcbp3-blue-backend
restart: unless-stopped
env_file:
- .env.production
volumes:
- /volume1/lcbp3/shared/uploads:/app/uploads
- /volume1/lcbp3/shared/logs:/app/logs
depends_on:
- mariadb
- redis
- elasticsearch
networks:
- lcbp3-network
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:3000/health']
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
frontend:
image: lcbp3-frontend:latest
container_name: lcbp3-blue-frontend
restart: unless-stopped
environment:
- NEXT_PUBLIC_API_URL=https://lcbp3-dms.example.com/api
depends_on:
- backend
networks:
- lcbp3-network
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:3000']
interval: 30s
timeout: 10s
retries: 3
mariadb:
image: mariadb:11.8
container_name: lcbp3-mariadb
restart: unless-stopped
environment:
MYSQL_ROOT_PASSWORD: ${DB_PASSWORD}
MYSQL_DATABASE: ${DB_DATABASE}
MYSQL_USER: ${DB_USERNAME}
MYSQL_PASSWORD: ${DB_PASSWORD}
volumes:
- /volume1/lcbp3/volumes/mariadb-data:/var/lib/mysql
networks:
- lcbp3-network
command: >
--character-set-server=utf8mb4
--collation-server=utf8mb4_unicode_ci
--max_connections=200
--innodb_buffer_pool_size=2G
healthcheck:
test: ['CMD', 'mysqladmin', 'ping', '-h', 'localhost']
interval: 10s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
container_name: lcbp3-redis
restart: unless-stopped
command: >
redis-server
--requirepass ${REDIS_PASSWORD}
--appendonly yes
--appendfsync everysec
--maxmemory 2gb
--maxmemory-policy allkeys-lru
volumes:
- /volume1/lcbp3/volumes/redis-data:/data
networks:
- lcbp3-network
healthcheck:
test: ['CMD', 'redis-cli', 'ping']
interval: 10s
timeout: 3s
retries: 3
elasticsearch:
image: elasticsearch:8.11.0
container_name: lcbp3-elasticsearch
restart: unless-stopped
environment:
- discovery.type=single-node
- xpack.security.enabled=true
- ELASTIC_PASSWORD=${ELASTICSEARCH_PASSWORD}
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
volumes:
- /volume1/lcbp3/volumes/elastic-data:/usr/share/elasticsearch/data
networks:
- lcbp3-network
healthcheck:
test: ['CMD-SHELL', 'curl -f http://localhost:9200/_cluster/health || exit 1']
interval: 30s
timeout: 10s
retries: 5
networks:
lcbp3-network:
name: lcbp3-blue-network
driver: bridge
```
### 3. Docker Compose - NGINX Proxy
```yaml
# File: /volume1/lcbp3/nginx-proxy/docker-compose.yml
version: '3.8'
services:
nginx:
image: nginx:alpine
container_name: lcbp3-nginx
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./ssl:/etc/nginx/ssl:ro
- /volume1/lcbp3/shared/logs/nginx:/var/log/nginx
networks:
- lcbp3-blue-network
- lcbp3-green-network
healthcheck:
test: ['CMD', 'nginx', '-t']
interval: 30s
timeout: 10s
retries: 3
networks:
lcbp3-blue-network:
external: true
lcbp3-green-network:
external: true
```
### 4. NGINX Configuration
```nginx
# File: /volume1/lcbp3/nginx-proxy/nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
client_max_body_size 50M;
# Gzip compression
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml text/javascript
application/json application/javascript application/xml+rss;
# Upstream backends (switch between blue/green)
upstream backend {
server lcbp3-blue-backend:3000 max_fails=3 fail_timeout=30s;
keepalive 32;
}
upstream frontend {
server lcbp3-blue-frontend:3000 max_fails=3 fail_timeout=30s;
keepalive 32;
}
# HTTP to HTTPS redirect
server {
listen 80;
server_name lcbp3-dms.example.com;
return 301 https://$server_name$request_uri;
}
# HTTPS server
server {
listen 443 ssl http2;
server_name lcbp3-dms.example.com;
# SSL configuration
ssl_certificate /etc/nginx/ssl/cert.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
# Security headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
# Frontend (Next.js)
location / {
proxy_pass http://frontend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
}
# Backend API
location /api {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts for file uploads
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
}
# Health check endpoint (no logging)
location /health {
proxy_pass http://backend/health;
access_log off;
}
# Static files caching
location ~* \.(jpg|jpeg|png|gif|ico|css|js|svg|woff|woff2|ttf|eot)$ {
proxy_pass http://frontend;
expires 1y;
add_header Cache-Control "public, immutable";
}
}
}
```
---
## 🚀 Initial Deployment
### Step 1: Prepare Docker Images
```bash
# Build images (on development machine)
cd /path/to/lcbp3/backend
docker build -t lcbp3-backend:1.0.0 .
docker tag lcbp3-backend:1.0.0 lcbp3-backend:latest
cd /path/to/lcbp3/frontend
docker build -t lcbp3-frontend:1.0.0 .
docker tag lcbp3-frontend:1.0.0 lcbp3-frontend:latest
# Save images to tar files
docker save lcbp3-backend:latest | gzip > lcbp3-backend-latest.tar.gz
docker save lcbp3-frontend:latest | gzip > lcbp3-frontend-latest.tar.gz
# Transfer to QNAP
scp lcbp3-backend-latest.tar.gz admin@qnap-ip:/volume1/lcbp3/
scp lcbp3-frontend-latest.tar.gz admin@qnap-ip:/volume1/lcbp3/
# Load images on QNAP
ssh admin@qnap-ip
cd /volume1/lcbp3
docker load < lcbp3-backend-latest.tar.gz
docker load < lcbp3-frontend-latest.tar.gz
```
### Step 2: Initialize Database
```bash
# Start MariaDB only
cd /volume1/lcbp3/blue
docker-compose up -d mariadb
# Wait for MariaDB to be ready
docker exec lcbp3-mariadb mysqladmin ping -h localhost
# Run migrations
docker-compose up -d backend
docker exec lcbp3-blue-backend npm run migration:run
# Seed initial data (if needed)
docker exec lcbp3-blue-backend npm run seed
```
### Step 3: Start Blue Environment
```bash
cd /volume1/lcbp3/blue
# Start all services
docker-compose up -d
# Check status
docker-compose ps
# View logs
docker-compose logs -f
# Wait for health checks
sleep 30
# Test health endpoint
curl http://localhost:3000/health
```
### Step 4: Start NGINX Proxy
```bash
cd /volume1/lcbp3/nginx-proxy
# Create networks (if not exist)
docker network create lcbp3-blue-network
docker network create lcbp3-green-network
# Start NGINX
docker-compose up -d
# Test NGINX configuration
docker exec lcbp3-nginx nginx -t
# Check NGINX logs
docker logs lcbp3-nginx
```
### Step 5: Set Current Environment
```bash
# Mark blue as current
echo "blue" > /volume1/lcbp3/current
```
### Step 6: Verify Deployment
```bash
# Test HTTPS endpoint
curl -k https://lcbp3-dms.example.com/health
# Test API
curl -k https://lcbp3-dms.example.com/api/health
# Check all containers
docker ps --filter "name=lcbp3"
# Check logs for errors
docker-compose -f /volume1/lcbp3/blue/docker-compose.yml logs --tail=100
```
---
## 🔄 Blue-Green Deployment Process
### Deployment Script
```bash
# File: /volume1/lcbp3/scripts/deploy.sh
#!/bin/bash
set -e # Exit on error
# Configuration
LCBP3_DIR="/volume1/lcbp3"
CURRENT=$(cat $LCBP3_DIR/current)
TARGET=$([[ "$CURRENT" == "blue" ]] && echo "green" || echo "blue")
echo "========================================="
echo "LCBP3-DMS Blue-Green Deployment"
echo "========================================="
echo "Current environment: $CURRENT"
echo "Target environment: $TARGET"
echo "========================================="
# Step 1: Backup database
echo "[1/9] Creating database backup..."
BACKUP_FILE="$LCBP3_DIR/shared/backups/db-backup-$(date +%Y%m%d-%H%M%S).sql"
docker exec lcbp3-mariadb mysqldump -u root -p${DB_PASSWORD} lcbp3_dms > $BACKUP_FILE
gzip $BACKUP_FILE
echo "✓ Backup created: $BACKUP_FILE.gz"
# Step 2: Pull latest images
echo "[2/9] Pulling latest Docker images..."
cd $LCBP3_DIR/$TARGET
docker-compose pull
echo "✓ Images pulled"
# Step 3: Update configuration
echo "[3/9] Updating configuration..."
# Copy .env if changed
if [ -f "$LCBP3_DIR/.env.production.new" ]; then
cp $LCBP3_DIR/.env.production.new $LCBP3_DIR/$TARGET/.env.production
echo "✓ Configuration updated"
fi
# Step 4: Start target environment
echo "[4/9] Starting $TARGET environment..."
docker-compose up -d
echo "$TARGET environment started"
# Step 5: Wait for services to be ready
echo "[5/9] Waiting for services to be healthy..."
sleep 10
# Check backend health
for i in {1..30}; do
if docker exec lcbp3-${TARGET}-backend curl -f http://localhost:3000/health > /dev/null 2>&1; then
echo "✓ Backend is healthy"
break
fi
if [ $i -eq 30 ]; then
echo "✗ Backend health check failed!"
docker-compose logs backend
exit 1
fi
sleep 2
done
# Step 6: Run database migrations
echo "[6/9] Running database migrations..."
docker exec lcbp3-${TARGET}-backend npm run migration:run
echo "✓ Migrations completed"
# Step 7: Switch NGINX to target environment
echo "[7/9] Switching NGINX to $TARGET..."
sed -i "s/lcbp3-${CURRENT}-backend/lcbp3-${TARGET}-backend/g" $LCBP3_DIR/nginx-proxy/nginx.conf
sed -i "s/lcbp3-${CURRENT}-frontend/lcbp3-${TARGET}-frontend/g" $LCBP3_DIR/nginx-proxy/nginx.conf
docker exec lcbp3-nginx nginx -t
docker exec lcbp3-nginx nginx -s reload
echo "✓ NGINX switched to $TARGET"
# Step 8: Verify new environment
echo "[8/9] Verifying new environment..."
sleep 5
if curl -f -k https://lcbp3-dms.example.com/health > /dev/null 2>&1; then
echo "✓ New environment is responding"
else
echo "✗ New environment verification failed!"
echo "Rolling back..."
./rollback.sh
exit 1
fi
# Step 9: Stop old environment
echo "[9/9] Stopping $CURRENT environment..."
cd $LCBP3_DIR/$CURRENT
docker-compose down
echo "$CURRENT environment stopped"
# Update current pointer
echo "$TARGET" > $LCBP3_DIR/current
echo "========================================="
echo "✓ Deployment completed successfully!"
echo "Active environment: $TARGET"
echo "========================================="
# Send notification (optional)
# /scripts/send-notification.sh "Deployment completed: $TARGET is now active"
```
### Rollback Script
```bash
# File: /volume1/lcbp3/scripts/rollback.sh
#!/bin/bash
set -e
LCBP3_DIR="/volume1/lcbp3"
CURRENT=$(cat $LCBP3_DIR/current)
PREVIOUS=$([[ "$CURRENT" == "blue" ]] && echo "green" || echo "blue")
echo "========================================="
echo "LCBP3-DMS Rollback"
echo "========================================="
echo "Current: $CURRENT"
echo "Rolling back to: $PREVIOUS"
echo "========================================="
# Switch NGINX back
echo "[1/3] Switching NGINX to $PREVIOUS..."
sed -i "s/lcbp3-${CURRENT}-backend/lcbp3-${PREVIOUS}-backend/g" $LCBP3_DIR/nginx-proxy/nginx.conf
sed -i "s/lcbp3-${CURRENT}-frontend/lcbp3-${PREVIOUS}-frontend/g" $LCBP3_DIR/nginx-proxy/nginx.conf
docker exec lcbp3-nginx nginx -s reload
echo "✓ NGINX switched"
# Start previous environment if stopped
echo "[2/3] Ensuring $PREVIOUS environment is running..."
cd $LCBP3_DIR/$PREVIOUS
docker-compose up -d
sleep 10
echo "$PREVIOUS environment is running"
# Verify
echo "[3/3] Verifying rollback..."
if curl -f -k https://lcbp3-dms.example.com/health > /dev/null 2>&1; then
echo "✓ Rollback successful"
echo "$PREVIOUS" > $LCBP3_DIR/current
else
echo "✗ Rollback verification failed!"
exit 1
fi
echo "========================================="
echo "✓ Rollback completed"
echo "Active environment: $PREVIOUS"
echo "========================================="
```
### Make Scripts Executable
```bash
chmod +x /volume1/lcbp3/scripts/deploy.sh
chmod +x /volume1/lcbp3/scripts/rollback.sh
```
---
## 📋 Deployment Checklist
### Pre-Deployment
- [ ] Backup current database
- [ ] Tag Docker images with version
- [ ] Update `.env.production` if needed
- [ ] Review migration scripts
- [ ] Notify stakeholders of deployment window
- [ ] Verify SSL certificate validity (> 30 days)
- [ ] Check disk space (> 20% free)
- [ ] Review recent error logs
### During Deployment
- [ ] Pull latest Docker images
- [ ] Start target environment (blue/green)
- [ ] Run database migrations
- [ ] Verify health checks pass
- [ ] Switch NGINX proxy
- [ ] Verify application responds correctly
- [ ] Check for errors in logs
- [ ] Monitor performance metrics
### Post-Deployment
- [ ] Monitor logs for 30 minutes
- [ ] Check performance metrics
- [ ] Verify all features working
- [ ] Test critical user flows
- [ ] Stop old environment
- [ ] Update deployment log
- [ ] Notify stakeholders of completion
- [ ] Archive old Docker images
---
## 🔍 Troubleshooting
### Common Issues
#### 1. Container Won't Start
```bash
# Check logs
docker logs lcbp3-blue-backend
# Check resource usage
docker stats
# Restart container
docker restart lcbp3-blue-backend
```
#### 2. Database Connection Failed
```bash
# Check MariaDB is running
docker ps | grep mariadb
# Test connection
docker exec lcbp3-mariadb mysql -u lcbp3_user -p -e "SELECT 1"
# Check environment variables
docker exec lcbp3-blue-backend env | grep DB_
```
#### 3. NGINX 502 Bad Gateway
```bash
# Check backend is running
curl http://localhost:3000/health
# Check NGINX configuration
docker exec lcbp3-nginx nginx -t
# Check NGINX logs
docker logs lcbp3-nginx
# Reload NGINX
docker exec lcbp3-nginx nginx -s reload
```
#### 4. Migration Failed
```bash
# Check migration status
docker exec lcbp3-blue-backend npm run migration:show
# Revert last migration
docker exec lcbp3-blue-backend npm run migration:revert
# Re-run migrations
docker exec lcbp3-blue-backend npm run migration:run
```
---
## 📊 Monitoring
### Health Checks
```bash
# Backend health
curl https://lcbp3-dms.example.com/health
# Database health
docker exec lcbp3-mariadb mysqladmin ping
# Redis health
docker exec lcbp3-redis redis-cli ping
# All containers status
docker ps --filter "name=lcbp3" --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
```
### Performance Monitoring
```bash
# Container resource usage
docker stats --no-stream
# Disk usage
df -h /volume1/lcbp3
# Database size
docker exec lcbp3-mariadb mysql -u root -p -e "
SELECT table_schema AS 'Database',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
FROM information_schema.tables
WHERE table_schema = 'lcbp3_dms'
GROUP BY table_schema;"
```
---
## 🔐 Security Best Practices
1. **Change Default Passwords:** Update all passwords in `.env.production`
2. **SSL/TLS:** Always use HTTPS in production
3. **Firewall:** Only expose ports 80, 443, and 22 (SSH)
4. **Regular Updates:** Keep Docker images updated
5. **Backup Encryption:** Encrypt database backups
6. **Access Control:** Limit SSH access to specific IPs
7. **Secrets Management:** Never commit `.env` files to Git
8. **Log Monitoring:** Review logs daily for suspicious activity
---
## 📚 Related Documentation
- [Environment Setup Guide](04-02-environment-setup.md)
- [Backup & Recovery](04-04-backup-recovery.md)
- [Monitoring & Alerting](04-03-monitoring-alerting.md)
- [Maintenance Procedures](04-05-maintenance-procedures.md)
- [ADR-015: Deployment Infrastructure](../05-decisions/ADR-015-deployment-infrastructure.md)
---
**Version:** 1.8.0
**Last Updated:** 2025-12-02
**Next Review:** 2026-06-01

View File

@@ -0,0 +1,501 @@
# Maintenance Procedures
**Project:** LCBP3-DMS
**Version:** 1.8.0
**Last Updated:** 2025-12-02
---
## 📋 Overview
This document outlines routine maintenance tasks, update procedures, and optimization guidelines for LCBP3-DMS.
---
## 📅 Maintenance Schedule
### Daily Tasks
- Monitor system health and backups
- Review error logs
- Check disk space
### Weekly Tasks
- Database optimization
- Log rotation and cleanup
- Security patch review
- Performance monitoring review
### Monthly Tasks
- SSL certificate check
- Dependency updates (Security patches)
- Database maintenance
- Backup restoration test
### Quarterly Tasks
- Full system update
- Capacity planning review
- Security audit
- Disaster recovery drill
---
## 🔄 Update Procedures
### Application Updates
#### Backend Update
```bash
#!/bin/bash
# File: /scripts/update-backend.sh
# Step 1: Backup database
/scripts/backup-database.sh
# Step 2: Pull latest code
cd /app/lcbp3/backend
git pull origin main
# Step 3: Install dependencies
docker exec lcbp3-backend npm install
# Step 4: Run migrations
docker exec lcbp3-backend npm run migration:run
# Step 5: Build application
docker exec lcbp3-backend npm run build
# Step 6: Restart backend
docker restart lcbp3-backend
# Step 7: Verify health
sleep 10
curl -f http://localhost:3000/health || {
echo "Health check failed! Rolling back..."
docker exec lcbp3-backend npm run migration:revert
docker restart lcbp3-backend
exit 1
}
echo "Backend updated successfully"
```
#### Frontend Update
```bash
#!/bin/bash
# File: /scripts/update-frontend.sh
# Step 1: Pull latest code
cd /app/lcbp3/frontend
git pull origin main
# Step 2: Install dependencies
docker exec lcbp3-frontend npm install
# Step 3: Build application
docker exec lcbp3-frontend npm run build
# Step 4: Restart frontend
docker restart lcbp3-frontend
# Step 5: Verify
sleep 10
curl -f http://localhost:3001 || {
echo "Frontend failed to start!"
exit 1
}
echo "Frontend updated successfully"
```
### Zero-Downtime Deployment
```bash
#!/bin/bash
# File: /scripts/zero-downtime-deploy.sh
# Using blue-green deployment strategy
# Step 1: Start new "green" backend
docker-compose -f docker-compose.green.yml up -d backend
# Step 2: Wait for health check
for i in {1..30}; do
curl -f http://localhost:3002/health && break
sleep 2
done
# Step 3: Switch NGINX to green
docker exec lcbp3-nginx nginx -s reload
# Step 4: Stop old "blue" backend
docker stop lcbp3-backend-blue
echo "Deployment completed with zero downtime"
```
---
## 🗄️ Database Maintenance
### Weekly Database Optimization
```sql
-- File: /scripts/optimize-database.sql
-- Optimize tables
OPTIMIZE TABLE correspondences;
OPTIMIZE TABLE rfas;
OPTIMIZE TABLE workflow_instances;
OPTIMIZE TABLE attachments;
-- Analyze tables for query optimization
ANALYZE TABLE correspondences;
ANALYZE TABLE rfas;
-- Check for table corruption
CHECK TABLE correspondences;
CHECK TABLE rfas;
-- Rebuild indexes if fragmented
ALTER TABLE correspondences ENGINE=InnoDB;
```
```bash
#!/bin/bash
# File: /scripts/weekly-db-maintenance.sh
docker exec lcbp3-mariadb mysql -u root -p lcbp3_dms < /scripts/optimize-database.sql
echo "Database optimization completed: $(date)"
```
### Monthly Database Cleanup
```sql
-- Archive old audit logs (older than 1 year)
INSERT INTO audit_logs_archive
SELECT * FROM audit_logs
WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);
DELETE FROM audit_logs
WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);
-- Clean up deleted notifications (older than 90 days)
DELETE FROM notifications
WHERE deleted_at IS NOT NULL
AND deleted_at < DATE_SUB(NOW(), INTERVAL 90 DAY);
-- Clean up expired temp uploads (older than 24h)
DELETE FROM temp_uploads
WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 DAY);
-- Optimize after cleanup
OPTIMIZE TABLE audit_logs;
OPTIMIZE TABLE notifications;
OPTIMIZE TABLE temp_uploads;
```
---
## 📦 Dependency Updates
### Security Patch Updates (Monthly)
```bash
#!/bin/bash
# File: /scripts/update-dependencies.sh
cd /app/lcbp3/backend
# Check for security vulnerabilities
npm audit
# Update security patches only (no major versions)
npm audit fix
# Run tests
npm test
# If tests pass, commit and deploy
git add package*.json
git commit -m "chore: security patch updates"
git push origin main
```
### Major Version Updates (Quarterly)
```bash
# Check for outdated packages
npm outdated
# Update one major dependency at a time
npm install @nestjs/core@latest
# Test thoroughly
npm test
npm run test:e2e
# If successful, commit
git commit -am "chore: update @nestjs/core to vX.X.X"
```
---
## 🧹 Log Management
### Log Rotation Configuration
```bash
# File: /etc/logrotate.d/lcbp3-dms
/app/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0640 node node
sharedscripts
postrotate
docker exec lcbp3-backend kill -USR1 1
endscript
}
```
### Manual Log Cleanup
```bash
#!/bin/bash
# File: /scripts/cleanup-logs.sh
# Delete logs older than 90 days
find /app/logs -name "*.log" -type f -mtime +90 -delete
# Compress logs older than 7 days
find /app/logs -name "*.log" -type f -mtime +7 -exec gzip {} \;
# Clean Docker logs
docker system prune -f --volumes --filter "until=720h"
echo "Log cleanup completed: $(date)"
```
---
## 🔐 SSL Certificate Renewal
### Check Certificate Expiry
```bash
#!/bin/bash
# File: /scripts/check-ssl-cert.sh
CERT_FILE="/app/nginx/ssl/cert.pem"
EXPIRY_DATE=$(openssl x509 -enddate -noout -in "$CERT_FILE" | cut -d= -f2)
EXPIRY_EPOCH=$(date -d "$EXPIRY_DATE" +%s)
NOW_EPOCH=$(date +%s)
DAYS_LEFT=$(( ($EXPIRY_EPOCH - $NOW_EPOCH) / 86400 ))
echo "SSL certificate expires in $DAYS_LEFT days"
if [ $DAYS_LEFT -lt 30 ]; then
echo "WARNING: SSL certificate expires soon!"
# Send alert
/scripts/send-alert-email.sh "SSL Certificate Expiring" "Certificate expires in $DAYS_LEFT days"
fi
```
### Renew SSL Certificate (Let's Encrypt)
```bash
#!/bin/bash
# File: /scripts/renew-ssl.sh
# Renew certificate
certbot renew --webroot -w /app/nginx/html
# Copy new certificate
cp /etc/letsencrypt/live/lcbp3-dms.example.com/fullchain.pem /app/nginx/ssl/cert.pem
cp /etc/letsencrypt/live/lcbp3-dms.example.com/privkey.pem /app/nginx/ssl/key.pem
# Reload NGINX
docker exec lcbp3-nginx nginx -s reload
echo "SSL certificate renewed: $(date)"
```
---
## 🧪 Performance Optimization
### Database Query Optimization
```sql
-- Find slow queries
SELECT * FROM mysql.slow_log
ORDER BY query_time DESC
LIMIT 10;
-- Add indexes for frequently queried columns
CREATE INDEX idx_correspondences_status ON correspondences(status);
CREATE INDEX idx_rfas_workflow_status ON rfas(workflow_status);
CREATE INDEX idx_attachments_entity ON attachments(entity_type, entity_id);
-- Analyze query execution plan
EXPLAIN SELECT * FROM correspondences
WHERE status = 'PENDING'
AND created_at > DATE_SUB(NOW(), INTERVAL 30 DAY);
```
### Redis Cache Optimization
```bash
#!/bin/bash
# File: /scripts/optimize-redis.sh
# Check Redis memory usage
docker exec lcbp3-redis redis-cli INFO memory
# Set max memory policy
docker exec lcbp3-redis redis-cli CONFIG SET maxmemory 1gb
docker exec lcbp3-redis redis-cli CONFIG SET maxmemory-policy allkeys-lru
# Save configuration
docker exec lcbp3-redis redis-cli CONFIG REWRITE
# Clear stale cache (if needed)
docker exec lcbp3-redis redis-cli FLUSHDB
```
### Application Performance Tuning
```typescript
// Enable production optimizations in NestJS
// File: backend/src/main.ts
async function bootstrap() {
const app = await NestFactory.create(AppModule, {
logger:
process.env.NODE_ENV === 'production'
? ['error', 'warn']
: ['log', 'error', 'warn', 'debug'],
});
// Enable compression
app.use(compression());
// Enable caching
app.useGlobalInterceptors(new CacheInterceptor());
// Set global timeout
app.use(timeout('30s'));
await app.listen(3000);
}
```
---
## 🔒 Security Maintenance
### Monthly Security Tasks
```bash
#!/bin/bash
# File: /scripts/security-maintenance.sh
# Update system packages
apt-get update && apt-get upgrade -y
# Update ClamAV virus definitions
docker exec lcbp3-clamav freshclam
# Scan for rootkits
rkhunter --check --skip-keypress
# Check for unauthorized users
awk -F: '($3 >= 1000) {print $1}' /etc/passwd
# Review sudo access
cat /etc/sudoers
# Check firewall rules
iptables -L -n -v
echo "Security maintenance completed: $(date)"
```
---
## ✅ Maintenance Checklist
### Pre-Maintenance
- [ ] Announce maintenance window to users
- [ ] Backup database and files
- [ ] Document current system state
- [ ] Prepare rollback plan
### During Maintenance
- [ ] Put system in maintenance mode (if needed)
- [ ] Perform updates/changes
- [ ] Run smoke tests
- [ ] Monitor system health
### Post-Maintenance
- [ ] Verify all services running
- [ ] Run full test suite
- [ ] Monitor performance metrics
- [ ] Communicate completion to users
- [ ] Document changes made
---
## 🔧 Emergency Maintenance
### Unplanned Maintenance Procedures
1. **Assess Urgency**
- Can it wait for scheduled maintenance?
- Is it causing active issues?
2. **Communicate Impact**
- Notify stakeholders immediately
- Estimate downtime
- Provide updates every 30 minutes
3. **Execute Carefully**
- Always backup first
- Have rollback plan ready
- Test in staging if possible
4. **Post-Maintenance Review**
- Document what happened
- Identify preventive measures
- Update runbooks
---
## 📚 Related Documents
- [Deployment Guide](04-01-deployment-guide.md)
- [Backup & Recovery](04-04-backup-recovery.md)
- [Monitoring & Alerting](04-03-monitoring-alerting.md)
---
**Version:** 1.8.0
**Last Review:** 2025-12-01
**Next Review:** 2026-03-01

View File

@@ -0,0 +1,444 @@
# Security Operations
**Project:** LCBP3-DMS
**Version:** 1.8.0
**Last Updated:** 2025-12-02
---
## 📋 Overview
This document outlines security monitoring, access control management, vulnerability management, and security incident response for LCBP3-DMS.
---
## 🔒 Access Control Management
### User Access Review
**Monthly Tasks:**
```bash
#!/bin/bash
# File: /scripts/audit-user-access.sh
# Export active users
docker exec lcbp3-mariadb mysql -u root -p -e "
SELECT user_id, username, email, primary_organization_id, is_active, last_login_at
FROM lcbp3_dms.users
WHERE is_active = 1
ORDER BY last_login_at DESC;
" > /reports/active-users-$(date +%Y%m%d).csv
# Find dormant accounts (no login > 90 days)
docker exec lcbp3-mariadb mysql -u root -p -e "
SELECT user_id, username, email, last_login_at,
DATEDIFF(NOW(), last_login_at) AS days_inactive
FROM lcbp3_dms.users
WHERE is_active = 1
AND (last_login_at IS NULL OR last_login_at < DATE_SUB(NOW(), INTERVAL 90 DAY));
"
echo "User access audit completed: $(date)"
```
### Role & Permission Audit
```sql
-- Review users with elevated permissions
SELECT u.username, u.email, r.role_name, r.scope
FROM users u
JOIN user_assignments ua ON u.user_id = ua.user_id
JOIN roles r ON ua.role_id = r.role_id
WHERE r.role_name IN ('Superadmin', 'Document Controller', 'Project Manager')
ORDER BY r.role_name, u.username;
-- Review Global scope roles (highest privilege)
SELECT u.username, r.role_name
FROM users u
JOIN user_assignments ua ON u.user_id = ua.user_id
JOIN roles r ON ua.role_id = r.role_id
WHERE r.scope = 'Global';
```
---
## 🛡️ Security Monitoring
### Log Monitoring for Security Events
```bash
#!/bin/bash
# File: /scripts/monitor-security-events.sh
# Check for failed login attempts
docker logs lcbp3-backend | grep "Failed login" | tail -20
# Check for unauthorized access attempts (403)
docker logs lcbp3-backend | grep "403" | tail -20
# Check for unusual activity patterns
docker logs lcbp3-backend | grep -E "DELETE|DROP|TRUNCATE" | tail -20
# Check for SQL injection attempts
docker logs lcbp3-backend | grep -i "SELECT.*FROM.*WHERE" | grep -v "legitimate" | tail -20
```
### Failed Login Monitoring
```sql
-- Find accounts with multiple failed login attempts
SELECT username, failed_attempts, locked_until
FROM users
WHERE failed_attempts >= 3
ORDER BY failed_attempts DESC;
-- Unlock user account after verification
UPDATE users
SET failed_attempts = 0, locked_until = NULL
WHERE user_id = ?;
```
---
## 🔐 Secrets & Credentials Management
### Password Rotation Schedule
| Credential | Rotation Frequency | Owner |
| ---------------------- | ------------------------ | ------------ |
| Database Root Password | Every 90 days | DBA |
| Database App Password | Every 90 days | DevOps |
| JWT Secret | Every 180 days | Backend Team |
| Redis Password | Every 90 days | DevOps |
| SMTP Password | When provider requires | Operations |
| SSL Private Key | With certificate renewal | Operations |
### Password Rotation Procedure
```bash
#!/bin/bash
# File: /scripts/rotate-db-password.sh
# Generate new password
NEW_PASSWORD=$(openssl rand -base64 32)
# Update database user password
docker exec lcbp3-mariadb mysql -u root -p -e "
ALTER USER 'lcbp3_user'@'%' IDENTIFIED BY '$NEW_PASSWORD';
FLUSH PRIVILEGES;
"
# Update application .env file
sed -i "s/^DB_PASS=.*/DB_PASS=$NEW_PASSWORD/" /app/backend/.env
# Restart backend to apply new password
docker restart lcbp3-backend
# Verify connection
sleep 10
curl -f http://localhost:3000/health || {
echo "FAILED: Backend cannot connect with new password"
# Rollback procedure...
exit 1
}
echo "Database password rotated successfully: $(date)"
# Store password securely (e.g., password manager)
```
---
## 🚨 Vulnerability Management
### Dependency Vulnerability Scanning
```bash
#!/bin/bash
# File: /scripts/scan-vulnerabilities.sh
# Backend dependencies
cd /app/backend
npm audit --production
# Critical/High vulnerabilities
VULNERABILITIES=$(npm audit --production --json | jq '.metadata.vulnerabilities.high + .metadata.vulnerabilities.critical')
if [ "$VULNERABILITIES" -gt 0 ]; then
echo "WARNING: $VULNERABILITIES critical/high vulnerabilities found!"
npm audit --production > /reports/security-audit-$(date +%Y%m%d).txt
# Send alert
/scripts/send-alert-email.sh "Security Vulnerabilities Detected" "Found $VULNERABILITIES critical/high vulnerabilities"
fi
# Frontend dependencies
cd /app/frontend
npm audit --production
```
### Container Image Scanning
```bash
#!/bin/bash
# File: /scripts/scan-images.sh
# Install Trivy (if not installed)
# wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | apt-key add -
# echo "deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main" | tee -a /etc/apt/sources.list.d/trivy.list
# apt-get update && apt-get install trivy
# Scan Docker images
trivy image --severity HIGH,CRITICAL lcbp3-backend:latest
trivy image --severity HIGH,CRITICAL lcbp3-frontend:latest
trivy image --severity HIGH,CRITICAL mariadb:11.8
trivy image --severity HIGH,CRITICAL redis:7.2-alpine
```
---
## 🔍 Security Hardening
### Server Hardening Checklist
- [ ] Disable root SSH login
- [ ] Use SSH key authentication only
- [ ] Configure firewall (allow only necessary ports)
- [ ] Enable automatic security updates
- [ ] Remove unnecessary services
- [ ] Configure fail2ban for brute-force protection
- [ ] Enable SELinux/AppArmor
- [ ] Regular security patch updates
### Docker Security
```yaml
# docker-compose.yml - Security best practices
services:
backend:
# Run as non-root user
user: 'node:node'
# Read-only root filesystem
read_only: true
# No new privileges
security_opt:
- no-new-privileges:true
# Limit capabilities
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE
# Resource limits
deploy:
resources:
limits:
cpus: '2'
memory: 2G
reservations:
memory: 512M
```
### Database Security
```sql
-- Remove anonymous users
DELETE FROM mysql.user WHERE User='';
-- Remove test database
DROP DATABASE IF EXISTS test;
-- Remove remote root login
DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1');
-- Create dedicated backup user with minimal privileges
CREATE USER 'backup_user'@'localhost' IDENTIFIED BY 'STRONG_PASSWORD';
GRANT SELECT, LOCK TABLES, SHOW VIEW, EVENT, TRIGGER ON lcbp3_dms.* TO 'backup_user'@'localhost';
-- Enable SSL for database connections
-- GRANT USAGE ON *.* TO 'lcbp3_user'@'%' REQUIRE SSL;
FLUSH PRIVILEGES;
```
---
## 🚨 Security Incident Response
### Incident Classification
| Type | Examples | Response Time |
| ----------------------- | ---------------------------- | ---------------- |
| **Data Breach** | Unauthorized data access | Immediate (< 1h) |
| **Account Compromise** | Stolen credentials | Immediate (< 1h) |
| **DDoS Attack** | Service unavailable | Immediate (< 1h) |
| **Malware/Ransomware** | Infected systems | Immediate (< 1h) |
| **Unauthorized Access** | Failed authentication spikes | High (< 4h) |
| **Suspicious Activity** | Unusual patterns | Medium (< 24h) |
### Data Breach Response
**Immediate Actions:**
1. **Contain the breach**
```bash
# Block suspicious IPs at firewall level
iptables -A INPUT -s <SUSPICIOUS_IP> -j DROP
# Disable compromised user accounts
docker exec lcbp3-mariadb mysql -u root -p -e "
UPDATE lcbp3_dms.users
SET is_active = 0
WHERE user_id = <COMPROMISED_USER_ID>;
"
```
2. **Assess impact**
```sql
-- Check audit logs for unauthorized access
SELECT * FROM audit_logs
WHERE user_id = <COMPROMISED_USER_ID>
AND created_at >= '<SUSPECTED_START_TIME>'
ORDER BY created_at DESC;
-- Check what documents were accessed
SELECT DISTINCT entity_id, entity_type, action
FROM audit_logs
WHERE user_id = <COMPROMISED_USER_ID>;
```
3. **Notify stakeholders**
- Security officer
- Management
- Affected users (if applicable)
- Legal team (if required by law)
4. **Document everything**
- Timeline of events
- Data accessed/compromised
- Actions taken
- Lessons learned
### Account Compromise Response
```bash
#!/bin/bash
# File: /scripts/respond-account-compromise.sh
USER_ID=$1
# 1. Immediately disable account
docker exec lcbp3-mariadb mysql -u root -p -e "
UPDATE lcbp3_dms.users
SET is_active = 0,
locked_until = DATE_ADD(NOW(), INTERVAL 24 HOUR)
WHERE user_id = $USER_ID;
"
# 2. Invalidate all sessions
docker exec lcbp3-redis redis-cli DEL "session:user:$USER_ID:*"
# 3. Generate audit report
docker exec lcbp3-mariadb mysql -u root -p -e "
SELECT * FROM lcbp3_dms.audit_logs
WHERE user_id = $USER_ID
AND created_at >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
ORDER BY created_at DESC;
" > /reports/compromise-audit-user-$USER_ID-$(date +%Y%m%d).txt
# 4. Notify security team
/scripts/send-alert-email.sh "Account Compromise" "User ID $USER_ID has been compromised and disabled"
echo "Account compromise response completed for User ID: $USER_ID"
```
---
## 📊 Security Metrics & KPIs
### Monthly Security Report
| Metric | Target | Actual |
| --------------------------- | --------- | ------ |
| Failed Login Attempts | < 100/day | Track |
| Locked Accounts | < 5/month | Track |
| Critical Vulnerabilities | 0 | Track |
| High Vulnerabilities | < 5 | Track |
| Unpatched Systems | 0 | Track |
| Security Incidents | 0 | Track |
| Mean Time To Detect (MTTD) | < 1 hour | Track |
| Mean Time To Respond (MTTR) | < 4 hours | Track |
---
## 🔐 Compliance & Audit
### Audit Log Retention
- **Access Logs:** 1 year
- **Security Events:** 2 years
- **Admin Actions:** 3 years
- **Data Changes:** 7 years (as required)
### Compliance Checklist
- [ ] Regular security audits (quarterly)
- [ ] Penetration testing (annually)
- [ ] Access control reviews (monthly)
- [ ] Encryption at rest and in transit
- [ ] Secure password policies enforced
- [ ] Multi-factor authentication (if required)
- [ ] Data backup and recovery tested
- [ ] Incident response plan documented and tested
---
## ✅ Security Operations Checklist
### Daily
- [ ] Review security alerts and logs
- [ ] Monitor failed login attempts
- [ ] Check for unusual access patterns
- [ ] Verify backup completion
### Weekly
- [ ] Review user access logs
- [ ] Scan for vulnerabilities
- [ ] Update virus definitions
- [ ] Review firewall logs
### Monthly
- [ ] User access audit
- [ ] Role and permission review
- [ ] Security patch application
- [ ] Compliance review
### Quarterly
- [ ] Full security audit
- [ ] Penetration testing
- [ ] Disaster recovery drill
- [ ] Update security policies
---
## 🔗 Related Documents
- [Incident Response](04-07-incident-response.md)
- [Monitoring & Alerting](04-03-monitoring-alerting.md)
- [ADR-004: RBAC Implementation](../05-decisions/ADR-004-rbac-implementation.md)
---
**Version:** 1.8.0
**Last Review:** 2025-12-01
**Next Review:** 2026-03-01

View File

@@ -0,0 +1,483 @@
# Incident Response Procedures
**Project:** LCBP3-DMS
**Version:** 1.8.0
**Last Updated:** 2025-12-02
---
## 📋 Overview
This document outlines incident classification, response procedures, and post-incident reviews for LCBP3-DMS.
---
## 🚨 Incident Classification
### Severity Levels
| Severity | Description | Response Time | Examples |
| ----------------- | ---------------------------- | ----------------- | ----------------------------------------------- |
| **P0 - Critical** | Complete system outage | 15 minutes | Database down, All services unavailable |
| **P1 - High** | Major functionality impaired | 1 hour | Authentication failing, Cannot create documents |
| **P2 - Medium** | Degraded performance | 4 hours | Slow response time, Some features broken |
| **P3 - Low** | Minor issues | Next business day | UI glitch, Non-critical bug |
---
## 📞 Incident Response Team
### Roles & Responsibilities
**Incident Commander (IC)**
- Coordinates response efforts
- Makes final decisions
- Communicates with stakeholders
**Technical Lead (TL)**
- Diagnoses technical issues
- Implements fixes
- Coordinates with engineers
**Communications Lead (CL)**
- Updates stakeholders
- Manages internal/external communications
- Documents incident timeline
**On-Call Engineer**
- First responder
- Initial triage and investigation
- Escalates to appropriate team
---
## 🔄 Incident Response Workflow
```mermaid
flowchart TD
Start([Incident Detected]) --> Acknowledge[Acknowledge Incident]
Acknowledge --> Assess[Assess Severity]
Assess --> P0{Severity?}
P0 -->|P0/P1| Alert[Page Incident Commander]
P0 -->|P2/P3| Assign[Assign to On-Call]
Alert --> Investigate[Investigate Root Cause]
Assign --> Investigate
Investigate --> Mitigate[Implement Mitigation]
Mitigate --> Verify[Verify Resolution]
Verify --> Resolved{Resolved?}
Resolved -->|No| Escalate[Escalate/Re-assess]
Escalate --> Investigate
Resolved -->|Yes| Communicate[Communicate Resolution]
Communicate --> PostMortem[Schedule Post-Mortem]
PostMortem --> End([Close Incident])
```
---
## 📋 Incident Response Playbooks
### P0: Database Down
**Symptoms:**
- Backend returns 500 errors
- Cannot connect to database
- Health check fails
**Immediate Actions:**
1. **Verify Issue**
```bash
docker ps | grep mariadb
docker logs lcbp3-mariadb --tail=50
```
2. **Attempt Restart**
```bash
docker restart lcbp3-mariadb
```
3. **Check Database Process**
```bash
docker exec lcbp3-mariadb ps aux | grep mysql
```
4. **If Restart Fails:**
```bash
# Check disk space
df -h
# Check database logs for corruption
docker exec lcbp3-mariadb cat /var/log/mysql/error.log
# If corrupted, restore from backup
# See backup-recovery.md
```
5. **Escalate to DBA** if not resolved in 30 minutes
---
### P0: Complete System Outage
**Symptoms:**
- All services return 502/503
- Health checks fail
- Users cannot access system
**Immediate Actions:**
1. **Check Container Status**
```bash
docker-compose ps
# Identify which containers are down
```
2. **Restart All Services**
```bash
docker-compose restart
```
3. **Check QNAP Server Resources**
```bash
top
df -h
free -h
```
4. **Check Network**
```bash
ping 8.8.8.8
netstat -tlnp
```
5. **If Server Issue:**
- Reboot QNAP server
- Contact QNAP support
---
### P1: Authentication System Failing
**Symptoms:**
- Users cannot log in
- JWT validation fails
- 401 errors
**Immediate Actions:**
1. **Check Redis (Session Store)**
```bash
docker exec lcbp3-redis redis-cli ping
# Should return PONG
```
2. **Check JWT Secret Configuration**
```bash
docker exec lcbp3-backend env | grep JWT_SECRET
# Verify not empty
```
3. **Check Backend Logs**
```bash
docker logs lcbp3-backend --tail=100 | grep "JWT\|Auth"
```
4. **Temporary Mitigation:**
```bash
# Restart backend to reload config
docker restart lcbp3-backend
```
---
### P1: File Upload Failing
**Symptoms:**
- Users cannot upload files
- 500 errors on file upload
- "Disk full" errors
**Immediate Actions:**
1. **Check Disk Space**
```bash
df -h /var/lib/docker/volumes/lcbp3_uploads
```
2. **If Disk Full:**
```bash
# Clean up temp uploads
find /var/lib/docker/volumes/lcbp3_uploads/_data/temp \
-type f -mtime +1 -delete
```
3. **Check ClamAV (Virus Scanner)**
```bash
docker logs lcbp3-clamav --tail=50
docker restart lcbp3-clamav
```
4. **Check File Permissions**
```bash
docker exec lcbp3-backend ls -la /app/uploads
```
---
### P2: Slow Performance
**Symptoms:**
- Pages load slowly
- API response time > 2s
- Users complain about slowness
**Actions:**
1. **Check System Resources**
```bash
docker stats
# Identify high CPU/memory containers
```
2. **Check Database Performance**
```sql
-- Show slow queries
SHOW PROCESSLIST;
-- Check connections
SHOW STATUS LIKE 'Threads_connected';
```
3. **Check Redis**
```bash
docker exec lcbp3-redis redis-cli --stat
```
4. **Check Application Logs**
```bash
docker logs lcbp3-backend | grep "Slow request"
```
5. **Temporary Mitigation:**
- Restart slow containers
- Clear Redis cache if needed
- Kill long-running queries
---
### P2: Email Notifications Not Sending
**Symptoms:**
- Users not receiving emails
- Email queue backing up
**Actions:**
1. **Check Email Queue**
```bash
# Access BullMQ dashboard or check Redis
docker exec lcbp3-redis redis-cli LLEN bull:email:waiting
```
2. **Check Email Processor Logs**
```bash
docker logs lcbp3-backend | grep "email\|SMTP"
```
3. **Test SMTP Connection**
```bash
docker exec lcbp3-backend node -e "
const nodemailer = require('nodemailer');
const transport = nodemailer.createTransport({
host: process.env.SMTP_HOST,
port: process.env.SMTP_PORT,
auth: {
user: process.env.SMTP_USER,
pass: process.env.SMTP_PASS
}
});
transport.verify().then(console.log).catch(console.error);
"
```
4. **Check SMTP Credentials**
- Verify not expired
- Check firewall/network access
---
## 📝 Incident Documentation
### Incident Report Template
```markdown
# Incident Report: [Brief Description]
**Incident ID:** INC-YYYYMMDD-001
**Severity:** P1
**Status:** Resolved
**Incident Commander:** [Name]
## Timeline
| Time | Event |
| ----- | --------------------------------------------------------- |
| 14:00 | Alert: High error rate detected |
| 14:05 | On-call engineer acknowledged |
| 14:10 | Identified root cause: Database connection pool exhausted |
| 14:15 | Implemented mitigation: Increased pool size |
| 14:20 | Verified resolution |
| 14:30 | Incident resolved |
## Impact
- **Duration:** 30 minutes
- **Affected Users:** ~50 users
- **Affected Services:** Document creation, Search
- **Data Loss:** None
## Root Cause
Database connection pool was exhausted due to slow queries not releasing connections.
## Resolution
1. Increased connection pool size from 10 to 20
2. Optimized slow queries
3. Added connection pool monitoring
## Action Items
- [ ] Add connection pool size alert (Owner: DevOps, Due: Next Sprint)
- [ ] Implement automatic query timeouts (Owner: Backend, Due: 2025-12-15)
- [ ] Review all queries for optimization (Owner: DBA, Due: 2025-12-31)
## Lessons Learned
- Connection pool monitoring was insufficient
- Need automated remediation for common issues
```
---
## 🔍 Post-Incident Review (PIR)
### PIR Meeting Agenda
1. **Timeline Review** (10 min)
- What happened and when?
- What was the impact?
2. **Root Cause Analysis** (15 min)
- Why did it happen?
- What were the contributing factors?
3. **What Went Well** (10 min)
- What did we do right?
- What helped us resolve quickly?
4. **What Went Wrong** (15 min)
- What could we have done better?
- What slowed us down?
5. **Action Items** (10 min)
- What changes will prevent this?
- Who owns each action?
- When will they be completed?
### PIR Best Practices
- **Blameless Culture:** Focus on systems, not individuals
- **Actionable Outcomes:** Every PIR should produce concrete actions
- **Follow Through:** Track action items to completion
- **Share Learnings:** Distribute PIR summary to entire team
---
## 📊 Incident Metrics
### Track & Review Monthly
- **MTTR (Mean Time To Resolution):** Average time to resolve incidents
- **MTBF (Mean Time Between Failures):** Average time between incidents
- **Incident Frequency:** Number of incidents per month
- **Severity Distribution:** Breakdown by P0/P1/P2/P3
- **Repeat Incidents:** Same root cause occurring multiple times
---
## ✅ Incident Response Checklist
### During Incident
- [ ] Acknowledge incident in tracking system
- [ ] Assess severity and assign IC
- [ ] Create incident channel (Slack/Teams)
- [ ] Begin documenting timeline
- [ ] Investigate and implement mitigation
- [ ] Communicate status updates every 30 min (P0/P1)
- [ ] Verify resolution
- [ ] Communicate resolution to stakeholders
### After Incident
- [ ] Create incident report
- [ ] Schedule PIR within 48 hours
- [ ] Identify action items
- [ ] Assign owners and deadlines
- [ ] Update runbooks/playbooks
- [ ] Share learnings with team
---
## 🔗 Related Documents
- [Monitoring & Alerting](04-03-monitoring-alerting.md)
- [Backup & Recovery](04-04-backup-recovery.md)
- [Security Operations](04-06-security-operations.md)
---
**Version:** 1.8.0
**Last Review:** 2025-12-01
**Next Review:** 2026-03-01

View File

@@ -0,0 +1,36 @@
# Infrastructure & Operations (OPS) Guide
**Project:** LCBP3-DMS
**Version:** 1.8.0
**Last Updated:** 2026-02-23
---
## 📋 Overview
This directory (`04-Infrastructure-OPS/`) serves as the single source of truth for all infrastructure setups, networking rules, Docker Compose configurations, backups, and site reliability operations for the LCBP3-DMS project.
It consolidates what was previously split across multiple operations and specification folders into a cohesive set of manuals for DevOps, System Administrators, and On-Call Engineers.
---
## 📂 Document Index
| File | Purpose | Key Contents |
| ------------------------------------------------------------------------ | ---------------------- | ------------------------------------------------------------------------------------------- |
| **[04-01-docker-compose.md](./04-01-docker-compose.md)** | Core Environment Setup | `.env` configs, Blue/Green Docker Compose, MariaDB & Redis optimization |
| **[04-02-backup-recovery.md](./04-02-backup-recovery.md)** | Disaster Recovery | RTO/RPO strategies, QNAP to ASUSTOR backup scripts, Restic/Mysqldump config |
| **[04-03-monitoring.md](./04-03-monitoring.md)** | Observability | Prometheus metrics, AlertManager rules (inclusive of Document Numbering DB), Grafana alerts |
| **[04-04-deployment-guide.md](./04-04-deployment-guide.md)** | Production Rollout | Step-by-step Blue-Green deployment scripts, rollback playbooks, Nginx Reverse Proxy |
| **[04-05-maintenance-procedures.md](./04-05-maintenance-procedures.md)** | Routine Care | Log rotation, dependency zero-downtime updates, scheduled DB optimizations |
| **[04-06-security-operations.md](./04-06-security-operations.md)** | Hardening & Audit | User access review scripts, SSL renewals, vulnerability scanning procedures |
| **[04-07-incident-response.md](./04-07-incident-response.md)** | Escalation | P0-P3 classifications, incident commander roles, Post-Incident Review (PIR) |
---
## 🎯 Guiding Principles
1. **Zero Downtime Deployments**: Utilize the Blue/Green architecture outlined in `04-04` wherever possible.
2. **Infrastructure as Code**: No manual unscripted changes. Modify the `docker-compose.yml` specs and `.env.production` templates directly.
3. **Automated Backups**: Backups must be validated automatically using the ASUSTOR pulling mechanism in `04-02`.
4. **Actionable Alerts**: No noisy monitoring. Prometheus alerts in `04-03` should route to Slack/PagerDuty only when action is required.