Files
lcbp3/specs/04-Infrastructure-OPS/04-02-backup-recovery.md
admin ef16817f38
All checks were successful
Build and Deploy / deploy (push) Successful in 4m44s
260223:1415 20260223 nextJS & nestJS Best pratices
2026-02-23 14:15:06 +07:00

22 KiB
Raw Blame History

04.2 Backup & Disaster Recovery

Project: LCBP3-DMS Version: 1.8.0 Status: Active Owner: Nattanin Peancharoen / DevOps Team Last Updated: 2026-02-23

📍 Backup Target Server: ASUSTOR AS5403T (Infrastructure & Backup) 🖥️ Primary Source Server: QNAP TS-473A (Application & Database)


📖 Overview

This document outlines the backup strategies, scripts (ASUSTOR pulling from QNAP), recovery procedures, and comprehensive disaster recovery planning for LCBP3-DMS.


Backup & Recovery Procedures

Project: LCBP3-DMS Version: 1.8.0 Last Updated: 2025-12-02


📋 Overview

This document outlines backup strategies, recovery procedures, and disaster recovery planning for LCBP3-DMS.


🎯 Backup Strategy

Backup Schedule

Data Type Frequency Retention Method
Database (Full) Daily at 02:00 30 days mysqldump + compression
Database (Incremental) Every 6 hours 7 days Binary logs
File Uploads Daily at 03:00 30 days rsync to backup server
Configuration Files Weekly 90 days Git repository
Elasticsearch Indexes Weekly 14 days Snapshot to S3/NFS
Application Logs Daily 90 days Rotation + archival

Backup Locations

Primary Backup: QNAP NAS /backup/lcbp3-dms Secondary Backup: External backup server (rsync) Offsite Backup: Cloud storage (optional - for critical data)


💾 Database Backup

Automated Daily Backup Script

#!/bin/bash
# File: /scripts/backup-database.sh

# Configuration
BACKUP_DIR="/backup/lcbp3-dms/database"
DB_CONTAINER="lcbp3-mariadb"
DB_NAME="lcbp3_dms"
DB_USER="backup_user"
DB_PASS="<BACKUP_USER_PASSWORD>"
RETENTION_DAYS=30

# Create backup directory
BACKUP_FILE="$BACKUP_DIR/lcbp3_$(date +%Y%m%d_%H%M%S).sql.gz"
mkdir -p "$BACKUP_DIR"

# Perform backup
echo "Starting database backup to $BACKUP_FILE"
docker exec $DB_CONTAINER mysqldump \
  --user=$DB_USER \
  --password=$DB_PASS \
  --single-transaction \
  --routines \
  --triggers \
  --databases $DB_NAME \
  | gzip > "$BACKUP_FILE"

# Check backup success
if [ $? -eq 0 ]; then
  echo "Backup completed successfully"

  # Delete old backups
  find "$BACKUP_DIR" -name "*.sql.gz" -type f -mtime +$RETENTION_DAYS -delete
  echo "Old backups cleaned up (retention: $RETENTION_DAYS days)"
else
  echo "ERROR: Backup failed!"
  exit 1
fi

Schedule with Cron

# Edit crontab
crontab -e

# Add backup job (runs daily at 2 AM)
0 2 * * * /scripts/backup-database.sh >> /var/log/backup-database.log 2>&1

Manual Database Backup

# Backup specific database
docker exec lcbp3-mariadb mysqldump \
  -u root -p \
  --single-transaction \
  lcbp3_dms > backup_$(date +%Y%m%d).sql

# Compress backup
gzip backup_$(date +%Y%m%d).sql

📂 File Uploads Backup

Automated Rsync Backup

#!/bin/bash
# File: /scripts/backup-uploads.sh

SOURCE="/var/lib/docker/volumes/lcbp3_uploads/_data"
DEST="/backup/lcbp3-dms/uploads"
RETENTION_DAYS=30

# Create incremental backup with rsync
rsync -av --delete \
  --backup --backup-dir="$DEST/backup-$(date +%Y%m%d)" \
  "$SOURCE/" "$DEST/current/"

# Cleanup old backups
find "$DEST" -maxdepth 1 -type d -name "backup-*" -mtime +$RETENTION_DAYS -exec rm -rf {} \;

echo "Upload backup completed: $(date)"

Schedule Uploads Backup

# Run daily at 3 AM
0 3 * * * /scripts/backup-uploads.sh >> /var/log/backup-uploads.log 2>&1

🔄 Database Recovery

Full Database Restore

# Step 1: Stop backend application
docker stop lcbp3-backend

# Step 2: Restore database from backup
gunzip < backup_20241201.sql.gz | \
  docker exec -i lcbp3-mariadb mysql -u root -p lcbp3_dms

# Step 3: Verify restore
docker exec lcbp3-mariadb mysql -u root -p -e "
  USE lcbp3_dms;
  SELECT COUNT(*) FROM users;
  SELECT COUNT(*) FROM correspondences;
"

# Step 4: Restart backend
docker start lcbp3-backend

Point-in-Time Recovery (Using Binary Logs)

# Step 1: Restore last full backup
gunzip < backup_20241201_020000.sql.gz | \
  docker exec -i lcbp3-mariadb mysql -u root -p lcbp3_dms

# Step 2: Apply binary logs since backup
docker exec lcbp3-mariadb mysqlbinlog \
  --start-datetime="2024-12-01 02:00:00" \
  --stop-datetime="2024-12-01 14:30:00" \
  /var/lib/mysql/mysql-bin.000001 | \
  docker exec -i lcbp3-mariadb mysql -u root -p lcbp3_dms

📁 File Uploads Recovery

Restore from Backup

# Stop backend to prevent file operations
docker stop lcbp3-backend

# Restore files
rsync -av \
  /backup/lcbp3-dms/uploads/current/ \
  /var/lib/docker/volumes/lcbp3_uploads/_data/

# Verify permissions
docker exec lcbp3-backend chown -R node:node /app/uploads

# Restart backend
docker start lcbp3-backend

🚨 Disaster Recovery Plan

RTO & RPO

  • RTO (Recovery Time Objective): 4 hours
  • RPO (Recovery Point Objective): 24 hours (for files), 6 hours (for database)

DR Scenarios

Scenario 1: Database Corruption

Detection: Database errors in logs, application errors Recovery Time: 30 minutes Steps:

  1. Stop backend
  2. Restore last full backup
  3. Apply binary logs (if needed)
  4. Verify data integrity
  5. Restart services

Scenario 2: Complete Server Failure

Detection: Server unresponsive Recovery Time: 4 hours Steps:

  1. Provision new QNAP server or VM
  2. Install Docker & Container Station
  3. Clone Git repository
  4. Restore database backup
  5. Restore file uploads
  6. Deploy containers
  7. Update DNS (if needed)
  8. Verify functionality

Scenario 3: Ransomware Attack

Detection: Encrypted files, ransom note Recovery Time: 6 hours Steps:

  1. DO NOT pay ransom
  2. Isolate infected server
  3. Provision clean environment
  4. Restore from offsite backup
  5. Scan restored backup for malware
  6. Deploy and verify
  7. Review security logs
  8. Implement additional security measures

Backup Verification

Weekly Backup Testing

#!/bin/bash
# File: /scripts/test-backup.sh

# Create temporary test database
docker exec lcbp3-mariadb mysql -u root -p -e "
  CREATE DATABASE IF NOT EXISTS test_restore;
"

# Restore latest backup to test database
LATEST_BACKUP=$(ls -t /backup/lcbp3-dms/database/*.sql.gz | head -1)
gunzip < "$LATEST_BACKUP" | \
  sed 's/USE `lcbp3_dms`/USE `test_restore`/g' | \
  docker exec -i lcbp3-mariadb mysql -u root -p

# Verify table counts
docker exec lcbp3-mariadb mysql -u root -p -e "
  SELECT COUNT(*)  FROM test_restore.users;
  SELECT COUNT(*) FROM test_restore.correspondences;
"

# Cleanup
docker exec lcbp3-mariadb mysql -u root -p -e "
  DROP DATABASE test_restore;
"

echo "Backup verification completed: $(date)"

Monthly DR Drill

  • Test full system restore on standby server
  • Document time taken and issues encountered
  • Update DR procedures based on findings

📊 Backup Monitoring

Backup Status Dashboard

Monitor:

  • Last successful backup timestamp
  • Backup file size (detect anomalies)
  • Backup success/failure rate
  • Available backup storage space

Alerts

Send alert if:

  • Backup fails
  • Backup file size < 50% of average (possible corruption)
  • No backup in last 48 hours
  • Backup storage < 20% free

🔧 Maintenance

Optimize Backup Performance

-- Enable InnoDB compression for large tables
ALTER TABLE correspondences ROW_FORMAT=COMPRESSED;
ALTER TABLE workflow_history ROW_FORMAT=COMPRESSED;

-- Archive old audit logs
-- Move records older than 1 year to archive table
INSERT INTO audit_logs_archive
SELECT * FROM audit_logs
WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);

DELETE FROM audit_logs
WHERE created_at < DATE_SUB(NOW(), INTERVAL 1 YEAR);

📚 Backup Checklist

Daily Tasks

  • Verify automated backups completed
  • Check backup log files for errors
  • Monitor backup storage space

Weekly Tasks

  • Test restore from random backup
  • Review backup size trends
  • Verify offsite backups synced

Monthly Tasks

  • Full DR drill
  • Review and update DR procedures
  • Test backup restoration on different server

Quarterly Tasks

  • Audit backup access controls
  • Review backup retention policies
  • Update backup documentation


Version: 1.8.0 Last Review: 2025-12-01 Next Review: 2026-03-01


Backup Strategy สำหรับ LCBP3-DMS

📍 Deploy on: ASUSTOR AS5403T (Infrastructure Server) 🎯 Backup Target: QNAP TS-473A (Application & Database) 📄 Version: v1.8.0


Overview

ระบบ Backup แบบ Pull-based: ASUSTOR ดึงข้อมูลจาก QNAP เพื่อความปลอดภัย หาก QNAP ถูกโจมตี ผู้โจมตีจะไม่สามารถลบ Backup บน ASUSTOR ได้

┌─────────────────────────────────────────────────────────────────┐
│                     BACKUP ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   QNAP (Source)                    ASUSTOR (Backup Target)       │
│   192.168.10.8                     192.168.10.9                  │
│                                                                  │
│   ┌──────────────┐   SSH/Rsync    ┌──────────────────────┐       │
│   │  MariaDB     │ ─────────────▶ │  /volume1/backup/db/ │       │
│   │  (mysqldump) │   Daily 2AM    │  (Restic Repository) │       │
│   └──────────────┘                └──────────────────────┘       │
│                                                                  │
│   ┌──────────────┐                ┌──────────────────────┐       │
│   │  Redis RDB   │ ─────────────▶ │  /volume1/backup/    │       │
│   │  + AOF       │   Daily 3AM    │  redis/              │       │
│   └──────────────┘                └──────────────────────┘       │
│                                                                  │
│   ┌──────────────┐                ┌──────────────────────┐       │
│   │  App Config  │ ─────────────▶ │  /volume1/backup/    │       │
│   │  + Volumes   │   Weekly Sun   │  config/             │       │
│   └──────────────┘                └──────────────────────┘       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

1. MariaDB Backup

1.1 Daily Database Backup Script

#!/bin/bash
# File: /volume1/np-dms/scripts/backup-mariadb.sh
# Run on: ASUSTOR (Pull from QNAP)

DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/volume1/backup/db"
QNAP_IP="192.168.10.8"
DB_NAME="lcbp3_db"
DB_USER="root"
DB_PASSWORD="${MARIADB_ROOT_PASSWORD}"

echo "🔄 Starting MariaDB backup at $DATE"

# Create backup directory
mkdir -p $BACKUP_DIR

# Remote mysqldump via SSH
ssh admin@$QNAP_IP "docker exec mariadb mysqldump \
  --single-transaction \
  --routines \
  --triggers \
  -u $DB_USER -p$DB_PASSWORD $DB_NAME" > $BACKUP_DIR/lcbp3_$DATE.sql

# Compress
gzip $BACKUP_DIR/lcbp3_$DATE.sql

# Add to Restic repository
restic -r $BACKUP_DIR/restic-repo backup $BACKUP_DIR/lcbp3_$DATE.sql.gz

# Keep only last 30 days of raw files
find $BACKUP_DIR -name "lcbp3_*.sql.gz" -mtime +30 -delete

echo "✅ MariaDB backup complete: lcbp3_$DATE.sql.gz"

1.2 Cron Schedule (ASUSTOR)

# MariaDB daily backup at 2 AM
0 2 * * * /volume1/np-dms/scripts/backup-mariadb.sh >> /var/log/backup-mariadb.log 2>&1

2. Redis Backup

2.1 Redis Backup Script

#!/bin/bash
# File: /volume1/np-dms/scripts/backup-redis.sh
# Run on: ASUSTOR (Pull from QNAP)

DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/volume1/backup/redis"
QNAP_IP="192.168.10.8"

echo "🔄 Starting Redis backup at $DATE"

mkdir -p $BACKUP_DIR

# Trigger BGSAVE on QNAP Redis
ssh admin@$QNAP_IP "docker exec cache redis-cli BGSAVE"
sleep 10

# Copy RDB and AOF files
scp admin@$QNAP_IP:/share/np-dms/services/cache/data/dump.rdb $BACKUP_DIR/redis_$DATE.rdb
scp admin@$QNAP_IP:/share/np-dms/services/cache/data/appendonly.aof $BACKUP_DIR/redis_$DATE.aof

# Compress
tar -czf $BACKUP_DIR/redis_$DATE.tar.gz \
  $BACKUP_DIR/redis_$DATE.rdb \
  $BACKUP_DIR/redis_$DATE.aof

# Cleanup raw files
rm $BACKUP_DIR/redis_$DATE.rdb $BACKUP_DIR/redis_$DATE.aof

echo "✅ Redis backup complete: redis_$DATE.tar.gz"

2.2 Cron Schedule

# Redis daily backup at 3 AM
0 3 * * * /volume1/np-dms/scripts/backup-redis.sh >> /var/log/backup-redis.log 2>&1

3. Application Config Backup

3.1 Weekly Config Backup Script

#!/bin/bash
# File: /volume1/np-dms/scripts/backup-config.sh
# Run on: ASUSTOR (Pull from QNAP)

DATE=$(date +%Y%m%d)
BACKUP_DIR="/volume1/backup/config"
QNAP_IP="192.168.10.8"

echo "🔄 Starting config backup at $DATE"

mkdir -p $BACKUP_DIR

# Sync Docker compose files and configs
rsync -avz --delete \
  admin@$QNAP_IP:/share/np-dms/ \
  $BACKUP_DIR/np-dms_$DATE/ \
  --exclude='*/data/*' \
  --exclude='*/logs/*' \
  --exclude='node_modules'

# Compress
tar -czf $BACKUP_DIR/config_$DATE.tar.gz $BACKUP_DIR/np-dms_$DATE

# Cleanup
rm -rf $BACKUP_DIR/np-dms_$DATE

echo "✅ Config backup complete: config_$DATE.tar.gz"

3.2 Cron Schedule

# Config weekly backup on Sunday at 4 AM
0 4 * * 0 /volume1/np-dms/scripts/backup-config.sh >> /var/log/backup-config.log 2>&1

4. Retention Policy

Backup Type Frequency Retention Storage Est.
MariaDB Daily 30 days ~5GB/month
Redis Daily 7 days ~500MB
Config Weekly 4 weeks ~200MB
Restic Daily 6 months Deduplicated

5. Restic Repository Setup

# Initialize Restic repository (one-time)
restic init -r /volume1/backup/restic-repo

# Set password in environment
export RESTIC_PASSWORD="your-secure-backup-password"

# Check repository status
restic -r /volume1/backup/restic-repo snapshots

# Prune old snapshots (keep 30 daily, 4 weekly, 6 monthly)
restic -r /volume1/backup/restic-repo forget \
  --keep-daily 30 \
  --keep-weekly 4 \
  --keep-monthly 6 \
  --prune

6. Verification Script

#!/bin/bash
# File: /volume1/np-dms/scripts/verify-backup.sh

echo "📋 Backup Verification Report"
echo "=============================="
echo ""

# Check latest MariaDB backup
LATEST_DB=$(ls -t /volume1/backup/db/*.sql.gz 2>/dev/null | head -1)
if [ -n "$LATEST_DB" ]; then
  echo "✅ Latest DB backup: $LATEST_DB"
  echo "   Size: $(du -h $LATEST_DB | cut -f1)"
else
  echo "❌ No DB backup found!"
fi

# Check latest Redis backup
LATEST_REDIS=$(ls -t /volume1/backup/redis/*.tar.gz 2>/dev/null | head -1)
if [ -n "$LATEST_REDIS" ]; then
  echo "✅ Latest Redis backup: $LATEST_REDIS"
else
  echo "❌ No Redis backup found!"
fi

# Check Restic repository
echo ""
echo "📦 Restic Snapshots:"
restic -r /volume1/backup/restic-repo snapshots --latest 5

📝 หมายเหตุ: เอกสารนี้อ้างอิงจาก Architecture Document v1.8.0


Disaster Recovery Plan สำหรับ LCBP3-DMS

📍 Version: v1.8.0 🖥️ Primary Server: QNAP TS-473A (Application & Database) 💾 Backup Server: ASUSTOR AS5403T (Infrastructure & Backup)


RTO/RPO Targets

Scenario RTO RPO Priority
Single backend node failure 0 min 0 P0
Redis failure 5 min 0 P0
MariaDB failure 10 min 0 P0
QNAP total failure 2 hours 15 min P1
Data corruption 4 hours 1 day P2

1. Quick Recovery Procedures

1.1 Service Not Responding

# Check container status
docker ps -a | grep <service-name>

# Restart specific service
docker restart <container-name>

# Check logs for errors
docker logs <container-name> --tail 100

1.2 Redis Failure

# Check status
docker exec cache redis-cli ping

# Restart
docker restart cache

# Verify
docker exec cache redis-cli ping

1.3 MariaDB Failure

# Check status
docker exec mariadb mysql -u root -p -e "SELECT 1"

# Restart
docker restart mariadb

# Wait for startup
sleep 30

# Verify
docker exec mariadb mysql -u root -p -e "SHOW DATABASES"

2. Full System Recovery

2.1 Recovery Prerequisites (ASUSTOR)

ตรวจสอบว่า Backup files พร้อมใช้งาน:

# SSH to ASUSTOR
ssh admin@192.168.10.9

# List available backups
ls -la /volume1/backup/db/
ls -la /volume1/backup/redis/
ls -la /volume1/backup/config/

# Check Restic snapshots
restic -r /volume1/backup/restic-repo snapshots

2.2 QNAP Recovery Script

#!/bin/bash
# File: /volume1/np-dms/scripts/disaster-recovery.sh
# Run on: ASUSTOR (Push to QNAP)

QNAP_IP="192.168.10.8"
BACKUP_DIR="/volume1/backup"

echo "🚨 Starting Disaster Recovery..."
echo "================================"

# 1. Restore Docker Network
echo "1⃣ Creating Docker network..."
ssh admin@$QNAP_IP "docker network create lcbp3 || true"

# 2. Restore config files
echo "2⃣ Restoring configuration files..."
LATEST_CONFIG=$(ls -t $BACKUP_DIR/config/*.tar.gz | head -1)
tar -xzf $LATEST_CONFIG -C /tmp/
rsync -avz /tmp/np-dms/ admin@$QNAP_IP:/share/np-dms/

# 3. Start infrastructure services
echo "3⃣ Starting MariaDB..."
ssh admin@$QNAP_IP "cd /share/np-dms/mariadb && docker-compose up -d"
sleep 30

# 4. Restore database
echo "4⃣ Restoring database..."
LATEST_DB=$(ls -t $BACKUP_DIR/db/*.sql.gz | head -1)
gunzip -c $LATEST_DB | ssh admin@$QNAP_IP "docker exec -i mariadb mysql -u root -p\$MYSQL_ROOT_PASSWORD lcbp3_db"

# 5. Start Redis
echo "5⃣ Starting Redis..."
ssh admin@$QNAP_IP "cd /share/np-dms/services && docker-compose up -d cache"

# 6. Restore Redis data (if needed)
echo "6⃣ Restoring Redis data..."
LATEST_REDIS=$(ls -t $BACKUP_DIR/redis/*.tar.gz | head -1)
tar -xzf $LATEST_REDIS -C /tmp/
scp /tmp/redis_*.rdb admin@$QNAP_IP:/share/np-dms/services/cache/data/dump.rdb
ssh admin@$QNAP_IP "docker restart cache"

# 7. Start remaining services
echo "7⃣ Starting application services..."
ssh admin@$QNAP_IP "cd /share/np-dms/services && docker-compose up -d"
ssh admin@$QNAP_IP "cd /share/np-dms/npm && docker-compose up -d"

# 8. Health check
echo "8⃣ Running health checks..."
sleep 60
curl -f https://lcbp3.np-dms.work/health || echo "⚠️ Frontend not ready"
curl -f https://backend.np-dms.work/health || echo "⚠️ Backend not ready"

echo ""
echo "✅ Disaster Recovery Complete"
echo "⚠️ Please verify system functionality manually"

3. Data Corruption Recovery

3.1 Point-in-Time Recovery (Database)

# List available Restic snapshots
restic -r /volume1/backup/restic-repo snapshots

# Restore specific snapshot
restic -r /volume1/backup/restic-repo restore <snapshot-id> --target /tmp/restore/

# Apply restored backup
gunzip -c /tmp/restore/lcbp3_*.sql.gz | \
  ssh admin@192.168.10.8 "docker exec -i mariadb mysql -u root -p\$MYSQL_ROOT_PASSWORD lcbp3_db"

3.2 Selective Table Recovery

# Extract specific tables from backup
gunzip -c /volume1/backup/db/lcbp3_YYYYMMDD.sql.gz | \
  grep -A1000 "CREATE TABLE \`documents\`" | \
  grep -B1000 "UNLOCK TABLES" > /tmp/documents_table.sql

# Restore specific table
ssh admin@192.168.10.8 "docker exec -i mariadb mysql -u root -p\$MYSQL_ROOT_PASSWORD lcbp3_db" < /tmp/documents_table.sql

4. Communication & Escalation

4.1 Incident Response

Severity Response Time Notify
P0 Immediate Admin Team + Management
P1 30 minutes Admin Team
P2 2 hours Admin Team (next business day)

4.2 Post-Incident Checklist

  • Identify root cause
  • Document timeline of events
  • Verify all services restored
  • Check data integrity
  • Update monitoring alerts if needed
  • Create incident report

5. Testing Schedule

Test Type Frequency Last Tested Next Due
Backup Verification Weekly - -
Single Service Recovery Monthly - -
Full DR Test Quarterly - -

📝 หมายเหตุ: เอกสารนี้อ้างอิงจาก Architecture Document v1.8.0