83 lines
4.2 KiB
Markdown
83 lines
4.2 KiB
Markdown
# Phase 0 Research: Infrastructure Operations & Deployment Automation
|
|
|
|
**Date**: 2026-04-20
|
|
**Feature**: Infrastructure Operations & Deployment Automation
|
|
**Status**: Complete
|
|
|
|
## Research Findings
|
|
|
|
### Blue-Green Deployment Strategy
|
|
|
|
**Decision**: Docker Compose with Nginx Proxy Manager for traffic switching
|
|
**Rationale**: Provides zero-downtime deployments by maintaining two identical production environments (blue/green) and switching traffic via reverse proxy configuration updates
|
|
**Alternatives Considered**: Kubernetes (too complex for current scale), Docker Swarm (limited networking features), Manual deployment scripts (prone to human error)
|
|
|
|
### Backup & Recovery Solution
|
|
|
|
**Decision**: Restic for encrypted backups + MariaDB dump scripts + automated validation
|
|
**Rationale**: Restic provides deduplication, encryption, and cloud storage support. Combined with native database dumps ensures complete system state capture
|
|
**Alternatives Considered**: Borg Backup (steeper learning curve), rsync only (no encryption/deduplication), commercial solutions (cost constraints)
|
|
|
|
### Monitoring Stack
|
|
|
|
**Decision**: Prometheus + Grafana + AlertManager + Node Exporter + cAdvisor
|
|
**Rationale**: Industry-standard monitoring stack with extensive community support, flexible alerting rules, and container-native metrics collection
|
|
**Alternatives Considered**: Zabbix (more complex setup), Nagios (older architecture), Datadog (commercial cost)
|
|
|
|
### Container Security Hardening
|
|
|
|
**Decision**: Docker security hardening with non-root users, read-only filesystems, capability dropping, and Trivy scanning
|
|
**Rationale**: Provides defense-in-depth security while maintaining functionality. Trivy offers comprehensive vulnerability scanning
|
|
**Alternatives Considered**: Podman (better security but ecosystem compatibility issues), Kubernetes security policies (overkill for current scale)
|
|
|
|
### Multi-NAS Architecture
|
|
|
|
**Decision**: QNAP for primary services, ASUSTOR for backup/monitoring registry
|
|
**Rationale**: Leverages existing hardware investment, provides geographic separation for critical services, and maintains established SSH key authentication
|
|
**Alternatives Considered**: Cloud hosting (recurring costs, data sovereignty concerns), Single NAS (single point of failure)
|
|
|
|
### SSL Certificate Management
|
|
|
|
**Decision**: Certbot with Let's Encrypt + automated renewal via cron jobs
|
|
**Rationale**: Free, automated certificate management with established reliability. Integration with Nginx Proxy Manager simplifies deployment
|
|
**Alternatives Considered**: Commercial CAs (cost), Self-signed certificates (browser warnings), Cloudflare certificates (dependency on external service)
|
|
|
|
### Secrets Management
|
|
|
|
**Decision**: Environment files with .gitignore + SSH key authentication
|
|
**Rationale**: Simple, secure approach that works across both NAS environments. No additional infrastructure required
|
|
**Alternatives Considered**: HashiCorp Vault (complex setup), Docker Swarm secrets (limited to single host), Infisical/SOPS (additional learning curve)
|
|
|
|
## Technical Decisions Summary
|
|
|
|
1. **Docker Compose** as primary orchestration tool
|
|
2. **Blue-Green deployment** pattern for zero downtime
|
|
3. **Restic** for backup encryption and deduplication
|
|
4. **Prometheus/Grafana** stack for monitoring
|
|
5. **Nginx Proxy Manager** for reverse proxy and SSL termination
|
|
6. **Trivy** for container vulnerability scanning
|
|
7. **Environment files** for secrets management
|
|
8. **SSH key authentication** for cross-NAS communication
|
|
|
|
## Implementation Constraints
|
|
|
|
- Must maintain existing QNAP/ASUSTOR IP addresses (192.168.10.8/9)
|
|
- Must preserve current data storage locations
|
|
- Must integrate with existing Gitea Actions CI/CD pipeline
|
|
- Must comply with ADR-016 security requirements
|
|
- Must support Thai language documentation per project standards
|
|
|
|
## Success Metrics Alignment
|
|
|
|
All technical decisions support the success criteria defined in the specification:
|
|
|
|
- 99.9% uptime through redundant infrastructure
|
|
- 30-second alert generation via Prometheus monitoring
|
|
- 4-hour RTO through automated backup validation
|
|
- Zero-downtime deployments via blue-green strategy
|
|
- 100% security compliance via container hardening
|
|
|
|
## Next Steps
|
|
|
|
Proceed to Phase 1: Design & Contracts with these technical foundations established.
|