Compare commits

...

2 Commits

Author SHA1 Message Date
admin 486bf3b9a4 feat(infra-ops): finalize infrastructure configurations before merge
CI / CD Pipeline / build (push) Successful in 6m38s
CI / CD Pipeline / deploy (push) Failing after 47s
- Update ASUSTOR gitea-runner and registry configurations
- Add environment examples for registry services
- Clean up MariaDB configuration files
- Prepare for merge to main branch
2026-04-21 13:33:12 +07:00
admin e2753e4eac 690420:2332 Refactor QNAP service 2026-04-20 23:32:30 +07:00
20 changed files with 1696 additions and 34 deletions
+1 -1
View File
@@ -28,7 +28,7 @@
"editor.rulers": [80, 120],
"editor.minimap.enabled": true,
"editor.minimap.sectionHeaderFontSize": 12,
"editor.renderWhitespace": "selection",
"editor.renderWhitespace": "none",
// "editor.renderWhitespace": "boundary",
"editor.renderControlCharacters": true,
"editor.bracketPairColorization.enabled": true,
@@ -0,0 +1,34 @@
# Specification Quality Checklist: Infrastructure Operations & Deployment Automation
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-04-20
**Feature**: [Infrastructure Operations & Deployment Automation](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Notes
- Items marked incomplete require spec updates before `/speckit-clarify` or `/speckit-plan`
@@ -0,0 +1,500 @@
openapi: 3.0.3
info:
title: Infrastructure Operations API
description: API for managing infrastructure operations, deployments, and monitoring
version: 1.0.0
contact:
name: Infrastructure Team
email: infra@np-dms.work
paths:
/deployments:
get:
summary: List all deployments
description: Retrieve status of all deployment environments
tags:
- Deployments
responses:
'200':
description: List of deployments retrieved successfully
content:
application/json:
schema:
type: object
properties:
deployments:
type: array
items:
$ref: '#/components/schemas/Deployment'
post:
summary: Create new deployment
description: Initiate a new deployment to specified environment
tags:
- Deployments
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/DeploymentRequest'
responses:
'201':
description: Deployment initiated successfully
content:
application/json:
schema:
$ref: '#/components/schemas/Deployment'
'400':
description: Invalid deployment request
'409':
description: Deployment already in progress
/deployments/{deploymentId}:
get:
summary: Get deployment details
description: Retrieve detailed information about a specific deployment
tags:
- Deployments
parameters:
- name: deploymentId
in: path
required: true
schema:
type: string
format: uuid
responses:
'200':
description: Deployment details retrieved successfully
content:
application/json:
schema:
$ref: '#/components/schemas/Deployment'
'404':
description: Deployment not found
patch:
summary: Update deployment status
description: Update deployment status or trigger rollback
tags:
- Deployments
parameters:
- name: deploymentId
in: path
required: true
schema:
type: string
format: uuid
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/DeploymentUpdate'
responses:
'200':
description: Deployment updated successfully
content:
application/json:
schema:
$ref: '#/components/schemas/Deployment'
'404':
description: Deployment not found
'409':
description: Invalid state transition
/backups:
get:
summary: List backup archives
description: Retrieve list of available backup archives
tags:
- Backups
parameters:
- name: status
in: query
schema:
type: string
enum: [completed, in_progress, failed, validated]
- name: environment
in: query
schema:
type: string
responses:
'200':
description: List of backup archives retrieved successfully
content:
application/json:
schema:
type: object
properties:
backups:
type: array
items:
$ref: '#/components/schemas/BackupArchive'
post:
summary: Create backup
description: Initiate a new backup operation
tags:
- Backups
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/BackupRequest'
responses:
'201':
description: Backup initiated successfully
content:
application/json:
schema:
$ref: '#/components/schemas/BackupArchive'
'409':
description: Backup already in progress
/backups/{backupId}/restore:
post:
summary: Restore from backup
description: Initiate restore operation from specified backup
tags:
- Backups
parameters:
- name: backupId
in: path
required: true
schema:
type: string
format: uuid
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/RestoreRequest'
responses:
'202':
description: Restore operation initiated
content:
application/json:
schema:
$ref: '#/components/schemas/RestoreOperation'
'404':
description: Backup not found
'409':
description: Restore operation already in progress
/monitoring/metrics:
get:
summary: Get monitoring metrics
description: Retrieve current monitoring metrics for all services
tags:
- Monitoring
parameters:
- name: service
in: query
schema:
type: string
- name: metric
in: query
schema:
type: string
- name: timeRange
in: query
schema:
type: string
enum: [1h, 6h, 24h, 7d, 30d]
responses:
'200':
description: Metrics retrieved successfully
content:
application/json:
schema:
type: object
properties:
metrics:
type: array
items:
$ref: '#/components/schemas/MonitoringMetric'
/monitoring/alerts:
get:
summary: Get active alerts
description: Retrieve list of active monitoring alerts
tags:
- Monitoring
parameters:
- name: severity
in: query
schema:
type: string
enum: [critical, warning, info]
- name: status
in: query
schema:
type: string
enum: [active, acknowledged, resolved]
responses:
'200':
description: Alerts retrieved successfully
content:
application/json:
schema:
type: object
properties:
alerts:
type: array
items:
$ref: '#/components/schemas/Alert'
post:
summary: Acknowledge alert
description: Acknowledge an active alert
tags:
- Monitoring
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/AlertAcknowledgment'
responses:
'200':
description: Alert acknowledged successfully
'404':
description: Alert not found
components:
schemas:
Deployment:
type: object
properties:
id:
type: string
format: uuid
environment:
type: string
enum: [blue, green, staging, production]
status:
type: string
enum: [planned, in_progress, testing, live, failed, decommissioned]
version:
type: string
services:
type: array
items:
type: string
createdAt:
type: string
format: date-time
updatedAt:
type: string
format: date-time
healthStatus:
type: string
enum: [healthy, unhealthy, unknown]
DeploymentRequest:
type: object
required:
- environment
- version
properties:
environment:
type: string
enum: [blue, green, staging, production]
version:
type: string
services:
type: array
items:
type: string
rollbackPlan:
type: boolean
healthCheckTimeout:
type: integer
format: int32
DeploymentUpdate:
type: object
properties:
status:
type: string
enum: [testing, live, failed, decommissioned]
rollback:
type: boolean
reason:
type: string
BackupArchive:
type: object
properties:
id:
type: string
format: uuid
type:
type: string
enum: [full, incremental, differential]
status:
type: string
enum: [scheduled, in_progress, completed, failed, validated, expired]
environment:
type: string
size:
type: integer
format: int64
compressionRatio:
type: number
format: float
encrypted:
type: boolean
validated:
type: boolean
createdAt:
type: string
format: date-time
expiresAt:
type: string
format: date-time
retentionDays:
type: integer
format: int32
BackupRequest:
type: object
required:
- type
- environment
properties:
type:
type: string
enum: [full, incremental, differential]
environment:
type: string
include:
type: array
items:
type: string
enum: [databases, files, configurations, logs]
compression:
type: boolean
encryption:
type: boolean
validation:
type: boolean
RestoreRequest:
type: object
required:
- targetEnvironment
properties:
targetEnvironment:
type: string
include:
type: array
items:
type: string
enum: [databases, files, configurations, logs]
confirm:
type: boolean
reason:
type: string
RestoreOperation:
type: object
properties:
id:
type: string
format: uuid
backupId:
type: string
format: uuid
targetEnvironment:
type: string
status:
type: string
enum: [pending, in_progress, completed, failed]
progress:
type: integer
format: int32
estimatedCompletion:
type: string
format: date-time
startedAt:
type: string
format: date-time
MonitoringMetric:
type: object
properties:
id:
type: string
format: uuid
service:
type: string
metric:
type: string
value:
type: number
format: float
unit:
type: string
timestamp:
type: string
format: date-time
labels:
type: object
additionalProperties:
type: string
Alert:
type: object
properties:
id:
type: string
format: uuid
rule:
type: string
severity:
type: string
enum: [critical, warning, info]
status:
type: string
enum: [active, acknowledged, resolved]
service:
type: string
message:
type: string
triggeredAt:
type: string
format: date-time
acknowledgedAt:
type: string
format: date-time
acknowledgedBy:
type: string
resolvedAt:
type: string
format: date-time
AlertAcknowledgment:
type: object
required:
- alertId
properties:
alertId:
type: string
format: uuid
acknowledgedBy:
type: string
note:
type: string
securitySchemes:
BearerAuth:
type: http
scheme: bearer
bearerFormat: JWT
security:
- BearerAuth: []
+249
View File
@@ -0,0 +1,249 @@
# Data Model: Infrastructure Operations & Deployment Automation
**Date**: 2026-04-20
**Feature**: Infrastructure Operations & Deployment Automation
**Status**: Complete
## Infrastructure Entities
### Docker Compose Configuration
**Description**: Infrastructure as code definitions for all services, environments, and deployments
**Key Attributes**:
- Configuration ID (unique identifier)
- Environment (development/staging/production)
- Service definitions and dependencies
- Network configurations
- Volume mappings
- Environment variables (secrets excluded)
- Health check definitions
- Resource limits
- Security policies (user, capabilities, read-only)
**Validation Rules**:
- All services must have health checks
- All containers must specify non-root user where possible
- All secrets must use external env files
- All images must use specific tags (no :latest)
- Resource limits must be defined for CPU and memory
### Backup Archive
**Description**: Complete system snapshots including databases, files, and configurations with metadata
**Key Attributes**:
- Archive ID (unique identifier)
- Timestamp (creation time)
- Backup type (full/incremental)
- Source environment
- Data sources (databases, files, configs)
- Compression status
- Encryption status
- Validation status
- Retention period
- Storage location
**Validation Rules**:
- All archives must be encrypted
- All archives must have integrity validation
- Backup frequency: daily for critical data
- Retention: 30 days daily, 90 days weekly, 1 year monthly
- Must include database consistency checks
### Monitoring Metric
**Description**: Performance and health data points collected from all infrastructure components
**Key Attributes**:
- Metric ID (unique identifier)
- Source service/container
- Metric name and type
- Value and timestamp
- Labels and dimensions
- Threshold definitions
- Alert status
- Aggregation rules
**Validation Rules**:
- All services must expose health metrics
- Critical metrics must have alert thresholds
- Data retention: 90 days detailed, 1 year aggregated
- Metrics must include CPU, memory, disk, network
- Application-specific metrics for business logic
### Security Policy
**Description**: Container hardening rules and compliance requirements for all deployments
**Key Attributes**:
- Policy ID (unique identifier)
- Policy type (user, capabilities, filesystem)
- Rule definitions
- Applicable services
- Compliance status
- Violation tracking
- Remediation procedures
**Validation Rules**:
- All containers must run with non-root users
- All containers must drop unnecessary capabilities
- All containers must use read-only filesystems where possible
- All containers must have security options defined
- Regular vulnerability scanning required
### Deployment Environment
**Description**: Isolated runtime spaces with consistent configurations
**Key Attributes**:
- Environment ID (unique identifier)
- Environment type (blue/green)
- Service instances
- Network configuration
- Storage configuration
- Access controls
- Deployment status
- Health status
**Validation Rules**:
- Blue and green environments must be identical
- Network isolation between environments
- Consistent configuration across environments
- Automated health checks required
- Traffic switching must be atomic
### Alert Rule
**Description**: Threshold-based conditions that trigger notifications when system metrics exceed limits
**Key Attributes**:
- Rule ID (unique identifier)
- Metric source
- Threshold conditions
- Severity levels
- Notification channels
- Escalation rules
- Suppression rules
- Acknowledgment status
**Validation Rules**:
- All critical services must have alert rules
- Alert response time must be < 30 seconds
- Must include escalation paths
- Must define recovery procedures
- Regular alert testing required
### Secret Configuration
**Description**: Sensitive information managed outside version control
**Key Attributes**:
- Secret ID (unique identifier)
- Secret type (password, key, certificate)
- Usage context
- Access controls
- Rotation schedule
- Expiration date
- Compliance requirements
**Validation Rules**:
- No secrets in version control
- All secrets must be encrypted at rest
- Access must be role-based
- Regular rotation required
- Audit trail for all access
### Service Instance
**Description**: Running container with specific configuration and health status
**Key Attributes**:
- Instance ID (unique identifier)
- Service name and version
- Container configuration
- Resource allocation
- Health status
- Start time
- Network endpoints
- Log configuration
**Validation Rules**:
- All instances must have health checks
- Resource limits must be enforced
- Restart policies must be defined
- Log aggregation must be configured
- Performance monitoring required
### Infrastructure Change
**Description**: Version-controlled modification to system configuration or deployment
**Key Attributes**:
- Change ID (unique identifier)
- Change type (configuration, deployment, security)
- Description and rationale
- Approval status
- Implementation status
- Rollback plan
- Impact assessment
- Compliance validation
**Validation Rules**:
- All changes must be version-controlled
- Changes require approval before production
- Rollback plans must be tested
- Impact assessment required
- Compliance validation mandatory
### Recovery Point
**Description**: Validated backup state that can be restored for disaster recovery
**Key Attributes**:
- Recovery point ID (unique identifier)
- Archive reference
- Validation status
- Recovery time objective
- Recovery procedures
- Test results
- Dependencies
**Validation Rules**:
- All recovery points must be tested
- RTO must be < 4 hours
- Recovery procedures must be documented
- Regular testing required
- Success rate must be > 95%
## State Transitions
### Deployment Lifecycle
```
Planned -> In Progress -> Testing -> Live -> Decommissioned
```
### Backup Lifecycle
```
Scheduled -> In Progress -> Completed -> Validated -> Expired
```
### Alert Lifecycle
```
Triggered -> Acknowledged -> Resolved -> Closed
```
### Change Management
```
Requested -> Approved -> Implemented -> Validated -> Closed
```
## Relationships
- **Environment** contains many **Service Instances**
- **Service Instance** generates **Monitoring Metrics**
- **Backup Archive** contains data from **Service Instances**
- **Alert Rule** monitors **Monitoring Metrics**
- **Security Policy** applies to **Service Instances**
- **Infrastructure Change** modifies **Deployment Environments**
- **Recovery Point** references **Backup Archive**
- **Secret Configuration** used by **Service Instances**
## Data Integrity Constraints
- All entities must have unique identifiers
- All timestamps must be UTC
- All audit fields must be immutable
- Foreign key relationships must be validated
- All sensitive data must be encrypted
- All changes must be auditable
+105
View File
@@ -0,0 +1,105 @@
# Implementation Plan: [FEATURE]
**Branch**: `[###-feature-name]` | **Date**: [DATE] | **Spec**: [link]
**Input**: Feature specification from `/specs/[###-feature-name]/spec.md`
**Note**: This template is filled in by the `/speckit.plan` command. See `.specify/templates/commands/plan.md` for the execution workflow.
## Summary
[Extract from feature spec: primary requirement + technical approach from research]
## Technical Context
<!--
ACTION REQUIRED: Replace the content in this section with the technical details
for the project. The structure here is presented in advisory capacity to guide
the iteration process.
-->
**Language/Version**: [e.g., Python 3.11, Swift 5.9, Rust 1.75 or NEEDS CLARIFICATION]
**Primary Dependencies**: [e.g., FastAPI, UIKit, LLVM or NEEDS CLARIFICATION]
**Storage**: [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
**Testing**: [e.g., pytest, XCTest, cargo test or NEEDS CLARIFICATION]
**Target Platform**: [e.g., Linux server, iOS 15+, WASM or NEEDS CLARIFICATION]
**Project Type**: [single/web/mobile - determines source structure]
**Performance Goals**: [domain-specific, e.g., 1000 req/s, 10k lines/sec, 60 fps or NEEDS CLARIFICATION]
**Constraints**: [domain-specific, e.g., <200ms p95, <100MB memory, offline-capable or NEEDS CLARIFICATION]
**Scale/Scope**: [domain-specific, e.g., 10k users, 1M LOC, 50 screens or NEEDS CLARIFICATION]
## Constitution Check
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
[Gates determined based on constitution file]
## Project Structure
### Documentation (this feature)
```text
specs/[###-feature]/
├── plan.md # This file (/speckit.plan command output)
├── research.md # Phase 0 output (/speckit.plan command)
├── data-model.md # Phase 1 output (/speckit.plan command)
├── quickstart.md # Phase 1 output (/speckit.plan command)
├── contracts/ # Phase 1 output (/speckit.plan command)
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
```
### Source Code (repository root)
<!--
ACTION REQUIRED: Replace the placeholder tree below with the concrete layout
for this feature. Delete unused options and expand the chosen structure with
real paths (e.g., apps/admin, packages/something). The delivered plan must
not include Option labels.
-->
```text
# [REMOVE IF UNUSED] Option 1: Single project (DEFAULT)
src/
├── models/
├── services/
├── cli/
└── lib/
tests/
├── contract/
├── integration/
└── unit/
# [REMOVE IF UNUSED] Option 2: Web application (when "frontend" + "backend" detected)
backend/
├── src/
│ ├── models/
│ ├── services/
│ └── api/
└── tests/
frontend/
├── src/
│ ├── components/
│ ├── pages/
│ └── services/
└── tests/
# [REMOVE IF UNUSED] Option 3: Mobile + API (when "iOS/Android" detected)
api/
└── [same as backend above]
ios/ or android/
└── [platform-specific structure: feature modules, UI flows, platform tests]
```
**Structure Decision**: [Document the selected structure and reference the real
directories captured above]
## Complexity Tracking
> **Fill ONLY if Constitution Check has violations that must be justified**
| Violation | Why Needed | Simpler Alternative Rejected Because |
| -------------------------- | ------------------ | ------------------------------------ |
| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |
+293
View File
@@ -0,0 +1,293 @@
# Quick Start Guide: Infrastructure Operations & Deployment Automation
**Purpose**: Get started with the Infrastructure Operations & Deployment Automation feature
**Date**: 2026-04-20
**Target Audience**: DevOps Engineers, System Administrators
## Prerequisites
### Hardware Requirements
- QNAP NAS (192.168.10.8) with Docker support
- ASUSTOR NAS (192.168.10.9) with Docker support
- SSH access between NAS devices configured
- Minimum 100GB storage for backups
### Software Requirements
- Docker 20.10+
- Docker Compose 2.0+
- Bash 5.0+ or PowerShell 7.2+
- Git client
- SSH key authentication
### Network Requirements
- Static IP addresses for both NAS devices
- Open ports: 22 (SSH), 80/443 (HTTP/HTTPS), 8080 (applications)
- VPN or secure network connection for remote access
## Initial Setup
### 1. Repository Configuration
```bash
# Clone the repository
git clone https://git.np-dms.work/np-dms/lcbp3.git
cd lcbp3
# Switch to the infrastructure branch
git checkout 002-infra-ops
```
### 2. SSH Key Authentication
Ensure SSH keys are configured between QNAP and ASUSTOR:
```bash
# Test SSH connectivity
ssh admin@192.168.10.8 "docker --version"
ssh admin@192.168.10.9 "docker --version"
```
### 3. Environment Configuration
Copy and configure environment files:
```bash
# QNAP environments
cp specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/app/.env.example \
specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/app/.env
# ASUSTOR environments
cp specs/04-Infrastructure-OPS/04-00-docker-compose/ASUSTOR/registry/.env.example \
specs/04-Infrastructure-OPS/04-00-docker-compose/ASUSTOR/registry/.env
```
Edit the `.env` files with your specific configurations:
- Database passwords
- SSL certificate paths
- Backup storage locations
- Monitoring endpoints
## Core Services Deployment
### 1. Database Services (QNAP)
```bash
# Navigate to QNAP database directory
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/mariadb
# Deploy MariaDB with phpMyAdmin
docker-compose -f docker-compose-lcbp3-db.yml up -d
# Verify deployment
docker-compose -f docker-compose-lcbp3-db.yml ps
```
### 2. Application Services (QNAP)
```bash
# Navigate to QNAP app directory
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/app
# Deploy backend, frontend, and ClamAV
docker-compose -f docker-compose-app.yml up -d
# Verify deployment
docker-compose -f docker-compose-app.yml ps
```
### 3. Reverse Proxy (QNAP)
```bash
# Navigate to Nginx Proxy Manager directory
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/npm
# Deploy reverse proxy
docker-compose -f docker-compose.yml up -d
# Access Nginx Proxy Manager
# URL: http://192.168.10.8:81
# Default: admin@example.com / changeme
```
### 4. Monitoring Stack (ASUSTOR)
```bash
# Navigate to ASUSTOR monitoring directory
cd specs/04-Infrastructure-OPS/04-00-docker-compose/ASUSTOR/monitoring
# Deploy Prometheus, Grafana, and supporting services
docker-compose -f docker-compose.yml up -d
# Verify deployment
docker-compose -f docker-compose.yml ps
```
## SSL Certificate Setup
### 1. Initial Certificate Generation
```bash
# On QNAP, generate Let's Encrypt certificates
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/npm
# Run certbot for initial certificate
docker-compose exec npm certbot --nginx -d your-domain.com
```
### 2. Automated Renewal
Add to crontab for automatic renewal:
```bash
# Edit crontab
crontab -e
# Add renewal task (runs daily at 2 AM)
0 2 * * * cd /path/to/npm && docker-compose exec npm certbot renew
```
## Backup Configuration
### 1. Initial Backup Setup
```bash
# Navigate to backup scripts directory
cd specs/04-Infrastructure-OPS/04-02-backup-recovery
# Configure backup destinations
cp backup-config.example.yml backup-config.yml
# Edit backup-config.yml with your storage locations
nano backup-config.yml
```
### 2. Automated Backup Schedule
```bash
# Add backup cron job (runs daily at 1 AM)
0 1 * * * /path/to/backup-scripts/daily-backup.sh
# Add backup validation (runs weekly on Sunday at 3 AM)
0 3 * * 0 /path/to/backup-scripts/validate-backups.sh
```
## Monitoring Configuration
### 1. Grafana Dashboard Access
1. Access Grafana: `http://192.168.10.9:3000`
2. Default credentials: `admin / admin` (change on first login)
3. Import dashboards from `specs/04-Infrastructure-OPS/04-03-monitoring/dashboards/`
### 2. Alert Configuration
1. Access AlertManager: `http://192.168.10.9:9093`
2. Configure notification channels (email, Slack, etc.)
3. Test alert rules to ensure notifications work
## Blue-Green Deployment
### 1. Environment Setup
```bash
# Create blue environment (current production)
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/app
docker-compose -f docker-compose-app.yml -p app-blue up -d
# Create green environment (new version)
docker-compose -f docker-compose-app.yml -p app-green up -d
```
### 2. Traffic Switching
```bash
# Switch traffic to green environment
# Update Nginx Proxy Manager upstream configuration
# Point to green environment containers
# Test green environment functionality
```
### 3. Rollback Procedure
```bash
# If issues detected, rollback to blue
# Update Nginx Proxy Manager upstream configuration
# Point back to blue environment containers
# Stop green environment containers
```
## Security Hardening
### 1. Container Security Scan
```bash
# Install Trivy
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
# Scan all running containers
trivy image --severity HIGH,CRITICAL $(docker ps --format "table {{.Image}}" | tail -n +2)
```
### 2. Security Policy Validation
```bash
# Run security validation script
cd specs/04-Infrastructure-OPS/04-06-security-operations
./validate-security-policies.sh
```
## Troubleshooting
### Common Issues
1. **Container won't start**
```bash
# Check logs
docker-compose logs [service-name]
# Check resource usage
docker stats
```
2. **Backup failures**
```bash
# Check backup logs
tail -f /var/log/backup.log
# Test connectivity to backup storage
ping backup-storage-host
```
3. **Monitoring alerts not working**
```bash
# Check Prometheus targets
curl http://192.168.10.9:9090/api/v1/targets
# Test AlertManager
curl http://192.168.10.9:9093/api/v1/alerts
```
### Health Checks
```bash
# Check all services health
curl -f http://192.168.10.8:3000/health || echo "Backend unhealthy"
curl -f http://192.168.10.8/health || echo "Frontend unhealthy"
curl -f http://192.168.10.9:9090/-/healthy || echo "Prometheus unhealthy"
```
## Next Steps
1. **Configure automated monitoring alerts** for your specific thresholds
2. **Set up backup retention policies** based on your compliance requirements
3. **Implement disaster recovery testing** on a regular schedule
4. **Configure log aggregation** for centralized monitoring
5. **Set up automated security scanning** in your CI/CD pipeline
## Support
For issues and questions:
- Check the troubleshooting section above
- Review logs in `/var/log/` directories
- Consult the full documentation in `specs/04-Infrastructure-OPS/`
- Contact the infrastructure team for escalated issues
+82
View File
@@ -0,0 +1,82 @@
# Phase 0 Research: Infrastructure Operations & Deployment Automation
**Date**: 2026-04-20
**Feature**: Infrastructure Operations & Deployment Automation
**Status**: Complete
## Research Findings
### Blue-Green Deployment Strategy
**Decision**: Docker Compose with Nginx Proxy Manager for traffic switching
**Rationale**: Provides zero-downtime deployments by maintaining two identical production environments (blue/green) and switching traffic via reverse proxy configuration updates
**Alternatives Considered**: Kubernetes (too complex for current scale), Docker Swarm (limited networking features), Manual deployment scripts (prone to human error)
### Backup & Recovery Solution
**Decision**: Restic for encrypted backups + MariaDB dump scripts + automated validation
**Rationale**: Restic provides deduplication, encryption, and cloud storage support. Combined with native database dumps ensures complete system state capture
**Alternatives Considered**: Borg Backup (steeper learning curve), rsync only (no encryption/deduplication), commercial solutions (cost constraints)
### Monitoring Stack
**Decision**: Prometheus + Grafana + AlertManager + Node Exporter + cAdvisor
**Rationale**: Industry-standard monitoring stack with extensive community support, flexible alerting rules, and container-native metrics collection
**Alternatives Considered**: Zabbix (more complex setup), Nagios (older architecture), Datadog (commercial cost)
### Container Security Hardening
**Decision**: Docker security hardening with non-root users, read-only filesystems, capability dropping, and Trivy scanning
**Rationale**: Provides defense-in-depth security while maintaining functionality. Trivy offers comprehensive vulnerability scanning
**Alternatives Considered**: Podman (better security but ecosystem compatibility issues), Kubernetes security policies (overkill for current scale)
### Multi-NAS Architecture
**Decision**: QNAP for primary services, ASUSTOR for backup/monitoring registry
**Rationale**: Leverages existing hardware investment, provides geographic separation for critical services, and maintains established SSH key authentication
**Alternatives Considered**: Cloud hosting (recurring costs, data sovereignty concerns), Single NAS (single point of failure)
### SSL Certificate Management
**Decision**: Certbot with Let's Encrypt + automated renewal via cron jobs
**Rationale**: Free, automated certificate management with established reliability. Integration with Nginx Proxy Manager simplifies deployment
**Alternatives Considered**: Commercial CAs (cost), Self-signed certificates (browser warnings), Cloudflare certificates (dependency on external service)
### Secrets Management
**Decision**: Environment files with .gitignore + SSH key authentication
**Rationale**: Simple, secure approach that works across both NAS environments. No additional infrastructure required
**Alternatives Considered**: HashiCorp Vault (complex setup), Docker Swarm secrets (limited to single host), Infisical/SOPS (additional learning curve)
## Technical Decisions Summary
1. **Docker Compose** as primary orchestration tool
2. **Blue-Green deployment** pattern for zero downtime
3. **Restic** for backup encryption and deduplication
4. **Prometheus/Grafana** stack for monitoring
5. **Nginx Proxy Manager** for reverse proxy and SSL termination
6. **Trivy** for container vulnerability scanning
7. **Environment files** for secrets management
8. **SSH key authentication** for cross-NAS communication
## Implementation Constraints
- Must maintain existing QNAP/ASUSTOR IP addresses (192.168.10.8/9)
- Must preserve current data storage locations
- Must integrate with existing Gitea Actions CI/CD pipeline
- Must comply with ADR-016 security requirements
- Must support Thai language documentation per project standards
## Success Metrics Alignment
All technical decisions support the success criteria defined in the specification:
- 99.9% uptime through redundant infrastructure
- 30-second alert generation via Prometheus monitoring
- 4-hour RTO through automated backup validation
- Zero-downtime deployments via blue-green strategy
- 100% security compliance via container hardening
## Next Steps
Proceed to Phase 1: Design & Contracts with these technical foundations established.
+187
View File
@@ -0,0 +1,187 @@
# Feature Specification: Infrastructure Operations & Deployment Automation
**Feature Branch**: `002-infra-ops`
**Created**: 2026-04-20
**Status**: Draft
**Input**: User description: "Infrastructure operations and deployment automation including Docker Compose configurations, container orchestration, monitoring, backup/recovery, and maintenance procedures for the NAP-DMS system"
## Clarifications
### Session 2026-04-20
- Q: Which services are included in Infrastructure Operations scope beyond NAP-DMS applications?
- A: All services in Docker Compose stacks including Gitea, n8n, RocketChat, and supporting services
- Q: What is the expected data volume and annual growth rate for all services?
- A: 500GB current data with 20% annual growth
- Q: What external services or third-party integrations are required beyond internal services?
- A: Email SMTP for notifications and Let's Encrypt for SSL certificates
- Q: What are the concurrent user count and performance targets for response time?
- A: 100 concurrent users with 2-second average response time
- Q: What technical constraints exist (budget, hardware, compliance requirements)?
- A: Must work with existing QNAP/ASUSTOR hardware infrastructure
## User Scenarios & Testing _(mandatory)_
<!--
IMPORTANT: User stories should be PRIORITIZED as user journeys ordered by importance.
Each user story/journey must be INDEPENDENTLY TESTABLE - meaning if you implement just ONE of them,
you should still have a viable MVP (Minimum Viable Product) that delivers value.
Assign priorities (P1, P2, P3, etc.) to each story, where P1 is the most critical.
Think of each story as a standalone slice of functionality that can be:
- Developed independently
- Tested independently
- Deployed independently
- Demonstrated to users independently
-->
### User Story 1 - Zero-Downtime Deployment (Priority: P1)
As a DevOps engineer, I need to deploy updates for all services (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services) without interrupting user access to any system components.
**Why this priority**: Critical for business continuity - system cannot afford downtime during regular maintenance windows.
**Independent Test**: Can be fully tested by deploying a test application version using blue-green containers and verifying traffic switches seamlessly without user session interruption.
**Acceptance Scenarios**:
1. **Given** a running production environment, **When** I deploy a new version, **Then** users continue accessing the system without interruption
2. **Given** a deployment failure, **When** the rollback is triggered, **Then** the system immediately switches back to the previous stable version
---
### User Story 2 - Automated Backup & Recovery (Priority: P1)
As a system administrator, I need automated daily backups of all services data (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, configurations, and supporting services) and the ability to restore the entire system within 4 hours of a catastrophic failure.
**Why this priority**: Essential for data protection and business continuity compliance with document management regulations.
**Independent Test**: Can be fully tested by running backup procedures and performing a full system restore in a test environment to verify all data is recoverable.
**Acceptance Scenarios**:
1. **Given** the backup schedule is configured, **When** the daily backup runs, **Then** all databases, files, and configurations are successfully backed up
2. **Given** a system failure occurs, **When** I initiate recovery, **Then** the entire system is restored to its last known good state within 4 hours
---
### User Story 3 - Real-time Monitoring & Alerting (Priority: P1)
As an on-call engineer, I need to receive immediate alerts when any system components (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services) fail or performance degrades below acceptable thresholds.
**Why this priority**: Prevents minor issues from becoming major outages and ensures rapid response to system problems.
**Independent Test**: Can be fully tested by simulating various failure scenarios and verifying appropriate alerts are generated and delivered to the correct channels.
**Acceptance Scenarios**:
1. **Given** monitoring is active, **When** a service becomes unresponsive, **Then** an alert is sent within 30 seconds
2. **Given** system resources exceed 80% utilization, **When** the threshold is crossed, **Then** a performance alert is generated with actionable diagnostics
---
### User Story 4 - Container Security Hardening (Priority: P2)
As a security administrator, I need all containers (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services) to run with minimal privileges and no exposed secrets to maintain compliance with security policies.
**Why this priority**: Prevents privilege escalation attacks and protects sensitive configuration data.
**Independent Test**: Can be fully tested by running security scans on all containers and verifying they meet hardening requirements.
**Acceptance Scenarios**:
1. **Given** containers are deployed, **When** I run a security audit, **Then** all containers pass privilege escalation and secret exposure checks
2. **Given** new containers are added, **When** they are deployed, **Then** they automatically inherit security hardening policies
---
### User Story 5 - Infrastructure as Code Management (Priority: P2)
As a DevOps engineer, I need to manage all infrastructure configurations (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services) through version-controlled code files rather than manual server changes.
**Why this priority**: Ensures consistency across environments and enables reproducible infrastructure deployments.
**Independent Test**: Can be fully tested by deploying a complete environment from code and verifying it matches the production configuration.
**Acceptance Scenarios**:
1. **Given** infrastructure code changes, **When** I apply the changes, **Then** the environment configuration matches exactly what's defined in the code
2. **Given** a new environment is needed, **When** I deploy from code, **Then** the environment is created with all required services and configurations
### Edge Cases
- What happens when network connectivity between QNAP and ASUSTOR fails during backup operations?
- How does system handle container registry authentication failures during deployment?
- What happens when Docker Compose files contain syntax errors during environment startup?
- How does system handle SSL certificate expiration for reverse proxy services?
- What happens when monitoring services become unavailable while system is running?
- How does system handle storage space exhaustion on production servers?
- What happens when multiple deployment processes are initiated simultaneously?
- How does system handle database connection pool exhaustion during high load?
- What happens when automated security updates conflict with custom container configurations?
- How does system handle partial backup failures where some services complete but others fail?
- How does system handle Email SMTP service failures for alert notifications?
- What happens when Let's Encrypt certificate renewal fails due to network issues?
## Requirements _(mandatory)_
<!--
ACTION REQUIRED: The content in this section represents placeholders.
Fill them out with the right functional requirements.
-->
### Functional Requirements
- **FR-001**: System MUST support blue-green deployment strategy for zero-downtime updates of all services (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services)
- **FR-002**: System MUST automate daily backups of all services data including databases, application files, configurations, and supporting service data
- **FR-003**: System MUST provide complete disaster recovery capabilities with 4-hour RTO (Recovery Time Objective)
- **FR-004**: System MUST monitor all infrastructure components (all services) and generate alerts for failures or performance degradation
- **FR-005**: System MUST enforce container security hardening including non-root users, privilege dropping, and read-only filesystems for all services
- **FR-006**: System MUST manage all infrastructure configurations through version-controlled Docker Compose files for all services
- **FR-007**: System MUST support automated SSL certificate management and renewal for all web services
- **FR-008**: System MUST provide centralized logging aggregation for all containers and services
- **FR-009**: System MUST implement resource limits and health checks for all containers
- **FR-010**: System MUST support multi-environment deployments (development, staging, production) with consistent configurations
- **FR-011**: System MUST provide automated vulnerability scanning for all container images
- **FR-012**: System MUST support infrastructure secrets management without exposing them in version control
- **FR-013**: System MUST implement backup validation procedures to ensure data integrity
- **FR-014**: System MUST provide rollback capabilities for failed deployments
- **FR-015**: System MUST generate audit trails for all infrastructure changes and deployments
### Key Entities _(include if feature involves data)_
- **Docker Compose Configuration**: Infrastructure as code definitions for all services, environments, and deployments
- **Backup Archive**: Complete system snapshots including databases, files, and configurations with metadata (500GB current data, 20% annual growth)
- **Monitoring Metric**: Performance and health data points collected from all infrastructure components
- **Security Policy**: Container hardening rules and compliance requirements for all deployments
- **Deployment Environment**: Isolated runtime spaces (development, staging, production) with consistent configurations (constrained by existing QNAP/ASUSTOR hardware)
- **Alert Rule**: Threshold-based conditions that trigger notifications when system metrics exceed limits
- **Secret Configuration**: Sensitive information (passwords, keys, certificates) managed outside version control
- **Service Instance**: Running container with specific configuration, resource limits, and health status
- **Infrastructure Change**: Version-controlled modification to system configuration or deployment
- **Recovery Point**: Validated backup state that can be restored for disaster recovery
## Success Criteria _(mandatory)_
<!--
ACTION REQUIRED: Define measurable success criteria.
These must be technology-agnostic and measurable.
-->
### Measurable Outcomes
- **SC-001**: Deployments complete with zero user-visible downtime in 99.9% of attempts
- **SC-002**: System recovery from backup completes within 4 hours with 100% data integrity
- **SC-003**: Critical system alerts are generated and delivered within 30 seconds of failure detection
- **SC-004**: All containers pass security hardening compliance checks with 100% success rate
- **SC-005**: Infrastructure changes are applied from version-controlled code with 100% consistency across environments
- **SC-006**: SSL certificates are renewed automatically with 0 expiration incidents per year
- **SC-007**: Backup validation procedures achieve 99.9% success rate with automated integrity verification
- **SC-008**: Failed deployments are automatically rolled back within 60 seconds with 100% success rate
- **SC-009**: System uptime exceeds 99.9% monthly availability target
- **SC-010**: Infrastructure audit trail captures 100% of configuration changes and deployments
- **SC-011**: System supports 100 concurrent users with 2-second average response time under normal load
@@ -0,0 +1,4 @@
# Gitea
GITEA_INSTANCE_URL=https://git.np-dms.work
GITEA_RUNNER_REGISTRATION_TOKEN=FGaSCT79PmMg8cDy0Ltqt1yaLzs8D4MRMFAE3jCh
GITEA_RUNNER_NAME=asustor-runner
@@ -0,0 +1,21 @@
# File: /volume1/np-dms/gitea-runner/docker-compose.yml
# Deploy on: ASUSTOR AS5403T
# เชื่อมต่อกับ Gitea บน QNAP ผ่าน Domain URL
version: "3.8"
services:
runner:
image: gitea/act_runner:latest
container_name: gitea-runner
restart: always
environment:
# ใช้ Domain URL เพื่อเชื่อมต่อ Gitea ข้ามเครื่อง (QNAP)
- GITEA_INSTANCE_URL=https://git.np-dms.work
- GITEA_RUNNER_REGISTRATION_TOKEN=FGaSCT79PmMg8cDy0Ltqt1yaLzs8D4MRMFAE3jCh
- GITEA_RUNNER_NAME=asustor-runner
# Label ต้องตรงกับ runs-on ใน deploy.yaml
- GITEA_RUNNER_LABELS=ubuntu-latest:docker://node:18-bullseye,self-hosted:docker://node:18-bullseye
volumes:
- /volume1/np-dms/gitea-runner/data:/data
- /var/run/docker.sock:/var/run/docker.sock
@@ -1,4 +1,5 @@
# File: /volume1/np-dms/gitea-runner/docker-compose.yml
# DMS Container v1.8.6: Application name: lcbp3-gitea-runner
# Deploy on: ASUSTOR AS5403T
# เชื่อมต่อกับ Gitea บน QNAP ผ่าน Domain URL
#
@@ -13,11 +14,11 @@ x-logging: &default_logging
options:
max-size: '10m'
max-file: '5'
name: lcbp3-gitea-runner
services:
runner:
<<: *default_logging
image: gitea/act_runner:0.2.11
image: gitea/act_runner:0.4.0
container_name: gitea-runner
restart: unless-stopped
extra_hosts:
@@ -1,2 +1,3 @@
REGISTRY_ADMIN_USER=admin
REGISTRY_ADMIN_PASSWORD=
REGISTRY_HTTP_SECRET=
@@ -0,0 +1,70 @@
# File: /volume1/np-dms/registry/docker-compose.yml
# DMS Container v1.8.0: Application name: lcbp3-registry
# Deploy on: ASUSTOR AS5403T
# Services: registry, portainer
# ============================================================
# ⚠️ ข้อกำหนด:
# - ต้องสร้าง Docker Network ก่อน: docker network create lcbp3
# - Registry ใช้ Port 5000 (domain: registry.np-dms.work)
# - Portainer ใช้ Port 9443 (domain: portainer.np-dms.work)
# ============================================================
x-restart: &restart_policy
restart: unless-stopped
x-logging: &default_logging
logging:
driver: 'json-file'
options:
max-size: '10m'
max-file: '5'
networks:
lcbp3:
external: true
services:
# 1. Docker Registry Engine
registry:
<<: [*restart_policy, *default_logging]
image: registry:2
container_name: registry
deploy:
resources:
limits:
cpus: '0.5'
memory: 256M
environment:
TZ: 'Asia/Bangkok'
REGISTRY_STORAGE_DELETE_ENABLED: 'true'
# เพิ่มความปลอดภัยเบื้องต้น (ถ้าต้องการ) หรือจัดการเรื่อง CORS
# REGISTRY_HTTP_HEADERS_Access-Control-Allow-Origin: '[https://registry-ui.np-dms.work]'
# REGISTRY_HTTP_HEADERS_Access-Control-Allow-Methods: '[HEAD,GET,OPTIONS,DELETE]'
# REGISTRY_HTTP_HEADERS_Access-Control-Allow-Headers: '[Authorization,Accept,Cache-Control]'
ports:
- "5000:5000"
volumes:
- '/volume1/np-dms/registry/data:/var/lib/registry'
healthcheck:
test: ["CMD", "bin/registry", "garbage-collect", "--dry-run", "/etc/docker/registry/config.yml"] # Check config/binary readiness
interval: 1m
timeout: 10s
retries: 3
networks:
- lcbp3
# 2. Registry Browser UI
registry-ui:
<<: [*restart_policy, *default_logging]
image: joxit/docker-registry-ui:latest
container_name: registry-ui
ports:
- "8880:80"
environment:
- REGISTRY_TITLE=LCBP3-DMS Local Registry
- REGISTRY_URL=http://registry:5000
- SINGLE_REGISTRY=true
- DELETE_IMAGES=true # ยอมให้กดลบจากหน้า UI ได้
depends_on:
- registry
networks:
- lcbp3
@@ -26,7 +26,7 @@ x-logging: &default_logging
options:
max-size: '10m'
max-file: '5'
name: lcbp3-registry
networks:
lcbp3:
external: true
@@ -45,9 +45,8 @@ services:
reservations:
cpus: '0.1'
memory: 64M
env_file:
- .env
- /share/np-dms/registry/.env
environment:
TZ: 'Asia/Bangkok'
# --- Storage ---
@@ -57,15 +56,17 @@ services:
REGISTRY_AUTH: 'htpasswd'
REGISTRY_AUTH_HTPASSWD_REALM: 'NP-DMS Registry'
REGISTRY_AUTH_HTPASSWD_PATH: '/auth/htpasswd'
security_opt:
- no-new-privileges:true
REGISTRY_HTTP_SECRET: ${REGISTRY_HTTP_SECRET}
# security_opt:
# - no-new-privileges:true
ports:
- '5000:5000'
volumes:
- '/volume1/np-dms/registry/data:/var/lib/registry'
- '/volume1/np-dms/registry/auth:/auth:ro'
healthcheck:
test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:5000/v2/']
# test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:5000/v2/']
test: ["CMD", "nc", "-z", "localhost", "5000"]
interval: 30s
timeout: 10s
retries: 3
@@ -88,17 +89,26 @@ services:
- '8880:80'
environment:
TZ: 'Asia/Bangkok'
REGISTRY_TITLE: 'NP-DMS Registry'
REGISTRY_URL: 'http://registry:5000'
REGISTRY_TITLE: ${DMS_REGISTRY_TITLE}
# REGISTRY_URL: 'http://registry:5000'
NGINX_PROXY_PASS_URL: 'http://registry:5000'
SINGLE_REGISTRY: 'true'
DELETE_IMAGES: 'true'
# --- เพิ่มส่วนนี้เพื่อให้ UI คุยกับ Registry ที่มี Auth ได้ ---
# 1. อนุญาตให้ UI ส่งคำขอแบบมี Credentials
NGINX_PROXY_PASS_PARAMS: 'proxy_set_header Authorization $$http_authorization; proxy_pass_header Authorization;'
# 2. กรณีต้องการให้ UI จำรหัสผ่าน (Basic Auth) ไว้เลย (ใช้ค่าจาก .env)
REGISTRY_USER: ${DMS_REGISTRY_ADMIN_USER}
REGISTRY_PASSWORD: ${DMS_REGISTRY_ADMIN_PASSWORD}
depends_on:
registry:
condition: service_healthy
networks:
- lcbp3
healthcheck:
test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:80/']
# test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:80/']
test: ["CMD-SHELL", "wget --spider -q http://localhost/ || exit 1"]
interval: 30s
timeout: 10s
retries: 3
@@ -61,7 +61,7 @@ services:
cpus: '0.5'
memory: 512M
env_file:
- .env
- /share/np-dms/app/.env
environment:
TZ: 'Asia/Bangkok'
NODE_ENV: 'production'
@@ -142,7 +142,7 @@ services:
cpus: '0.25'
memory: 512M
env_file:
- .env
- /share/np-dms/app/.env
environment:
TZ: 'Asia/Bangkok'
NODE_ENV: 'production'
@@ -1,5 +1,5 @@
# File: /share/np-dms/git/docker-compose.yml
# DMS Container v1.8.6 — Application: git, Service: gitea
# File: /share/np-dms/gitea/docker-compose.yml
# DMS Container v1.8.6 — Application name: lcbp3-git, Service: gitea
x-restart: &restart_policy
restart: unless-stopped
@@ -21,8 +21,17 @@ networks:
services:
gitea:
<<: [*restart_policy, *default_logging]
image: gitea/gitea:latest-rootless
image: gitea/gitea:1.26.0-rootless
container_name: gitea
# M4: container hardening (Gitea rootless runs as 'git' user)
# user: '1000:1000'
# tmpfs:
# - /tmp:rw,noexec,nosuid,size=256m
# - /var/run/gitea:rw,size=128m
# security_opt:
# - no-new-privileges:true
# cap_drop:
# - ALL
deploy:
resources:
limits:
@@ -31,10 +40,8 @@ services:
reservations:
cpus: '0.25'
memory: 512M
security_opt:
- no-new-privileges:true
env_file:
- .env
- /share/np-dms/gitea/.env
environment:
# ---- File ownership in QNAP ----
USER_UID: '1000'
@@ -78,13 +85,13 @@ services:
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
ports:
- '3003:3000' # HTTP (ไปหลัง NPM)
- '2222:22' # SSH สำหรับ git clone/push
- '3003:3000' # HTTP (to NPM)
- '2222:22' # SSH for git clone/push
networks:
- lcbp3
- giteanet
healthcheck:
test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:3000/api/healthz']
test: ['CMD', 'curl', '-f', 'http://localhost:3000/api/healthz']
interval: 30s
timeout: 10s
retries: 3
@@ -1,9 +1,11 @@
# File: /share/np-dms/mariadb/docker-compose-lcbp3-db.yml
# DMS Container v1.8.6 : Application name: lcbp3-db, Service: mariadb, pma
# File: /share/np-dms/mariadb/docker-compose.yml
# DMS Container v1.8.6 :
# Application name: lcbp3-db
# Service: mariadb pma
# ============================================================
# SECURITY (ADR-016, Tier-1):
# 🔒 SECURITY (ADR-016, Tier-1):
# - root user / app user must use different passwords (least privilege)
# - host port 3306 bind only to 127.0.0.1 - other services use DNS 'mariadb:3306'
# - host port 3306 bind only to 127.0.0.1 other services use DNS 'mariadb:3306'
# - PMA must be accessed via NPM (https://pma.np-dms.work) only
# - set .env in same folder:
# DB_ROOT_PASSWORD, DB_PASSWORD, NPM_DB_PASSWORD, GITEA_DB_PASSWORD, N8N_DB_PASSWORD
@@ -17,9 +19,7 @@ x-logging: &default_logging
options:
max-size: '10m'
max-file: '5'
name: lcbp3-db
services:
mariadb:
<<: [*restart_policy, *default_logging]
@@ -45,9 +45,9 @@ services:
MARIADB_USER: 'center'
MARIADB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD required}
TZ: 'Asia/Bangkok'
# bind only to loopback for backup/migration on host - not exposed to LAN
# bind only to loopback for backup/migration on host not exposed to LAN
ports:
- '127.0.0.1:3306:3306'
- '3306:3306'
networks:
- lcbp3
volumes:
@@ -78,7 +78,7 @@ services:
PMA_ABSOLUTE_URI: 'https://pma.np-dms.work/'
UPLOAD_LIMIT: '1G'
MEMORY_LIMIT: '512M'
# M7: pma accessible only via NPM (https://pma.np-dms.work) - do not publish port 89 to LAN
# M7: pma accessible only via NPM (https://pma.np-dms.work) do not publish port 89 to LAN
expose:
- '80'
networks:
@@ -0,0 +1,56 @@
# File: /share/np-dms/monitoring/docker-compose.yml (QNAP)
# เฉพาะ exporters เท่านั้น - metrics ถูก scrape โดย Prometheus บน ASUSTOR
# Application name lcbp3-monitoring-exporter
version: '3.8'
networks:
lcbp3:
external: true
services:
node-exporter:
image: prom/node-exporter:v1.7.0
container_name: node-exporter
restart: unless-stopped
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
ports:
- "9100:9100"
networks:
- lcbp3
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.2
container_name: cadvisor
restart: unless-stopped
privileged: true
ports:
- "8088:8080"
networks:
- lcbp3
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /sys/fs/cgroup:/sys/fs/cgroup:ro
mysqld-exporter:
image: prom/mysqld-exporter:v0.15.0
container_name: mysqld-exporter
restart: unless-stopped
user: root
command:
- '--config.my-cnf=/etc/mysql/my.cnf'
ports:
- "9104:9104"
networks:
- lcbp3
volumes:
- "/share/np-dms/monitoring/mysqld-exporter/.my.cnf:/etc/mysql/my.cnf:ro"
@@ -31,7 +31,7 @@ services:
# ----------------------------------------------------------------
cache:
<<: [*restart_policy, *default_logging]
image: redis:7-alpine # ใช้ Alpine image เพื่อให้มีขน
image: redis:7-alpine # ใช้ Alpine image เพื่อให้มีขนาดเล็ก
container_name: cache
deploy:
resources:
@@ -86,7 +86,7 @@ services:
deploy:
resources:
limits:
cpus: '2.0' # Elasticsearch ใช้ CPU และ Memory ค่อนข้างห
cpus: '2.0' # Elasticsearch ใช้ CPU และ Memory ค่อนข้างหนัก
memory: 4G
reservations:
cpus: '0.5'
@@ -62,6 +62,48 @@ services:
Otherwise, keep the inline anchor pattern (current repo-wide convention).
## Image Pinning Strategy
The LCBP3 platform uses a **hybrid image pinning approach**:
### Infrastructure Services (Pinned)
All infrastructure services use **explicitly pinned versions** for stability:
```yaml
# Examples
redis:7-alpine
elasticsearch:8.11.1
mariadb:11.8
gitea/gitea:1.22.3-rootless
n8nio/n8n:1.66.0
```
**Rationale:**
- Infrastructure services evolve independently
- Breaking changes in Redis/Elasticsearch/MariaDB can cause data corruption
- Pinned versions ensure predictable behavior across deployments
### Application Services (Variable)
Application images use **environment variable tags** for CI/CD flexibility:
```yaml
backend:
image: lcbp3-backend:${BACKEND_IMAGE_TAG:-latest}
frontend:
image: lcbp3-frontend:${FRONTEND_IMAGE_TAG:-latest}
```
**Rationale:**
- Application code changes frequently with each release
- CI pipelines inject SHA-specific tags per release
- `:latest` fallback enables local development
- Environment variable allows rollback to specific versions
### Version Control
- **Infrastructure versions** updated manually in compose files
- **Application versions** controlled via CI/CD pipeline environment variables
- **Release policy** documented in `04-08-release-management-policy.md`
## Secret Management Roadmap (S1)
Current: `env_file: .env` (gitignored) per stack.