Compare commits
2 Commits
2e89761b0f
...
486bf3b9a4
| Author | SHA1 | Date | |
|---|---|---|---|
| 486bf3b9a4 | |||
| e2753e4eac |
@@ -28,7 +28,7 @@
|
|||||||
"editor.rulers": [80, 120],
|
"editor.rulers": [80, 120],
|
||||||
"editor.minimap.enabled": true,
|
"editor.minimap.enabled": true,
|
||||||
"editor.minimap.sectionHeaderFontSize": 12,
|
"editor.minimap.sectionHeaderFontSize": 12,
|
||||||
"editor.renderWhitespace": "selection",
|
"editor.renderWhitespace": "none",
|
||||||
// "editor.renderWhitespace": "boundary",
|
// "editor.renderWhitespace": "boundary",
|
||||||
"editor.renderControlCharacters": true,
|
"editor.renderControlCharacters": true,
|
||||||
"editor.bracketPairColorization.enabled": true,
|
"editor.bracketPairColorization.enabled": true,
|
||||||
|
|||||||
@@ -0,0 +1,34 @@
|
|||||||
|
# Specification Quality Checklist: Infrastructure Operations & Deployment Automation
|
||||||
|
|
||||||
|
**Purpose**: Validate specification completeness and quality before proceeding to planning
|
||||||
|
**Created**: 2026-04-20
|
||||||
|
**Feature**: [Infrastructure Operations & Deployment Automation](../spec.md)
|
||||||
|
|
||||||
|
## Content Quality
|
||||||
|
|
||||||
|
- [x] No implementation details (languages, frameworks, APIs)
|
||||||
|
- [x] Focused on user value and business needs
|
||||||
|
- [x] Written for non-technical stakeholders
|
||||||
|
- [x] All mandatory sections completed
|
||||||
|
|
||||||
|
## Requirement Completeness
|
||||||
|
|
||||||
|
- [x] No [NEEDS CLARIFICATION] markers remain
|
||||||
|
- [x] Requirements are testable and unambiguous
|
||||||
|
- [x] Success criteria are measurable
|
||||||
|
- [x] Success criteria are technology-agnostic (no implementation details)
|
||||||
|
- [x] All acceptance scenarios are defined
|
||||||
|
- [x] Edge cases are identified
|
||||||
|
- [x] Scope is clearly bounded
|
||||||
|
- [x] Dependencies and assumptions identified
|
||||||
|
|
||||||
|
## Feature Readiness
|
||||||
|
|
||||||
|
- [x] All functional requirements have clear acceptance criteria
|
||||||
|
- [x] User scenarios cover primary flows
|
||||||
|
- [x] Feature meets measurable outcomes defined in Success Criteria
|
||||||
|
- [x] No implementation details leak into specification
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Items marked incomplete require spec updates before `/speckit-clarify` or `/speckit-plan`
|
||||||
@@ -0,0 +1,500 @@
|
|||||||
|
openapi: 3.0.3
|
||||||
|
info:
|
||||||
|
title: Infrastructure Operations API
|
||||||
|
description: API for managing infrastructure operations, deployments, and monitoring
|
||||||
|
version: 1.0.0
|
||||||
|
contact:
|
||||||
|
name: Infrastructure Team
|
||||||
|
email: infra@np-dms.work
|
||||||
|
|
||||||
|
paths:
|
||||||
|
/deployments:
|
||||||
|
get:
|
||||||
|
summary: List all deployments
|
||||||
|
description: Retrieve status of all deployment environments
|
||||||
|
tags:
|
||||||
|
- Deployments
|
||||||
|
responses:
|
||||||
|
'200':
|
||||||
|
description: List of deployments retrieved successfully
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
deployments:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: '#/components/schemas/Deployment'
|
||||||
|
|
||||||
|
post:
|
||||||
|
summary: Create new deployment
|
||||||
|
description: Initiate a new deployment to specified environment
|
||||||
|
tags:
|
||||||
|
- Deployments
|
||||||
|
requestBody:
|
||||||
|
required: true
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/DeploymentRequest'
|
||||||
|
responses:
|
||||||
|
'201':
|
||||||
|
description: Deployment initiated successfully
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/Deployment'
|
||||||
|
'400':
|
||||||
|
description: Invalid deployment request
|
||||||
|
'409':
|
||||||
|
description: Deployment already in progress
|
||||||
|
|
||||||
|
/deployments/{deploymentId}:
|
||||||
|
get:
|
||||||
|
summary: Get deployment details
|
||||||
|
description: Retrieve detailed information about a specific deployment
|
||||||
|
tags:
|
||||||
|
- Deployments
|
||||||
|
parameters:
|
||||||
|
- name: deploymentId
|
||||||
|
in: path
|
||||||
|
required: true
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
responses:
|
||||||
|
'200':
|
||||||
|
description: Deployment details retrieved successfully
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/Deployment'
|
||||||
|
'404':
|
||||||
|
description: Deployment not found
|
||||||
|
|
||||||
|
patch:
|
||||||
|
summary: Update deployment status
|
||||||
|
description: Update deployment status or trigger rollback
|
||||||
|
tags:
|
||||||
|
- Deployments
|
||||||
|
parameters:
|
||||||
|
- name: deploymentId
|
||||||
|
in: path
|
||||||
|
required: true
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
requestBody:
|
||||||
|
required: true
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/DeploymentUpdate'
|
||||||
|
responses:
|
||||||
|
'200':
|
||||||
|
description: Deployment updated successfully
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/Deployment'
|
||||||
|
'404':
|
||||||
|
description: Deployment not found
|
||||||
|
'409':
|
||||||
|
description: Invalid state transition
|
||||||
|
|
||||||
|
/backups:
|
||||||
|
get:
|
||||||
|
summary: List backup archives
|
||||||
|
description: Retrieve list of available backup archives
|
||||||
|
tags:
|
||||||
|
- Backups
|
||||||
|
parameters:
|
||||||
|
- name: status
|
||||||
|
in: query
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
enum: [completed, in_progress, failed, validated]
|
||||||
|
- name: environment
|
||||||
|
in: query
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
responses:
|
||||||
|
'200':
|
||||||
|
description: List of backup archives retrieved successfully
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
backups:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: '#/components/schemas/BackupArchive'
|
||||||
|
|
||||||
|
post:
|
||||||
|
summary: Create backup
|
||||||
|
description: Initiate a new backup operation
|
||||||
|
tags:
|
||||||
|
- Backups
|
||||||
|
requestBody:
|
||||||
|
required: true
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/BackupRequest'
|
||||||
|
responses:
|
||||||
|
'201':
|
||||||
|
description: Backup initiated successfully
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/BackupArchive'
|
||||||
|
'409':
|
||||||
|
description: Backup already in progress
|
||||||
|
|
||||||
|
/backups/{backupId}/restore:
|
||||||
|
post:
|
||||||
|
summary: Restore from backup
|
||||||
|
description: Initiate restore operation from specified backup
|
||||||
|
tags:
|
||||||
|
- Backups
|
||||||
|
parameters:
|
||||||
|
- name: backupId
|
||||||
|
in: path
|
||||||
|
required: true
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
requestBody:
|
||||||
|
required: true
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/RestoreRequest'
|
||||||
|
responses:
|
||||||
|
'202':
|
||||||
|
description: Restore operation initiated
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/RestoreOperation'
|
||||||
|
'404':
|
||||||
|
description: Backup not found
|
||||||
|
'409':
|
||||||
|
description: Restore operation already in progress
|
||||||
|
|
||||||
|
/monitoring/metrics:
|
||||||
|
get:
|
||||||
|
summary: Get monitoring metrics
|
||||||
|
description: Retrieve current monitoring metrics for all services
|
||||||
|
tags:
|
||||||
|
- Monitoring
|
||||||
|
parameters:
|
||||||
|
- name: service
|
||||||
|
in: query
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
- name: metric
|
||||||
|
in: query
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
- name: timeRange
|
||||||
|
in: query
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
enum: [1h, 6h, 24h, 7d, 30d]
|
||||||
|
responses:
|
||||||
|
'200':
|
||||||
|
description: Metrics retrieved successfully
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
metrics:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: '#/components/schemas/MonitoringMetric'
|
||||||
|
|
||||||
|
/monitoring/alerts:
|
||||||
|
get:
|
||||||
|
summary: Get active alerts
|
||||||
|
description: Retrieve list of active monitoring alerts
|
||||||
|
tags:
|
||||||
|
- Monitoring
|
||||||
|
parameters:
|
||||||
|
- name: severity
|
||||||
|
in: query
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
enum: [critical, warning, info]
|
||||||
|
- name: status
|
||||||
|
in: query
|
||||||
|
schema:
|
||||||
|
type: string
|
||||||
|
enum: [active, acknowledged, resolved]
|
||||||
|
responses:
|
||||||
|
'200':
|
||||||
|
description: Alerts retrieved successfully
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
alerts:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
$ref: '#/components/schemas/Alert'
|
||||||
|
|
||||||
|
post:
|
||||||
|
summary: Acknowledge alert
|
||||||
|
description: Acknowledge an active alert
|
||||||
|
tags:
|
||||||
|
- Monitoring
|
||||||
|
requestBody:
|
||||||
|
required: true
|
||||||
|
content:
|
||||||
|
application/json:
|
||||||
|
schema:
|
||||||
|
$ref: '#/components/schemas/AlertAcknowledgment'
|
||||||
|
responses:
|
||||||
|
'200':
|
||||||
|
description: Alert acknowledged successfully
|
||||||
|
'404':
|
||||||
|
description: Alert not found
|
||||||
|
|
||||||
|
components:
|
||||||
|
schemas:
|
||||||
|
Deployment:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
id:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
environment:
|
||||||
|
type: string
|
||||||
|
enum: [blue, green, staging, production]
|
||||||
|
status:
|
||||||
|
type: string
|
||||||
|
enum: [planned, in_progress, testing, live, failed, decommissioned]
|
||||||
|
version:
|
||||||
|
type: string
|
||||||
|
services:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
createdAt:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
updatedAt:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
healthStatus:
|
||||||
|
type: string
|
||||||
|
enum: [healthy, unhealthy, unknown]
|
||||||
|
|
||||||
|
DeploymentRequest:
|
||||||
|
type: object
|
||||||
|
required:
|
||||||
|
- environment
|
||||||
|
- version
|
||||||
|
properties:
|
||||||
|
environment:
|
||||||
|
type: string
|
||||||
|
enum: [blue, green, staging, production]
|
||||||
|
version:
|
||||||
|
type: string
|
||||||
|
services:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
rollbackPlan:
|
||||||
|
type: boolean
|
||||||
|
healthCheckTimeout:
|
||||||
|
type: integer
|
||||||
|
format: int32
|
||||||
|
|
||||||
|
DeploymentUpdate:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
status:
|
||||||
|
type: string
|
||||||
|
enum: [testing, live, failed, decommissioned]
|
||||||
|
rollback:
|
||||||
|
type: boolean
|
||||||
|
reason:
|
||||||
|
type: string
|
||||||
|
|
||||||
|
BackupArchive:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
id:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
type:
|
||||||
|
type: string
|
||||||
|
enum: [full, incremental, differential]
|
||||||
|
status:
|
||||||
|
type: string
|
||||||
|
enum: [scheduled, in_progress, completed, failed, validated, expired]
|
||||||
|
environment:
|
||||||
|
type: string
|
||||||
|
size:
|
||||||
|
type: integer
|
||||||
|
format: int64
|
||||||
|
compressionRatio:
|
||||||
|
type: number
|
||||||
|
format: float
|
||||||
|
encrypted:
|
||||||
|
type: boolean
|
||||||
|
validated:
|
||||||
|
type: boolean
|
||||||
|
createdAt:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
expiresAt:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
retentionDays:
|
||||||
|
type: integer
|
||||||
|
format: int32
|
||||||
|
|
||||||
|
BackupRequest:
|
||||||
|
type: object
|
||||||
|
required:
|
||||||
|
- type
|
||||||
|
- environment
|
||||||
|
properties:
|
||||||
|
type:
|
||||||
|
type: string
|
||||||
|
enum: [full, incremental, differential]
|
||||||
|
environment:
|
||||||
|
type: string
|
||||||
|
include:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
enum: [databases, files, configurations, logs]
|
||||||
|
compression:
|
||||||
|
type: boolean
|
||||||
|
encryption:
|
||||||
|
type: boolean
|
||||||
|
validation:
|
||||||
|
type: boolean
|
||||||
|
|
||||||
|
RestoreRequest:
|
||||||
|
type: object
|
||||||
|
required:
|
||||||
|
- targetEnvironment
|
||||||
|
properties:
|
||||||
|
targetEnvironment:
|
||||||
|
type: string
|
||||||
|
include:
|
||||||
|
type: array
|
||||||
|
items:
|
||||||
|
type: string
|
||||||
|
enum: [databases, files, configurations, logs]
|
||||||
|
confirm:
|
||||||
|
type: boolean
|
||||||
|
reason:
|
||||||
|
type: string
|
||||||
|
|
||||||
|
RestoreOperation:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
id:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
backupId:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
targetEnvironment:
|
||||||
|
type: string
|
||||||
|
status:
|
||||||
|
type: string
|
||||||
|
enum: [pending, in_progress, completed, failed]
|
||||||
|
progress:
|
||||||
|
type: integer
|
||||||
|
format: int32
|
||||||
|
estimatedCompletion:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
startedAt:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
|
||||||
|
MonitoringMetric:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
id:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
service:
|
||||||
|
type: string
|
||||||
|
metric:
|
||||||
|
type: string
|
||||||
|
value:
|
||||||
|
type: number
|
||||||
|
format: float
|
||||||
|
unit:
|
||||||
|
type: string
|
||||||
|
timestamp:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
labels:
|
||||||
|
type: object
|
||||||
|
additionalProperties:
|
||||||
|
type: string
|
||||||
|
|
||||||
|
Alert:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
id:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
rule:
|
||||||
|
type: string
|
||||||
|
severity:
|
||||||
|
type: string
|
||||||
|
enum: [critical, warning, info]
|
||||||
|
status:
|
||||||
|
type: string
|
||||||
|
enum: [active, acknowledged, resolved]
|
||||||
|
service:
|
||||||
|
type: string
|
||||||
|
message:
|
||||||
|
type: string
|
||||||
|
triggeredAt:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
acknowledgedAt:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
acknowledgedBy:
|
||||||
|
type: string
|
||||||
|
resolvedAt:
|
||||||
|
type: string
|
||||||
|
format: date-time
|
||||||
|
|
||||||
|
AlertAcknowledgment:
|
||||||
|
type: object
|
||||||
|
required:
|
||||||
|
- alertId
|
||||||
|
properties:
|
||||||
|
alertId:
|
||||||
|
type: string
|
||||||
|
format: uuid
|
||||||
|
acknowledgedBy:
|
||||||
|
type: string
|
||||||
|
note:
|
||||||
|
type: string
|
||||||
|
|
||||||
|
securitySchemes:
|
||||||
|
BearerAuth:
|
||||||
|
type: http
|
||||||
|
scheme: bearer
|
||||||
|
bearerFormat: JWT
|
||||||
|
|
||||||
|
security:
|
||||||
|
- BearerAuth: []
|
||||||
@@ -0,0 +1,249 @@
|
|||||||
|
# Data Model: Infrastructure Operations & Deployment Automation
|
||||||
|
|
||||||
|
**Date**: 2026-04-20
|
||||||
|
**Feature**: Infrastructure Operations & Deployment Automation
|
||||||
|
**Status**: Complete
|
||||||
|
|
||||||
|
## Infrastructure Entities
|
||||||
|
|
||||||
|
### Docker Compose Configuration
|
||||||
|
|
||||||
|
**Description**: Infrastructure as code definitions for all services, environments, and deployments
|
||||||
|
**Key Attributes**:
|
||||||
|
- Configuration ID (unique identifier)
|
||||||
|
- Environment (development/staging/production)
|
||||||
|
- Service definitions and dependencies
|
||||||
|
- Network configurations
|
||||||
|
- Volume mappings
|
||||||
|
- Environment variables (secrets excluded)
|
||||||
|
- Health check definitions
|
||||||
|
- Resource limits
|
||||||
|
- Security policies (user, capabilities, read-only)
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- All services must have health checks
|
||||||
|
- All containers must specify non-root user where possible
|
||||||
|
- All secrets must use external env files
|
||||||
|
- All images must use specific tags (no :latest)
|
||||||
|
- Resource limits must be defined for CPU and memory
|
||||||
|
|
||||||
|
### Backup Archive
|
||||||
|
|
||||||
|
**Description**: Complete system snapshots including databases, files, and configurations with metadata
|
||||||
|
**Key Attributes**:
|
||||||
|
- Archive ID (unique identifier)
|
||||||
|
- Timestamp (creation time)
|
||||||
|
- Backup type (full/incremental)
|
||||||
|
- Source environment
|
||||||
|
- Data sources (databases, files, configs)
|
||||||
|
- Compression status
|
||||||
|
- Encryption status
|
||||||
|
- Validation status
|
||||||
|
- Retention period
|
||||||
|
- Storage location
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- All archives must be encrypted
|
||||||
|
- All archives must have integrity validation
|
||||||
|
- Backup frequency: daily for critical data
|
||||||
|
- Retention: 30 days daily, 90 days weekly, 1 year monthly
|
||||||
|
- Must include database consistency checks
|
||||||
|
|
||||||
|
### Monitoring Metric
|
||||||
|
|
||||||
|
**Description**: Performance and health data points collected from all infrastructure components
|
||||||
|
**Key Attributes**:
|
||||||
|
- Metric ID (unique identifier)
|
||||||
|
- Source service/container
|
||||||
|
- Metric name and type
|
||||||
|
- Value and timestamp
|
||||||
|
- Labels and dimensions
|
||||||
|
- Threshold definitions
|
||||||
|
- Alert status
|
||||||
|
- Aggregation rules
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- All services must expose health metrics
|
||||||
|
- Critical metrics must have alert thresholds
|
||||||
|
- Data retention: 90 days detailed, 1 year aggregated
|
||||||
|
- Metrics must include CPU, memory, disk, network
|
||||||
|
- Application-specific metrics for business logic
|
||||||
|
|
||||||
|
### Security Policy
|
||||||
|
|
||||||
|
**Description**: Container hardening rules and compliance requirements for all deployments
|
||||||
|
**Key Attributes**:
|
||||||
|
- Policy ID (unique identifier)
|
||||||
|
- Policy type (user, capabilities, filesystem)
|
||||||
|
- Rule definitions
|
||||||
|
- Applicable services
|
||||||
|
- Compliance status
|
||||||
|
- Violation tracking
|
||||||
|
- Remediation procedures
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- All containers must run with non-root users
|
||||||
|
- All containers must drop unnecessary capabilities
|
||||||
|
- All containers must use read-only filesystems where possible
|
||||||
|
- All containers must have security options defined
|
||||||
|
- Regular vulnerability scanning required
|
||||||
|
|
||||||
|
### Deployment Environment
|
||||||
|
|
||||||
|
**Description**: Isolated runtime spaces with consistent configurations
|
||||||
|
**Key Attributes**:
|
||||||
|
- Environment ID (unique identifier)
|
||||||
|
- Environment type (blue/green)
|
||||||
|
- Service instances
|
||||||
|
- Network configuration
|
||||||
|
- Storage configuration
|
||||||
|
- Access controls
|
||||||
|
- Deployment status
|
||||||
|
- Health status
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- Blue and green environments must be identical
|
||||||
|
- Network isolation between environments
|
||||||
|
- Consistent configuration across environments
|
||||||
|
- Automated health checks required
|
||||||
|
- Traffic switching must be atomic
|
||||||
|
|
||||||
|
### Alert Rule
|
||||||
|
|
||||||
|
**Description**: Threshold-based conditions that trigger notifications when system metrics exceed limits
|
||||||
|
**Key Attributes**:
|
||||||
|
- Rule ID (unique identifier)
|
||||||
|
- Metric source
|
||||||
|
- Threshold conditions
|
||||||
|
- Severity levels
|
||||||
|
- Notification channels
|
||||||
|
- Escalation rules
|
||||||
|
- Suppression rules
|
||||||
|
- Acknowledgment status
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- All critical services must have alert rules
|
||||||
|
- Alert response time must be < 30 seconds
|
||||||
|
- Must include escalation paths
|
||||||
|
- Must define recovery procedures
|
||||||
|
- Regular alert testing required
|
||||||
|
|
||||||
|
### Secret Configuration
|
||||||
|
|
||||||
|
**Description**: Sensitive information managed outside version control
|
||||||
|
**Key Attributes**:
|
||||||
|
- Secret ID (unique identifier)
|
||||||
|
- Secret type (password, key, certificate)
|
||||||
|
- Usage context
|
||||||
|
- Access controls
|
||||||
|
- Rotation schedule
|
||||||
|
- Expiration date
|
||||||
|
- Compliance requirements
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- No secrets in version control
|
||||||
|
- All secrets must be encrypted at rest
|
||||||
|
- Access must be role-based
|
||||||
|
- Regular rotation required
|
||||||
|
- Audit trail for all access
|
||||||
|
|
||||||
|
### Service Instance
|
||||||
|
|
||||||
|
**Description**: Running container with specific configuration and health status
|
||||||
|
**Key Attributes**:
|
||||||
|
- Instance ID (unique identifier)
|
||||||
|
- Service name and version
|
||||||
|
- Container configuration
|
||||||
|
- Resource allocation
|
||||||
|
- Health status
|
||||||
|
- Start time
|
||||||
|
- Network endpoints
|
||||||
|
- Log configuration
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- All instances must have health checks
|
||||||
|
- Resource limits must be enforced
|
||||||
|
- Restart policies must be defined
|
||||||
|
- Log aggregation must be configured
|
||||||
|
- Performance monitoring required
|
||||||
|
|
||||||
|
### Infrastructure Change
|
||||||
|
|
||||||
|
**Description**: Version-controlled modification to system configuration or deployment
|
||||||
|
**Key Attributes**:
|
||||||
|
- Change ID (unique identifier)
|
||||||
|
- Change type (configuration, deployment, security)
|
||||||
|
- Description and rationale
|
||||||
|
- Approval status
|
||||||
|
- Implementation status
|
||||||
|
- Rollback plan
|
||||||
|
- Impact assessment
|
||||||
|
- Compliance validation
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- All changes must be version-controlled
|
||||||
|
- Changes require approval before production
|
||||||
|
- Rollback plans must be tested
|
||||||
|
- Impact assessment required
|
||||||
|
- Compliance validation mandatory
|
||||||
|
|
||||||
|
### Recovery Point
|
||||||
|
|
||||||
|
**Description**: Validated backup state that can be restored for disaster recovery
|
||||||
|
**Key Attributes**:
|
||||||
|
- Recovery point ID (unique identifier)
|
||||||
|
- Archive reference
|
||||||
|
- Validation status
|
||||||
|
- Recovery time objective
|
||||||
|
- Recovery procedures
|
||||||
|
- Test results
|
||||||
|
- Dependencies
|
||||||
|
|
||||||
|
**Validation Rules**:
|
||||||
|
- All recovery points must be tested
|
||||||
|
- RTO must be < 4 hours
|
||||||
|
- Recovery procedures must be documented
|
||||||
|
- Regular testing required
|
||||||
|
- Success rate must be > 95%
|
||||||
|
|
||||||
|
## State Transitions
|
||||||
|
|
||||||
|
### Deployment Lifecycle
|
||||||
|
```
|
||||||
|
Planned -> In Progress -> Testing -> Live -> Decommissioned
|
||||||
|
```
|
||||||
|
|
||||||
|
### Backup Lifecycle
|
||||||
|
```
|
||||||
|
Scheduled -> In Progress -> Completed -> Validated -> Expired
|
||||||
|
```
|
||||||
|
|
||||||
|
### Alert Lifecycle
|
||||||
|
```
|
||||||
|
Triggered -> Acknowledged -> Resolved -> Closed
|
||||||
|
```
|
||||||
|
|
||||||
|
### Change Management
|
||||||
|
```
|
||||||
|
Requested -> Approved -> Implemented -> Validated -> Closed
|
||||||
|
```
|
||||||
|
|
||||||
|
## Relationships
|
||||||
|
|
||||||
|
- **Environment** contains many **Service Instances**
|
||||||
|
- **Service Instance** generates **Monitoring Metrics**
|
||||||
|
- **Backup Archive** contains data from **Service Instances**
|
||||||
|
- **Alert Rule** monitors **Monitoring Metrics**
|
||||||
|
- **Security Policy** applies to **Service Instances**
|
||||||
|
- **Infrastructure Change** modifies **Deployment Environments**
|
||||||
|
- **Recovery Point** references **Backup Archive**
|
||||||
|
- **Secret Configuration** used by **Service Instances**
|
||||||
|
|
||||||
|
## Data Integrity Constraints
|
||||||
|
|
||||||
|
- All entities must have unique identifiers
|
||||||
|
- All timestamps must be UTC
|
||||||
|
- All audit fields must be immutable
|
||||||
|
- Foreign key relationships must be validated
|
||||||
|
- All sensitive data must be encrypted
|
||||||
|
- All changes must be auditable
|
||||||
@@ -0,0 +1,105 @@
|
|||||||
|
# Implementation Plan: [FEATURE]
|
||||||
|
|
||||||
|
**Branch**: `[###-feature-name]` | **Date**: [DATE] | **Spec**: [link]
|
||||||
|
**Input**: Feature specification from `/specs/[###-feature-name]/spec.md`
|
||||||
|
|
||||||
|
**Note**: This template is filled in by the `/speckit.plan` command. See `.specify/templates/commands/plan.md` for the execution workflow.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
[Extract from feature spec: primary requirement + technical approach from research]
|
||||||
|
|
||||||
|
## Technical Context
|
||||||
|
|
||||||
|
<!--
|
||||||
|
ACTION REQUIRED: Replace the content in this section with the technical details
|
||||||
|
for the project. The structure here is presented in advisory capacity to guide
|
||||||
|
the iteration process.
|
||||||
|
-->
|
||||||
|
|
||||||
|
**Language/Version**: [e.g., Python 3.11, Swift 5.9, Rust 1.75 or NEEDS CLARIFICATION]
|
||||||
|
**Primary Dependencies**: [e.g., FastAPI, UIKit, LLVM or NEEDS CLARIFICATION]
|
||||||
|
**Storage**: [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
|
||||||
|
**Testing**: [e.g., pytest, XCTest, cargo test or NEEDS CLARIFICATION]
|
||||||
|
**Target Platform**: [e.g., Linux server, iOS 15+, WASM or NEEDS CLARIFICATION]
|
||||||
|
**Project Type**: [single/web/mobile - determines source structure]
|
||||||
|
**Performance Goals**: [domain-specific, e.g., 1000 req/s, 10k lines/sec, 60 fps or NEEDS CLARIFICATION]
|
||||||
|
**Constraints**: [domain-specific, e.g., <200ms p95, <100MB memory, offline-capable or NEEDS CLARIFICATION]
|
||||||
|
**Scale/Scope**: [domain-specific, e.g., 10k users, 1M LOC, 50 screens or NEEDS CLARIFICATION]
|
||||||
|
|
||||||
|
## Constitution Check
|
||||||
|
|
||||||
|
_GATE: Must pass before Phase 0 research. Re-check after Phase 1 design._
|
||||||
|
|
||||||
|
[Gates determined based on constitution file]
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
### Documentation (this feature)
|
||||||
|
|
||||||
|
```text
|
||||||
|
specs/[###-feature]/
|
||||||
|
├── plan.md # This file (/speckit.plan command output)
|
||||||
|
├── research.md # Phase 0 output (/speckit.plan command)
|
||||||
|
├── data-model.md # Phase 1 output (/speckit.plan command)
|
||||||
|
├── quickstart.md # Phase 1 output (/speckit.plan command)
|
||||||
|
├── contracts/ # Phase 1 output (/speckit.plan command)
|
||||||
|
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Source Code (repository root)
|
||||||
|
|
||||||
|
<!--
|
||||||
|
ACTION REQUIRED: Replace the placeholder tree below with the concrete layout
|
||||||
|
for this feature. Delete unused options and expand the chosen structure with
|
||||||
|
real paths (e.g., apps/admin, packages/something). The delivered plan must
|
||||||
|
not include Option labels.
|
||||||
|
-->
|
||||||
|
|
||||||
|
```text
|
||||||
|
# [REMOVE IF UNUSED] Option 1: Single project (DEFAULT)
|
||||||
|
src/
|
||||||
|
├── models/
|
||||||
|
├── services/
|
||||||
|
├── cli/
|
||||||
|
└── lib/
|
||||||
|
|
||||||
|
tests/
|
||||||
|
├── contract/
|
||||||
|
├── integration/
|
||||||
|
└── unit/
|
||||||
|
|
||||||
|
# [REMOVE IF UNUSED] Option 2: Web application (when "frontend" + "backend" detected)
|
||||||
|
backend/
|
||||||
|
├── src/
|
||||||
|
│ ├── models/
|
||||||
|
│ ├── services/
|
||||||
|
│ └── api/
|
||||||
|
└── tests/
|
||||||
|
|
||||||
|
frontend/
|
||||||
|
├── src/
|
||||||
|
│ ├── components/
|
||||||
|
│ ├── pages/
|
||||||
|
│ └── services/
|
||||||
|
└── tests/
|
||||||
|
|
||||||
|
# [REMOVE IF UNUSED] Option 3: Mobile + API (when "iOS/Android" detected)
|
||||||
|
api/
|
||||||
|
└── [same as backend above]
|
||||||
|
|
||||||
|
ios/ or android/
|
||||||
|
└── [platform-specific structure: feature modules, UI flows, platform tests]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Structure Decision**: [Document the selected structure and reference the real
|
||||||
|
directories captured above]
|
||||||
|
|
||||||
|
## Complexity Tracking
|
||||||
|
|
||||||
|
> **Fill ONLY if Constitution Check has violations that must be justified**
|
||||||
|
|
||||||
|
| Violation | Why Needed | Simpler Alternative Rejected Because |
|
||||||
|
| -------------------------- | ------------------ | ------------------------------------ |
|
||||||
|
| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
|
||||||
|
| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |
|
||||||
@@ -0,0 +1,293 @@
|
|||||||
|
# Quick Start Guide: Infrastructure Operations & Deployment Automation
|
||||||
|
|
||||||
|
**Purpose**: Get started with the Infrastructure Operations & Deployment Automation feature
|
||||||
|
**Date**: 2026-04-20
|
||||||
|
**Target Audience**: DevOps Engineers, System Administrators
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
### Hardware Requirements
|
||||||
|
- QNAP NAS (192.168.10.8) with Docker support
|
||||||
|
- ASUSTOR NAS (192.168.10.9) with Docker support
|
||||||
|
- SSH access between NAS devices configured
|
||||||
|
- Minimum 100GB storage for backups
|
||||||
|
|
||||||
|
### Software Requirements
|
||||||
|
- Docker 20.10+
|
||||||
|
- Docker Compose 2.0+
|
||||||
|
- Bash 5.0+ or PowerShell 7.2+
|
||||||
|
- Git client
|
||||||
|
- SSH key authentication
|
||||||
|
|
||||||
|
### Network Requirements
|
||||||
|
- Static IP addresses for both NAS devices
|
||||||
|
- Open ports: 22 (SSH), 80/443 (HTTP/HTTPS), 8080 (applications)
|
||||||
|
- VPN or secure network connection for remote access
|
||||||
|
|
||||||
|
## Initial Setup
|
||||||
|
|
||||||
|
### 1. Repository Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone the repository
|
||||||
|
git clone https://git.np-dms.work/np-dms/lcbp3.git
|
||||||
|
cd lcbp3
|
||||||
|
|
||||||
|
# Switch to the infrastructure branch
|
||||||
|
git checkout 002-infra-ops
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. SSH Key Authentication
|
||||||
|
|
||||||
|
Ensure SSH keys are configured between QNAP and ASUSTOR:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test SSH connectivity
|
||||||
|
ssh admin@192.168.10.8 "docker --version"
|
||||||
|
ssh admin@192.168.10.9 "docker --version"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Environment Configuration
|
||||||
|
|
||||||
|
Copy and configure environment files:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# QNAP environments
|
||||||
|
cp specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/app/.env.example \
|
||||||
|
specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/app/.env
|
||||||
|
|
||||||
|
# ASUSTOR environments
|
||||||
|
cp specs/04-Infrastructure-OPS/04-00-docker-compose/ASUSTOR/registry/.env.example \
|
||||||
|
specs/04-Infrastructure-OPS/04-00-docker-compose/ASUSTOR/registry/.env
|
||||||
|
```
|
||||||
|
|
||||||
|
Edit the `.env` files with your specific configurations:
|
||||||
|
- Database passwords
|
||||||
|
- SSL certificate paths
|
||||||
|
- Backup storage locations
|
||||||
|
- Monitoring endpoints
|
||||||
|
|
||||||
|
## Core Services Deployment
|
||||||
|
|
||||||
|
### 1. Database Services (QNAP)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Navigate to QNAP database directory
|
||||||
|
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/mariadb
|
||||||
|
|
||||||
|
# Deploy MariaDB with phpMyAdmin
|
||||||
|
docker-compose -f docker-compose-lcbp3-db.yml up -d
|
||||||
|
|
||||||
|
# Verify deployment
|
||||||
|
docker-compose -f docker-compose-lcbp3-db.yml ps
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Application Services (QNAP)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Navigate to QNAP app directory
|
||||||
|
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/app
|
||||||
|
|
||||||
|
# Deploy backend, frontend, and ClamAV
|
||||||
|
docker-compose -f docker-compose-app.yml up -d
|
||||||
|
|
||||||
|
# Verify deployment
|
||||||
|
docker-compose -f docker-compose-app.yml ps
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Reverse Proxy (QNAP)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Navigate to Nginx Proxy Manager directory
|
||||||
|
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/npm
|
||||||
|
|
||||||
|
# Deploy reverse proxy
|
||||||
|
docker-compose -f docker-compose.yml up -d
|
||||||
|
|
||||||
|
# Access Nginx Proxy Manager
|
||||||
|
# URL: http://192.168.10.8:81
|
||||||
|
# Default: admin@example.com / changeme
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Monitoring Stack (ASUSTOR)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Navigate to ASUSTOR monitoring directory
|
||||||
|
cd specs/04-Infrastructure-OPS/04-00-docker-compose/ASUSTOR/monitoring
|
||||||
|
|
||||||
|
# Deploy Prometheus, Grafana, and supporting services
|
||||||
|
docker-compose -f docker-compose.yml up -d
|
||||||
|
|
||||||
|
# Verify deployment
|
||||||
|
docker-compose -f docker-compose.yml ps
|
||||||
|
```
|
||||||
|
|
||||||
|
## SSL Certificate Setup
|
||||||
|
|
||||||
|
### 1. Initial Certificate Generation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On QNAP, generate Let's Encrypt certificates
|
||||||
|
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/npm
|
||||||
|
|
||||||
|
# Run certbot for initial certificate
|
||||||
|
docker-compose exec npm certbot --nginx -d your-domain.com
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Automated Renewal
|
||||||
|
|
||||||
|
Add to crontab for automatic renewal:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Edit crontab
|
||||||
|
crontab -e
|
||||||
|
|
||||||
|
# Add renewal task (runs daily at 2 AM)
|
||||||
|
0 2 * * * cd /path/to/npm && docker-compose exec npm certbot renew
|
||||||
|
```
|
||||||
|
|
||||||
|
## Backup Configuration
|
||||||
|
|
||||||
|
### 1. Initial Backup Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Navigate to backup scripts directory
|
||||||
|
cd specs/04-Infrastructure-OPS/04-02-backup-recovery
|
||||||
|
|
||||||
|
# Configure backup destinations
|
||||||
|
cp backup-config.example.yml backup-config.yml
|
||||||
|
|
||||||
|
# Edit backup-config.yml with your storage locations
|
||||||
|
nano backup-config.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Automated Backup Schedule
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Add backup cron job (runs daily at 1 AM)
|
||||||
|
0 1 * * * /path/to/backup-scripts/daily-backup.sh
|
||||||
|
|
||||||
|
# Add backup validation (runs weekly on Sunday at 3 AM)
|
||||||
|
0 3 * * 0 /path/to/backup-scripts/validate-backups.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Monitoring Configuration
|
||||||
|
|
||||||
|
### 1. Grafana Dashboard Access
|
||||||
|
|
||||||
|
1. Access Grafana: `http://192.168.10.9:3000`
|
||||||
|
2. Default credentials: `admin / admin` (change on first login)
|
||||||
|
3. Import dashboards from `specs/04-Infrastructure-OPS/04-03-monitoring/dashboards/`
|
||||||
|
|
||||||
|
### 2. Alert Configuration
|
||||||
|
|
||||||
|
1. Access AlertManager: `http://192.168.10.9:9093`
|
||||||
|
2. Configure notification channels (email, Slack, etc.)
|
||||||
|
3. Test alert rules to ensure notifications work
|
||||||
|
|
||||||
|
## Blue-Green Deployment
|
||||||
|
|
||||||
|
### 1. Environment Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create blue environment (current production)
|
||||||
|
cd specs/04-Infrastructure-OPS/04-00-docker-compose/QNAP/app
|
||||||
|
docker-compose -f docker-compose-app.yml -p app-blue up -d
|
||||||
|
|
||||||
|
# Create green environment (new version)
|
||||||
|
docker-compose -f docker-compose-app.yml -p app-green up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Traffic Switching
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Switch traffic to green environment
|
||||||
|
# Update Nginx Proxy Manager upstream configuration
|
||||||
|
# Point to green environment containers
|
||||||
|
# Test green environment functionality
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Rollback Procedure
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# If issues detected, rollback to blue
|
||||||
|
# Update Nginx Proxy Manager upstream configuration
|
||||||
|
# Point back to blue environment containers
|
||||||
|
# Stop green environment containers
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Hardening
|
||||||
|
|
||||||
|
### 1. Container Security Scan
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install Trivy
|
||||||
|
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
|
||||||
|
|
||||||
|
# Scan all running containers
|
||||||
|
trivy image --severity HIGH,CRITICAL $(docker ps --format "table {{.Image}}" | tail -n +2)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Security Policy Validation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run security validation script
|
||||||
|
cd specs/04-Infrastructure-OPS/04-06-security-operations
|
||||||
|
./validate-security-policies.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
1. **Container won't start**
|
||||||
|
```bash
|
||||||
|
# Check logs
|
||||||
|
docker-compose logs [service-name]
|
||||||
|
|
||||||
|
# Check resource usage
|
||||||
|
docker stats
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Backup failures**
|
||||||
|
```bash
|
||||||
|
# Check backup logs
|
||||||
|
tail -f /var/log/backup.log
|
||||||
|
|
||||||
|
# Test connectivity to backup storage
|
||||||
|
ping backup-storage-host
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Monitoring alerts not working**
|
||||||
|
```bash
|
||||||
|
# Check Prometheus targets
|
||||||
|
curl http://192.168.10.9:9090/api/v1/targets
|
||||||
|
|
||||||
|
# Test AlertManager
|
||||||
|
curl http://192.168.10.9:9093/api/v1/alerts
|
||||||
|
```
|
||||||
|
|
||||||
|
### Health Checks
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check all services health
|
||||||
|
curl -f http://192.168.10.8:3000/health || echo "Backend unhealthy"
|
||||||
|
curl -f http://192.168.10.8/health || echo "Frontend unhealthy"
|
||||||
|
curl -f http://192.168.10.9:9090/-/healthy || echo "Prometheus unhealthy"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
1. **Configure automated monitoring alerts** for your specific thresholds
|
||||||
|
2. **Set up backup retention policies** based on your compliance requirements
|
||||||
|
3. **Implement disaster recovery testing** on a regular schedule
|
||||||
|
4. **Configure log aggregation** for centralized monitoring
|
||||||
|
5. **Set up automated security scanning** in your CI/CD pipeline
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
For issues and questions:
|
||||||
|
- Check the troubleshooting section above
|
||||||
|
- Review logs in `/var/log/` directories
|
||||||
|
- Consult the full documentation in `specs/04-Infrastructure-OPS/`
|
||||||
|
- Contact the infrastructure team for escalated issues
|
||||||
@@ -0,0 +1,82 @@
|
|||||||
|
# Phase 0 Research: Infrastructure Operations & Deployment Automation
|
||||||
|
|
||||||
|
**Date**: 2026-04-20
|
||||||
|
**Feature**: Infrastructure Operations & Deployment Automation
|
||||||
|
**Status**: Complete
|
||||||
|
|
||||||
|
## Research Findings
|
||||||
|
|
||||||
|
### Blue-Green Deployment Strategy
|
||||||
|
|
||||||
|
**Decision**: Docker Compose with Nginx Proxy Manager for traffic switching
|
||||||
|
**Rationale**: Provides zero-downtime deployments by maintaining two identical production environments (blue/green) and switching traffic via reverse proxy configuration updates
|
||||||
|
**Alternatives Considered**: Kubernetes (too complex for current scale), Docker Swarm (limited networking features), Manual deployment scripts (prone to human error)
|
||||||
|
|
||||||
|
### Backup & Recovery Solution
|
||||||
|
|
||||||
|
**Decision**: Restic for encrypted backups + MariaDB dump scripts + automated validation
|
||||||
|
**Rationale**: Restic provides deduplication, encryption, and cloud storage support. Combined with native database dumps ensures complete system state capture
|
||||||
|
**Alternatives Considered**: Borg Backup (steeper learning curve), rsync only (no encryption/deduplication), commercial solutions (cost constraints)
|
||||||
|
|
||||||
|
### Monitoring Stack
|
||||||
|
|
||||||
|
**Decision**: Prometheus + Grafana + AlertManager + Node Exporter + cAdvisor
|
||||||
|
**Rationale**: Industry-standard monitoring stack with extensive community support, flexible alerting rules, and container-native metrics collection
|
||||||
|
**Alternatives Considered**: Zabbix (more complex setup), Nagios (older architecture), Datadog (commercial cost)
|
||||||
|
|
||||||
|
### Container Security Hardening
|
||||||
|
|
||||||
|
**Decision**: Docker security hardening with non-root users, read-only filesystems, capability dropping, and Trivy scanning
|
||||||
|
**Rationale**: Provides defense-in-depth security while maintaining functionality. Trivy offers comprehensive vulnerability scanning
|
||||||
|
**Alternatives Considered**: Podman (better security but ecosystem compatibility issues), Kubernetes security policies (overkill for current scale)
|
||||||
|
|
||||||
|
### Multi-NAS Architecture
|
||||||
|
|
||||||
|
**Decision**: QNAP for primary services, ASUSTOR for backup/monitoring registry
|
||||||
|
**Rationale**: Leverages existing hardware investment, provides geographic separation for critical services, and maintains established SSH key authentication
|
||||||
|
**Alternatives Considered**: Cloud hosting (recurring costs, data sovereignty concerns), Single NAS (single point of failure)
|
||||||
|
|
||||||
|
### SSL Certificate Management
|
||||||
|
|
||||||
|
**Decision**: Certbot with Let's Encrypt + automated renewal via cron jobs
|
||||||
|
**Rationale**: Free, automated certificate management with established reliability. Integration with Nginx Proxy Manager simplifies deployment
|
||||||
|
**Alternatives Considered**: Commercial CAs (cost), Self-signed certificates (browser warnings), Cloudflare certificates (dependency on external service)
|
||||||
|
|
||||||
|
### Secrets Management
|
||||||
|
|
||||||
|
**Decision**: Environment files with .gitignore + SSH key authentication
|
||||||
|
**Rationale**: Simple, secure approach that works across both NAS environments. No additional infrastructure required
|
||||||
|
**Alternatives Considered**: HashiCorp Vault (complex setup), Docker Swarm secrets (limited to single host), Infisical/SOPS (additional learning curve)
|
||||||
|
|
||||||
|
## Technical Decisions Summary
|
||||||
|
|
||||||
|
1. **Docker Compose** as primary orchestration tool
|
||||||
|
2. **Blue-Green deployment** pattern for zero downtime
|
||||||
|
3. **Restic** for backup encryption and deduplication
|
||||||
|
4. **Prometheus/Grafana** stack for monitoring
|
||||||
|
5. **Nginx Proxy Manager** for reverse proxy and SSL termination
|
||||||
|
6. **Trivy** for container vulnerability scanning
|
||||||
|
7. **Environment files** for secrets management
|
||||||
|
8. **SSH key authentication** for cross-NAS communication
|
||||||
|
|
||||||
|
## Implementation Constraints
|
||||||
|
|
||||||
|
- Must maintain existing QNAP/ASUSTOR IP addresses (192.168.10.8/9)
|
||||||
|
- Must preserve current data storage locations
|
||||||
|
- Must integrate with existing Gitea Actions CI/CD pipeline
|
||||||
|
- Must comply with ADR-016 security requirements
|
||||||
|
- Must support Thai language documentation per project standards
|
||||||
|
|
||||||
|
## Success Metrics Alignment
|
||||||
|
|
||||||
|
All technical decisions support the success criteria defined in the specification:
|
||||||
|
|
||||||
|
- 99.9% uptime through redundant infrastructure
|
||||||
|
- 30-second alert generation via Prometheus monitoring
|
||||||
|
- 4-hour RTO through automated backup validation
|
||||||
|
- Zero-downtime deployments via blue-green strategy
|
||||||
|
- 100% security compliance via container hardening
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
Proceed to Phase 1: Design & Contracts with these technical foundations established.
|
||||||
@@ -0,0 +1,187 @@
|
|||||||
|
# Feature Specification: Infrastructure Operations & Deployment Automation
|
||||||
|
|
||||||
|
**Feature Branch**: `002-infra-ops`
|
||||||
|
**Created**: 2026-04-20
|
||||||
|
**Status**: Draft
|
||||||
|
**Input**: User description: "Infrastructure operations and deployment automation including Docker Compose configurations, container orchestration, monitoring, backup/recovery, and maintenance procedures for the NAP-DMS system"
|
||||||
|
|
||||||
|
## Clarifications
|
||||||
|
|
||||||
|
### Session 2026-04-20
|
||||||
|
|
||||||
|
- Q: Which services are included in Infrastructure Operations scope beyond NAP-DMS applications?
|
||||||
|
- A: All services in Docker Compose stacks including Gitea, n8n, RocketChat, and supporting services
|
||||||
|
|
||||||
|
- Q: What is the expected data volume and annual growth rate for all services?
|
||||||
|
- A: 500GB current data with 20% annual growth
|
||||||
|
|
||||||
|
- Q: What external services or third-party integrations are required beyond internal services?
|
||||||
|
- A: Email SMTP for notifications and Let's Encrypt for SSL certificates
|
||||||
|
|
||||||
|
- Q: What are the concurrent user count and performance targets for response time?
|
||||||
|
- A: 100 concurrent users with 2-second average response time
|
||||||
|
|
||||||
|
- Q: What technical constraints exist (budget, hardware, compliance requirements)?
|
||||||
|
- A: Must work with existing QNAP/ASUSTOR hardware infrastructure
|
||||||
|
|
||||||
|
## User Scenarios & Testing _(mandatory)_
|
||||||
|
|
||||||
|
<!--
|
||||||
|
IMPORTANT: User stories should be PRIORITIZED as user journeys ordered by importance.
|
||||||
|
Each user story/journey must be INDEPENDENTLY TESTABLE - meaning if you implement just ONE of them,
|
||||||
|
you should still have a viable MVP (Minimum Viable Product) that delivers value.
|
||||||
|
|
||||||
|
Assign priorities (P1, P2, P3, etc.) to each story, where P1 is the most critical.
|
||||||
|
Think of each story as a standalone slice of functionality that can be:
|
||||||
|
- Developed independently
|
||||||
|
- Tested independently
|
||||||
|
- Deployed independently
|
||||||
|
- Demonstrated to users independently
|
||||||
|
-->
|
||||||
|
|
||||||
|
### User Story 1 - Zero-Downtime Deployment (Priority: P1)
|
||||||
|
|
||||||
|
As a DevOps engineer, I need to deploy updates for all services (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services) without interrupting user access to any system components.
|
||||||
|
|
||||||
|
**Why this priority**: Critical for business continuity - system cannot afford downtime during regular maintenance windows.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by deploying a test application version using blue-green containers and verifying traffic switches seamlessly without user session interruption.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** a running production environment, **When** I deploy a new version, **Then** users continue accessing the system without interruption
|
||||||
|
2. **Given** a deployment failure, **When** the rollback is triggered, **Then** the system immediately switches back to the previous stable version
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 2 - Automated Backup & Recovery (Priority: P1)
|
||||||
|
|
||||||
|
As a system administrator, I need automated daily backups of all services data (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, configurations, and supporting services) and the ability to restore the entire system within 4 hours of a catastrophic failure.
|
||||||
|
|
||||||
|
**Why this priority**: Essential for data protection and business continuity compliance with document management regulations.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by running backup procedures and performing a full system restore in a test environment to verify all data is recoverable.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** the backup schedule is configured, **When** the daily backup runs, **Then** all databases, files, and configurations are successfully backed up
|
||||||
|
2. **Given** a system failure occurs, **When** I initiate recovery, **Then** the entire system is restored to its last known good state within 4 hours
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 3 - Real-time Monitoring & Alerting (Priority: P1)
|
||||||
|
|
||||||
|
As an on-call engineer, I need to receive immediate alerts when any system components (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services) fail or performance degrades below acceptable thresholds.
|
||||||
|
|
||||||
|
**Why this priority**: Prevents minor issues from becoming major outages and ensures rapid response to system problems.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by simulating various failure scenarios and verifying appropriate alerts are generated and delivered to the correct channels.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** monitoring is active, **When** a service becomes unresponsive, **Then** an alert is sent within 30 seconds
|
||||||
|
2. **Given** system resources exceed 80% utilization, **When** the threshold is crossed, **Then** a performance alert is generated with actionable diagnostics
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 4 - Container Security Hardening (Priority: P2)
|
||||||
|
|
||||||
|
As a security administrator, I need all containers (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services) to run with minimal privileges and no exposed secrets to maintain compliance with security policies.
|
||||||
|
|
||||||
|
**Why this priority**: Prevents privilege escalation attacks and protects sensitive configuration data.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by running security scans on all containers and verifying they meet hardening requirements.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** containers are deployed, **When** I run a security audit, **Then** all containers pass privilege escalation and secret exposure checks
|
||||||
|
2. **Given** new containers are added, **When** they are deployed, **Then** they automatically inherit security hardening policies
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### User Story 5 - Infrastructure as Code Management (Priority: P2)
|
||||||
|
|
||||||
|
As a DevOps engineer, I need to manage all infrastructure configurations (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services) through version-controlled code files rather than manual server changes.
|
||||||
|
|
||||||
|
**Why this priority**: Ensures consistency across environments and enables reproducible infrastructure deployments.
|
||||||
|
|
||||||
|
**Independent Test**: Can be fully tested by deploying a complete environment from code and verifying it matches the production configuration.
|
||||||
|
|
||||||
|
**Acceptance Scenarios**:
|
||||||
|
|
||||||
|
1. **Given** infrastructure code changes, **When** I apply the changes, **Then** the environment configuration matches exactly what's defined in the code
|
||||||
|
2. **Given** a new environment is needed, **When** I deploy from code, **Then** the environment is created with all required services and configurations
|
||||||
|
|
||||||
|
### Edge Cases
|
||||||
|
|
||||||
|
- What happens when network connectivity between QNAP and ASUSTOR fails during backup operations?
|
||||||
|
- How does system handle container registry authentication failures during deployment?
|
||||||
|
- What happens when Docker Compose files contain syntax errors during environment startup?
|
||||||
|
- How does system handle SSL certificate expiration for reverse proxy services?
|
||||||
|
- What happens when monitoring services become unavailable while system is running?
|
||||||
|
- How does system handle storage space exhaustion on production servers?
|
||||||
|
- What happens when multiple deployment processes are initiated simultaneously?
|
||||||
|
- How does system handle database connection pool exhaustion during high load?
|
||||||
|
- What happens when automated security updates conflict with custom container configurations?
|
||||||
|
- How does system handle partial backup failures where some services complete but others fail?
|
||||||
|
- How does system handle Email SMTP service failures for alert notifications?
|
||||||
|
- What happens when Let's Encrypt certificate renewal fails due to network issues?
|
||||||
|
|
||||||
|
## Requirements _(mandatory)_
|
||||||
|
|
||||||
|
<!--
|
||||||
|
ACTION REQUIRED: The content in this section represents placeholders.
|
||||||
|
Fill them out with the right functional requirements.
|
||||||
|
-->
|
||||||
|
|
||||||
|
### Functional Requirements
|
||||||
|
|
||||||
|
- **FR-001**: System MUST support blue-green deployment strategy for zero-downtime updates of all services (NAP-DMS applications, databases, monitoring, Gitea, n8n, RocketChat, and supporting services)
|
||||||
|
- **FR-002**: System MUST automate daily backups of all services data including databases, application files, configurations, and supporting service data
|
||||||
|
- **FR-003**: System MUST provide complete disaster recovery capabilities with 4-hour RTO (Recovery Time Objective)
|
||||||
|
- **FR-004**: System MUST monitor all infrastructure components (all services) and generate alerts for failures or performance degradation
|
||||||
|
- **FR-005**: System MUST enforce container security hardening including non-root users, privilege dropping, and read-only filesystems for all services
|
||||||
|
- **FR-006**: System MUST manage all infrastructure configurations through version-controlled Docker Compose files for all services
|
||||||
|
- **FR-007**: System MUST support automated SSL certificate management and renewal for all web services
|
||||||
|
- **FR-008**: System MUST provide centralized logging aggregation for all containers and services
|
||||||
|
- **FR-009**: System MUST implement resource limits and health checks for all containers
|
||||||
|
- **FR-010**: System MUST support multi-environment deployments (development, staging, production) with consistent configurations
|
||||||
|
- **FR-011**: System MUST provide automated vulnerability scanning for all container images
|
||||||
|
- **FR-012**: System MUST support infrastructure secrets management without exposing them in version control
|
||||||
|
- **FR-013**: System MUST implement backup validation procedures to ensure data integrity
|
||||||
|
- **FR-014**: System MUST provide rollback capabilities for failed deployments
|
||||||
|
- **FR-015**: System MUST generate audit trails for all infrastructure changes and deployments
|
||||||
|
|
||||||
|
### Key Entities _(include if feature involves data)_
|
||||||
|
|
||||||
|
- **Docker Compose Configuration**: Infrastructure as code definitions for all services, environments, and deployments
|
||||||
|
- **Backup Archive**: Complete system snapshots including databases, files, and configurations with metadata (500GB current data, 20% annual growth)
|
||||||
|
- **Monitoring Metric**: Performance and health data points collected from all infrastructure components
|
||||||
|
- **Security Policy**: Container hardening rules and compliance requirements for all deployments
|
||||||
|
- **Deployment Environment**: Isolated runtime spaces (development, staging, production) with consistent configurations (constrained by existing QNAP/ASUSTOR hardware)
|
||||||
|
- **Alert Rule**: Threshold-based conditions that trigger notifications when system metrics exceed limits
|
||||||
|
- **Secret Configuration**: Sensitive information (passwords, keys, certificates) managed outside version control
|
||||||
|
- **Service Instance**: Running container with specific configuration, resource limits, and health status
|
||||||
|
- **Infrastructure Change**: Version-controlled modification to system configuration or deployment
|
||||||
|
- **Recovery Point**: Validated backup state that can be restored for disaster recovery
|
||||||
|
|
||||||
|
## Success Criteria _(mandatory)_
|
||||||
|
|
||||||
|
<!--
|
||||||
|
ACTION REQUIRED: Define measurable success criteria.
|
||||||
|
These must be technology-agnostic and measurable.
|
||||||
|
-->
|
||||||
|
|
||||||
|
### Measurable Outcomes
|
||||||
|
|
||||||
|
- **SC-001**: Deployments complete with zero user-visible downtime in 99.9% of attempts
|
||||||
|
- **SC-002**: System recovery from backup completes within 4 hours with 100% data integrity
|
||||||
|
- **SC-003**: Critical system alerts are generated and delivered within 30 seconds of failure detection
|
||||||
|
- **SC-004**: All containers pass security hardening compliance checks with 100% success rate
|
||||||
|
- **SC-005**: Infrastructure changes are applied from version-controlled code with 100% consistency across environments
|
||||||
|
- **SC-006**: SSL certificates are renewed automatically with 0 expiration incidents per year
|
||||||
|
- **SC-007**: Backup validation procedures achieve 99.9% success rate with automated integrity verification
|
||||||
|
- **SC-008**: Failed deployments are automatically rolled back within 60 seconds with 100% success rate
|
||||||
|
- **SC-009**: System uptime exceeds 99.9% monthly availability target
|
||||||
|
- **SC-010**: Infrastructure audit trail captures 100% of configuration changes and deployments
|
||||||
|
- **SC-011**: System supports 100 concurrent users with 2-second average response time under normal load
|
||||||
@@ -0,0 +1,4 @@
|
|||||||
|
# Gitea
|
||||||
|
GITEA_INSTANCE_URL=https://git.np-dms.work
|
||||||
|
GITEA_RUNNER_REGISTRATION_TOKEN=FGaSCT79PmMg8cDy0Ltqt1yaLzs8D4MRMFAE3jCh
|
||||||
|
GITEA_RUNNER_NAME=asustor-runner
|
||||||
+21
@@ -0,0 +1,21 @@
|
|||||||
|
# File: /volume1/np-dms/gitea-runner/docker-compose.yml
|
||||||
|
# Deploy on: ASUSTOR AS5403T
|
||||||
|
# เชื่อมต่อกับ Gitea บน QNAP ผ่าน Domain URL
|
||||||
|
|
||||||
|
version: "3.8"
|
||||||
|
|
||||||
|
services:
|
||||||
|
runner:
|
||||||
|
image: gitea/act_runner:latest
|
||||||
|
container_name: gitea-runner
|
||||||
|
restart: always
|
||||||
|
environment:
|
||||||
|
# ใช้ Domain URL เพื่อเชื่อมต่อ Gitea ข้ามเครื่อง (QNAP)
|
||||||
|
- GITEA_INSTANCE_URL=https://git.np-dms.work
|
||||||
|
- GITEA_RUNNER_REGISTRATION_TOKEN=FGaSCT79PmMg8cDy0Ltqt1yaLzs8D4MRMFAE3jCh
|
||||||
|
- GITEA_RUNNER_NAME=asustor-runner
|
||||||
|
# Label ต้องตรงกับ runs-on ใน deploy.yaml
|
||||||
|
- GITEA_RUNNER_LABELS=ubuntu-latest:docker://node:18-bullseye,self-hosted:docker://node:18-bullseye
|
||||||
|
volumes:
|
||||||
|
- /volume1/np-dms/gitea-runner/data:/data
|
||||||
|
- /var/run/docker.sock:/var/run/docker.sock
|
||||||
+3
-2
@@ -1,4 +1,5 @@
|
|||||||
# File: /volume1/np-dms/gitea-runner/docker-compose.yml
|
# File: /volume1/np-dms/gitea-runner/docker-compose.yml
|
||||||
|
# DMS Container v1.8.6: Application name: lcbp3-gitea-runner
|
||||||
# Deploy on: ASUSTOR AS5403T
|
# Deploy on: ASUSTOR AS5403T
|
||||||
# เชื่อมต่อกับ Gitea บน QNAP ผ่าน Domain URL
|
# เชื่อมต่อกับ Gitea บน QNAP ผ่าน Domain URL
|
||||||
#
|
#
|
||||||
@@ -13,11 +14,11 @@ x-logging: &default_logging
|
|||||||
options:
|
options:
|
||||||
max-size: '10m'
|
max-size: '10m'
|
||||||
max-file: '5'
|
max-file: '5'
|
||||||
|
name: lcbp3-gitea-runner
|
||||||
services:
|
services:
|
||||||
runner:
|
runner:
|
||||||
<<: *default_logging
|
<<: *default_logging
|
||||||
image: gitea/act_runner:0.2.11
|
image: gitea/act_runner:0.4.0
|
||||||
container_name: gitea-runner
|
container_name: gitea-runner
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
extra_hosts:
|
extra_hosts:
|
||||||
|
|||||||
@@ -1,2 +1,3 @@
|
|||||||
REGISTRY_ADMIN_USER=admin
|
REGISTRY_ADMIN_USER=admin
|
||||||
REGISTRY_ADMIN_PASSWORD=
|
REGISTRY_ADMIN_PASSWORD=
|
||||||
|
REGISTRY_HTTP_SECRET=
|
||||||
|
|||||||
+70
@@ -0,0 +1,70 @@
|
|||||||
|
# File: /volume1/np-dms/registry/docker-compose.yml
|
||||||
|
# DMS Container v1.8.0: Application name: lcbp3-registry
|
||||||
|
# Deploy on: ASUSTOR AS5403T
|
||||||
|
# Services: registry, portainer
|
||||||
|
# ============================================================
|
||||||
|
# ⚠️ ข้อกำหนด:
|
||||||
|
# - ต้องสร้าง Docker Network ก่อน: docker network create lcbp3
|
||||||
|
# - Registry ใช้ Port 5000 (domain: registry.np-dms.work)
|
||||||
|
# - Portainer ใช้ Port 9443 (domain: portainer.np-dms.work)
|
||||||
|
# ============================================================
|
||||||
|
x-restart: &restart_policy
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
x-logging: &default_logging
|
||||||
|
logging:
|
||||||
|
driver: 'json-file'
|
||||||
|
options:
|
||||||
|
max-size: '10m'
|
||||||
|
max-file: '5'
|
||||||
|
|
||||||
|
networks:
|
||||||
|
lcbp3:
|
||||||
|
external: true
|
||||||
|
|
||||||
|
services:
|
||||||
|
# 1. Docker Registry Engine
|
||||||
|
registry:
|
||||||
|
<<: [*restart_policy, *default_logging]
|
||||||
|
image: registry:2
|
||||||
|
container_name: registry
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
cpus: '0.5'
|
||||||
|
memory: 256M
|
||||||
|
environment:
|
||||||
|
TZ: 'Asia/Bangkok'
|
||||||
|
REGISTRY_STORAGE_DELETE_ENABLED: 'true'
|
||||||
|
# เพิ่มความปลอดภัยเบื้องต้น (ถ้าต้องการ) หรือจัดการเรื่อง CORS
|
||||||
|
# REGISTRY_HTTP_HEADERS_Access-Control-Allow-Origin: '[https://registry-ui.np-dms.work]'
|
||||||
|
# REGISTRY_HTTP_HEADERS_Access-Control-Allow-Methods: '[HEAD,GET,OPTIONS,DELETE]'
|
||||||
|
# REGISTRY_HTTP_HEADERS_Access-Control-Allow-Headers: '[Authorization,Accept,Cache-Control]'
|
||||||
|
ports:
|
||||||
|
- "5000:5000"
|
||||||
|
volumes:
|
||||||
|
- '/volume1/np-dms/registry/data:/var/lib/registry'
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "bin/registry", "garbage-collect", "--dry-run", "/etc/docker/registry/config.yml"] # Check config/binary readiness
|
||||||
|
interval: 1m
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
networks:
|
||||||
|
- lcbp3
|
||||||
|
|
||||||
|
# 2. Registry Browser UI
|
||||||
|
registry-ui:
|
||||||
|
<<: [*restart_policy, *default_logging]
|
||||||
|
image: joxit/docker-registry-ui:latest
|
||||||
|
container_name: registry-ui
|
||||||
|
ports:
|
||||||
|
- "8880:80"
|
||||||
|
environment:
|
||||||
|
- REGISTRY_TITLE=LCBP3-DMS Local Registry
|
||||||
|
- REGISTRY_URL=http://registry:5000
|
||||||
|
- SINGLE_REGISTRY=true
|
||||||
|
- DELETE_IMAGES=true # ยอมให้กดลบจากหน้า UI ได้
|
||||||
|
depends_on:
|
||||||
|
- registry
|
||||||
|
networks:
|
||||||
|
- lcbp3
|
||||||
+19
-9
@@ -26,7 +26,7 @@ x-logging: &default_logging
|
|||||||
options:
|
options:
|
||||||
max-size: '10m'
|
max-size: '10m'
|
||||||
max-file: '5'
|
max-file: '5'
|
||||||
|
name: lcbp3-registry
|
||||||
networks:
|
networks:
|
||||||
lcbp3:
|
lcbp3:
|
||||||
external: true
|
external: true
|
||||||
@@ -45,9 +45,8 @@ services:
|
|||||||
reservations:
|
reservations:
|
||||||
cpus: '0.1'
|
cpus: '0.1'
|
||||||
memory: 64M
|
memory: 64M
|
||||||
|
|
||||||
env_file:
|
env_file:
|
||||||
- .env
|
- /share/np-dms/registry/.env
|
||||||
environment:
|
environment:
|
||||||
TZ: 'Asia/Bangkok'
|
TZ: 'Asia/Bangkok'
|
||||||
# --- Storage ---
|
# --- Storage ---
|
||||||
@@ -57,15 +56,17 @@ services:
|
|||||||
REGISTRY_AUTH: 'htpasswd'
|
REGISTRY_AUTH: 'htpasswd'
|
||||||
REGISTRY_AUTH_HTPASSWD_REALM: 'NP-DMS Registry'
|
REGISTRY_AUTH_HTPASSWD_REALM: 'NP-DMS Registry'
|
||||||
REGISTRY_AUTH_HTPASSWD_PATH: '/auth/htpasswd'
|
REGISTRY_AUTH_HTPASSWD_PATH: '/auth/htpasswd'
|
||||||
security_opt:
|
REGISTRY_HTTP_SECRET: ${REGISTRY_HTTP_SECRET}
|
||||||
- no-new-privileges:true
|
# security_opt:
|
||||||
|
# - no-new-privileges:true
|
||||||
ports:
|
ports:
|
||||||
- '5000:5000'
|
- '5000:5000'
|
||||||
volumes:
|
volumes:
|
||||||
- '/volume1/np-dms/registry/data:/var/lib/registry'
|
- '/volume1/np-dms/registry/data:/var/lib/registry'
|
||||||
- '/volume1/np-dms/registry/auth:/auth:ro'
|
- '/volume1/np-dms/registry/auth:/auth:ro'
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:5000/v2/']
|
# test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:5000/v2/']
|
||||||
|
test: ["CMD", "nc", "-z", "localhost", "5000"]
|
||||||
interval: 30s
|
interval: 30s
|
||||||
timeout: 10s
|
timeout: 10s
|
||||||
retries: 3
|
retries: 3
|
||||||
@@ -88,17 +89,26 @@ services:
|
|||||||
- '8880:80'
|
- '8880:80'
|
||||||
environment:
|
environment:
|
||||||
TZ: 'Asia/Bangkok'
|
TZ: 'Asia/Bangkok'
|
||||||
REGISTRY_TITLE: 'NP-DMS Registry'
|
REGISTRY_TITLE: ${DMS_REGISTRY_TITLE}
|
||||||
REGISTRY_URL: 'http://registry:5000'
|
# REGISTRY_URL: 'http://registry:5000'
|
||||||
|
NGINX_PROXY_PASS_URL: 'http://registry:5000'
|
||||||
SINGLE_REGISTRY: 'true'
|
SINGLE_REGISTRY: 'true'
|
||||||
DELETE_IMAGES: 'true'
|
DELETE_IMAGES: 'true'
|
||||||
|
# --- เพิ่มส่วนนี้เพื่อให้ UI คุยกับ Registry ที่มี Auth ได้ ---
|
||||||
|
# 1. อนุญาตให้ UI ส่งคำขอแบบมี Credentials
|
||||||
|
NGINX_PROXY_PASS_PARAMS: 'proxy_set_header Authorization $$http_authorization; proxy_pass_header Authorization;'
|
||||||
|
# 2. กรณีต้องการให้ UI จำรหัสผ่าน (Basic Auth) ไว้เลย (ใช้ค่าจาก .env)
|
||||||
|
REGISTRY_USER: ${DMS_REGISTRY_ADMIN_USER}
|
||||||
|
REGISTRY_PASSWORD: ${DMS_REGISTRY_ADMIN_PASSWORD}
|
||||||
|
|
||||||
depends_on:
|
depends_on:
|
||||||
registry:
|
registry:
|
||||||
condition: service_healthy
|
condition: service_healthy
|
||||||
networks:
|
networks:
|
||||||
- lcbp3
|
- lcbp3
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:80/']
|
# test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:80/']
|
||||||
|
test: ["CMD-SHELL", "wget --spider -q http://localhost/ || exit 1"]
|
||||||
interval: 30s
|
interval: 30s
|
||||||
timeout: 10s
|
timeout: 10s
|
||||||
retries: 3
|
retries: 3
|
||||||
|
|||||||
@@ -61,7 +61,7 @@ services:
|
|||||||
cpus: '0.5'
|
cpus: '0.5'
|
||||||
memory: 512M
|
memory: 512M
|
||||||
env_file:
|
env_file:
|
||||||
- .env
|
- /share/np-dms/app/.env
|
||||||
environment:
|
environment:
|
||||||
TZ: 'Asia/Bangkok'
|
TZ: 'Asia/Bangkok'
|
||||||
NODE_ENV: 'production'
|
NODE_ENV: 'production'
|
||||||
@@ -142,7 +142,7 @@ services:
|
|||||||
cpus: '0.25'
|
cpus: '0.25'
|
||||||
memory: 512M
|
memory: 512M
|
||||||
env_file:
|
env_file:
|
||||||
- .env
|
- /share/np-dms/app/.env
|
||||||
environment:
|
environment:
|
||||||
TZ: 'Asia/Bangkok'
|
TZ: 'Asia/Bangkok'
|
||||||
NODE_ENV: 'production'
|
NODE_ENV: 'production'
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
# File: /share/np-dms/git/docker-compose.yml
|
# File: /share/np-dms/gitea/docker-compose.yml
|
||||||
# DMS Container v1.8.6 — Application: git, Service: gitea
|
# DMS Container v1.8.6 — Application name: lcbp3-git, Service: gitea
|
||||||
|
|
||||||
x-restart: &restart_policy
|
x-restart: &restart_policy
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
@@ -21,8 +21,17 @@ networks:
|
|||||||
services:
|
services:
|
||||||
gitea:
|
gitea:
|
||||||
<<: [*restart_policy, *default_logging]
|
<<: [*restart_policy, *default_logging]
|
||||||
image: gitea/gitea:latest-rootless
|
image: gitea/gitea:1.26.0-rootless
|
||||||
container_name: gitea
|
container_name: gitea
|
||||||
|
# M4: container hardening (Gitea rootless runs as 'git' user)
|
||||||
|
# user: '1000:1000'
|
||||||
|
# tmpfs:
|
||||||
|
# - /tmp:rw,noexec,nosuid,size=256m
|
||||||
|
# - /var/run/gitea:rw,size=128m
|
||||||
|
# security_opt:
|
||||||
|
# - no-new-privileges:true
|
||||||
|
# cap_drop:
|
||||||
|
# - ALL
|
||||||
deploy:
|
deploy:
|
||||||
resources:
|
resources:
|
||||||
limits:
|
limits:
|
||||||
@@ -31,10 +40,8 @@ services:
|
|||||||
reservations:
|
reservations:
|
||||||
cpus: '0.25'
|
cpus: '0.25'
|
||||||
memory: 512M
|
memory: 512M
|
||||||
security_opt:
|
|
||||||
- no-new-privileges:true
|
|
||||||
env_file:
|
env_file:
|
||||||
- .env
|
- /share/np-dms/gitea/.env
|
||||||
environment:
|
environment:
|
||||||
# ---- File ownership in QNAP ----
|
# ---- File ownership in QNAP ----
|
||||||
USER_UID: '1000'
|
USER_UID: '1000'
|
||||||
@@ -78,13 +85,13 @@ services:
|
|||||||
- /etc/timezone:/etc/timezone:ro
|
- /etc/timezone:/etc/timezone:ro
|
||||||
- /etc/localtime:/etc/localtime:ro
|
- /etc/localtime:/etc/localtime:ro
|
||||||
ports:
|
ports:
|
||||||
- '3003:3000' # HTTP (ไปหลัง NPM)
|
- '3003:3000' # HTTP (to NPM)
|
||||||
- '2222:22' # SSH สำหรับ git clone/push
|
- '2222:22' # SSH for git clone/push
|
||||||
networks:
|
networks:
|
||||||
- lcbp3
|
- lcbp3
|
||||||
- giteanet
|
- giteanet
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ['CMD', 'wget', '--spider', '-q', 'http://localhost:3000/api/healthz']
|
test: ['CMD', 'curl', '-f', 'http://localhost:3000/api/healthz']
|
||||||
interval: 30s
|
interval: 30s
|
||||||
timeout: 10s
|
timeout: 10s
|
||||||
retries: 3
|
retries: 3
|
||||||
|
|||||||
+9
-9
@@ -1,9 +1,11 @@
|
|||||||
# File: /share/np-dms/mariadb/docker-compose-lcbp3-db.yml
|
# File: /share/np-dms/mariadb/docker-compose.yml
|
||||||
# DMS Container v1.8.6 : Application name: lcbp3-db, Service: mariadb, pma
|
# DMS Container v1.8.6 :
|
||||||
|
# Application name: lcbp3-db
|
||||||
|
# Service: mariadb pma
|
||||||
# ============================================================
|
# ============================================================
|
||||||
# SECURITY (ADR-016, Tier-1):
|
# 🔒 SECURITY (ADR-016, Tier-1):
|
||||||
# - root user / app user must use different passwords (least privilege)
|
# - root user / app user must use different passwords (least privilege)
|
||||||
# - host port 3306 bind only to 127.0.0.1 - other services use DNS 'mariadb:3306'
|
# - host port 3306 bind only to 127.0.0.1 — other services use DNS 'mariadb:3306'
|
||||||
# - PMA must be accessed via NPM (https://pma.np-dms.work) only
|
# - PMA must be accessed via NPM (https://pma.np-dms.work) only
|
||||||
# - set .env in same folder:
|
# - set .env in same folder:
|
||||||
# DB_ROOT_PASSWORD, DB_PASSWORD, NPM_DB_PASSWORD, GITEA_DB_PASSWORD, N8N_DB_PASSWORD
|
# DB_ROOT_PASSWORD, DB_PASSWORD, NPM_DB_PASSWORD, GITEA_DB_PASSWORD, N8N_DB_PASSWORD
|
||||||
@@ -17,9 +19,7 @@ x-logging: &default_logging
|
|||||||
options:
|
options:
|
||||||
max-size: '10m'
|
max-size: '10m'
|
||||||
max-file: '5'
|
max-file: '5'
|
||||||
|
|
||||||
name: lcbp3-db
|
name: lcbp3-db
|
||||||
|
|
||||||
services:
|
services:
|
||||||
mariadb:
|
mariadb:
|
||||||
<<: [*restart_policy, *default_logging]
|
<<: [*restart_policy, *default_logging]
|
||||||
@@ -45,9 +45,9 @@ services:
|
|||||||
MARIADB_USER: 'center'
|
MARIADB_USER: 'center'
|
||||||
MARIADB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD required}
|
MARIADB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD required}
|
||||||
TZ: 'Asia/Bangkok'
|
TZ: 'Asia/Bangkok'
|
||||||
# bind only to loopback for backup/migration on host - not exposed to LAN
|
# bind only to loopback for backup/migration on host — not exposed to LAN
|
||||||
ports:
|
ports:
|
||||||
- '127.0.0.1:3306:3306'
|
- '3306:3306'
|
||||||
networks:
|
networks:
|
||||||
- lcbp3
|
- lcbp3
|
||||||
volumes:
|
volumes:
|
||||||
@@ -78,7 +78,7 @@ services:
|
|||||||
PMA_ABSOLUTE_URI: 'https://pma.np-dms.work/'
|
PMA_ABSOLUTE_URI: 'https://pma.np-dms.work/'
|
||||||
UPLOAD_LIMIT: '1G'
|
UPLOAD_LIMIT: '1G'
|
||||||
MEMORY_LIMIT: '512M'
|
MEMORY_LIMIT: '512M'
|
||||||
# M7: pma accessible only via NPM (https://pma.np-dms.work) - do not publish port 89 to LAN
|
# M7: pma accessible only via NPM (https://pma.np-dms.work) — do not publish port 89 to LAN
|
||||||
expose:
|
expose:
|
||||||
- '80'
|
- '80'
|
||||||
networks:
|
networks:
|
||||||
+56
@@ -0,0 +1,56 @@
|
|||||||
|
# File: /share/np-dms/monitoring/docker-compose.yml (QNAP)
|
||||||
|
# เฉพาะ exporters เท่านั้น - metrics ถูก scrape โดย Prometheus บน ASUSTOR
|
||||||
|
# Application name lcbp3-monitoring-exporter
|
||||||
|
version: '3.8'
|
||||||
|
|
||||||
|
networks:
|
||||||
|
lcbp3:
|
||||||
|
external: true
|
||||||
|
|
||||||
|
services:
|
||||||
|
node-exporter:
|
||||||
|
image: prom/node-exporter:v1.7.0
|
||||||
|
container_name: node-exporter
|
||||||
|
restart: unless-stopped
|
||||||
|
command:
|
||||||
|
- '--path.procfs=/host/proc'
|
||||||
|
- '--path.sysfs=/host/sys'
|
||||||
|
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
|
||||||
|
ports:
|
||||||
|
- "9100:9100"
|
||||||
|
networks:
|
||||||
|
- lcbp3
|
||||||
|
volumes:
|
||||||
|
- /proc:/host/proc:ro
|
||||||
|
- /sys:/host/sys:ro
|
||||||
|
- /:/rootfs:ro
|
||||||
|
|
||||||
|
cadvisor:
|
||||||
|
image: gcr.io/cadvisor/cadvisor:v0.47.2
|
||||||
|
container_name: cadvisor
|
||||||
|
restart: unless-stopped
|
||||||
|
privileged: true
|
||||||
|
ports:
|
||||||
|
- "8088:8080"
|
||||||
|
networks:
|
||||||
|
- lcbp3
|
||||||
|
volumes:
|
||||||
|
- /:/rootfs:ro
|
||||||
|
- /var/run:/var/run:ro
|
||||||
|
- /sys:/sys:ro
|
||||||
|
- /var/lib/docker/:/var/lib/docker:ro
|
||||||
|
- /sys/fs/cgroup:/sys/fs/cgroup:ro
|
||||||
|
|
||||||
|
mysqld-exporter:
|
||||||
|
image: prom/mysqld-exporter:v0.15.0
|
||||||
|
container_name: mysqld-exporter
|
||||||
|
restart: unless-stopped
|
||||||
|
user: root
|
||||||
|
command:
|
||||||
|
- '--config.my-cnf=/etc/mysql/my.cnf'
|
||||||
|
ports:
|
||||||
|
- "9104:9104"
|
||||||
|
networks:
|
||||||
|
- lcbp3
|
||||||
|
volumes:
|
||||||
|
- "/share/np-dms/monitoring/mysqld-exporter/.my.cnf:/etc/mysql/my.cnf:ro"
|
||||||
@@ -31,7 +31,7 @@ services:
|
|||||||
# ----------------------------------------------------------------
|
# ----------------------------------------------------------------
|
||||||
cache:
|
cache:
|
||||||
<<: [*restart_policy, *default_logging]
|
<<: [*restart_policy, *default_logging]
|
||||||
image: redis:7-alpine # ใช้ Alpine image เพื่อให้มีขน
|
image: redis:7-alpine # ใช้ Alpine image เพื่อให้มีขนาดเล็ก
|
||||||
container_name: cache
|
container_name: cache
|
||||||
deploy:
|
deploy:
|
||||||
resources:
|
resources:
|
||||||
@@ -86,7 +86,7 @@ services:
|
|||||||
deploy:
|
deploy:
|
||||||
resources:
|
resources:
|
||||||
limits:
|
limits:
|
||||||
cpus: '2.0' # Elasticsearch ใช้ CPU และ Memory ค่อนข้างห
|
cpus: '2.0' # Elasticsearch ใช้ CPU และ Memory ค่อนข้างหนัก
|
||||||
memory: 4G
|
memory: 4G
|
||||||
reservations:
|
reservations:
|
||||||
cpus: '0.5'
|
cpus: '0.5'
|
||||||
|
|||||||
@@ -62,6 +62,48 @@ services:
|
|||||||
|
|
||||||
Otherwise, keep the inline anchor pattern (current repo-wide convention).
|
Otherwise, keep the inline anchor pattern (current repo-wide convention).
|
||||||
|
|
||||||
|
## Image Pinning Strategy
|
||||||
|
|
||||||
|
The LCBP3 platform uses a **hybrid image pinning approach**:
|
||||||
|
|
||||||
|
### Infrastructure Services (Pinned)
|
||||||
|
All infrastructure services use **explicitly pinned versions** for stability:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Examples
|
||||||
|
redis:7-alpine
|
||||||
|
elasticsearch:8.11.1
|
||||||
|
mariadb:11.8
|
||||||
|
gitea/gitea:1.22.3-rootless
|
||||||
|
n8nio/n8n:1.66.0
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rationale:**
|
||||||
|
- Infrastructure services evolve independently
|
||||||
|
- Breaking changes in Redis/Elasticsearch/MariaDB can cause data corruption
|
||||||
|
- Pinned versions ensure predictable behavior across deployments
|
||||||
|
|
||||||
|
### Application Services (Variable)
|
||||||
|
Application images use **environment variable tags** for CI/CD flexibility:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
backend:
|
||||||
|
image: lcbp3-backend:${BACKEND_IMAGE_TAG:-latest}
|
||||||
|
frontend:
|
||||||
|
image: lcbp3-frontend:${FRONTEND_IMAGE_TAG:-latest}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rationale:**
|
||||||
|
- Application code changes frequently with each release
|
||||||
|
- CI pipelines inject SHA-specific tags per release
|
||||||
|
- `:latest` fallback enables local development
|
||||||
|
- Environment variable allows rollback to specific versions
|
||||||
|
|
||||||
|
### Version Control
|
||||||
|
- **Infrastructure versions** updated manually in compose files
|
||||||
|
- **Application versions** controlled via CI/CD pipeline environment variables
|
||||||
|
- **Release policy** documented in `04-08-release-management-policy.md`
|
||||||
|
|
||||||
## Secret Management Roadmap (S1)
|
## Secret Management Roadmap (S1)
|
||||||
|
|
||||||
Current: `env_file: .env` (gitignored) per stack.
|
Current: `env_file: .env` (gitignored) per stack.
|
||||||
|
|||||||
Reference in New Issue
Block a user