Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
347 lines
10 KiB
Markdown
347 lines
10 KiB
Markdown
# Infrastructure Structure Overview
|
|
|
|
## New Multi-Environment Structure
|
|
|
|
```
|
|
infra/
|
|
├── README.md # Main infrastructure documentation
|
|
├── DEPLOYMENT_GUIDE.md # Complete deployment guide
|
|
├── MIGRATION_GUIDE.md # Migration from old structure
|
|
├── STRUCTURE_OVERVIEW.md # This file
|
|
│
|
|
├── base/ # Base compose files (environment-agnostic)
|
|
│ ├── infrastructure.yaml # Core infrastructure services
|
|
│ ├── services.yaml # Application microservices
|
|
│ ├── monitoring.yaml # Monitoring stack
|
|
│ └── external.yaml # External services (Traefik, Authentik, etc.)
|
|
│
|
|
├── environments/ # Environment-specific configurations
|
|
│ ├── local/ # Local development
|
|
│ │ ├── .env.example # Template
|
|
│ │ └── .env # Actual config (gitignored)
|
|
│ ├── development/ # Development server
|
|
│ │ ├── .env.example # Template
|
|
│ │ └── .env # Actual config (gitignored)
|
|
│ └── production/ # Production server
|
|
│ ├── .env.example # Template
|
|
│ └── .env # Actual config (gitignored)
|
|
│
|
|
├── configs/ # Service configuration files
|
|
│ ├── traefik/ # Traefik configs
|
|
│ │ ├── config/ # Dynamic configuration
|
|
│ │ │ ├── middlewares.yml
|
|
│ │ │ ├── routers.yml
|
|
│ │ │ └── services.yml
|
|
│ │ ├── traefik.yml # Static configuration
|
|
│ │ └── .provider.env # GoDaddy API credentials (gitignored)
|
|
│ ├── grafana/ # Grafana configs
|
|
│ │ ├── dashboards/ # Dashboard JSON files
|
|
│ │ └── provisioning/ # Datasources, dashboards
|
|
│ ├── prometheus/ # Prometheus config
|
|
│ │ └── prometheus.yml
|
|
│ ├── loki/ # Loki config
|
|
│ │ └── loki-config.yml
|
|
│ ├── promtail/ # Promtail config
|
|
│ │ └── promtail-config.yml
|
|
│ ├── vault/ # Vault config
|
|
│ │ └── config/
|
|
│ └── authentik/ # Authentik bootstrap
|
|
│ ├── bootstrap.yaml
|
|
│ ├── custom-templates/
|
|
│ └── media/
|
|
│
|
|
├── certs/ # SSL certificates (gitignored)
|
|
│ ├── local/ # Self-signed certs
|
|
│ ├── development/ # Let's Encrypt certs
|
|
│ └── production/ # Let's Encrypt certs
|
|
│
|
|
├── docker/ # Dockerfile templates
|
|
│ ├── base-runtime.Dockerfile # Base image for all services
|
|
│ ├── base-ml.Dockerfile # Base image for ML services
|
|
│ └── Dockerfile.ml-service.template
|
|
│
|
|
└── scripts/ # Deployment and utility scripts
|
|
├── deploy.sh # Main deployment script
|
|
├── setup-networks.sh # Create Docker networks
|
|
└── cleanup.sh # Cleanup script
|
|
```
|
|
|
|
## Base Compose Files
|
|
|
|
### infrastructure.yaml
|
|
Core infrastructure services needed by the application:
|
|
- **Vault** - Secrets management
|
|
- **MinIO** - Object storage (S3-compatible)
|
|
- **PostgreSQL** - Relational database
|
|
- **Neo4j** - Graph database
|
|
- **Qdrant** - Vector database
|
|
- **Redis** - Cache and session store
|
|
- **NATS** - Message queue (with JetStream)
|
|
|
|
### services.yaml
|
|
Application microservices (14 services):
|
|
- **svc-ingestion** - Document ingestion
|
|
- **svc-extract** - Data extraction
|
|
- **svc-kg** - Knowledge graph
|
|
- **svc-rag-indexer** - RAG indexing (ML)
|
|
- **svc-rag-retriever** - RAG retrieval (ML)
|
|
- **svc-forms** - Form processing
|
|
- **svc-hmrc** - HMRC integration
|
|
- **svc-ocr** - OCR processing (ML)
|
|
- **svc-rpa** - RPA automation
|
|
- **svc-normalize-map** - Data normalization
|
|
- **svc-reason** - Reasoning engine
|
|
- **svc-firm-connectors** - Firm integrations
|
|
- **svc-coverage** - Coverage analysis
|
|
- **ui-review** - Review UI (Next.js)
|
|
|
|
### monitoring.yaml
|
|
Monitoring and observability stack:
|
|
- **Prometheus** - Metrics collection
|
|
- **Grafana** - Dashboards and visualization
|
|
- **Loki** - Log aggregation
|
|
- **Promtail** - Log collection
|
|
|
|
### external.yaml (optional)
|
|
External services that may already exist:
|
|
- **Traefik** - Reverse proxy and load balancer
|
|
- **Authentik** - SSO and authentication
|
|
- **Gitea** - Git repository and container registry
|
|
- **Nextcloud** - File storage
|
|
- **Portainer** - Docker management UI
|
|
|
|
## Environment Configurations
|
|
|
|
### Local Development
|
|
- **Domain**: `localhost` or `*.local.harkon.co.uk`
|
|
- **SSL**: Self-signed certificates
|
|
- **Auth**: Optional (can disable Authentik)
|
|
- **Registry**: Local Docker registry or Gitea
|
|
- **Passwords**: Simple (postgres, admin, etc.)
|
|
- **Purpose**: Local development and testing
|
|
- **Traefik Dashboard**: Exposed on port 8080
|
|
|
|
### Development Server
|
|
- **Domain**: `*.dev.harkon.co.uk`
|
|
- **SSL**: Let's Encrypt (DNS-01 via GoDaddy)
|
|
- **Auth**: Authentik SSO enabled
|
|
- **Registry**: Gitea container registry
|
|
- **Passwords**: Strong (auto-generated)
|
|
- **Purpose**: Staging and integration testing
|
|
- **Traefik Dashboard**: Protected by Authentik
|
|
|
|
### Production Server
|
|
- **Domain**: `*.harkon.co.uk`
|
|
- **SSL**: Let's Encrypt (DNS-01 via GoDaddy)
|
|
- **Auth**: Authentik SSO enabled
|
|
- **Registry**: Gitea container registry
|
|
- **Passwords**: Strong (auto-generated)
|
|
- **Purpose**: Production deployment
|
|
- **Traefik Dashboard**: Protected by Authentik
|
|
- **Monitoring**: Full stack enabled
|
|
|
|
## Docker Networks
|
|
|
|
All environments use two networks:
|
|
|
|
### frontend
|
|
- Public-facing services
|
|
- Connected to Traefik
|
|
- Services: UI, Grafana, Vault, MinIO console
|
|
|
|
### backend
|
|
- Internal services
|
|
- Not directly accessible
|
|
- Services: Databases, message queues, internal APIs
|
|
|
|
## Volume Naming
|
|
|
|
Volumes are named consistently across environments:
|
|
- `postgres_data`
|
|
- `neo4j_data`
|
|
- `neo4j_logs`
|
|
- `qdrant_data`
|
|
- `minio_data`
|
|
- `vault_data`
|
|
- `redis_data`
|
|
- `nats_data`
|
|
- `prometheus_data`
|
|
- `grafana_data`
|
|
- `loki_data`
|
|
|
|
## Deployment Workflow
|
|
|
|
### 1. Setup Environment
|
|
```bash
|
|
cp infra/environments/production/.env.example infra/environments/production/.env
|
|
vim infra/environments/production/.env
|
|
```
|
|
|
|
### 2. Generate Secrets
|
|
```bash
|
|
./scripts/generate-production-secrets.sh
|
|
```
|
|
|
|
### 3. Setup Networks
|
|
```bash
|
|
./infra/scripts/setup-networks.sh
|
|
```
|
|
|
|
### 4. Deploy Infrastructure
|
|
```bash
|
|
./infra/scripts/deploy.sh production infrastructure
|
|
```
|
|
|
|
### 5. Deploy Monitoring
|
|
```bash
|
|
./infra/scripts/deploy.sh production monitoring
|
|
```
|
|
|
|
### 6. Deploy Services
|
|
```bash
|
|
./infra/scripts/deploy.sh production services
|
|
```
|
|
|
|
## Key Features
|
|
|
|
### ✅ Multi-Environment Support
|
|
Single codebase deploys to local, development, and production with environment-specific configurations.
|
|
|
|
### ✅ Modular Architecture
|
|
Services split into logical groups (infrastructure, monitoring, services, external) for independent deployment.
|
|
|
|
### ✅ Unified Deployment
|
|
Single `deploy.sh` script handles all environments and stacks.
|
|
|
|
### ✅ Environment Isolation
|
|
Each environment has its own `.env` file with appropriate secrets and configurations.
|
|
|
|
### ✅ Shared Configurations
|
|
Common service configs in `configs/` directory, referenced by all environments.
|
|
|
|
### ✅ Security Best Practices
|
|
- Secrets in gitignored `.env` files
|
|
- Strong password generation
|
|
- Authentik SSO integration
|
|
- SSL/TLS everywhere (Let's Encrypt)
|
|
|
|
### ✅ Easy Maintenance
|
|
- Clear directory structure
|
|
- Comprehensive documentation
|
|
- Migration guide from old structure
|
|
- Troubleshooting guides
|
|
|
|
## Service Access
|
|
|
|
### Local
|
|
- http://localhost:3000 - Grafana
|
|
- http://localhost:9093 - MinIO
|
|
- http://localhost:8200 - Vault
|
|
- http://localhost:8080 - Traefik Dashboard
|
|
|
|
### Development
|
|
- https://grafana.dev.harkon.co.uk
|
|
- https://minio.dev.harkon.co.uk
|
|
- https://vault.dev.harkon.co.uk
|
|
- https://ui-review.dev.harkon.co.uk
|
|
|
|
### Production
|
|
- https://grafana.harkon.co.uk
|
|
- https://minio.harkon.co.uk
|
|
- https://vault.harkon.co.uk
|
|
- https://ui-review.harkon.co.uk
|
|
|
|
## Configuration Management
|
|
|
|
### Environment Variables
|
|
All configuration via environment variables in `.env` files:
|
|
- Domain settings
|
|
- Database passwords
|
|
- API keys
|
|
- OAuth secrets
|
|
- Registry credentials
|
|
|
|
### Service Configs
|
|
Static configurations in `configs/` directory:
|
|
- Traefik routing rules
|
|
- Grafana dashboards
|
|
- Prometheus scrape configs
|
|
- Loki retention policies
|
|
|
|
### Secrets Management
|
|
- Development/Production: Vault
|
|
- Local: Environment variables
|
|
- Rotation: `generate-production-secrets.sh`
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Metrics (Prometheus)
|
|
- Service health
|
|
- Resource usage
|
|
- Request rates
|
|
- Error rates
|
|
|
|
### Logs (Loki)
|
|
- Centralized logging
|
|
- Query via Grafana
|
|
- Retention policies
|
|
- Log aggregation
|
|
|
|
### Dashboards (Grafana)
|
|
- Infrastructure overview
|
|
- Service metrics
|
|
- Application performance
|
|
- Business metrics
|
|
|
|
### Alerts
|
|
- Prometheus AlertManager
|
|
- Slack/Email notifications
|
|
- PagerDuty integration
|
|
|
|
## Backup Strategy
|
|
|
|
### What to Backup
|
|
- PostgreSQL database
|
|
- Neo4j graph data
|
|
- Vault secrets
|
|
- MinIO objects
|
|
- Qdrant vectors
|
|
- Grafana dashboards
|
|
|
|
### How to Backup
|
|
```bash
|
|
# Automated backup script
|
|
./scripts/backup-volumes.sh production
|
|
|
|
# Manual backup
|
|
docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine tar czf /backup/postgres.tar.gz /data
|
|
```
|
|
|
|
### Backup Schedule
|
|
- Daily: Databases
|
|
- Weekly: Full system
|
|
- Monthly: Archive
|
|
|
|
## Disaster Recovery
|
|
|
|
### Recovery Steps
|
|
1. Restore infrastructure
|
|
2. Restore volumes from backup
|
|
3. Deploy services
|
|
4. Verify functionality
|
|
5. Update DNS if needed
|
|
|
|
### RTO/RPO
|
|
- **RTO**: 4 hours (Recovery Time Objective)
|
|
- **RPO**: 24 hours (Recovery Point Objective)
|
|
|
|
## Next Steps
|
|
|
|
1. Review [DEPLOYMENT_GUIDE.md](DEPLOYMENT_GUIDE.md) for deployment instructions
|
|
2. Review [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) if migrating from old structure
|
|
3. Setup environment files
|
|
4. Deploy to local first
|
|
5. Test in development
|
|
6. Deploy to production
|
|
|