Files
ai-tax-agent/infra/STRUCTURE_OVERVIEW.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

347 lines
10 KiB
Markdown

# Infrastructure Structure Overview
## New Multi-Environment Structure
```
infra/
├── README.md # Main infrastructure documentation
├── DEPLOYMENT_GUIDE.md # Complete deployment guide
├── MIGRATION_GUIDE.md # Migration from old structure
├── STRUCTURE_OVERVIEW.md # This file
├── base/ # Base compose files (environment-agnostic)
│ ├── infrastructure.yaml # Core infrastructure services
│ ├── services.yaml # Application microservices
│ ├── monitoring.yaml # Monitoring stack
│ └── external.yaml # External services (Traefik, Authentik, etc.)
├── environments/ # Environment-specific configurations
│ ├── local/ # Local development
│ │ ├── .env.example # Template
│ │ └── .env # Actual config (gitignored)
│ ├── development/ # Development server
│ │ ├── .env.example # Template
│ │ └── .env # Actual config (gitignored)
│ └── production/ # Production server
│ ├── .env.example # Template
│ └── .env # Actual config (gitignored)
├── configs/ # Service configuration files
│ ├── traefik/ # Traefik configs
│ │ ├── config/ # Dynamic configuration
│ │ │ ├── middlewares.yml
│ │ │ ├── routers.yml
│ │ │ └── services.yml
│ │ ├── traefik.yml # Static configuration
│ │ └── .provider.env # GoDaddy API credentials (gitignored)
│ ├── grafana/ # Grafana configs
│ │ ├── dashboards/ # Dashboard JSON files
│ │ └── provisioning/ # Datasources, dashboards
│ ├── prometheus/ # Prometheus config
│ │ └── prometheus.yml
│ ├── loki/ # Loki config
│ │ └── loki-config.yml
│ ├── promtail/ # Promtail config
│ │ └── promtail-config.yml
│ ├── vault/ # Vault config
│ │ └── config/
│ └── authentik/ # Authentik bootstrap
│ ├── bootstrap.yaml
│ ├── custom-templates/
│ └── media/
├── certs/ # SSL certificates (gitignored)
│ ├── local/ # Self-signed certs
│ ├── development/ # Let's Encrypt certs
│ └── production/ # Let's Encrypt certs
├── docker/ # Dockerfile templates
│ ├── base-runtime.Dockerfile # Base image for all services
│ ├── base-ml.Dockerfile # Base image for ML services
│ └── Dockerfile.ml-service.template
└── scripts/ # Deployment and utility scripts
├── deploy.sh # Main deployment script
├── setup-networks.sh # Create Docker networks
└── cleanup.sh # Cleanup script
```
## Base Compose Files
### infrastructure.yaml
Core infrastructure services needed by the application:
- **Vault** - Secrets management
- **MinIO** - Object storage (S3-compatible)
- **PostgreSQL** - Relational database
- **Neo4j** - Graph database
- **Qdrant** - Vector database
- **Redis** - Cache and session store
- **NATS** - Message queue (with JetStream)
### services.yaml
Application microservices (14 services):
- **svc-ingestion** - Document ingestion
- **svc-extract** - Data extraction
- **svc-kg** - Knowledge graph
- **svc-rag-indexer** - RAG indexing (ML)
- **svc-rag-retriever** - RAG retrieval (ML)
- **svc-forms** - Form processing
- **svc-hmrc** - HMRC integration
- **svc-ocr** - OCR processing (ML)
- **svc-rpa** - RPA automation
- **svc-normalize-map** - Data normalization
- **svc-reason** - Reasoning engine
- **svc-firm-connectors** - Firm integrations
- **svc-coverage** - Coverage analysis
- **ui-review** - Review UI (Next.js)
### monitoring.yaml
Monitoring and observability stack:
- **Prometheus** - Metrics collection
- **Grafana** - Dashboards and visualization
- **Loki** - Log aggregation
- **Promtail** - Log collection
### external.yaml (optional)
External services that may already exist:
- **Traefik** - Reverse proxy and load balancer
- **Authentik** - SSO and authentication
- **Gitea** - Git repository and container registry
- **Nextcloud** - File storage
- **Portainer** - Docker management UI
## Environment Configurations
### Local Development
- **Domain**: `localhost` or `*.local.harkon.co.uk`
- **SSL**: Self-signed certificates
- **Auth**: Optional (can disable Authentik)
- **Registry**: Local Docker registry or Gitea
- **Passwords**: Simple (postgres, admin, etc.)
- **Purpose**: Local development and testing
- **Traefik Dashboard**: Exposed on port 8080
### Development Server
- **Domain**: `*.dev.harkon.co.uk`
- **SSL**: Let's Encrypt (DNS-01 via GoDaddy)
- **Auth**: Authentik SSO enabled
- **Registry**: Gitea container registry
- **Passwords**: Strong (auto-generated)
- **Purpose**: Staging and integration testing
- **Traefik Dashboard**: Protected by Authentik
### Production Server
- **Domain**: `*.harkon.co.uk`
- **SSL**: Let's Encrypt (DNS-01 via GoDaddy)
- **Auth**: Authentik SSO enabled
- **Registry**: Gitea container registry
- **Passwords**: Strong (auto-generated)
- **Purpose**: Production deployment
- **Traefik Dashboard**: Protected by Authentik
- **Monitoring**: Full stack enabled
## Docker Networks
All environments use two networks:
### frontend
- Public-facing services
- Connected to Traefik
- Services: UI, Grafana, Vault, MinIO console
### backend
- Internal services
- Not directly accessible
- Services: Databases, message queues, internal APIs
## Volume Naming
Volumes are named consistently across environments:
- `postgres_data`
- `neo4j_data`
- `neo4j_logs`
- `qdrant_data`
- `minio_data`
- `vault_data`
- `redis_data`
- `nats_data`
- `prometheus_data`
- `grafana_data`
- `loki_data`
## Deployment Workflow
### 1. Setup Environment
```bash
cp infra/environments/production/.env.example infra/environments/production/.env
vim infra/environments/production/.env
```
### 2. Generate Secrets
```bash
./scripts/generate-production-secrets.sh
```
### 3. Setup Networks
```bash
./infra/scripts/setup-networks.sh
```
### 4. Deploy Infrastructure
```bash
./infra/scripts/deploy.sh production infrastructure
```
### 5. Deploy Monitoring
```bash
./infra/scripts/deploy.sh production monitoring
```
### 6. Deploy Services
```bash
./infra/scripts/deploy.sh production services
```
## Key Features
### ✅ Multi-Environment Support
Single codebase deploys to local, development, and production with environment-specific configurations.
### ✅ Modular Architecture
Services split into logical groups (infrastructure, monitoring, services, external) for independent deployment.
### ✅ Unified Deployment
Single `deploy.sh` script handles all environments and stacks.
### ✅ Environment Isolation
Each environment has its own `.env` file with appropriate secrets and configurations.
### ✅ Shared Configurations
Common service configs in `configs/` directory, referenced by all environments.
### ✅ Security Best Practices
- Secrets in gitignored `.env` files
- Strong password generation
- Authentik SSO integration
- SSL/TLS everywhere (Let's Encrypt)
### ✅ Easy Maintenance
- Clear directory structure
- Comprehensive documentation
- Migration guide from old structure
- Troubleshooting guides
## Service Access
### Local
- http://localhost:3000 - Grafana
- http://localhost:9093 - MinIO
- http://localhost:8200 - Vault
- http://localhost:8080 - Traefik Dashboard
### Development
- https://grafana.dev.harkon.co.uk
- https://minio.dev.harkon.co.uk
- https://vault.dev.harkon.co.uk
- https://ui-review.dev.harkon.co.uk
### Production
- https://grafana.harkon.co.uk
- https://minio.harkon.co.uk
- https://vault.harkon.co.uk
- https://ui-review.harkon.co.uk
## Configuration Management
### Environment Variables
All configuration via environment variables in `.env` files:
- Domain settings
- Database passwords
- API keys
- OAuth secrets
- Registry credentials
### Service Configs
Static configurations in `configs/` directory:
- Traefik routing rules
- Grafana dashboards
- Prometheus scrape configs
- Loki retention policies
### Secrets Management
- Development/Production: Vault
- Local: Environment variables
- Rotation: `generate-production-secrets.sh`
## Monitoring and Observability
### Metrics (Prometheus)
- Service health
- Resource usage
- Request rates
- Error rates
### Logs (Loki)
- Centralized logging
- Query via Grafana
- Retention policies
- Log aggregation
### Dashboards (Grafana)
- Infrastructure overview
- Service metrics
- Application performance
- Business metrics
### Alerts
- Prometheus AlertManager
- Slack/Email notifications
- PagerDuty integration
## Backup Strategy
### What to Backup
- PostgreSQL database
- Neo4j graph data
- Vault secrets
- MinIO objects
- Qdrant vectors
- Grafana dashboards
### How to Backup
```bash
# Automated backup script
./scripts/backup-volumes.sh production
# Manual backup
docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine tar czf /backup/postgres.tar.gz /data
```
### Backup Schedule
- Daily: Databases
- Weekly: Full system
- Monthly: Archive
## Disaster Recovery
### Recovery Steps
1. Restore infrastructure
2. Restore volumes from backup
3. Deploy services
4. Verify functionality
5. Update DNS if needed
### RTO/RPO
- **RTO**: 4 hours (Recovery Time Objective)
- **RPO**: 24 hours (Recovery Point Objective)
## Next Steps
1. Review [DEPLOYMENT_GUIDE.md](DEPLOYMENT_GUIDE.md) for deployment instructions
2. Review [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) if migrating from old structure
3. Setup environment files
4. Deploy to local first
5. Test in development
6. Deploy to production