# Infrastructure Structure Overview ## New Multi-Environment Structure ``` infra/ ├── README.md # Main infrastructure documentation ├── DEPLOYMENT_GUIDE.md # Complete deployment guide ├── MIGRATION_GUIDE.md # Migration from old structure ├── STRUCTURE_OVERVIEW.md # This file │ ├── base/ # Base compose files (environment-agnostic) │ ├── infrastructure.yaml # Core infrastructure services │ ├── services.yaml # Application microservices │ ├── monitoring.yaml # Monitoring stack │ └── external.yaml # External services (Traefik, Authentik, etc.) │ ├── environments/ # Environment-specific configurations │ ├── local/ # Local development │ │ ├── .env.example # Template │ │ └── .env # Actual config (gitignored) │ ├── development/ # Development server │ │ ├── .env.example # Template │ │ └── .env # Actual config (gitignored) │ └── production/ # Production server │ ├── .env.example # Template │ └── .env # Actual config (gitignored) │ ├── configs/ # Service configuration files │ ├── traefik/ # Traefik configs │ │ ├── config/ # Dynamic configuration │ │ │ ├── middlewares.yml │ │ │ ├── routers.yml │ │ │ └── services.yml │ │ ├── traefik.yml # Static configuration │ │ └── .provider.env # GoDaddy API credentials (gitignored) │ ├── grafana/ # Grafana configs │ │ ├── dashboards/ # Dashboard JSON files │ │ └── provisioning/ # Datasources, dashboards │ ├── prometheus/ # Prometheus config │ │ └── prometheus.yml │ ├── loki/ # Loki config │ │ └── loki-config.yml │ ├── promtail/ # Promtail config │ │ └── promtail-config.yml │ ├── vault/ # Vault config │ │ └── config/ │ └── authentik/ # Authentik bootstrap │ ├── bootstrap.yaml │ ├── custom-templates/ │ └── media/ │ ├── certs/ # SSL certificates (gitignored) │ ├── local/ # Self-signed certs │ ├── development/ # Let's Encrypt certs │ └── production/ # Let's Encrypt certs │ ├── docker/ # Dockerfile templates │ ├── base-runtime.Dockerfile # Base image for all services │ ├── base-ml.Dockerfile # Base image for ML services │ └── Dockerfile.ml-service.template │ └── scripts/ # Deployment and utility scripts ├── deploy.sh # Main deployment script ├── setup-networks.sh # Create Docker networks └── cleanup.sh # Cleanup script ``` ## Base Compose Files ### infrastructure.yaml Core infrastructure services needed by the application: - **Vault** - Secrets management - **MinIO** - Object storage (S3-compatible) - **PostgreSQL** - Relational database - **Neo4j** - Graph database - **Qdrant** - Vector database - **Redis** - Cache and session store - **NATS** - Message queue (with JetStream) ### services.yaml Application microservices (14 services): - **svc-ingestion** - Document ingestion - **svc-extract** - Data extraction - **svc-kg** - Knowledge graph - **svc-rag-indexer** - RAG indexing (ML) - **svc-rag-retriever** - RAG retrieval (ML) - **svc-forms** - Form processing - **svc-hmrc** - HMRC integration - **svc-ocr** - OCR processing (ML) - **svc-rpa** - RPA automation - **svc-normalize-map** - Data normalization - **svc-reason** - Reasoning engine - **svc-firm-connectors** - Firm integrations - **svc-coverage** - Coverage analysis - **ui-review** - Review UI (Next.js) ### monitoring.yaml Monitoring and observability stack: - **Prometheus** - Metrics collection - **Grafana** - Dashboards and visualization - **Loki** - Log aggregation - **Promtail** - Log collection ### external.yaml (optional) External services that may already exist: - **Traefik** - Reverse proxy and load balancer - **Authentik** - SSO and authentication - **Gitea** - Git repository and container registry - **Nextcloud** - File storage - **Portainer** - Docker management UI ## Environment Configurations ### Local Development - **Domain**: `localhost` or `*.local.harkon.co.uk` - **SSL**: Self-signed certificates - **Auth**: Optional (can disable Authentik) - **Registry**: Local Docker registry or Gitea - **Passwords**: Simple (postgres, admin, etc.) - **Purpose**: Local development and testing - **Traefik Dashboard**: Exposed on port 8080 ### Development Server - **Domain**: `*.dev.harkon.co.uk` - **SSL**: Let's Encrypt (DNS-01 via GoDaddy) - **Auth**: Authentik SSO enabled - **Registry**: Gitea container registry - **Passwords**: Strong (auto-generated) - **Purpose**: Staging and integration testing - **Traefik Dashboard**: Protected by Authentik ### Production Server - **Domain**: `*.harkon.co.uk` - **SSL**: Let's Encrypt (DNS-01 via GoDaddy) - **Auth**: Authentik SSO enabled - **Registry**: Gitea container registry - **Passwords**: Strong (auto-generated) - **Purpose**: Production deployment - **Traefik Dashboard**: Protected by Authentik - **Monitoring**: Full stack enabled ## Docker Networks All environments use two networks: ### frontend - Public-facing services - Connected to Traefik - Services: UI, Grafana, Vault, MinIO console ### backend - Internal services - Not directly accessible - Services: Databases, message queues, internal APIs ## Volume Naming Volumes are named consistently across environments: - `postgres_data` - `neo4j_data` - `neo4j_logs` - `qdrant_data` - `minio_data` - `vault_data` - `redis_data` - `nats_data` - `prometheus_data` - `grafana_data` - `loki_data` ## Deployment Workflow ### 1. Setup Environment ```bash cp infra/environments/production/.env.example infra/environments/production/.env vim infra/environments/production/.env ``` ### 2. Generate Secrets ```bash ./scripts/generate-production-secrets.sh ``` ### 3. Setup Networks ```bash ./infra/scripts/setup-networks.sh ``` ### 4. Deploy Infrastructure ```bash ./infra/scripts/deploy.sh production infrastructure ``` ### 5. Deploy Monitoring ```bash ./infra/scripts/deploy.sh production monitoring ``` ### 6. Deploy Services ```bash ./infra/scripts/deploy.sh production services ``` ## Key Features ### ✅ Multi-Environment Support Single codebase deploys to local, development, and production with environment-specific configurations. ### ✅ Modular Architecture Services split into logical groups (infrastructure, monitoring, services, external) for independent deployment. ### ✅ Unified Deployment Single `deploy.sh` script handles all environments and stacks. ### ✅ Environment Isolation Each environment has its own `.env` file with appropriate secrets and configurations. ### ✅ Shared Configurations Common service configs in `configs/` directory, referenced by all environments. ### ✅ Security Best Practices - Secrets in gitignored `.env` files - Strong password generation - Authentik SSO integration - SSL/TLS everywhere (Let's Encrypt) ### ✅ Easy Maintenance - Clear directory structure - Comprehensive documentation - Migration guide from old structure - Troubleshooting guides ## Service Access ### Local - http://localhost:3000 - Grafana - http://localhost:9093 - MinIO - http://localhost:8200 - Vault - http://localhost:8080 - Traefik Dashboard ### Development - https://grafana.dev.harkon.co.uk - https://minio.dev.harkon.co.uk - https://vault.dev.harkon.co.uk - https://ui-review.dev.harkon.co.uk ### Production - https://grafana.harkon.co.uk - https://minio.harkon.co.uk - https://vault.harkon.co.uk - https://ui-review.harkon.co.uk ## Configuration Management ### Environment Variables All configuration via environment variables in `.env` files: - Domain settings - Database passwords - API keys - OAuth secrets - Registry credentials ### Service Configs Static configurations in `configs/` directory: - Traefik routing rules - Grafana dashboards - Prometheus scrape configs - Loki retention policies ### Secrets Management - Development/Production: Vault - Local: Environment variables - Rotation: `generate-production-secrets.sh` ## Monitoring and Observability ### Metrics (Prometheus) - Service health - Resource usage - Request rates - Error rates ### Logs (Loki) - Centralized logging - Query via Grafana - Retention policies - Log aggregation ### Dashboards (Grafana) - Infrastructure overview - Service metrics - Application performance - Business metrics ### Alerts - Prometheus AlertManager - Slack/Email notifications - PagerDuty integration ## Backup Strategy ### What to Backup - PostgreSQL database - Neo4j graph data - Vault secrets - MinIO objects - Qdrant vectors - Grafana dashboards ### How to Backup ```bash # Automated backup script ./scripts/backup-volumes.sh production # Manual backup docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine tar czf /backup/postgres.tar.gz /data ``` ### Backup Schedule - Daily: Databases - Weekly: Full system - Monthly: Archive ## Disaster Recovery ### Recovery Steps 1. Restore infrastructure 2. Restore volumes from backup 3. Deploy services 4. Verify functionality 5. Update DNS if needed ### RTO/RPO - **RTO**: 4 hours (Recovery Time Objective) - **RPO**: 24 hours (Recovery Point Objective) ## Next Steps 1. Review [DEPLOYMENT_GUIDE.md](DEPLOYMENT_GUIDE.md) for deployment instructions 2. Review [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) if migrating from old structure 3. Setup environment files 4. Deploy to local first 5. Test in development 6. Deploy to production