Files
ai-tax-agent/infra/STRUCTURE_OVERVIEW.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

10 KiB

Infrastructure Structure Overview

New Multi-Environment Structure

infra/
├── README.md                      # Main infrastructure documentation
├── DEPLOYMENT_GUIDE.md            # Complete deployment guide
├── MIGRATION_GUIDE.md             # Migration from old structure
├── STRUCTURE_OVERVIEW.md          # This file
│
├── base/                          # Base compose files (environment-agnostic)
│   ├── infrastructure.yaml        # Core infrastructure services
│   ├── services.yaml              # Application microservices
│   ├── monitoring.yaml            # Monitoring stack
│   └── external.yaml              # External services (Traefik, Authentik, etc.)
│
├── environments/                  # Environment-specific configurations
│   ├── local/                     # Local development
│   │   ├── .env.example          # Template
│   │   └── .env                  # Actual config (gitignored)
│   ├── development/               # Development server
│   │   ├── .env.example          # Template
│   │   └── .env                  # Actual config (gitignored)
│   └── production/                # Production server
│       ├── .env.example          # Template
│       └── .env                  # Actual config (gitignored)
│
├── configs/                       # Service configuration files
│   ├── traefik/                  # Traefik configs
│   │   ├── config/               # Dynamic configuration
│   │   │   ├── middlewares.yml
│   │   │   ├── routers.yml
│   │   │   └── services.yml
│   │   ├── traefik.yml           # Static configuration
│   │   └── .provider.env         # GoDaddy API credentials (gitignored)
│   ├── grafana/                  # Grafana configs
│   │   ├── dashboards/           # Dashboard JSON files
│   │   └── provisioning/         # Datasources, dashboards
│   ├── prometheus/               # Prometheus config
│   │   └── prometheus.yml
│   ├── loki/                     # Loki config
│   │   └── loki-config.yml
│   ├── promtail/                 # Promtail config
│   │   └── promtail-config.yml
│   ├── vault/                    # Vault config
│   │   └── config/
│   └── authentik/                # Authentik bootstrap
│       ├── bootstrap.yaml
│       ├── custom-templates/
│       └── media/
│
├── certs/                        # SSL certificates (gitignored)
│   ├── local/                    # Self-signed certs
│   ├── development/              # Let's Encrypt certs
│   └── production/               # Let's Encrypt certs
│
├── docker/                       # Dockerfile templates
│   ├── base-runtime.Dockerfile   # Base image for all services
│   ├── base-ml.Dockerfile        # Base image for ML services
│   └── Dockerfile.ml-service.template
│
└── scripts/                      # Deployment and utility scripts
    ├── deploy.sh                 # Main deployment script
    ├── setup-networks.sh         # Create Docker networks
    └── cleanup.sh                # Cleanup script

Base Compose Files

infrastructure.yaml

Core infrastructure services needed by the application:

  • Vault - Secrets management
  • MinIO - Object storage (S3-compatible)
  • PostgreSQL - Relational database
  • Neo4j - Graph database
  • Qdrant - Vector database
  • Redis - Cache and session store
  • NATS - Message queue (with JetStream)

services.yaml

Application microservices (14 services):

  • svc-ingestion - Document ingestion
  • svc-extract - Data extraction
  • svc-kg - Knowledge graph
  • svc-rag-indexer - RAG indexing (ML)
  • svc-rag-retriever - RAG retrieval (ML)
  • svc-forms - Form processing
  • svc-hmrc - HMRC integration
  • svc-ocr - OCR processing (ML)
  • svc-rpa - RPA automation
  • svc-normalize-map - Data normalization
  • svc-reason - Reasoning engine
  • svc-firm-connectors - Firm integrations
  • svc-coverage - Coverage analysis
  • ui-review - Review UI (Next.js)

monitoring.yaml

Monitoring and observability stack:

  • Prometheus - Metrics collection
  • Grafana - Dashboards and visualization
  • Loki - Log aggregation
  • Promtail - Log collection

external.yaml (optional)

External services that may already exist:

  • Traefik - Reverse proxy and load balancer
  • Authentik - SSO and authentication
  • Gitea - Git repository and container registry
  • Nextcloud - File storage
  • Portainer - Docker management UI

Environment Configurations

Local Development

  • Domain: localhost or *.local.harkon.co.uk
  • SSL: Self-signed certificates
  • Auth: Optional (can disable Authentik)
  • Registry: Local Docker registry or Gitea
  • Passwords: Simple (postgres, admin, etc.)
  • Purpose: Local development and testing
  • Traefik Dashboard: Exposed on port 8080

Development Server

  • Domain: *.dev.harkon.co.uk
  • SSL: Let's Encrypt (DNS-01 via GoDaddy)
  • Auth: Authentik SSO enabled
  • Registry: Gitea container registry
  • Passwords: Strong (auto-generated)
  • Purpose: Staging and integration testing
  • Traefik Dashboard: Protected by Authentik

Production Server

  • Domain: *.harkon.co.uk
  • SSL: Let's Encrypt (DNS-01 via GoDaddy)
  • Auth: Authentik SSO enabled
  • Registry: Gitea container registry
  • Passwords: Strong (auto-generated)
  • Purpose: Production deployment
  • Traefik Dashboard: Protected by Authentik
  • Monitoring: Full stack enabled

Docker Networks

All environments use two networks:

frontend

  • Public-facing services
  • Connected to Traefik
  • Services: UI, Grafana, Vault, MinIO console

backend

  • Internal services
  • Not directly accessible
  • Services: Databases, message queues, internal APIs

Volume Naming

Volumes are named consistently across environments:

  • postgres_data
  • neo4j_data
  • neo4j_logs
  • qdrant_data
  • minio_data
  • vault_data
  • redis_data
  • nats_data
  • prometheus_data
  • grafana_data
  • loki_data

Deployment Workflow

1. Setup Environment

cp infra/environments/production/.env.example infra/environments/production/.env
vim infra/environments/production/.env

2. Generate Secrets

./scripts/generate-production-secrets.sh

3. Setup Networks

./infra/scripts/setup-networks.sh

4. Deploy Infrastructure

./infra/scripts/deploy.sh production infrastructure

5. Deploy Monitoring

./infra/scripts/deploy.sh production monitoring

6. Deploy Services

./infra/scripts/deploy.sh production services

Key Features

Multi-Environment Support

Single codebase deploys to local, development, and production with environment-specific configurations.

Modular Architecture

Services split into logical groups (infrastructure, monitoring, services, external) for independent deployment.

Unified Deployment

Single deploy.sh script handles all environments and stacks.

Environment Isolation

Each environment has its own .env file with appropriate secrets and configurations.

Shared Configurations

Common service configs in configs/ directory, referenced by all environments.

Security Best Practices

  • Secrets in gitignored .env files
  • Strong password generation
  • Authentik SSO integration
  • SSL/TLS everywhere (Let's Encrypt)

Easy Maintenance

  • Clear directory structure
  • Comprehensive documentation
  • Migration guide from old structure
  • Troubleshooting guides

Service Access

Local

Development

Production

Configuration Management

Environment Variables

All configuration via environment variables in .env files:

  • Domain settings
  • Database passwords
  • API keys
  • OAuth secrets
  • Registry credentials

Service Configs

Static configurations in configs/ directory:

  • Traefik routing rules
  • Grafana dashboards
  • Prometheus scrape configs
  • Loki retention policies

Secrets Management

  • Development/Production: Vault
  • Local: Environment variables
  • Rotation: generate-production-secrets.sh

Monitoring and Observability

Metrics (Prometheus)

  • Service health
  • Resource usage
  • Request rates
  • Error rates

Logs (Loki)

  • Centralized logging
  • Query via Grafana
  • Retention policies
  • Log aggregation

Dashboards (Grafana)

  • Infrastructure overview
  • Service metrics
  • Application performance
  • Business metrics

Alerts

  • Prometheus AlertManager
  • Slack/Email notifications
  • PagerDuty integration

Backup Strategy

What to Backup

  • PostgreSQL database
  • Neo4j graph data
  • Vault secrets
  • MinIO objects
  • Qdrant vectors
  • Grafana dashboards

How to Backup

# Automated backup script
./scripts/backup-volumes.sh production

# Manual backup
docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine tar czf /backup/postgres.tar.gz /data

Backup Schedule

  • Daily: Databases
  • Weekly: Full system
  • Monthly: Archive

Disaster Recovery

Recovery Steps

  1. Restore infrastructure
  2. Restore volumes from backup
  3. Deploy services
  4. Verify functionality
  5. Update DNS if needed

RTO/RPO

  • RTO: 4 hours (Recovery Time Objective)
  • RPO: 24 hours (Recovery Point Objective)

Next Steps

  1. Review DEPLOYMENT_GUIDE.md for deployment instructions
  2. Review MIGRATION_GUIDE.md if migrating from old structure
  3. Setup environment files
  4. Deploy to local first
  5. Test in development
  6. Deploy to production