Files
ai-tax-agent/docs/DEPLOYMENT_PLAN.md
harkon f0f7674b8d
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
clean up base infra
2025-10-11 11:42:43 +01:00

11 KiB

Isolated Stacks Deployment Plan

Executive Summary

This plan outlines the strategy to host both the AI Tax Agent application and company services (Nextcloud, Gitea, Portainer, Authentik) on the remote server at 141.136.35.199 while maintaining an efficient local development workflow.

Current State Analysis

Remote Server (141.136.35.199)

  • Location: /opt/compose/
  • Existing Services:
    • Traefik v3.5.1 (reverse proxy with GoDaddy DNS challenge)
    • Authentik 2025.8.1 (SSO/Authentication)
    • Gitea 1.24.5 (Git hosting)
    • Nextcloud (Cloud storage)
    • Portainer 2.33.1 (Docker management)
  • Networks: frontend and backend (external)
  • Domain: harkon.co.uk
  • SSL: Let's Encrypt via GoDaddy DNS challenge
  • Exposed Subdomains:
    • traefik.harkon.co.uk
    • auth.harkon.co.uk
    • gitea.harkon.co.uk
    • cloud.harkon.co.uk
    • portainer.harkon.co.uk

Local Repository (infra/compose/)

  • Compose Files:
    • docker-compose.local.yml - Full stack for local development
    • docker-compose.backend.yml - Backend services (appears to be production-ready)
  • Application Services:
    • 13+ microservices (svc-ingestion, svc-extract, svc-forms, svc-hmrc, etc.)
    • UI Review application
    • Infrastructure: Vault, MinIO, Qdrant, Neo4j, Postgres, Redis, NATS, Prometheus, Grafana, Loki
  • Networks: ai-tax-agent-frontend and ai-tax-agent-backend
  • Domain: local.lan (for development)
  • Authentication: Authentik with ForwardAuth middleware

Challenges & Conflicts

1. Duplicate Services

  • Both environments have Traefik and Authentik
  • Need to decide: shared vs. isolated

2. Network Naming

  • Remote: frontend, backend
  • Local: ai-tax-agent-frontend, ai-tax-agent-backend
  • Production needs: Consistent naming

3. Domain Management

  • Remote: *.harkon.co.uk (public)
  • Local: *.local.lan (development)
  • Production: Need subdomains like app.harkon.co.uk, api.harkon.co.uk

4. SSL Certificates

  • Remote: GoDaddy DNS challenge (production)
  • Local: Self-signed certificates
  • Production: Must use GoDaddy DNS challenge

5. Resource Isolation

  • Company services need to remain stable
  • Application services need independent deployment/rollback

Decision: Keep Stacks Completely Separate

We will deploy the company services and the AI Tax Agent as two fully isolated stacks, each with its own Traefik and Authentik. This maximizes blast-radius isolation and avoids naming and DNS conflicts across environments.

Key implications:

  • Separate external networks and DNS namespaces per stack
  • Duplicate edge (Traefik) and IdP (Authentik), independent upgrades and rollbacks
  • Slightly higher resource usage in exchange for strong isolation

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    Internet (*.harkon.co.uk)                 │
└────────────────────────┬────────────────────────────────────┘
                         │
                    ┌────▼────┐
                    │ Traefik │ (Port 80/443)
                    │ v3.5.1  │
                    └────┬────┘
                         │
        ┌────────────────┼────────────────┐
        │                │                │
   ┌────▼─────┐    ┌────▼────┐     ┌────▼─────┐
   │Authentik │    │ Company │     │   App    │
   │   SSO    │    │Services │     │ Services │
   └──────────┘    └─────────┘     └──────────┘
                         │                │
                    ┌────┴────┐      ┌────┴────┐
                    │ Gitea   │      │ Vault   │
                    │Nextcloud│      │ MinIO   │
                    │Portainer│      │ Neo4j   │
                    └─────────┘      │ Qdrant  │
                                     │ Postgres│
                                     │ Redis   │
                                     │ NATS    │
                                     │ 13 SVCs │
                                     │ UI      │
                                     └─────────┘

Directory Structure (per stack)

/opt/compose/<stack>/
├── traefik/                    # Stack-local reverse proxy
│   ├── compose.yaml
│   ├── config/
│   │   ├── traefik.yaml       # Static config
│   │   ├── dynamic-company.yaml
│   │   └── dynamic-app.yaml
│   └── certs/
├── authentik/                  # Stack-local SSO
│   ├── compose.yaml
│   └── ...
├── company/                    # Company services namespace
│   ├── gitea/
│   │   └── compose.yaml
│   ├── nextcloud/
│   │   └── compose.yaml
│   └── portainer/
│       └── compose.yaml
└── ai-tax-agent/              # Application namespace (if this is the app stack)
    ├── .env                   # Production environment
    ├── infrastructure.yaml    # Vault, MinIO, Neo4j, Qdrant, etc.
    ├── services.yaml          # All microservices
    └── monitoring.yaml        # Prometheus, Grafana, Loki

Network Strategy

  • Use stack-scoped network names to avoid collisions: apa-frontend, apa-backend.
  • Only attach services that must be public to apa-frontend.
  • Keep internal communication on apa-backend.

Domain Mapping

Company Services (existing):

  • traefik.harkon.co.uk - Traefik dashboard
  • auth.harkon.co.uk - Authentik SSO
  • gitea.harkon.co.uk - Git hosting
  • cloud.harkon.co.uk - Nextcloud
  • portainer.harkon.co.uk - Docker management

Application Services (app stack):

  • review.<domain> - Review UI
  • api.<domain> - API Gateway (microservices via Traefik)
  • vault.<domain> - Vault UI (admin only)
  • minio.<domain> - MinIO Console (admin only)
  • neo4j.<domain> - Neo4j Browser (admin only)
  • qdrant.<domain> - Qdrant UI (admin only)
  • grafana.<domain> - Grafana (monitoring)
  • prometheus.<domain> - Prometheus (admin only)
  • loki.<domain> - Loki (admin only)

Authentication Strategy

Authentik Configuration:

  1. Company Group - Access to Gitea, Nextcloud, Portainer
  2. App Admin Group - Full access to all app services
  3. App User Group - Access to Review UI and API
  4. App Reviewer Group - Access to Review UI only

Middleware Configuration:

  • authentik-forwardauth - Standard auth for all services
  • admin-auth - Requires admin group (Vault, MinIO, Neo4j, etc.)
  • reviewer-auth - Requires reviewer or higher
  • rate-limit - Standard rate limiting
  • api-rate-limit - Stricter API rate limiting

Implementation Notes

  • infra/base/infrastructure.yaml now includes Traefik and Authentik in the infrastructure stack with stack-scoped networks and service names.
  • All infrastructure component service keys and container names use the apa- prefix to avoid DNS collisions on shared Docker hosts.
  • Traefik static and dynamic configs live under infra/base/traefik/config/.

Local Development Workflow

Development Environment

Keep Existing Setup:

  • Use docker-compose.local.yml as-is
  • Domain: *.local.lan
  • Self-signed certificates
  • Isolated networks: ai-tax-agent-frontend, ai-tax-agent-backend
  • Full stack runs locally

Benefits:

  • No dependency on remote server
  • Fast iteration
  • Complete isolation
  • Works offline

Development Commands

# Local development
make bootstrap          # Initial setup
make up                 # Start all services
make down               # Stop all services
make logs SERVICE=svc-ingestion

# Build and test
make build              # Build all images
make test               # Run tests
make test-integration   # Integration tests

# Deploy to production
make deploy-production  # Deploy to remote server

Production Deployment Strategy

Phase 1: Preparation (Week 1)

  1. Backup Current State

    ssh deploy@141.136.35.199
    cd /opt/compose
    tar -czf ~/backup-$(date +%Y%m%d).tar.gz .
    
  2. Create Production Environment File

    • Copy infra/compose/env.example to infra/compose/.env.production
    • Update all secrets and passwords
    • Set DOMAIN=harkon.co.uk
    • Configure GoDaddy API credentials
  3. Update Traefik Configuration

    • Merge local Traefik config with remote
    • Add application routes
    • Configure Authentik ForwardAuth
  4. Prepare Docker Images

    • Build all application images
    • Push to container registry (Gitea registry or Docker Hub)
    • Tag with version numbers

Phase 2: Infrastructure Deployment (Week 2)

  1. Deploy Application Infrastructure

    # On remote server
    cd /opt/compose/ai-tax-agent
    docker compose -f infrastructure.yaml up -d
    
  2. Initialize Services

    • Vault: Unseal and configure
    • Postgres: Run migrations
    • Neo4j: Install plugins
    • MinIO: Create buckets
  3. Configure Authentik

    • Create application groups
    • Configure OAuth providers
    • Set up ForwardAuth outpost

Phase 3: Application Deployment (Week 3)

  1. Deploy Microservices

    docker compose -f services.yaml up -d
    
  2. Deploy Monitoring

    docker compose -f monitoring.yaml up -d
    
  3. Verify Health

    • Check all service health endpoints
    • Verify Traefik routing
    • Test authentication flow

Phase 4: Testing & Validation (Week 4)

  1. Smoke Tests
  2. Integration Tests
  3. Performance Tests
  4. Security Audit

Deployment Files Structure

Create three new compose files for production:

  1. infrastructure.yaml - Vault, MinIO, Neo4j, Qdrant, Postgres, Redis, NATS
  2. services.yaml - All 13 microservices + UI
  3. monitoring.yaml - Prometheus, Grafana, Loki

Rollback Strategy

  1. Service-Level Rollback: Use Docker image tags
  2. Full Rollback: Restore from backup
  3. Gradual Rollout: Deploy services incrementally

Monitoring & Maintenance

  • Logs: Centralized in Loki
  • Metrics: Prometheus + Grafana
  • Alerts: Configure Grafana alerts
  • Backups: Daily automated backups of volumes

Next Steps

  1. Review and approve this plan
  2. Create production environment file
  3. Create production compose files
  4. Set up CI/CD pipeline for automated deployment
  5. Execute Phase 1 (Preparation)