Files
ai-tax-agent/docs/DEPLOYMENT_PLAN.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

11 KiB

Unified Infrastructure Deployment Plan

Executive Summary

This plan outlines the strategy to host both the AI Tax Agent application and company services (Nextcloud, Gitea, Portainer, Authentik) on the remote server at 141.136.35.199 while maintaining an efficient local development workflow.

Current State Analysis

Remote Server (141.136.35.199)

  • Location: /opt/compose/
  • Existing Services:
    • Traefik v3.5.1 (reverse proxy with GoDaddy DNS challenge)
    • Authentik 2025.8.1 (SSO/Authentication)
    • Gitea 1.24.5 (Git hosting)
    • Nextcloud (Cloud storage)
    • Portainer 2.33.1 (Docker management)
  • Networks: frontend and backend (external)
  • Domain: harkon.co.uk
  • SSL: Let's Encrypt via GoDaddy DNS challenge
  • Exposed Subdomains:
    • traefik.harkon.co.uk
    • authentik.harkon.co.uk
    • gitea.harkon.co.uk
    • cloud.harkon.co.uk
    • portainer.harkon.co.uk

Local Repository (infra/compose/)

  • Compose Files:
    • docker-compose.local.yml - Full stack for local development
    • docker-compose.backend.yml - Backend services (appears to be production-ready)
  • Application Services:
    • 13+ microservices (svc-ingestion, svc-extract, svc-forms, svc-hmrc, etc.)
    • UI Review application
    • Infrastructure: Vault, MinIO, Qdrant, Neo4j, Postgres, Redis, NATS, Prometheus, Grafana, Loki
  • Networks: ai-tax-agent-frontend and ai-tax-agent-backend
  • Domain: local.lan (for development)
  • Authentication: Authentik with ForwardAuth middleware

Challenges & Conflicts

1. Duplicate Services

  • Both environments have Traefik and Authentik
  • Need to decide: shared vs. isolated

2. Network Naming

  • Remote: frontend, backend
  • Local: ai-tax-agent-frontend, ai-tax-agent-backend
  • Production needs: Consistent naming

3. Domain Management

  • Remote: *.harkon.co.uk (public)
  • Local: *.local.lan (development)
  • Production: Need subdomains like app.harkon.co.uk, api.harkon.co.uk

4. SSL Certificates

  • Remote: GoDaddy DNS challenge (production)
  • Local: Self-signed certificates
  • Production: Must use GoDaddy DNS challenge

5. Resource Isolation

  • Company services need to remain stable
  • Application services need independent deployment/rollback

Pros:

  • Single point of entry
  • Shared authentication across all services
  • Simplified SSL management
  • Cost-effective (one Traefik, one Authentik)

Cons:

  • Application deployments could affect company services
  • Requires careful configuration management

Implementation:

/opt/compose/
├── traefik/              # Shared Traefik (existing)
├── authentik/            # Shared Authentik (existing)
├── company/              # Company services
│   ├── gitea/
│   ├── nextcloud/
│   └── portainer/
└── ai-tax-agent/         # Application services
    ├── infrastructure/   # App-specific infra (Vault, MinIO, Neo4j, etc.)
    └── services/         # Microservices

Option B: Isolated Stacks

Pros:

  • Complete isolation
  • Independent scaling
  • No cross-contamination

Cons:

  • Duplicate Traefik/Authentik
  • More complex SSL management
  • Higher resource usage
  • Users need separate logins

Proposed Solution: Hybrid Approach

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    Internet (*.harkon.co.uk)                 │
└────────────────────────┬────────────────────────────────────┘
                         │
                    ┌────▼────┐
                    │ Traefik │ (Port 80/443)
                    │ v3.5.1  │
                    └────┬────┘
                         │
        ┌────────────────┼────────────────┐
        │                │                │
   ┌────▼─────┐    ┌────▼────┐     ┌────▼─────┐
   │Authentik │    │ Company │     │   App    │
   │   SSO    │    │Services │     │ Services │
   └──────────┘    └─────────┘     └──────────┘
                         │                │
                    ┌────┴────┐      ┌────┴────┐
                    │ Gitea   │      │ Vault   │
                    │Nextcloud│      │ MinIO   │
                    │Portainer│      │ Neo4j   │
                    └─────────┘      │ Qdrant  │
                                     │ Postgres│
                                     │ Redis   │
                                     │ NATS    │
                                     │ 13 SVCs │
                                     │ UI      │
                                     └─────────┘

Directory Structure

/opt/compose/
├── traefik/                    # Shared reverse proxy
│   ├── compose.yaml
│   ├── config/
│   │   ├── traefik.yaml       # Static config
│   │   ├── dynamic-company.yaml
│   │   └── dynamic-app.yaml
│   └── certs/
├── authentik/                  # Shared SSO
│   ├── compose.yaml
│   └── ...
├── company/                    # Company services namespace
│   ├── gitea/
│   │   └── compose.yaml
│   ├── nextcloud/
│   │   └── compose.yaml
│   └── portainer/
│       └── compose.yaml
└── ai-tax-agent/              # Application namespace
    ├── .env                   # Production environment
    ├── infrastructure.yaml    # Vault, MinIO, Neo4j, Qdrant, etc.
    ├── services.yaml          # All microservices
    └── monitoring.yaml        # Prometheus, Grafana, Loki

Network Strategy

Shared Networks:

  • frontend - For all services exposed via Traefik
  • backend - For internal service communication

Application-Specific Networks (optional):

  • ai-tax-agent-internal - For app-only internal communication

Domain Mapping

Company Services (existing):

  • traefik.harkon.co.uk - Traefik dashboard
  • authentik.harkon.co.uk - Authentik SSO
  • gitea.harkon.co.uk - Git hosting
  • cloud.harkon.co.uk - Nextcloud
  • portainer.harkon.co.uk - Docker management

Application Services (new):

  • app.harkon.co.uk - Review UI
  • api.harkon.co.uk - API Gateway (all microservices)
  • vault.harkon.co.uk - Vault UI (admin only)
  • minio.harkon.co.uk - MinIO Console (admin only)
  • neo4j.harkon.co.uk - Neo4j Browser (admin only)
  • qdrant.harkon.co.uk - Qdrant UI (admin only)
  • grafana.harkon.co.uk - Grafana (monitoring)
  • prometheus.harkon.co.uk - Prometheus (admin only)
  • loki.harkon.co.uk - Loki (admin only)

Authentication Strategy

Authentik Configuration:

  1. Company Group - Access to Gitea, Nextcloud, Portainer
  2. App Admin Group - Full access to all app services
  3. App User Group - Access to Review UI and API
  4. App Reviewer Group - Access to Review UI only

Middleware Configuration:

  • authentik-forwardauth - Standard auth for all services
  • admin-auth - Requires admin group (Vault, MinIO, Neo4j, etc.)
  • reviewer-auth - Requires reviewer or higher
  • rate-limit - Standard rate limiting
  • api-rate-limit - Stricter API rate limiting

Local Development Workflow

Development Environment

Keep Existing Setup:

  • Use docker-compose.local.yml as-is
  • Domain: *.local.lan
  • Self-signed certificates
  • Isolated networks: ai-tax-agent-frontend, ai-tax-agent-backend
  • Full stack runs locally

Benefits:

  • No dependency on remote server
  • Fast iteration
  • Complete isolation
  • Works offline

Development Commands

# Local development
make bootstrap          # Initial setup
make up                 # Start all services
make down               # Stop all services
make logs SERVICE=svc-ingestion

# Build and test
make build              # Build all images
make test               # Run tests
make test-integration   # Integration tests

# Deploy to production
make deploy-production  # Deploy to remote server

Production Deployment Strategy

Phase 1: Preparation (Week 1)

  1. Backup Current State

    ssh deploy@141.136.35.199
    cd /opt/compose
    tar -czf ~/backup-$(date +%Y%m%d).tar.gz .
    
  2. Create Production Environment File

    • Copy infra/compose/env.example to infra/compose/.env.production
    • Update all secrets and passwords
    • Set DOMAIN=harkon.co.uk
    • Configure GoDaddy API credentials
  3. Update Traefik Configuration

    • Merge local Traefik config with remote
    • Add application routes
    • Configure Authentik ForwardAuth
  4. Prepare Docker Images

    • Build all application images
    • Push to container registry (Gitea registry or Docker Hub)
    • Tag with version numbers

Phase 2: Infrastructure Deployment (Week 2)

  1. Deploy Application Infrastructure

    # On remote server
    cd /opt/compose/ai-tax-agent
    docker compose -f infrastructure.yaml up -d
    
  2. Initialize Services

    • Vault: Unseal and configure
    • Postgres: Run migrations
    • Neo4j: Install plugins
    • MinIO: Create buckets
  3. Configure Authentik

    • Create application groups
    • Configure OAuth providers
    • Set up ForwardAuth outpost

Phase 3: Application Deployment (Week 3)

  1. Deploy Microservices

    docker compose -f services.yaml up -d
    
  2. Deploy Monitoring

    docker compose -f monitoring.yaml up -d
    
  3. Verify Health

    • Check all service health endpoints
    • Verify Traefik routing
    • Test authentication flow

Phase 4: Testing & Validation (Week 4)

  1. Smoke Tests
  2. Integration Tests
  3. Performance Tests
  4. Security Audit

Deployment Files Structure

Create three new compose files for production:

  1. infrastructure.yaml - Vault, MinIO, Neo4j, Qdrant, Postgres, Redis, NATS
  2. services.yaml - All 13 microservices + UI
  3. monitoring.yaml - Prometheus, Grafana, Loki

Rollback Strategy

  1. Service-Level Rollback: Use Docker image tags
  2. Full Rollback: Restore from backup
  3. Gradual Rollout: Deploy services incrementally

Monitoring & Maintenance

  • Logs: Centralized in Loki
  • Metrics: Prometheus + Grafana
  • Alerts: Configure Grafana alerts
  • Backups: Daily automated backups of volumes

Next Steps

  1. Review and approve this plan
  2. Create production environment file
  3. Create production compose files
  4. Set up CI/CD pipeline for automated deployment
  5. Execute Phase 1 (Preparation)