# Isolated Stacks Deployment Plan ## Executive Summary This plan outlines the strategy to host both the **AI Tax Agent application** and **company services** (Nextcloud, Gitea, Portainer, Authentik) on the remote server at `141.136.35.199` while maintaining an efficient local development workflow. ## Current State Analysis ### Remote Server (`141.136.35.199`) - **Location**: `/opt/compose/` - **Existing Services**: - Traefik v3.5.1 (reverse proxy with GoDaddy DNS challenge) - Authentik 2025.8.1 (SSO/Authentication) - Gitea 1.24.5 (Git hosting) - Nextcloud (Cloud storage) - Portainer 2.33.1 (Docker management) - **Networks**: `frontend` and `backend` (external) - **Domain**: `harkon.co.uk` - **SSL**: Let's Encrypt via GoDaddy DNS challenge - **Exposed Subdomains**: - `traefik.harkon.co.uk` - `auth.harkon.co.uk` - `gitea.harkon.co.uk` - `cloud.harkon.co.uk` - `portainer.harkon.co.uk` ### Local Repository (`infra/compose/`) - **Compose Files**: - `docker-compose.local.yml` - Full stack for local development - `docker-compose.backend.yml` - Backend services (appears to be production-ready) - **Application Services**: - 13+ microservices (svc-ingestion, svc-extract, svc-forms, svc-hmrc, etc.) - UI Review application - Infrastructure: Vault, MinIO, Qdrant, Neo4j, Postgres, Redis, NATS, Prometheus, Grafana, Loki - **Networks**: `ai-tax-agent-frontend` and `ai-tax-agent-backend` - **Domain**: `local.lan` (for development) - **Authentication**: Authentik with ForwardAuth middleware ## Challenges & Conflicts ### 1. **Duplicate Services** - Both environments have Traefik and Authentik - Need to decide: shared vs. isolated ### 2. **Network Naming** - Remote: `frontend`, `backend` - Local: `ai-tax-agent-frontend`, `ai-tax-agent-backend` - Production needs: Consistent naming ### 3. **Domain Management** - Remote: `*.harkon.co.uk` (public) - Local: `*.local.lan` (development) - Production: Need subdomains like `app.harkon.co.uk`, `api.harkon.co.uk` ### 4. **SSL Certificates** - Remote: GoDaddy DNS challenge (production) - Local: Self-signed certificates - Production: Must use GoDaddy DNS challenge ### 5. **Resource Isolation** - Company services need to remain stable - Application services need independent deployment/rollback # Decision: Keep Stacks Completely Separate We will deploy the company services and the AI Tax Agent as two fully isolated stacks, each with its own Traefik and Authentik. This maximizes blast-radius isolation and avoids naming and DNS conflicts across environments. Key implications: - Separate external networks and DNS namespaces per stack - Duplicate edge (Traefik) and IdP (Authentik), independent upgrades and rollbacks - Slightly higher resource usage in exchange for strong isolation ### Architecture Overview ``` ┌─────────────────────────────────────────────────────────────┐ │ Internet (*.harkon.co.uk) │ └────────────────────────┬────────────────────────────────────┘ │ ┌────▼────┐ │ Traefik │ (Port 80/443) │ v3.5.1 │ └────┬────┘ │ ┌────────────────┼────────────────┐ │ │ │ ┌────▼─────┐ ┌────▼────┐ ┌────▼─────┐ │Authentik │ │ Company │ │ App │ │ SSO │ │Services │ │ Services │ └──────────┘ └─────────┘ └──────────┘ │ │ ┌────┴────┐ ┌────┴────┐ │ Gitea │ │ Vault │ │Nextcloud│ │ MinIO │ │Portainer│ │ Neo4j │ └─────────┘ │ Qdrant │ │ Postgres│ │ Redis │ │ NATS │ │ 13 SVCs │ │ UI │ └─────────┘ ``` ### Directory Structure (per stack) ``` /opt/compose// ├── traefik/ # Stack-local reverse proxy │ ├── compose.yaml │ ├── config/ │ │ ├── traefik.yaml # Static config │ │ ├── dynamic-company.yaml │ │ └── dynamic-app.yaml │ └── certs/ ├── authentik/ # Stack-local SSO │ ├── compose.yaml │ └── ... ├── company/ # Company services namespace │ ├── gitea/ │ │ └── compose.yaml │ ├── nextcloud/ │ │ └── compose.yaml │ └── portainer/ │ └── compose.yaml └── ai-tax-agent/ # Application namespace (if this is the app stack) ├── .env # Production environment ├── infrastructure.yaml # Vault, MinIO, Neo4j, Qdrant, etc. ├── services.yaml # All microservices └── monitoring.yaml # Prometheus, Grafana, Loki ``` ### Network Strategy - Use stack-scoped network names to avoid collisions: `apa-frontend`, `apa-backend`. - Only attach services that must be public to `apa-frontend`. - Keep internal communication on `apa-backend`. ### Domain Mapping **Company Services** (existing): - `traefik.harkon.co.uk` - Traefik dashboard - `auth.harkon.co.uk` - Authentik SSO - `gitea.harkon.co.uk` - Git hosting - `cloud.harkon.co.uk` - Nextcloud - `portainer.harkon.co.uk` - Docker management **Application Services** (app stack): - `review.` - Review UI - `api.` - API Gateway (microservices via Traefik) - `vault.` - Vault UI (admin only) - `minio.` - MinIO Console (admin only) - `neo4j.` - Neo4j Browser (admin only) - `qdrant.` - Qdrant UI (admin only) - `grafana.` - Grafana (monitoring) - `prometheus.` - Prometheus (admin only) - `loki.` - Loki (admin only) ### Authentication Strategy **Authentik Configuration**: 1. **Company Group** - Access to Gitea, Nextcloud, Portainer 2. **App Admin Group** - Full access to all app services 3. **App User Group** - Access to Review UI and API 4. **App Reviewer Group** - Access to Review UI only **Middleware Configuration**: - `authentik-forwardauth` - Standard auth for all services - `admin-auth` - Requires admin group (Vault, MinIO, Neo4j, etc.) - `reviewer-auth` - Requires reviewer or higher - `rate-limit` - Standard rate limiting - `api-rate-limit` - Stricter API rate limiting ## Implementation Notes - infra/base/infrastructure.yaml now includes Traefik and Authentik in the infrastructure stack with stack-scoped networks and service names. - All infrastructure component service keys and container names use the `apa-` prefix to avoid DNS collisions on shared Docker hosts. - Traefik static and dynamic configs live under `infra/base/traefik/config/`. ## Local Development Workflow ### Development Environment **Keep Existing Setup**: - Use `docker-compose.local.yml` as-is - Domain: `*.local.lan` - Self-signed certificates - Isolated networks: `ai-tax-agent-frontend`, `ai-tax-agent-backend` - Full stack runs locally **Benefits**: - No dependency on remote server - Fast iteration - Complete isolation - Works offline ### Development Commands ```bash # Local development make bootstrap # Initial setup make up # Start all services make down # Stop all services make logs SERVICE=svc-ingestion # Build and test make build # Build all images make test # Run tests make test-integration # Integration tests # Deploy to production make deploy-production # Deploy to remote server ``` ## Production Deployment Strategy ### Phase 1: Preparation (Week 1) 1. **Backup Current State** ```bash ssh deploy@141.136.35.199 cd /opt tar -czf ~/backup-$(date +%Y%m%d).tar.gz . ``` 2. **Create Production Environment File** - Copy `infra/environments/production/.env.example` to `infra/environments/production/.env` - Update all secrets and passwords - Set `DOMAIN=harkon.co.uk` - Configure GoDaddy API credentials 3. **Update Traefik Configuration** - Merge local Traefik config with remote - Add application routes - Configure Authentik ForwardAuth 4. **Prepare Docker Images** - Build all application images - Push to container registry (Gitea registry or Docker Hub) - Tag with version numbers ### Phase 2: Infrastructure Deployment (Week 2) 1. **Deploy Application Infrastructure** ```bash # On remote server cd /opt/ai-tax-agent docker compose -f infrastructure.yaml up -d ``` 2. **Initialize Services** - Vault: Unseal and configure - Postgres: Run migrations - Neo4j: Install plugins - MinIO: Create buckets 3. **Configure Authentik** - Create application groups - Configure OAuth providers - Set up ForwardAuth outpost ### Phase 3: Application Deployment (Week 3) 1. **Deploy Microservices** ```bash docker compose -f services.yaml up -d ``` 2. **Deploy Monitoring** ```bash docker compose -f monitoring.yaml up -d ``` 3. **Verify Health** - Check all service health endpoints - Verify Traefik routing - Test authentication flow ### Phase 4: Testing & Validation (Week 4) 1. **Smoke Tests** 2. **Integration Tests** 3. **Performance Tests** 4. **Security Audit** ## Deployment Files Structure Create three new compose files for production: 1. **`infrastructure.yaml`** - Vault, MinIO, Neo4j, Qdrant, Postgres, Redis, NATS 2. **`services.yaml`** - All 13 microservices + UI 3. **`monitoring.yaml`** - Prometheus, Grafana, Loki ## Rollback Strategy 1. **Service-Level Rollback**: Use Docker image tags 2. **Full Rollback**: Restore from backup 3. **Gradual Rollout**: Deploy services incrementally ## Monitoring & Maintenance - **Logs**: Centralized in Loki - **Metrics**: Prometheus + Grafana - **Alerts**: Configure Grafana alerts - **Backups**: Daily automated backups of volumes ## Next Steps 1. Review and approve this plan 2. Create production environment file 3. Create production compose files 4. Set up CI/CD pipeline for automated deployment 5. Execute Phase 1 (Preparation)