Files
ai-tax-agent/docs/DEPLOYMENT_CHECKLIST.md
harkon f0f7674b8d
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
clean up base infra
2025-10-11 11:42:43 +01:00

9.4 KiB

Deployment Checklist

Pre-Deployment Checklist

Local Development

  • Docker and Docker Compose installed
  • Git repository cloned
  • Environment file created: cp infra/environments/local/.env.example infra/environments/local/.env
  • Docker networks created: ./infra/scripts/setup-networks.sh
  • Sufficient disk space (10GB+)

Development Server

  • Server accessible via SSH
  • Docker and Docker Compose installed on server
  • Domain configured: *.dev.harkon.co.uk
  • DNS records pointing to server
  • GoDaddy API credentials available
  • Environment file created: cp infra/environments/development/.env.example infra/environments/development/.env
  • Secrets generated: ./scripts/generate-secrets.sh
  • Docker networks created: ./infra/scripts/setup-networks.sh

Production Server

  • Server accessible via SSH (deploy@141.136.35.199)
  • Docker and Docker Compose installed
  • Domain configured: *.harkon.co.uk
  • DNS records verified
  • GoDaddy API credentials configured
  • Environment file exists: infra/environments/production/.env
  • All secrets verified (no CHANGE_ME values)
  • Docker networks created: ./infra/scripts/setup-networks.sh
  • Backup of existing data (if migrating)

Deployment Checklist

Phase 1: External Services (Production Only)

Traefik

  • Navigate to: cd /opt/ai-tax-agent/infra/compose/traefik
  • Verify config: cat config/traefik.yaml
  • Verify provider credentials: cat .provider.env
  • Deploy: docker compose up -d
  • Check logs: docker compose logs -f
  • Verify running: docker ps | grep traefik
  • Test dashboard: https://traefik.harkon.co.uk
  • Verify SSL certificate obtained

Authentik

  • Navigate to: cd /opt/ai-tax-agent/infra/compose/authentik
  • Verify environment: cat .env
  • Deploy: docker compose up -d
  • Wait for startup: sleep 30
  • Check logs: docker compose logs -f authentik-server
  • Verify running: docker ps | grep authentik
  • Access UI: https://authentik.harkon.co.uk
  • Complete initial setup
  • Create admin user
  • Note down API token

Gitea

  • Navigate to: cd /opt/ai-tax-agent/infra/compose/gitea
  • Verify environment: cat .env
  • Deploy: docker compose up -d
  • Wait for startup: sleep 30
  • Check logs: docker compose logs -f gitea-server
  • Verify running: docker ps | grep gitea
  • Access UI: https://gitea.harkon.co.uk
  • Complete initial setup
  • Enable container registry
  • Create access token
  • Test docker login: docker login gitea.harkon.co.uk

Nextcloud (Optional)

  • Navigate to: cd /opt/ai-tax-agent/infra/compose/nextcloud
  • Deploy: docker compose up -d
  • Access UI: https://nextcloud.harkon.co.uk
  • Complete setup

Portainer (Optional)

  • Navigate to: cd /opt/ai-tax-agent/infra/compose/portainer
  • Deploy: docker compose up -d
  • Access UI: https://portainer.harkon.co.uk
  • Create admin user

Phase 2: Application Infrastructure

Infrastructure Services

  • Navigate to: cd /opt/ai-tax-agent
  • Verify environment: cat infra/environments/production/.env
  • Deploy: ./infra/scripts/deploy.sh production infrastructure
  • Wait for services: sleep 30
  • Check status: docker ps | grep -E "apa-vault|apa-minio|apa-postgres|apa-neo4j|apa-qdrant|apa-redis|apa-nats"
  • Verify Vault: curl https://vault.harkon.co.uk/v1/sys/health
  • Verify MinIO: curl https://minio-api.harkon.co.uk/minio/health/live
  • Verify PostgreSQL: docker exec apa-postgres pg_isready
  • Verify Neo4j: curl http://localhost:7474
  • Verify Qdrant: curl http://localhost:6333/health
  • Verify Redis: docker exec apa-redis redis-cli ping
  • Verify NATS: docker logs nats | grep "Server is ready"

Initialize Vault

  • Access Vault: docker exec -it vault sh
  • Initialize: vault operator init (if first time)
  • Save unseal keys and root token
  • Unseal: vault operator unseal (3 times with different keys)
  • Login: vault login <root-token>
  • Enable KV secrets: vault secrets enable -path=secret kv-v2
  • Exit: exit

Initialize MinIO

  • Access MinIO console: https://minio.harkon.co.uk
  • Login with credentials from .env
  • Create buckets:
    • documents
    • embeddings
    • models
    • backups
  • Set bucket policies (public/private as needed)
  • Create access keys for services

Initialize Databases

  • PostgreSQL:

    • Access: docker exec -it apa-postgres psql -U postgres
    • Create databases: CREATE DATABASE tax_system;
    • Verify: \l
    • Exit: \q
  • Neo4j:

    • Access: docker exec -it apa-neo4j cypher-shell -u neo4j -p <password>
    • Create constraints (if needed)
    • Exit: :exit
  • Qdrant:

    • Create collections via API or wait for services to create them

Phase 3: Monitoring Stack

  • Deploy: ./infra/scripts/deploy.sh production monitoring
  • Wait for services: sleep 30
  • Check status: docker ps | grep -E "prometheus|grafana|loki|promtail"
  • Access Grafana: https://grafana.harkon.co.uk
  • Login with credentials from .env
  • Verify Prometheus datasource
  • Verify Loki datasource
  • Import dashboards
  • Test queries

Phase 4: Application Services

Build and Push Images

  • Verify Gitea registry access: docker login gitea.harkon.co.uk
  • Build base images: ./scripts/build-base-images.sh gitea.harkon.co.uk v1.0.1 harkon
  • Build service images: ./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.1 harkon
  • Verify images in Gitea: https://gitea.harkon.co.uk/harkon/-/packages

Deploy Services

  • Deploy: ./infra/scripts/deploy.sh production services
  • Wait for services: sleep 60
  • Check status: docker ps | grep svc-
  • Check logs: docker compose -f infra/base/services.yaml --env-file infra/environments/production/.env logs -f
  • Verify all 14 services running
  • Check health endpoints

Phase 5: Configure Authentik OAuth

For each service that needs OAuth:

Grafana

  • Create OAuth provider in Authentik
  • Note client ID and secret
  • Update GRAFANA_OAUTH_CLIENT_SECRET in .env
  • Restart Grafana: docker restart grafana
  • Test OAuth login

MinIO

  • Create OAuth provider in Authentik
  • Note client ID and secret
  • Update AUTHENTIK_MINIO_CLIENT_SECRET in .env
  • Restart MinIO: docker restart minio
  • Test OAuth login

Vault

  • Create OAuth provider in Authentik
  • Note client ID and secret
  • Update AUTHENTIK_VAULT_CLIENT_SECRET in .env
  • Configure Vault OIDC
  • Test OAuth login

UI Review

  • Create OAuth provider in Authentik
  • Note client ID and secret
  • Update AUTHENTIK_UI_REVIEW_CLIENT_SECRET in .env
  • Restart UI Review: docker restart ui-review
  • Test OAuth login

Post-Deployment Verification

Service Accessibility

  • Traefik Dashboard: https://traefik.harkon.co.uk
  • Authentik: https://auth.harkon.co.uk
  • Gitea: https://gitea.harkon.co.uk
  • Grafana: https://grafana.harkon.co.uk
  • Prometheus: https://prometheus.harkon.co.uk
  • Vault: https://vault.harkon.co.uk
  • MinIO: https://minio.harkon.co.uk
  • UI Review: https://app.harkon.co.uk

Health Checks

  • All services show as healthy in docker ps
  • No error logs in docker compose logs
  • Grafana shows metrics from Prometheus
  • Loki receiving logs
  • Traefik routing working correctly
  • SSL certificates valid

Functional Tests

  • Can log in to Authentik
  • Can log in to Grafana via OAuth
  • Can access MinIO console
  • Can push/pull from Gitea registry
  • Can access UI Review
  • Can query Prometheus
  • Can view logs in Loki

Performance Checks

  • Response times acceptable (<2s)
  • No memory leaks (check docker stats)
  • No CPU spikes
  • Disk usage reasonable

Rollback Plan

If deployment fails:

Rollback External Services

  • Stop service: cd infra/compose/<service> && docker compose down
  • Restore previous version
  • Restart: docker compose up -d

Rollback Application Infrastructure

  • Stop services: ./infra/scripts/deploy.sh production down
  • Restore data from backup
  • Deploy previous version
  • Verify functionality

Restore Data

  • PostgreSQL: docker exec -i apa-postgres psql -U postgres -d tax_system < backup.sql
  • Neo4j: docker exec apa-neo4j neo4j-admin load --from=/backup/neo4j.dump
  • MinIO: Restore from backup bucket
  • Vault: Restore from snapshot

Maintenance Checklist

Daily

  • Check service status: make status
  • Check logs for errors: make logs | grep ERROR
  • Check disk space: df -h
  • Check Grafana dashboards

Weekly

  • Review Grafana metrics
  • Check for security updates
  • Review logs for anomalies
  • Test backups

Monthly

  • Update Docker images
  • Rotate secrets
  • Review and update documentation
  • Test disaster recovery

Emergency Contacts

  • Infrastructure Lead: [Name]
  • DevOps Team: [Contact]
  • On-Call: [Contact]

Notes

  • Keep this checklist updated
  • Document any deviations
  • Note any issues encountered
  • Update runbooks based on experience