Files
ai-tax-agent/infra/DEPLOYMENT_GUIDE.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

542 lines
10 KiB
Markdown

# AI Tax Agent Infrastructure Deployment Guide
Complete guide for deploying AI Tax Agent infrastructure across all environments.
## Table of Contents
1. [Prerequisites](#prerequisites)
2. [Quick Start](#quick-start)
3. [Local Development](#local-development)
4. [Development Server](#development-server)
5. [Production Server](#production-server)
6. [Troubleshooting](#troubleshooting)
---
## Prerequisites
### Required Software
- Docker 24.0+ with Compose V2
- Git
- SSH access (for remote deployments)
- Domain with DNS access (for dev/prod)
### Required Accounts
- GoDaddy account (for DNS-01 challenge)
- Gitea account (for container registry)
- OpenAI/Anthropic API keys (optional)
### Network Requirements
- Ports 80, 443 open (for Traefik)
- Docker networks: `frontend`, `backend`
---
## Quick Start
### 1. Clone Repository
```bash
git clone <repository-url>
cd ai-tax-agent
```
### 2. Choose Environment
```bash
# Local development
export ENV=local
# Development server
export ENV=development
# Production server
export ENV=production
```
### 3. Setup Environment File
```bash
# Copy template
cp infra/environments/$ENV/.env.example infra/environments/$ENV/.env
# Edit configuration
vim infra/environments/$ENV/.env
```
### 4. Generate Secrets (Dev/Prod only)
```bash
./scripts/generate-production-secrets.sh
```
### 5. Deploy
```bash
# Setup networks
./infra/scripts/setup-networks.sh
# Deploy all services
./infra/scripts/deploy.sh $ENV all
```
---
## Local Development
### Setup
1. **Create environment file**:
```bash
cp infra/environments/local/.env.example infra/environments/local/.env
```
2. **Edit configuration**:
```bash
vim infra/environments/local/.env
```
Key settings for local:
```env
DOMAIN=localhost
POSTGRES_PASSWORD=postgres
MINIO_ROOT_PASSWORD=minioadmin
GRAFANA_PASSWORD=admin
```
3. **Generate self-signed certificates** (optional):
```bash
./scripts/generate-dev-certs.sh
```
### Deploy
```bash
# Setup networks
./infra/scripts/setup-networks.sh
# Deploy infrastructure
./infra/scripts/deploy.sh local infrastructure
# Deploy monitoring
./infra/scripts/deploy.sh local monitoring
# Deploy services
./infra/scripts/deploy.sh local services
```
### Access Services
- **Grafana**: http://localhost:3000 (admin/admin)
- **MinIO Console**: http://localhost:9093 (minioadmin/minioadmin)
- **Vault**: http://localhost:8200 (token: dev-root-token)
- **Traefik Dashboard**: http://localhost:8080
### Development Workflow
1. Make code changes
2. Build images: `./scripts/build-and-push-images.sh localhost:5000 latest local`
3. Restart services: `./infra/scripts/deploy.sh local services`
4. Test changes
5. Check logs: `docker compose -f infra/base/services.yaml --env-file infra/environments/local/.env logs -f`
---
## Development Server
### Prerequisites
- Server with Docker installed
- Domain: `dev.harkon.co.uk`
- GoDaddy API credentials
- SSH access to server
### Setup
1. **SSH to development server**:
```bash
ssh deploy@dev-server.harkon.co.uk
```
2. **Clone repository**:
```bash
cd /opt
git clone <repository-url> ai-tax-agent
cd ai-tax-agent
```
3. **Create environment file**:
```bash
cp infra/environments/development/.env.example infra/environments/development/.env
```
4. **Generate secrets**:
```bash
./scripts/generate-production-secrets.sh
```
5. **Edit environment file**:
```bash
vim infra/environments/development/.env
```
Update:
- `DOMAIN=dev.harkon.co.uk`
- `EMAIL=dev@harkon.co.uk`
- API keys
- Registry credentials
6. **Setup GoDaddy DNS**:
```bash
# Create Traefik provider file
vim infra/configs/traefik/.provider.env
```
Add:
```env
GODADDY_API_KEY=your-api-key
GODADDY_API_SECRET=your-api-secret
```
### Deploy
```bash
# Setup networks
./infra/scripts/setup-networks.sh
# Deploy infrastructure
./infra/scripts/deploy.sh development infrastructure
# Wait for services to be healthy
sleep 30
# Deploy monitoring
./infra/scripts/deploy.sh development monitoring
# Deploy services
./infra/scripts/deploy.sh development services
```
### Verify Deployment
```bash
# Check services
docker ps
# Check logs
docker compose -f infra/base/infrastructure.yaml --env-file infra/environments/development/.env logs -f
# Test endpoints
curl https://vault.dev.harkon.co.uk
curl https://grafana.dev.harkon.co.uk
```
### Access Services
- **Grafana**: https://grafana.dev.harkon.co.uk
- **MinIO**: https://minio.dev.harkon.co.uk
- **Vault**: https://vault.dev.harkon.co.uk
- **UI Review**: https://ui-review.dev.harkon.co.uk
---
## Production Server
### Prerequisites
- Production server (141.136.35.199)
- Domain: `harkon.co.uk`
- Existing Traefik, Authentik, Gitea
- SSH access as `deploy` user
### Pre-Deployment Checklist
- [ ] Backup existing data
- [ ] Test in development first
- [ ] Generate production secrets
- [ ] Update DNS records
- [ ] Configure Authentik OAuth providers
- [ ] Setup Gitea container registry
- [ ] Build and push Docker images
### Setup
1. **SSH to production server**:
```bash
ssh deploy@141.136.35.199
```
2. **Navigate to project**:
```bash
cd /opt/ai-tax-agent
git pull origin main
```
3. **Verify environment file**:
```bash
cat infra/environments/production/.env | grep DOMAIN
```
Should show:
```env
DOMAIN=harkon.co.uk
```
4. **Verify secrets are set**:
```bash
# Check all secrets are not CHANGE_ME
grep -i "CHANGE_ME" infra/environments/production/.env
```
Should return nothing.
### Deploy Infrastructure
```bash
# Setup networks (if not already created)
./infra/scripts/setup-networks.sh
# Deploy infrastructure services
./infra/scripts/deploy.sh production infrastructure
```
This deploys:
- Vault (secrets management)
- MinIO (object storage)
- PostgreSQL (relational database)
- Neo4j (graph database)
- Qdrant (vector database)
- Redis (cache)
- NATS (message queue)
### Deploy Monitoring
```bash
./infra/scripts/deploy.sh production monitoring
```
This deploys:
- Prometheus (metrics)
- Grafana (dashboards)
- Loki (logs)
- Promtail (log collector)
### Deploy Services
```bash
./infra/scripts/deploy.sh production services
```
This deploys all 14 microservices.
### Post-Deployment
1. **Verify all services are running**:
```bash
docker ps | grep ai-tax-agent
```
2. **Check health**:
```bash
curl https://vault.harkon.co.uk/v1/sys/health
curl https://minio-api.harkon.co.uk/minio/health/live
```
3. **Configure Authentik OAuth**:
- Create OAuth providers for each service
- Update environment variables with client secrets
- Restart services
4. **Initialize Vault**:
```bash
# Access Vault
docker exec -it vault sh
# Initialize (if first time)
vault operator init
# Unseal (if needed)
vault operator unseal
```
5. **Setup MinIO buckets**:
```bash
# Access MinIO console
# https://minio.harkon.co.uk
# Create buckets:
# - documents
# - embeddings
# - models
# - backups
```
### Access Services
All services available at `https://<service>.harkon.co.uk`:
- **UI Review**: https://ui-review.harkon.co.uk
- **Grafana**: https://grafana.harkon.co.uk
- **Prometheus**: https://prometheus.harkon.co.uk
- **Vault**: https://vault.harkon.co.uk
- **MinIO**: https://minio.harkon.co.uk
---
## Troubleshooting
### Services Not Starting
```bash
# Check logs
docker compose -f infra/base/infrastructure.yaml --env-file infra/environments/production/.env logs -f
# Check specific service
docker logs vault
# Check Docker daemon
sudo systemctl status docker
```
### Network Issues
```bash
# Check networks exist
docker network ls | grep -E "frontend|backend"
# Inspect network
docker network inspect frontend
# Recreate networks
docker network rm frontend backend
./infra/scripts/setup-networks.sh
```
### Traefik Routing Issues
```bash
# Check Traefik logs
docker logs traefik | grep -i error
# Check container labels
docker inspect vault | grep -A 20 Labels
# Check Traefik dashboard
https://traefik.harkon.co.uk/dashboard/
```
### Database Connection Issues
```bash
# Check PostgreSQL
docker exec -it postgres psql -U postgres -c "\l"
# Check Neo4j
docker exec -it neo4j cypher-shell -u neo4j -p $NEO4J_PASSWORD
# Check Redis
docker exec -it redis redis-cli ping
```
### Volume/Data Issues
```bash
# List volumes
docker volume ls
# Inspect volume
docker volume inspect postgres_data
# Backup volume
docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine tar czf /backup/postgres_backup.tar.gz /data
```
### SSL Certificate Issues
```bash
# Check Traefik logs for ACME errors
docker logs traefik | grep -i acme
# Check GoDaddy credentials
cat infra/configs/traefik/.provider.env
# Force certificate renewal
docker exec traefik rm -rf /var/traefik/certs/acme.json
docker restart traefik
```
---
## Maintenance
### Update Services
```bash
# Pull latest code
git pull origin main
# Rebuild images
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.2 harkon
# Deploy updates
./infra/scripts/deploy.sh production services --pull
```
### Backup Data
```bash
# Backup all volumes
./scripts/backup-volumes.sh production
# Backup specific service
docker run --rm -v postgres_data:/data -v $(pwd):/backup alpine tar czf /backup/postgres_backup.tar.gz /data
```
### Scale Services
```bash
# Scale a service
docker compose -f infra/base/services.yaml --env-file infra/environments/production/.env up -d --scale svc-ingestion=3
```
### View Logs
```bash
# All services
docker compose -f infra/base/services.yaml --env-file infra/environments/production/.env logs -f
# Specific service
docker logs -f svc-ingestion
# With Loki (via Grafana)
https://grafana.harkon.co.uk/explore
```
---
## Security Best Practices
1. **Rotate secrets regularly** - Use `generate-production-secrets.sh`
2. **Use Authentik SSO** - Enable for all services
3. **Keep images updated** - Regular security patches
4. **Monitor logs** - Check for suspicious activity
5. **Backup regularly** - Automated daily backups
6. **Use strong passwords** - Minimum 32 characters
7. **Limit network exposure** - Only expose necessary ports
8. **Enable audit logging** - Track all access
---
## Support
For issues:
1. Check logs
2. Review documentation
3. Check Traefik dashboard
4. Verify environment variables
5. Test in development first