Files
ai-tax-agent/docs/DEPLOYMENT_PLAN.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

346 lines
11 KiB
Markdown

# Unified Infrastructure Deployment Plan
## Executive Summary
This plan outlines the strategy to host both the **AI Tax Agent application** and **company services** (Nextcloud, Gitea, Portainer, Authentik) on the remote server at `141.136.35.199` while maintaining an efficient local development workflow.
## Current State Analysis
### Remote Server (`141.136.35.199`)
- **Location**: `/opt/compose/`
- **Existing Services**:
- Traefik v3.5.1 (reverse proxy with GoDaddy DNS challenge)
- Authentik 2025.8.1 (SSO/Authentication)
- Gitea 1.24.5 (Git hosting)
- Nextcloud (Cloud storage)
- Portainer 2.33.1 (Docker management)
- **Networks**: `frontend` and `backend` (external)
- **Domain**: `harkon.co.uk`
- **SSL**: Let's Encrypt via GoDaddy DNS challenge
- **Exposed Subdomains**:
- `traefik.harkon.co.uk`
- `authentik.harkon.co.uk`
- `gitea.harkon.co.uk`
- `cloud.harkon.co.uk`
- `portainer.harkon.co.uk`
### Local Repository (`infra/compose/`)
- **Compose Files**:
- `docker-compose.local.yml` - Full stack for local development
- `docker-compose.backend.yml` - Backend services (appears to be production-ready)
- **Application Services**:
- 13+ microservices (svc-ingestion, svc-extract, svc-forms, svc-hmrc, etc.)
- UI Review application
- Infrastructure: Vault, MinIO, Qdrant, Neo4j, Postgres, Redis, NATS, Prometheus, Grafana, Loki
- **Networks**: `ai-tax-agent-frontend` and `ai-tax-agent-backend`
- **Domain**: `local.lan` (for development)
- **Authentication**: Authentik with ForwardAuth middleware
## Challenges & Conflicts
### 1. **Duplicate Services**
- Both environments have Traefik and Authentik
- Need to decide: shared vs. isolated
### 2. **Network Naming**
- Remote: `frontend`, `backend`
- Local: `ai-tax-agent-frontend`, `ai-tax-agent-backend`
- Production needs: Consistent naming
### 3. **Domain Management**
- Remote: `*.harkon.co.uk` (public)
- Local: `*.local.lan` (development)
- Production: Need subdomains like `app.harkon.co.uk`, `api.harkon.co.uk`
### 4. **SSL Certificates**
- Remote: GoDaddy DNS challenge (production)
- Local: Self-signed certificates
- Production: Must use GoDaddy DNS challenge
### 5. **Resource Isolation**
- Company services need to remain stable
- Application services need independent deployment/rollback
## Recommended Architecture
### Option A: Unified Traefik & Authentik (RECOMMENDED)
**Pros**:
- Single point of entry
- Shared authentication across all services
- Simplified SSL management
- Cost-effective (one Traefik, one Authentik)
**Cons**:
- Application deployments could affect company services
- Requires careful configuration management
**Implementation**:
```
/opt/compose/
├── traefik/ # Shared Traefik (existing)
├── authentik/ # Shared Authentik (existing)
├── company/ # Company services
│ ├── gitea/
│ ├── nextcloud/
│ └── portainer/
└── ai-tax-agent/ # Application services
├── infrastructure/ # App-specific infra (Vault, MinIO, Neo4j, etc.)
└── services/ # Microservices
```
### Option B: Isolated Stacks
**Pros**:
- Complete isolation
- Independent scaling
- No cross-contamination
**Cons**:
- Duplicate Traefik/Authentik
- More complex SSL management
- Higher resource usage
- Users need separate logins
## Proposed Solution: Hybrid Approach
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ Internet (*.harkon.co.uk) │
└────────────────────────┬────────────────────────────────────┘
┌────▼────┐
│ Traefik │ (Port 80/443)
│ v3.5.1 │
└────┬────┘
┌────────────────┼────────────────┐
│ │ │
┌────▼─────┐ ┌────▼────┐ ┌────▼─────┐
│Authentik │ │ Company │ │ App │
│ SSO │ │Services │ │ Services │
└──────────┘ └─────────┘ └──────────┘
│ │
┌────┴────┐ ┌────┴────┐
│ Gitea │ │ Vault │
│Nextcloud│ │ MinIO │
│Portainer│ │ Neo4j │
└─────────┘ │ Qdrant │
│ Postgres│
│ Redis │
│ NATS │
│ 13 SVCs │
│ UI │
└─────────┘
```
### Directory Structure
```
/opt/compose/
├── traefik/ # Shared reverse proxy
│ ├── compose.yaml
│ ├── config/
│ │ ├── traefik.yaml # Static config
│ │ ├── dynamic-company.yaml
│ │ └── dynamic-app.yaml
│ └── certs/
├── authentik/ # Shared SSO
│ ├── compose.yaml
│ └── ...
├── company/ # Company services namespace
│ ├── gitea/
│ │ └── compose.yaml
│ ├── nextcloud/
│ │ └── compose.yaml
│ └── portainer/
│ └── compose.yaml
└── ai-tax-agent/ # Application namespace
├── .env # Production environment
├── infrastructure.yaml # Vault, MinIO, Neo4j, Qdrant, etc.
├── services.yaml # All microservices
└── monitoring.yaml # Prometheus, Grafana, Loki
```
### Network Strategy
**Shared Networks**:
- `frontend` - For all services exposed via Traefik
- `backend` - For internal service communication
**Application-Specific Networks** (optional):
- `ai-tax-agent-internal` - For app-only internal communication
### Domain Mapping
**Company Services** (existing):
- `traefik.harkon.co.uk` - Traefik dashboard
- `authentik.harkon.co.uk` - Authentik SSO
- `gitea.harkon.co.uk` - Git hosting
- `cloud.harkon.co.uk` - Nextcloud
- `portainer.harkon.co.uk` - Docker management
**Application Services** (new):
- `app.harkon.co.uk` - Review UI
- `api.harkon.co.uk` - API Gateway (all microservices)
- `vault.harkon.co.uk` - Vault UI (admin only)
- `minio.harkon.co.uk` - MinIO Console (admin only)
- `neo4j.harkon.co.uk` - Neo4j Browser (admin only)
- `qdrant.harkon.co.uk` - Qdrant UI (admin only)
- `grafana.harkon.co.uk` - Grafana (monitoring)
- `prometheus.harkon.co.uk` - Prometheus (admin only)
- `loki.harkon.co.uk` - Loki (admin only)
### Authentication Strategy
**Authentik Configuration**:
1. **Company Group** - Access to Gitea, Nextcloud, Portainer
2. **App Admin Group** - Full access to all app services
3. **App User Group** - Access to Review UI and API
4. **App Reviewer Group** - Access to Review UI only
**Middleware Configuration**:
- `authentik-forwardauth` - Standard auth for all services
- `admin-auth` - Requires admin group (Vault, MinIO, Neo4j, etc.)
- `reviewer-auth` - Requires reviewer or higher
- `rate-limit` - Standard rate limiting
- `api-rate-limit` - Stricter API rate limiting
## Local Development Workflow
### Development Environment
**Keep Existing Setup**:
- Use `docker-compose.local.yml` as-is
- Domain: `*.local.lan`
- Self-signed certificates
- Isolated networks: `ai-tax-agent-frontend`, `ai-tax-agent-backend`
- Full stack runs locally
**Benefits**:
- No dependency on remote server
- Fast iteration
- Complete isolation
- Works offline
### Development Commands
```bash
# Local development
make bootstrap # Initial setup
make up # Start all services
make down # Stop all services
make logs SERVICE=svc-ingestion
# Build and test
make build # Build all images
make test # Run tests
make test-integration # Integration tests
# Deploy to production
make deploy-production # Deploy to remote server
```
## Production Deployment Strategy
### Phase 1: Preparation (Week 1)
1. **Backup Current State**
```bash
ssh deploy@141.136.35.199
cd /opt/compose
tar -czf ~/backup-$(date +%Y%m%d).tar.gz .
```
2. **Create Production Environment File**
- Copy `infra/compose/env.example` to `infra/compose/.env.production`
- Update all secrets and passwords
- Set `DOMAIN=harkon.co.uk`
- Configure GoDaddy API credentials
3. **Update Traefik Configuration**
- Merge local Traefik config with remote
- Add application routes
- Configure Authentik ForwardAuth
4. **Prepare Docker Images**
- Build all application images
- Push to container registry (Gitea registry or Docker Hub)
- Tag with version numbers
### Phase 2: Infrastructure Deployment (Week 2)
1. **Deploy Application Infrastructure**
```bash
# On remote server
cd /opt/compose/ai-tax-agent
docker compose -f infrastructure.yaml up -d
```
2. **Initialize Services**
- Vault: Unseal and configure
- Postgres: Run migrations
- Neo4j: Install plugins
- MinIO: Create buckets
3. **Configure Authentik**
- Create application groups
- Configure OAuth providers
- Set up ForwardAuth outpost
### Phase 3: Application Deployment (Week 3)
1. **Deploy Microservices**
```bash
docker compose -f services.yaml up -d
```
2. **Deploy Monitoring**
```bash
docker compose -f monitoring.yaml up -d
```
3. **Verify Health**
- Check all service health endpoints
- Verify Traefik routing
- Test authentication flow
### Phase 4: Testing & Validation (Week 4)
1. **Smoke Tests**
2. **Integration Tests**
3. **Performance Tests**
4. **Security Audit**
## Deployment Files Structure
Create three new compose files for production:
1. **`infrastructure.yaml`** - Vault, MinIO, Neo4j, Qdrant, Postgres, Redis, NATS
2. **`services.yaml`** - All 13 microservices + UI
3. **`monitoring.yaml`** - Prometheus, Grafana, Loki
## Rollback Strategy
1. **Service-Level Rollback**: Use Docker image tags
2. **Full Rollback**: Restore from backup
3. **Gradual Rollout**: Deploy services incrementally
## Monitoring & Maintenance
- **Logs**: Centralized in Loki
- **Metrics**: Prometheus + Grafana
- **Alerts**: Configure Grafana alerts
- **Backups**: Daily automated backups of volumes
## Next Steps
1. Review and approve this plan
2. Create production environment file
3. Create production compose files
4. Set up CI/CD pipeline for automated deployment
5. Execute Phase 1 (Preparation)