Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
314 lines
11 KiB
Markdown
314 lines
11 KiB
Markdown
# Isolated Stacks Deployment Plan
|
|
|
|
## Executive Summary
|
|
|
|
This plan outlines the strategy to host both the **AI Tax Agent application** and **company services** (Nextcloud, Gitea, Portainer, Authentik) on the remote server at `141.136.35.199` while maintaining an efficient local development workflow.
|
|
|
|
## Current State Analysis
|
|
|
|
### Remote Server (`141.136.35.199`)
|
|
- **Location**: `/opt/compose/`
|
|
- **Existing Services**:
|
|
- Traefik v3.5.1 (reverse proxy with GoDaddy DNS challenge)
|
|
- Authentik 2025.8.1 (SSO/Authentication)
|
|
- Gitea 1.24.5 (Git hosting)
|
|
- Nextcloud (Cloud storage)
|
|
- Portainer 2.33.1 (Docker management)
|
|
- **Networks**: `frontend` and `backend` (external)
|
|
- **Domain**: `harkon.co.uk`
|
|
- **SSL**: Let's Encrypt via GoDaddy DNS challenge
|
|
- **Exposed Subdomains**:
|
|
- `traefik.harkon.co.uk`
|
|
- `auth.harkon.co.uk`
|
|
- `gitea.harkon.co.uk`
|
|
- `cloud.harkon.co.uk`
|
|
- `portainer.harkon.co.uk`
|
|
|
|
### Local Repository (`infra/compose/`)
|
|
- **Compose Files**:
|
|
- `docker-compose.local.yml` - Full stack for local development
|
|
- `docker-compose.backend.yml` - Backend services (appears to be production-ready)
|
|
- **Application Services**:
|
|
- 13+ microservices (svc-ingestion, svc-extract, svc-forms, svc-hmrc, etc.)
|
|
- UI Review application
|
|
- Infrastructure: Vault, MinIO, Qdrant, Neo4j, Postgres, Redis, NATS, Prometheus, Grafana, Loki
|
|
- **Networks**: `ai-tax-agent-frontend` and `ai-tax-agent-backend`
|
|
- **Domain**: `local.lan` (for development)
|
|
- **Authentication**: Authentik with ForwardAuth middleware
|
|
|
|
## Challenges & Conflicts
|
|
|
|
### 1. **Duplicate Services**
|
|
- Both environments have Traefik and Authentik
|
|
- Need to decide: shared vs. isolated
|
|
|
|
### 2. **Network Naming**
|
|
- Remote: `frontend`, `backend`
|
|
- Local: `ai-tax-agent-frontend`, `ai-tax-agent-backend`
|
|
- Production needs: Consistent naming
|
|
|
|
### 3. **Domain Management**
|
|
- Remote: `*.harkon.co.uk` (public)
|
|
- Local: `*.local.lan` (development)
|
|
- Production: Need subdomains like `app.harkon.co.uk`, `api.harkon.co.uk`
|
|
|
|
### 4. **SSL Certificates**
|
|
- Remote: GoDaddy DNS challenge (production)
|
|
- Local: Self-signed certificates
|
|
- Production: Must use GoDaddy DNS challenge
|
|
|
|
### 5. **Resource Isolation**
|
|
- Company services need to remain stable
|
|
- Application services need independent deployment/rollback
|
|
|
|
# Decision: Keep Stacks Completely Separate
|
|
|
|
We will deploy the company services and the AI Tax Agent as two fully isolated stacks, each with its own Traefik and Authentik. This maximizes blast-radius isolation and avoids naming and DNS conflicts across environments.
|
|
|
|
Key implications:
|
|
- Separate external networks and DNS namespaces per stack
|
|
- Duplicate edge (Traefik) and IdP (Authentik), independent upgrades and rollbacks
|
|
- Slightly higher resource usage in exchange for strong isolation
|
|
|
|
### Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Internet (*.harkon.co.uk) │
|
|
└────────────────────────┬────────────────────────────────────┘
|
|
│
|
|
┌────▼────┐
|
|
│ Traefik │ (Port 80/443)
|
|
│ v3.5.1 │
|
|
└────┬────┘
|
|
│
|
|
┌────────────────┼────────────────┐
|
|
│ │ │
|
|
┌────▼─────┐ ┌────▼────┐ ┌────▼─────┐
|
|
│Authentik │ │ Company │ │ App │
|
|
│ SSO │ │Services │ │ Services │
|
|
└──────────┘ └─────────┘ └──────────┘
|
|
│ │
|
|
┌────┴────┐ ┌────┴────┐
|
|
│ Gitea │ │ Vault │
|
|
│Nextcloud│ │ MinIO │
|
|
│Portainer│ │ Neo4j │
|
|
└─────────┘ │ Qdrant │
|
|
│ Postgres│
|
|
│ Redis │
|
|
│ NATS │
|
|
│ 13 SVCs │
|
|
│ UI │
|
|
└─────────┘
|
|
```
|
|
|
|
### Directory Structure (per stack)
|
|
|
|
```
|
|
/opt/compose/<stack>/
|
|
├── traefik/ # Stack-local reverse proxy
|
|
│ ├── compose.yaml
|
|
│ ├── config/
|
|
│ │ ├── traefik.yaml # Static config
|
|
│ │ ├── dynamic-company.yaml
|
|
│ │ └── dynamic-app.yaml
|
|
│ └── certs/
|
|
├── authentik/ # Stack-local SSO
|
|
│ ├── compose.yaml
|
|
│ └── ...
|
|
├── company/ # Company services namespace
|
|
│ ├── gitea/
|
|
│ │ └── compose.yaml
|
|
│ ├── nextcloud/
|
|
│ │ └── compose.yaml
|
|
│ └── portainer/
|
|
│ └── compose.yaml
|
|
└── ai-tax-agent/ # Application namespace (if this is the app stack)
|
|
├── .env # Production environment
|
|
├── infrastructure.yaml # Vault, MinIO, Neo4j, Qdrant, etc.
|
|
├── services.yaml # All microservices
|
|
└── monitoring.yaml # Prometheus, Grafana, Loki
|
|
```
|
|
|
|
### Network Strategy
|
|
|
|
- Use stack-scoped network names to avoid collisions: `apa-frontend`, `apa-backend`.
|
|
- Only attach services that must be public to `apa-frontend`.
|
|
- Keep internal communication on `apa-backend`.
|
|
|
|
### Domain Mapping
|
|
|
|
**Company Services** (existing):
|
|
- `traefik.harkon.co.uk` - Traefik dashboard
|
|
- `auth.harkon.co.uk` - Authentik SSO
|
|
- `gitea.harkon.co.uk` - Git hosting
|
|
- `cloud.harkon.co.uk` - Nextcloud
|
|
- `portainer.harkon.co.uk` - Docker management
|
|
|
|
**Application Services** (app stack):
|
|
- `review.<domain>` - Review UI
|
|
- `api.<domain>` - API Gateway (microservices via Traefik)
|
|
- `vault.<domain>` - Vault UI (admin only)
|
|
- `minio.<domain>` - MinIO Console (admin only)
|
|
- `neo4j.<domain>` - Neo4j Browser (admin only)
|
|
- `qdrant.<domain>` - Qdrant UI (admin only)
|
|
- `grafana.<domain>` - Grafana (monitoring)
|
|
- `prometheus.<domain>` - Prometheus (admin only)
|
|
- `loki.<domain>` - Loki (admin only)
|
|
|
|
### Authentication Strategy
|
|
|
|
**Authentik Configuration**:
|
|
1. **Company Group** - Access to Gitea, Nextcloud, Portainer
|
|
2. **App Admin Group** - Full access to all app services
|
|
3. **App User Group** - Access to Review UI and API
|
|
4. **App Reviewer Group** - Access to Review UI only
|
|
|
|
**Middleware Configuration**:
|
|
- `authentik-forwardauth` - Standard auth for all services
|
|
- `admin-auth` - Requires admin group (Vault, MinIO, Neo4j, etc.)
|
|
- `reviewer-auth` - Requires reviewer or higher
|
|
- `rate-limit` - Standard rate limiting
|
|
- `api-rate-limit` - Stricter API rate limiting
|
|
|
|
## Implementation Notes
|
|
|
|
- infra/base/infrastructure.yaml now includes Traefik and Authentik in the infrastructure stack with stack-scoped networks and service names.
|
|
- All infrastructure component service keys and container names use the `apa-` prefix to avoid DNS collisions on shared Docker hosts.
|
|
- Traefik static and dynamic configs live under `infra/base/traefik/config/`.
|
|
|
|
## Local Development Workflow
|
|
|
|
### Development Environment
|
|
|
|
**Keep Existing Setup**:
|
|
- Use `docker-compose.local.yml` as-is
|
|
- Domain: `*.local.lan`
|
|
- Self-signed certificates
|
|
- Isolated networks: `ai-tax-agent-frontend`, `ai-tax-agent-backend`
|
|
- Full stack runs locally
|
|
|
|
**Benefits**:
|
|
- No dependency on remote server
|
|
- Fast iteration
|
|
- Complete isolation
|
|
- Works offline
|
|
|
|
### Development Commands
|
|
|
|
```bash
|
|
# Local development
|
|
make bootstrap # Initial setup
|
|
make up # Start all services
|
|
make down # Stop all services
|
|
make logs SERVICE=svc-ingestion
|
|
|
|
# Build and test
|
|
make build # Build all images
|
|
make test # Run tests
|
|
make test-integration # Integration tests
|
|
|
|
# Deploy to production
|
|
make deploy-production # Deploy to remote server
|
|
```
|
|
|
|
## Production Deployment Strategy
|
|
|
|
### Phase 1: Preparation (Week 1)
|
|
|
|
1. **Backup Current State**
|
|
```bash
|
|
ssh deploy@141.136.35.199
|
|
cd /opt/compose
|
|
tar -czf ~/backup-$(date +%Y%m%d).tar.gz .
|
|
```
|
|
|
|
2. **Create Production Environment File**
|
|
- Copy `infra/compose/env.example` to `infra/compose/.env.production`
|
|
- Update all secrets and passwords
|
|
- Set `DOMAIN=harkon.co.uk`
|
|
- Configure GoDaddy API credentials
|
|
|
|
3. **Update Traefik Configuration**
|
|
- Merge local Traefik config with remote
|
|
- Add application routes
|
|
- Configure Authentik ForwardAuth
|
|
|
|
4. **Prepare Docker Images**
|
|
- Build all application images
|
|
- Push to container registry (Gitea registry or Docker Hub)
|
|
- Tag with version numbers
|
|
|
|
### Phase 2: Infrastructure Deployment (Week 2)
|
|
|
|
1. **Deploy Application Infrastructure**
|
|
```bash
|
|
# On remote server
|
|
cd /opt/compose/ai-tax-agent
|
|
docker compose -f infrastructure.yaml up -d
|
|
```
|
|
|
|
2. **Initialize Services**
|
|
- Vault: Unseal and configure
|
|
- Postgres: Run migrations
|
|
- Neo4j: Install plugins
|
|
- MinIO: Create buckets
|
|
|
|
3. **Configure Authentik**
|
|
- Create application groups
|
|
- Configure OAuth providers
|
|
- Set up ForwardAuth outpost
|
|
|
|
### Phase 3: Application Deployment (Week 3)
|
|
|
|
1. **Deploy Microservices**
|
|
```bash
|
|
docker compose -f services.yaml up -d
|
|
```
|
|
|
|
2. **Deploy Monitoring**
|
|
```bash
|
|
docker compose -f monitoring.yaml up -d
|
|
```
|
|
|
|
3. **Verify Health**
|
|
- Check all service health endpoints
|
|
- Verify Traefik routing
|
|
- Test authentication flow
|
|
|
|
### Phase 4: Testing & Validation (Week 4)
|
|
|
|
1. **Smoke Tests**
|
|
2. **Integration Tests**
|
|
3. **Performance Tests**
|
|
4. **Security Audit**
|
|
|
|
## Deployment Files Structure
|
|
|
|
Create three new compose files for production:
|
|
|
|
1. **`infrastructure.yaml`** - Vault, MinIO, Neo4j, Qdrant, Postgres, Redis, NATS
|
|
2. **`services.yaml`** - All 13 microservices + UI
|
|
3. **`monitoring.yaml`** - Prometheus, Grafana, Loki
|
|
|
|
## Rollback Strategy
|
|
|
|
1. **Service-Level Rollback**: Use Docker image tags
|
|
2. **Full Rollback**: Restore from backup
|
|
3. **Gradual Rollout**: Deploy services incrementally
|
|
|
|
## Monitoring & Maintenance
|
|
|
|
- **Logs**: Centralized in Loki
|
|
- **Metrics**: Prometheus + Grafana
|
|
- **Alerts**: Configure Grafana alerts
|
|
- **Backups**: Daily automated backups of volumes
|
|
|
|
## Next Steps
|
|
|
|
1. Review and approve this plan
|
|
2. Create production environment file
|
|
3. Create production compose files
|
|
4. Set up CI/CD pipeline for automated deployment
|
|
5. Execute Phase 1 (Preparation)
|