clean up base infra
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled

This commit is contained in:
harkon
2025-10-11 11:42:43 +01:00
parent b324ff09ef
commit f0f7674b8d
52 changed files with 663 additions and 5224 deletions

View File

@@ -1,4 +1,4 @@
# Unified Infrastructure Deployment Plan
# Isolated Stacks Deployment Plan
## Executive Summary
@@ -19,7 +19,7 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
- **SSL**: Let's Encrypt via GoDaddy DNS challenge
- **Exposed Subdomains**:
- `traefik.harkon.co.uk`
- `authentik.harkon.co.uk`
- `auth.harkon.co.uk`
- `gitea.harkon.co.uk`
- `cloud.harkon.co.uk`
- `portainer.harkon.co.uk`
@@ -61,48 +61,14 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
- Company services need to remain stable
- Application services need independent deployment/rollback
## Recommended Architecture
# Decision: Keep Stacks Completely Separate
### Option A: Unified Traefik & Authentik (RECOMMENDED)
We will deploy the company services and the AI Tax Agent as two fully isolated stacks, each with its own Traefik and Authentik. This maximizes blast-radius isolation and avoids naming and DNS conflicts across environments.
**Pros**:
- Single point of entry
- Shared authentication across all services
- Simplified SSL management
- Cost-effective (one Traefik, one Authentik)
**Cons**:
- Application deployments could affect company services
- Requires careful configuration management
**Implementation**:
```
/opt/compose/
├── traefik/ # Shared Traefik (existing)
├── authentik/ # Shared Authentik (existing)
├── company/ # Company services
│ ├── gitea/
│ ├── nextcloud/
│ └── portainer/
└── ai-tax-agent/ # Application services
├── infrastructure/ # App-specific infra (Vault, MinIO, Neo4j, etc.)
└── services/ # Microservices
```
### Option B: Isolated Stacks
**Pros**:
- Complete isolation
- Independent scaling
- No cross-contamination
**Cons**:
- Duplicate Traefik/Authentik
- More complex SSL management
- Higher resource usage
- Users need separate logins
## Proposed Solution: Hybrid Approach
Key implications:
- Separate external networks and DNS namespaces per stack
- Duplicate edge (Traefik) and IdP (Authentik), independent upgrades and rollbacks
- Slightly higher resource usage in exchange for strong isolation
### Architecture Overview
@@ -136,18 +102,18 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
└─────────┘
```
### Directory Structure
### Directory Structure (per stack)
```
/opt/compose/
├── traefik/ # Shared reverse proxy
/opt/compose/<stack>/
├── traefik/ # Stack-local reverse proxy
│ ├── compose.yaml
│ ├── config/
│ │ ├── traefik.yaml # Static config
│ │ ├── dynamic-company.yaml
│ │ └── dynamic-app.yaml
│ └── certs/
├── authentik/ # Shared SSO
├── authentik/ # Stack-local SSO
│ ├── compose.yaml
│ └── ...
├── company/ # Company services namespace
@@ -157,7 +123,7 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
│ │ └── compose.yaml
│ └── portainer/
│ └── compose.yaml
└── ai-tax-agent/ # Application namespace
└── ai-tax-agent/ # Application namespace (if this is the app stack)
├── .env # Production environment
├── infrastructure.yaml # Vault, MinIO, Neo4j, Qdrant, etc.
├── services.yaml # All microservices
@@ -166,32 +132,29 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
### Network Strategy
**Shared Networks**:
- `frontend` - For all services exposed via Traefik
- `backend` - For internal service communication
**Application-Specific Networks** (optional):
- `ai-tax-agent-internal` - For app-only internal communication
- Use stack-scoped network names to avoid collisions: `apa-frontend`, `apa-backend`.
- Only attach services that must be public to `apa-frontend`.
- Keep internal communication on `apa-backend`.
### Domain Mapping
**Company Services** (existing):
- `traefik.harkon.co.uk` - Traefik dashboard
- `authentik.harkon.co.uk` - Authentik SSO
- `auth.harkon.co.uk` - Authentik SSO
- `gitea.harkon.co.uk` - Git hosting
- `cloud.harkon.co.uk` - Nextcloud
- `portainer.harkon.co.uk` - Docker management
**Application Services** (new):
- `app.harkon.co.uk` - Review UI
- `api.harkon.co.uk` - API Gateway (all microservices)
- `vault.harkon.co.uk` - Vault UI (admin only)
- `minio.harkon.co.uk` - MinIO Console (admin only)
- `neo4j.harkon.co.uk` - Neo4j Browser (admin only)
- `qdrant.harkon.co.uk` - Qdrant UI (admin only)
- `grafana.harkon.co.uk` - Grafana (monitoring)
- `prometheus.harkon.co.uk` - Prometheus (admin only)
- `loki.harkon.co.uk` - Loki (admin only)
**Application Services** (app stack):
- `review.<domain>` - Review UI
- `api.<domain>` - API Gateway (microservices via Traefik)
- `vault.<domain>` - Vault UI (admin only)
- `minio.<domain>` - MinIO Console (admin only)
- `neo4j.<domain>` - Neo4j Browser (admin only)
- `qdrant.<domain>` - Qdrant UI (admin only)
- `grafana.<domain>` - Grafana (monitoring)
- `prometheus.<domain>` - Prometheus (admin only)
- `loki.<domain>` - Loki (admin only)
### Authentication Strategy
@@ -208,6 +171,12 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
- `rate-limit` - Standard rate limiting
- `api-rate-limit` - Stricter API rate limiting
## Implementation Notes
- infra/base/infrastructure.yaml now includes Traefik and Authentik in the infrastructure stack with stack-scoped networks and service names.
- All infrastructure component service keys and container names use the `apa-` prefix to avoid DNS collisions on shared Docker hosts.
- Traefik static and dynamic configs live under `infra/base/traefik/config/`.
## Local Development Workflow
### Development Environment
@@ -342,4 +311,3 @@ Create three new compose files for production:
3. Create production compose files
4. Set up CI/CD pipeline for automated deployment
5. Execute Phase 1 (Preparation)