Files
ai-tax-agent/docs/INFRASTRUCTURE_ARCHITECTURE.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

29 KiB

Infrastructure Architecture

System Overview

┌─────────────────────────────────────────────────────────────────────┐
│                         Internet / Users                             │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 │ HTTPS
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         Traefik (Reverse Proxy)                      │
│  - SSL Termination (Let's Encrypt)                                  │
│  - Routing (Host-based)                                              │
│  - Load Balancing                                                    │
│  - Rate Limiting                                                     │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                ┌────────────────┼────────────────┐
                │                │                │
                ▼                ▼                ▼
┌───────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│    Authentik      │  │   External       │  │   Application    │
│    (SSO/Auth)     │  │   Services       │  │   Services       │
│                   │  │                  │  │                  │
│  - User Auth      │  │  - Gitea         │  │  - UI Review     │
│  - OAuth Provider │  │  - Nextcloud     │  │  - API Services  │
│  - SAML Provider  │  │  - Portainer     │  │  - ML Services   │
└───────────────────┘  └──────────────────┘  └──────────────────┘
                                                       │
                                                       │
                        ┌──────────────────────────────┼──────────────────────────────┐
                        │                              │                              │
                        ▼                              ▼                              ▼
        ┌───────────────────────────┐  ┌───────────────────────────┐  ┌───────────────────────────┐
        │   Infrastructure Layer    │  │    Data Layer             │  │   Monitoring Layer        │
        │                           │  │                           │  │                           │
        │  - Vault (Secrets)        │  │  - PostgreSQL             │  │  - Prometheus (Metrics)   │
        │  - MinIO (Object Storage) │  │  - Neo4j (Graph DB)       │  │  - Grafana (Dashboards)   │
        │  - Redis (Cache)          │  │  - Qdrant (Vector DB)     │  │  - Loki (Logs)            │
        │  - NATS (Message Queue)   │  │                           │  │  - Promtail (Collector)   │
        └───────────────────────────┘  └───────────────────────────┘  └───────────────────────────┘

Deployment Architecture

Production Environment

┌─────────────────────────────────────────────────────────────────────┐
│                    Production Server (141.136.35.199)                │
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                    External Services                         │   │
│  │  (Deployed from infra/compose/)                              │   │
│  │                                                               │   │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │   │
│  │  │ Traefik  │  │Authentik │  │  Gitea   │  │Nextcloud │   │   │
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘   │   │
│  │                                                               │   │
│  │  Deployment: cd infra/compose/<service> && docker compose up │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │              Application Infrastructure                       │   │
│  │  (Deployed from infra/base/ + infra/environments/production/) │   │
│  │                                                               │   │
│  │  ┌──────────────────────────────────────────────────────┐   │   │
│  │  │  Infrastructure Services                              │   │   │
│  │  │  - Vault, MinIO, PostgreSQL, Neo4j, Qdrant           │   │   │
│  │  │  - Redis, NATS                                        │   │   │
│  │  └──────────────────────────────────────────────────────┘   │   │
│  │                                                               │   │
│  │  ┌──────────────────────────────────────────────────────┐   │   │
│  │  │  Application Services (14 microservices)             │   │   │
│  │  │  - svc-ingestion, svc-extract, svc-kg, etc.          │   │   │
│  │  │  - ui-review                                          │   │   │
│  │  └──────────────────────────────────────────────────────┘   │   │
│  │                                                               │   │
│  │  ┌──────────────────────────────────────────────────────┐   │   │
│  │  │  Monitoring Services                                  │   │   │
│  │  │  - Prometheus, Grafana, Loki, Promtail               │   │   │
│  │  └──────────────────────────────────────────────────────┘   │   │
│  │                                                               │   │
│  │  Deployment: ./infra/scripts/deploy.sh production <stack>    │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                    Docker Networks                           │   │
│  │                                                               │   │
│  │  ┌──────────────┐              ┌──────────────┐            │   │
│  │  │   frontend   │◄────────────►│   backend    │            │   │
│  │  │  (external)  │              │  (external)  │            │   │
│  │  └──────────────┘              └──────────────┘            │   │
│  └─────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Local Development Environment

┌─────────────────────────────────────────────────────────────────────┐
│                    Local Machine (localhost)                         │
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │              All-in-One Development Stack                    │   │
│  │  (Deployed from infra/compose/docker-compose.local.yml)      │   │
│  │                                                               │   │
│  │  ┌──────────────────────────────────────────────────────┐   │   │
│  │  │  All Services in One Compose File                    │   │   │
│  │  │  - Traefik, Authentik, Vault, MinIO                  │   │   │
│  │  │  - PostgreSQL, Neo4j, Qdrant, Redis, NATS            │   │   │
│  │  │  - Prometheus, Grafana, Loki                          │   │   │
│  │  │  - All 14 microservices + UI                          │   │   │
│  │  └──────────────────────────────────────────────────────┘   │   │
│  │                                                               │   │
│  │  Deployment: make run                                         │   │
│  │  OR: cd infra/compose && docker compose -f docker-compose... │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                       │
│  Alternative: Multi-Environment Structure (same as production)       │
│  Deployment: ./infra/scripts/deploy.sh local all                     │
└─────────────────────────────────────────────────────────────────────┘

Network Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         Frontend Network                             │
│  (Public-facing services connected to Traefik)                       │
│                                                                       │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
│  │ Traefik  │  │Authentik │  │  Vault   │  │  MinIO   │           │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘           │
│                                                                       │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
│  │ Grafana  │  │  Qdrant  │  │  Neo4j   │  │UI Review │           │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘           │
└─────────────────────────────────────────────────────────────────────┘
                                 │
                                 │ Bridge
                                 │
┌─────────────────────────────────────────────────────────────────────┐
│                         Backend Network                              │
│  (Internal services, not directly accessible)                        │
│                                                                       │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
│  │PostgreSQL│  │  Redis   │  │   NATS   │  │  Vault   │           │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘           │
│                                                                       │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
│  │  Neo4j   │  │  Qdrant  │  │  MinIO   │  │Authentik │           │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘           │
│                                                                       │
│  ┌────────────────────────────────────────────────────────┐         │
│  │         All Application Microservices                   │         │
│  │  (svc-ingestion, svc-extract, svc-kg, etc.)            │         │
│  └────────────────────────────────────────────────────────┘         │
└─────────────────────────────────────────────────────────────────────┘

Data Flow

Document Ingestion Flow

User → Traefik → Authentik (Auth) → UI Review
                                        │
                                        ▼
                                  svc-ingestion
                                        │
                    ┌───────────────────┼───────────────────┐
                    ▼                   ▼                   ▼
                MinIO              PostgreSQL            NATS
            (Store file)         (Store metadata)    (Publish event)
                                                           │
                    ┌──────────────────────────────────────┤
                    │                   │                  │
                    ▼                   ▼                  ▼
              svc-extract          svc-ocr           svc-forms
                    │                   │                  │
                    └───────────────────┼──────────────────┘
                                        ▼
                                  svc-normalize-map
                                        │
                    ┌───────────────────┼───────────────────┐
                    ▼                   ▼                   ▼
                 Neo4j              Qdrant            PostgreSQL
            (Knowledge Graph)   (Vector Embeddings)  (Structured Data)

Query/Retrieval Flow

User → Traefik → Authentik (Auth) → UI Review
                                        │
                                        ▼
                                  svc-rag-retriever
                                        │
                    ┌───────────────────┼───────────────────┐
                    ▼                   ▼                   ▼
                 Qdrant              Neo4j            PostgreSQL
            (Vector Search)    (Graph Traversal)   (SQL Queries)
                    │                   │                  │
                    └───────────────────┼──────────────────┘
                                        ▼
                                   svc-reason
                                        │
                                        ▼
                                  svc-coverage
                                        │
                                        ▼
                                   UI Review
                                        │
                                        ▼
                                      User

Deployment Sequence

Production Deployment Order

1. External Services (One-time setup)
   ├── Traefik (reverse proxy)
   ├── Authentik (SSO)
   ├── Gitea (registry)
   ├── Nextcloud (optional)
   └── Portainer (optional)

2. Application Infrastructure
   ├── Vault (secrets)
   ├── PostgreSQL (database)
   ├── Neo4j (graph database)
   ├── Qdrant (vector database)
   ├── MinIO (object storage)
   ├── Redis (cache)
   └── NATS (message queue)

3. Monitoring Stack
   ├── Prometheus (metrics)
   ├── Loki (logs)
   ├── Promtail (log collector)
   └── Grafana (dashboards)

4. Application Services
   ├── Core Services (ingestion, extract, kg)
   ├── ML Services (ocr, rag-indexer, rag-retriever)
   ├── Processing Services (forms, normalize-map, reason)
   ├── Integration Services (hmrc, firm-connectors, rpa)
   ├── Analysis Services (coverage)
   └── UI (ui-review)

Configuration Hierarchy

Environment Variables (.env files)
    │
    ├── infra/environments/production/.env
    │   ├── DOMAIN=harkon.co.uk
    │   ├── Database passwords
    │   ├── API keys
    │   └── OAuth secrets
    │
    ├── infra/compose/traefik/.provider.env
    │   └── GoDaddy API credentials
    │
    └── infra/compose/authentik/.env
        └── Authentik secrets

Service Configurations
    │
    ├── infra/compose/traefik/config/
    │   └── traefik.yaml (static config)
    │
    ├── infra/configs/traefik/
    │   └── app-middlewares.yml (dynamic config)
    │
    ├── infra/configs/grafana/
    │   ├── dashboards/
    │   └── provisioning/
    │
    └── infra/configs/prometheus/
        └── prometheus.yml

Security Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         Security Layers                              │
│                                                                       │
│  1. Network Layer                                                    │
│     ├── Traefik (SSL/TLS termination)                               │
│     ├── Let's Encrypt (automatic certificates)                       │
│     └── Rate limiting & DDoS protection                              │
│                                                                       │
│  2. Authentication Layer                                             │
│     ├── Authentik (SSO/OAuth/SAML)                                  │
│     ├── ForwardAuth middleware                                       │
│     └── Session management                                           │
│                                                                       │
│  3. Authorization Layer                                              │
│     ├── Authentik policies                                           │
│     ├── Service-level permissions                                    │
│     └── API key validation                                           │
│                                                                       │
│  4. Secrets Management                                               │
│     ├── Vault (runtime secrets)                                      │
│     ├── Environment variables (.env files)                           │
│     └── Docker secrets                                               │
│                                                                       │
│  5. Network Isolation                                                │
│     ├── Frontend network (public)                                    │
│     ├── Backend network (private)                                    │
│     └── Service-to-service communication                             │
│                                                                       │
│  6. Data Encryption                                                  │
│     ├── TLS in transit                                               │
│     ├── Database encryption at rest                                  │
│     └── Object storage encryption                                    │
└─────────────────────────────────────────────────────────────────────┘

Monitoring & Observability

┌─────────────────────────────────────────────────────────────────────┐
│                    Monitoring Architecture                           │
│                                                                       │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                      Grafana                                  │  │
│  │  (Unified dashboard for metrics, logs, and traces)           │  │
│  └────────────┬─────────────────────────────────┬───────────────┘  │
│               │                                 │                   │
│               ▼                                 ▼                   │
│  ┌────────────────────────┐      ┌────────────────────────┐       │
│  │      Prometheus        │      │         Loki           │       │
│  │  (Metrics collection)  │      │  (Log aggregation)     │       │
│  └────────────┬───────────┘      └────────────┬───────────┘       │
│               │                                │                   │
│               │                                │                   │
│  ┌────────────┴───────────┐      ┌────────────┴───────────┐       │
│  │   Service Metrics      │      │      Promtail          │       │
│  │  - /metrics endpoints  │      │  (Log collection)      │       │
│  │  - Health checks       │      └────────────┬───────────┘       │
│  │  - Custom metrics      │                   │                   │
│  └────────────────────────┘      ┌────────────┴───────────┐       │
│                                   │   Container Logs       │       │
│                                   │  - stdout/stderr       │       │
│                                   │  - Application logs    │       │
│                                   └────────────────────────┘       │
└─────────────────────────────────────────────────────────────────────┘

Backup & Disaster Recovery

┌─────────────────────────────────────────────────────────────────────┐
│                    Backup Strategy                                   │
│                                                                       │
│  Daily Backups:                                                      │
│  ├── PostgreSQL (pg_dump)                                           │
│  ├── Neo4j (neo4j-admin dump)                                       │
│  ├── Qdrant (snapshot)                                              │
│  ├── Vault (snapshot)                                               │
│  └── MinIO (bucket sync)                                            │
│                                                                       │
│  Weekly Backups:                                                     │
│  ├── Full system snapshot                                           │
│  ├── Configuration files                                            │
│  └── SSL certificates                                               │
│                                                                       │
│  Retention:                                                          │
│  ├── Daily: 7 days                                                  │
│  ├── Weekly: 4 weeks                                                │
│  └── Monthly: 12 months                                             │
│                                                                       │
│  Recovery:                                                           │
│  ├── RTO: 4 hours                                                   │
│  └── RPO: 24 hours                                                  │
└─────────────────────────────────────────────────────────────────────┘