clean up base infra
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
This commit is contained in:
@@ -99,13 +99,13 @@
|
||||
- [ ] Verify environment: `cat infra/environments/production/.env`
|
||||
- [ ] Deploy: `./infra/scripts/deploy.sh production infrastructure`
|
||||
- [ ] Wait for services: `sleep 30`
|
||||
- [ ] Check status: `docker ps | grep -E "vault|minio|postgres|neo4j|qdrant|redis|nats"`
|
||||
- [ ] Check status: `docker ps | grep -E "apa-vault|apa-minio|apa-postgres|apa-neo4j|apa-qdrant|apa-redis|apa-nats"`
|
||||
- [ ] Verify Vault: `curl https://vault.harkon.co.uk/v1/sys/health`
|
||||
- [ ] Verify MinIO: `curl https://minio-api.harkon.co.uk/minio/health/live`
|
||||
- [ ] Verify PostgreSQL: `docker exec postgres pg_isready`
|
||||
- [ ] Verify PostgreSQL: `docker exec apa-postgres pg_isready`
|
||||
- [ ] Verify Neo4j: `curl http://localhost:7474`
|
||||
- [ ] Verify Qdrant: `curl http://localhost:6333/health`
|
||||
- [ ] Verify Redis: `docker exec redis redis-cli ping`
|
||||
- [ ] Verify Redis: `docker exec apa-redis redis-cli ping`
|
||||
- [ ] Verify NATS: `docker logs nats | grep "Server is ready"`
|
||||
|
||||
#### Initialize Vault
|
||||
@@ -133,13 +133,13 @@
|
||||
#### Initialize Databases
|
||||
|
||||
- [ ] PostgreSQL:
|
||||
- [ ] Access: `docker exec -it postgres psql -U postgres`
|
||||
- [ ] Access: `docker exec -it apa-postgres psql -U postgres`
|
||||
- [ ] Create databases: `CREATE DATABASE tax_system;`
|
||||
- [ ] Verify: `\l`
|
||||
- [ ] Exit: `\q`
|
||||
|
||||
- [ ] Neo4j:
|
||||
- [ ] Access: `docker exec -it neo4j cypher-shell -u neo4j -p <password>`
|
||||
- [ ] Access: `docker exec -it apa-neo4j cypher-shell -u neo4j -p <password>`
|
||||
- [ ] Create constraints (if needed)
|
||||
- [ ] Exit: `:exit`
|
||||
|
||||
@@ -219,13 +219,13 @@ For each service that needs OAuth:
|
||||
### Service Accessibility
|
||||
|
||||
- [ ] Traefik Dashboard: `https://traefik.harkon.co.uk`
|
||||
- [ ] Authentik: `https://authentik.harkon.co.uk`
|
||||
- [ ] Authentik: `https://auth.harkon.co.uk`
|
||||
- [ ] Gitea: `https://gitea.harkon.co.uk`
|
||||
- [ ] Grafana: `https://grafana.harkon.co.uk`
|
||||
- [ ] Prometheus: `https://prometheus.harkon.co.uk`
|
||||
- [ ] Vault: `https://vault.harkon.co.uk`
|
||||
- [ ] MinIO: `https://minio.harkon.co.uk`
|
||||
- [ ] UI Review: `https://ui-review.harkon.co.uk`
|
||||
- [ ] UI Review: `https://app.harkon.co.uk`
|
||||
|
||||
### Health Checks
|
||||
|
||||
@@ -274,8 +274,8 @@ If deployment fails:
|
||||
|
||||
### Restore Data
|
||||
|
||||
- [ ] PostgreSQL: `docker exec -i postgres psql -U postgres -d tax_system < backup.sql`
|
||||
- [ ] Neo4j: `docker exec neo4j neo4j-admin load --from=/backup/neo4j.dump`
|
||||
- [ ] PostgreSQL: `docker exec -i apa-postgres psql -U postgres -d tax_system < backup.sql`
|
||||
- [ ] Neo4j: `docker exec apa-neo4j neo4j-admin load --from=/backup/neo4j.dump`
|
||||
- [ ] MinIO: Restore from backup bucket
|
||||
- [ ] Vault: Restore from snapshot
|
||||
|
||||
@@ -320,4 +320,3 @@ If deployment fails:
|
||||
- Document any deviations
|
||||
- Note any issues encountered
|
||||
- Update runbooks based on experience
|
||||
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Unified Infrastructure Deployment Plan
|
||||
# Isolated Stacks Deployment Plan
|
||||
|
||||
## Executive Summary
|
||||
|
||||
@@ -19,7 +19,7 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
|
||||
- **SSL**: Let's Encrypt via GoDaddy DNS challenge
|
||||
- **Exposed Subdomains**:
|
||||
- `traefik.harkon.co.uk`
|
||||
- `authentik.harkon.co.uk`
|
||||
- `auth.harkon.co.uk`
|
||||
- `gitea.harkon.co.uk`
|
||||
- `cloud.harkon.co.uk`
|
||||
- `portainer.harkon.co.uk`
|
||||
@@ -61,48 +61,14 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
|
||||
- Company services need to remain stable
|
||||
- Application services need independent deployment/rollback
|
||||
|
||||
## Recommended Architecture
|
||||
# Decision: Keep Stacks Completely Separate
|
||||
|
||||
### Option A: Unified Traefik & Authentik (RECOMMENDED)
|
||||
We will deploy the company services and the AI Tax Agent as two fully isolated stacks, each with its own Traefik and Authentik. This maximizes blast-radius isolation and avoids naming and DNS conflicts across environments.
|
||||
|
||||
**Pros**:
|
||||
- Single point of entry
|
||||
- Shared authentication across all services
|
||||
- Simplified SSL management
|
||||
- Cost-effective (one Traefik, one Authentik)
|
||||
|
||||
**Cons**:
|
||||
- Application deployments could affect company services
|
||||
- Requires careful configuration management
|
||||
|
||||
**Implementation**:
|
||||
```
|
||||
/opt/compose/
|
||||
├── traefik/ # Shared Traefik (existing)
|
||||
├── authentik/ # Shared Authentik (existing)
|
||||
├── company/ # Company services
|
||||
│ ├── gitea/
|
||||
│ ├── nextcloud/
|
||||
│ └── portainer/
|
||||
└── ai-tax-agent/ # Application services
|
||||
├── infrastructure/ # App-specific infra (Vault, MinIO, Neo4j, etc.)
|
||||
└── services/ # Microservices
|
||||
```
|
||||
|
||||
### Option B: Isolated Stacks
|
||||
|
||||
**Pros**:
|
||||
- Complete isolation
|
||||
- Independent scaling
|
||||
- No cross-contamination
|
||||
|
||||
**Cons**:
|
||||
- Duplicate Traefik/Authentik
|
||||
- More complex SSL management
|
||||
- Higher resource usage
|
||||
- Users need separate logins
|
||||
|
||||
## Proposed Solution: Hybrid Approach
|
||||
Key implications:
|
||||
- Separate external networks and DNS namespaces per stack
|
||||
- Duplicate edge (Traefik) and IdP (Authentik), independent upgrades and rollbacks
|
||||
- Slightly higher resource usage in exchange for strong isolation
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
@@ -136,18 +102,18 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
### Directory Structure
|
||||
### Directory Structure (per stack)
|
||||
|
||||
```
|
||||
/opt/compose/
|
||||
├── traefik/ # Shared reverse proxy
|
||||
/opt/compose/<stack>/
|
||||
├── traefik/ # Stack-local reverse proxy
|
||||
│ ├── compose.yaml
|
||||
│ ├── config/
|
||||
│ │ ├── traefik.yaml # Static config
|
||||
│ │ ├── dynamic-company.yaml
|
||||
│ │ └── dynamic-app.yaml
|
||||
│ └── certs/
|
||||
├── authentik/ # Shared SSO
|
||||
├── authentik/ # Stack-local SSO
|
||||
│ ├── compose.yaml
|
||||
│ └── ...
|
||||
├── company/ # Company services namespace
|
||||
@@ -157,7 +123,7 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
|
||||
│ │ └── compose.yaml
|
||||
│ └── portainer/
|
||||
│ └── compose.yaml
|
||||
└── ai-tax-agent/ # Application namespace
|
||||
└── ai-tax-agent/ # Application namespace (if this is the app stack)
|
||||
├── .env # Production environment
|
||||
├── infrastructure.yaml # Vault, MinIO, Neo4j, Qdrant, etc.
|
||||
├── services.yaml # All microservices
|
||||
@@ -166,32 +132,29 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
|
||||
|
||||
### Network Strategy
|
||||
|
||||
**Shared Networks**:
|
||||
- `frontend` - For all services exposed via Traefik
|
||||
- `backend` - For internal service communication
|
||||
|
||||
**Application-Specific Networks** (optional):
|
||||
- `ai-tax-agent-internal` - For app-only internal communication
|
||||
- Use stack-scoped network names to avoid collisions: `apa-frontend`, `apa-backend`.
|
||||
- Only attach services that must be public to `apa-frontend`.
|
||||
- Keep internal communication on `apa-backend`.
|
||||
|
||||
### Domain Mapping
|
||||
|
||||
**Company Services** (existing):
|
||||
- `traefik.harkon.co.uk` - Traefik dashboard
|
||||
- `authentik.harkon.co.uk` - Authentik SSO
|
||||
- `auth.harkon.co.uk` - Authentik SSO
|
||||
- `gitea.harkon.co.uk` - Git hosting
|
||||
- `cloud.harkon.co.uk` - Nextcloud
|
||||
- `portainer.harkon.co.uk` - Docker management
|
||||
|
||||
**Application Services** (new):
|
||||
- `app.harkon.co.uk` - Review UI
|
||||
- `api.harkon.co.uk` - API Gateway (all microservices)
|
||||
- `vault.harkon.co.uk` - Vault UI (admin only)
|
||||
- `minio.harkon.co.uk` - MinIO Console (admin only)
|
||||
- `neo4j.harkon.co.uk` - Neo4j Browser (admin only)
|
||||
- `qdrant.harkon.co.uk` - Qdrant UI (admin only)
|
||||
- `grafana.harkon.co.uk` - Grafana (monitoring)
|
||||
- `prometheus.harkon.co.uk` - Prometheus (admin only)
|
||||
- `loki.harkon.co.uk` - Loki (admin only)
|
||||
**Application Services** (app stack):
|
||||
- `review.<domain>` - Review UI
|
||||
- `api.<domain>` - API Gateway (microservices via Traefik)
|
||||
- `vault.<domain>` - Vault UI (admin only)
|
||||
- `minio.<domain>` - MinIO Console (admin only)
|
||||
- `neo4j.<domain>` - Neo4j Browser (admin only)
|
||||
- `qdrant.<domain>` - Qdrant UI (admin only)
|
||||
- `grafana.<domain>` - Grafana (monitoring)
|
||||
- `prometheus.<domain>` - Prometheus (admin only)
|
||||
- `loki.<domain>` - Loki (admin only)
|
||||
|
||||
### Authentication Strategy
|
||||
|
||||
@@ -208,6 +171,12 @@ This plan outlines the strategy to host both the **AI Tax Agent application** an
|
||||
- `rate-limit` - Standard rate limiting
|
||||
- `api-rate-limit` - Stricter API rate limiting
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
- infra/base/infrastructure.yaml now includes Traefik and Authentik in the infrastructure stack with stack-scoped networks and service names.
|
||||
- All infrastructure component service keys and container names use the `apa-` prefix to avoid DNS collisions on shared Docker hosts.
|
||||
- Traefik static and dynamic configs live under `infra/base/traefik/config/`.
|
||||
|
||||
## Local Development Workflow
|
||||
|
||||
### Development Environment
|
||||
@@ -342,4 +311,3 @@ Create three new compose files for production:
|
||||
3. Create production compose files
|
||||
4. Set up CI/CD pipeline for automated deployment
|
||||
5. Execute Phase 1 (Preparation)
|
||||
|
||||
|
||||
282
docs/NATS_README.md
Normal file
282
docs/NATS_README.md
Normal file
@@ -0,0 +1,282 @@
|
||||
# NATS.io Event Bus with JetStream
|
||||
|
||||
This document describes the NATS.io event bus implementation with JetStream support for the AI Tax Agent project.
|
||||
|
||||
## Overview
|
||||
|
||||
The `NATSEventBus` class provides a robust, scalable event streaming solution using NATS.io with JetStream for persistent messaging. It implements the same `EventBus` interface as other event bus implementations (Kafka, SQS, Memory) for consistency.
|
||||
|
||||
## Features
|
||||
|
||||
- **JetStream Integration**: Uses NATS JetStream for persistent, reliable message delivery
|
||||
- **Automatic Stream Management**: Creates and manages JetStream streams automatically
|
||||
- **Pull-based Consumers**: Uses pull-based consumers for better flow control
|
||||
- **Cluster Support**: Supports NATS cluster configurations for high availability
|
||||
- **Error Handling**: Comprehensive error handling with automatic retries
|
||||
- **Message Acknowledgment**: Explicit message acknowledgment with configurable retry policies
|
||||
- **Durable Consumers**: Creates durable consumers for guaranteed message processing
|
||||
|
||||
## Configuration
|
||||
|
||||
### Basic Configuration
|
||||
|
||||
```python
|
||||
from libs.events import NATSEventBus
|
||||
|
||||
# Single server
|
||||
bus = NATSEventBus(
|
||||
servers="nats://localhost:4222",
|
||||
stream_name="TAX_AGENT_EVENTS",
|
||||
consumer_group="tax-agent"
|
||||
)
|
||||
|
||||
# Multiple servers (cluster)
|
||||
bus = NATSEventBus(
|
||||
servers=[
|
||||
"nats://nats1.example.com:4222",
|
||||
"nats://nats2.example.com:4222",
|
||||
"nats://nats3.example.com:4222"
|
||||
],
|
||||
stream_name="PRODUCTION_EVENTS",
|
||||
consumer_group="tax-agent-prod"
|
||||
)
|
||||
```
|
||||
|
||||
### Factory Configuration
|
||||
|
||||
```python
|
||||
from libs.events import create_event_bus
|
||||
|
||||
bus = create_event_bus(
|
||||
"nats",
|
||||
servers="nats://localhost:4222",
|
||||
stream_name="TAX_AGENT_EVENTS",
|
||||
consumer_group="tax-agent"
|
||||
)
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Publishing Events
|
||||
|
||||
```python
|
||||
from libs.events import EventPayload
|
||||
|
||||
# Create event payload
|
||||
payload = EventPayload(
|
||||
data={"user_id": "123", "action": "login"},
|
||||
actor="user-service",
|
||||
tenant_id="tenant-456",
|
||||
trace_id="trace-789"
|
||||
)
|
||||
|
||||
# Publish event
|
||||
success = await bus.publish("user.login", payload)
|
||||
if success:
|
||||
print("Event published successfully")
|
||||
```
|
||||
|
||||
### Subscribing to Events
|
||||
|
||||
```python
|
||||
async def handle_user_login(topic: str, payload: EventPayload) -> None:
|
||||
print(f"User {payload.data['user_id']} logged in")
|
||||
# Process the event...
|
||||
|
||||
# Subscribe to topic
|
||||
await bus.subscribe("user.login", handle_user_login)
|
||||
```
|
||||
|
||||
### Complete Example
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from libs.events import NATSEventBus, EventPayload
|
||||
|
||||
async def main():
|
||||
bus = NATSEventBus()
|
||||
|
||||
try:
|
||||
# Start the bus
|
||||
await bus.start()
|
||||
|
||||
# Subscribe to events
|
||||
await bus.subscribe("user.created", handle_user_created)
|
||||
|
||||
# Publish an event
|
||||
payload = EventPayload(
|
||||
data={"user_id": "123", "email": "user@example.com"},
|
||||
actor="registration-service",
|
||||
tenant_id="tenant-456"
|
||||
)
|
||||
await bus.publish("user.created", payload)
|
||||
|
||||
# Wait for processing
|
||||
await asyncio.sleep(1)
|
||||
|
||||
finally:
|
||||
await bus.stop()
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## JetStream Configuration
|
||||
|
||||
The NATS event bus automatically creates and configures JetStream streams with the following settings:
|
||||
|
||||
- **Retention Policy**: Work Queue (messages are removed after acknowledgment)
|
||||
- **Max Age**: 7 days (messages older than 7 days are automatically deleted)
|
||||
- **Storage**: File-based storage for persistence
|
||||
- **Subject Pattern**: `{stream_name}.*` (e.g., `TAX_AGENT_EVENTS.*`)
|
||||
|
||||
### Consumer Configuration
|
||||
|
||||
- **Durable Consumers**: Each topic subscription creates a durable consumer
|
||||
- **Ack Policy**: Explicit acknowledgment required
|
||||
- **Deliver Policy**: New messages only (doesn't replay old messages)
|
||||
- **Max Deliver**: 3 attempts before message is considered failed
|
||||
- **Ack Wait**: 30 seconds timeout for acknowledgment
|
||||
|
||||
## Error Handling
|
||||
|
||||
The NATS event bus includes comprehensive error handling:
|
||||
|
||||
### Publishing Errors
|
||||
- Network failures are logged and return `False`
|
||||
- Automatic retry logic can be implemented at the application level
|
||||
|
||||
### Consumer Errors
|
||||
- Handler exceptions are caught and logged
|
||||
- Failed messages are negatively acknowledged (NAK) for retry
|
||||
- Messages that fail multiple times are moved to a dead letter queue (if configured)
|
||||
|
||||
### Connection Errors
|
||||
- Automatic reconnection is handled by the NATS client
|
||||
- Consumer tasks are gracefully shut down on connection loss
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
The implementation includes structured logging with the following information:
|
||||
|
||||
- Event publishing success/failure
|
||||
- Consumer subscription status
|
||||
- Message processing metrics
|
||||
- Error details and stack traces
|
||||
|
||||
### Log Examples
|
||||
|
||||
```
|
||||
INFO: Event published topic=user.created event_id=01HK... stream_seq=123
|
||||
INFO: Subscribed to topic topic=user.login consumer=tax-agent-user.login
|
||||
ERROR: Handler failed topic=user.created event_id=01HK... error=...
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Throughput
|
||||
- Pull-based consumers allow for controlled message processing
|
||||
- Batch fetching (up to 10 messages per fetch) improves throughput
|
||||
- Async processing enables high concurrency
|
||||
|
||||
### Memory Usage
|
||||
- File-based storage keeps memory usage low
|
||||
- Configurable message retention prevents unbounded growth
|
||||
|
||||
### Network Efficiency
|
||||
- Binary protocol with minimal overhead
|
||||
- Connection pooling and reuse
|
||||
- Efficient subject-based routing
|
||||
|
||||
## Deployment
|
||||
|
||||
### Docker Compose Example
|
||||
|
||||
```yaml
|
||||
services:
|
||||
nats:
|
||||
image: nats:2.10-alpine
|
||||
ports:
|
||||
- "4222:4222"
|
||||
- "8222:8222"
|
||||
command:
|
||||
- "--jetstream"
|
||||
- "--store_dir=/data"
|
||||
- "--http_port=8222"
|
||||
volumes:
|
||||
- nats_data:/data
|
||||
|
||||
volumes:
|
||||
nats_data:
|
||||
```
|
||||
|
||||
### Kubernetes Example
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: nats
|
||||
spec:
|
||||
serviceName: nats
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: nats
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: nats
|
||||
spec:
|
||||
containers:
|
||||
- name: nats
|
||||
image: nats:2.10-alpine
|
||||
args:
|
||||
- "--cluster_name=nats-cluster"
|
||||
- "--jetstream"
|
||||
- "--store_dir=/data"
|
||||
ports:
|
||||
- containerPort: 4222
|
||||
- containerPort: 6222
|
||||
- containerPort: 8222
|
||||
volumeMounts:
|
||||
- name: nats-storage
|
||||
mountPath: /data
|
||||
volumeClaimTemplates:
|
||||
- metadata:
|
||||
name: nats-storage
|
||||
spec:
|
||||
accessModes: ["ReadWriteOnce"]
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
The NATS event bus requires the following Python package:
|
||||
|
||||
```
|
||||
nats-py>=2.6.0
|
||||
```
|
||||
|
||||
This is automatically included in `libs/requirements.txt`.
|
||||
|
||||
## Comparison with Other Event Buses
|
||||
|
||||
| Feature | NATS | Kafka | SQS |
|
||||
|---------|------|-------|-----|
|
||||
| Setup Complexity | Low | Medium | Low |
|
||||
| Throughput | High | Very High | Medium |
|
||||
| Latency | Very Low | Low | Medium |
|
||||
| Persistence | Yes (JetStream) | Yes | Yes |
|
||||
| Ordering | Per Subject | Per Partition | FIFO Queues |
|
||||
| Clustering | Built-in | Built-in | Managed |
|
||||
| Operational Overhead | Low | High | None |
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use meaningful subject names**: Follow a hierarchical naming convention (e.g., `service.entity.action`)
|
||||
2. **Handle failures gracefully**: Implement proper error handling in event handlers
|
||||
3. **Monitor consumer lag**: Track message processing delays
|
||||
4. **Use appropriate retention**: Configure message retention based on business requirements
|
||||
5. **Test failure scenarios**: Verify behavior during network partitions and service failures
|
||||
@@ -27,12 +27,12 @@ EOF
|
||||
|
||||
```bash
|
||||
# Copy production compose files
|
||||
scp infra/compose/production/infrastructure.yaml deploy@141.136.35.199:/opt/ai-tax-agent/compose/production/
|
||||
scp infra/compose/production/services.yaml deploy@141.136.35.199:/opt/ai-tax-agent/compose/production/
|
||||
scp infra/compose/production/monitoring.yaml deploy@141.136.35.199:/opt/ai-tax-agent/compose/production/
|
||||
scp infra/base/infrastructure.yaml deploy@141.136.35.199:/opt/ai-tax-agent/compose/production/
|
||||
scp infra/base/services.yaml deploy@141.136.35.199:/opt/ai-tax-agent/compose/production/
|
||||
scp infra/base/monitoring.yaml deploy@141.136.35.199:/opt/ai-tax-agent/compose/production/
|
||||
|
||||
# Copy environment file
|
||||
scp infra/compose/.env.production deploy@141.136.35.199:/opt/ai-tax-agent/compose/.env.production
|
||||
scp infra/environments/production/.env deploy@141.136.35.199:/opt/ai-tax-agent/compose/.env
|
||||
|
||||
# Copy monitoring configs
|
||||
scp infra/compose/prometheus/prometheus.yml deploy@141.136.35.199:/opt/ai-tax-agent/compose/prometheus/
|
||||
@@ -123,17 +123,17 @@ ssh deploy@141.136.35.199 "rm ~/vault-keys.txt"
|
||||
|
||||
```bash
|
||||
# MinIO is ready immediately, access at:
|
||||
# https://minio-console.harkon.co.uk
|
||||
# https://minio.harkon.co.uk
|
||||
# Username: admin (from .env.production MINIO_ROOT_USER)
|
||||
# Password: <from .env.production MINIO_ROOT_PASSWORD>
|
||||
|
||||
# Create required buckets
|
||||
ssh deploy@141.136.35.199 << 'EOF'
|
||||
docker exec minio mc alias set local http://localhost:9000 admin <MINIO_ROOT_PASSWORD>
|
||||
docker exec minio mc mb local/documents
|
||||
docker exec minio mc mb local/processed
|
||||
docker exec minio mc mb local/models
|
||||
docker exec minio mc mb local/temp
|
||||
docker exec apa-minio mc alias set local http://localhost:9000 admin <MINIO_ROOT_PASSWORD>
|
||||
docker exec apa-minio mc mb local/documents
|
||||
docker exec apa-minio mc mb local/processed
|
||||
docker exec apa-minio mc mb local/models
|
||||
docker exec apa-minio mc mb local/temp
|
||||
EOF
|
||||
```
|
||||
|
||||
@@ -147,7 +147,7 @@ EOF
|
||||
|
||||
# Verify connection
|
||||
ssh deploy@141.136.35.199 << 'EOF'
|
||||
docker exec neo4j cypher-shell -u neo4j -p <NEO4J_PASSWORD> "RETURN 'Connected' as status;"
|
||||
docker exec apa-neo4j cypher-shell -u neo4j -p <NEO4J_PASSWORD> "RETURN 'Connected' as status;"
|
||||
EOF
|
||||
```
|
||||
|
||||
@@ -181,7 +181,7 @@ EOF
|
||||
|
||||
### Step 10: Configure Authentik OAuth for Grafana
|
||||
|
||||
1. **Login to Authentik**: https://authentik.harkon.co.uk
|
||||
1. **Login to Authentik**: https://auth.harkon.co.uk
|
||||
2. **Create OAuth Provider**:
|
||||
- Applications → Providers → Create
|
||||
- Type: OAuth2/OpenID Provider
|
||||
@@ -210,7 +210,7 @@ EOF
|
||||
|
||||
# Restart Grafana
|
||||
cd /opt/ai-tax-agent
|
||||
docker compose -f compose/production/monitoring.yaml restart grafana
|
||||
docker compose -f compose/production/monitoring.yaml restart apa-grafana
|
||||
```
|
||||
|
||||
### Step 11: Verify Deployment
|
||||
@@ -375,4 +375,3 @@ For issues or questions:
|
||||
- Check logs: `./scripts/verify-deployment.sh`
|
||||
- Review documentation: `docs/DEPLOYMENT_CHECKLIST.md`
|
||||
- Contact: [Your support contact]
|
||||
|
||||
|
||||
Reference in New Issue
Block a user