Initial commit
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled

This commit is contained in:
harkon
2025-10-11 08:41:36 +01:00
commit b324ff09ef
276 changed files with 55220 additions and 0 deletions

282
libs/events/NATS_README.md Normal file
View File

@@ -0,0 +1,282 @@
# NATS.io Event Bus with JetStream
This document describes the NATS.io event bus implementation with JetStream support for the AI Tax Agent project.
## Overview
The `NATSEventBus` class provides a robust, scalable event streaming solution using NATS.io with JetStream for persistent messaging. It implements the same `EventBus` interface as other event bus implementations (Kafka, SQS, Memory) for consistency.
## Features
- **JetStream Integration**: Uses NATS JetStream for persistent, reliable message delivery
- **Automatic Stream Management**: Creates and manages JetStream streams automatically
- **Pull-based Consumers**: Uses pull-based consumers for better flow control
- **Cluster Support**: Supports NATS cluster configurations for high availability
- **Error Handling**: Comprehensive error handling with automatic retries
- **Message Acknowledgment**: Explicit message acknowledgment with configurable retry policies
- **Durable Consumers**: Creates durable consumers for guaranteed message processing
## Configuration
### Basic Configuration
```python
from libs.events import NATSEventBus
# Single server
bus = NATSEventBus(
servers="nats://localhost:4222",
stream_name="TAX_AGENT_EVENTS",
consumer_group="tax-agent"
)
# Multiple servers (cluster)
bus = NATSEventBus(
servers=[
"nats://nats1.example.com:4222",
"nats://nats2.example.com:4222",
"nats://nats3.example.com:4222"
],
stream_name="PRODUCTION_EVENTS",
consumer_group="tax-agent-prod"
)
```
### Factory Configuration
```python
from libs.events import create_event_bus
bus = create_event_bus(
"nats",
servers="nats://localhost:4222",
stream_name="TAX_AGENT_EVENTS",
consumer_group="tax-agent"
)
```
## Usage
### Publishing Events
```python
from libs.events import EventPayload
# Create event payload
payload = EventPayload(
data={"user_id": "123", "action": "login"},
actor="user-service",
tenant_id="tenant-456",
trace_id="trace-789"
)
# Publish event
success = await bus.publish("user.login", payload)
if success:
print("Event published successfully")
```
### Subscribing to Events
```python
async def handle_user_login(topic: str, payload: EventPayload) -> None:
print(f"User {payload.data['user_id']} logged in")
# Process the event...
# Subscribe to topic
await bus.subscribe("user.login", handle_user_login)
```
### Complete Example
```python
import asyncio
from libs.events import NATSEventBus, EventPayload
async def main():
bus = NATSEventBus()
try:
# Start the bus
await bus.start()
# Subscribe to events
await bus.subscribe("user.created", handle_user_created)
# Publish an event
payload = EventPayload(
data={"user_id": "123", "email": "user@example.com"},
actor="registration-service",
tenant_id="tenant-456"
)
await bus.publish("user.created", payload)
# Wait for processing
await asyncio.sleep(1)
finally:
await bus.stop()
asyncio.run(main())
```
## JetStream Configuration
The NATS event bus automatically creates and configures JetStream streams with the following settings:
- **Retention Policy**: Work Queue (messages are removed after acknowledgment)
- **Max Age**: 7 days (messages older than 7 days are automatically deleted)
- **Storage**: File-based storage for persistence
- **Subject Pattern**: `{stream_name}.*` (e.g., `TAX_AGENT_EVENTS.*`)
### Consumer Configuration
- **Durable Consumers**: Each topic subscription creates a durable consumer
- **Ack Policy**: Explicit acknowledgment required
- **Deliver Policy**: New messages only (doesn't replay old messages)
- **Max Deliver**: 3 attempts before message is considered failed
- **Ack Wait**: 30 seconds timeout for acknowledgment
## Error Handling
The NATS event bus includes comprehensive error handling:
### Publishing Errors
- Network failures are logged and return `False`
- Automatic retry logic can be implemented at the application level
### Consumer Errors
- Handler exceptions are caught and logged
- Failed messages are negatively acknowledged (NAK) for retry
- Messages that fail multiple times are moved to a dead letter queue (if configured)
### Connection Errors
- Automatic reconnection is handled by the NATS client
- Consumer tasks are gracefully shut down on connection loss
## Monitoring and Observability
The implementation includes structured logging with the following information:
- Event publishing success/failure
- Consumer subscription status
- Message processing metrics
- Error details and stack traces
### Log Examples
```
INFO: Event published topic=user.created event_id=01HK... stream_seq=123
INFO: Subscribed to topic topic=user.login consumer=tax-agent-user.login
ERROR: Handler failed topic=user.created event_id=01HK... error=...
```
## Performance Considerations
### Throughput
- Pull-based consumers allow for controlled message processing
- Batch fetching (up to 10 messages per fetch) improves throughput
- Async processing enables high concurrency
### Memory Usage
- File-based storage keeps memory usage low
- Configurable message retention prevents unbounded growth
### Network Efficiency
- Binary protocol with minimal overhead
- Connection pooling and reuse
- Efficient subject-based routing
## Deployment
### Docker Compose Example
```yaml
services:
nats:
image: nats:2.10-alpine
ports:
- "4222:4222"
- "8222:8222"
command:
- "--jetstream"
- "--store_dir=/data"
- "--http_port=8222"
volumes:
- nats_data:/data
volumes:
nats_data:
```
### Kubernetes Example
```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nats
spec:
serviceName: nats
replicas: 3
selector:
matchLabels:
app: nats
template:
metadata:
labels:
app: nats
spec:
containers:
- name: nats
image: nats:2.10-alpine
args:
- "--cluster_name=nats-cluster"
- "--jetstream"
- "--store_dir=/data"
ports:
- containerPort: 4222
- containerPort: 6222
- containerPort: 8222
volumeMounts:
- name: nats-storage
mountPath: /data
volumeClaimTemplates:
- metadata:
name: nats-storage
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
```
## Dependencies
The NATS event bus requires the following Python package:
```
nats-py>=2.6.0
```
This is automatically included in `libs/requirements.txt`.
## Comparison with Other Event Buses
| Feature | NATS | Kafka | SQS |
|---------|------|-------|-----|
| Setup Complexity | Low | Medium | Low |
| Throughput | High | Very High | Medium |
| Latency | Very Low | Low | Medium |
| Persistence | Yes (JetStream) | Yes | Yes |
| Ordering | Per Subject | Per Partition | FIFO Queues |
| Clustering | Built-in | Built-in | Managed |
| Operational Overhead | Low | High | None |
## Best Practices
1. **Use meaningful subject names**: Follow a hierarchical naming convention (e.g., `service.entity.action`)
2. **Handle failures gracefully**: Implement proper error handling in event handlers
3. **Monitor consumer lag**: Track message processing delays
4. **Use appropriate retention**: Configure message retention based on business requirements
5. **Test failure scenarios**: Verify behavior during network partitions and service failures

20
libs/events/__init__.py Normal file
View File

@@ -0,0 +1,20 @@
"""Event-driven architecture with Kafka, SQS, NATS, and Memory support."""
from .base import EventBus, EventPayload
from .factory import create_event_bus
from .kafka_bus import KafkaEventBus
from .memory_bus import MemoryEventBus
from .nats_bus import NATSEventBus
from .sqs_bus import SQSEventBus
from .topics import EventTopics
__all__ = [
"EventPayload",
"EventBus",
"KafkaEventBus",
"MemoryEventBus",
"NATSEventBus",
"SQSEventBus",
"create_event_bus",
"EventTopics",
]

68
libs/events/base.py Normal file
View File

@@ -0,0 +1,68 @@
"""Base event classes and interfaces."""
import json
from abc import ABC, abstractmethod
from collections.abc import Awaitable, Callable
from datetime import datetime
from typing import Any
import ulid
# Each payload MUST include: `event_id (ulid)`, `occurred_at (iso)`, `actor`, `tenant_id`, `trace_id`, `schema_version`, and a `data` object (service-specific).
class EventPayload:
"""Standard event payload structure"""
def __init__( # pylint: disable=too-many-arguments,too-many-positional-arguments
self,
data: dict[str, Any],
actor: str,
tenant_id: str,
trace_id: str | None = None,
schema_version: str = "1.0",
):
self.event_id = str(ulid.new())
self.occurred_at = datetime.utcnow().isoformat() + "Z"
self.actor = actor
self.tenant_id = tenant_id
self.trace_id = trace_id
self.schema_version = schema_version
self.data = data
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary for serialization"""
return {
"event_id": self.event_id,
"occurred_at": self.occurred_at,
"actor": self.actor,
"tenant_id": self.tenant_id,
"trace_id": self.trace_id,
"schema_version": self.schema_version,
"data": self.data,
}
def to_json(self) -> str:
"""Convert to JSON string"""
return json.dumps(self.to_dict())
class EventBus(ABC):
"""Abstract event bus interface"""
@abstractmethod
async def publish(self, topic: str, payload: EventPayload) -> bool:
"""Publish event to topic"""
@abstractmethod
async def subscribe(
self, topic: str, handler: Callable[[str, EventPayload], Awaitable[None]]
) -> None:
"""Subscribe to topic with handler"""
@abstractmethod
async def start(self) -> None:
"""Start the event bus"""
@abstractmethod
async def stop(self) -> None:
"""Stop the event bus"""

View File

@@ -0,0 +1,163 @@
"""Example usage of NATS.io event bus with JetStream."""
import asyncio
import logging
from libs.events import EventPayload, NATSEventBus, create_event_bus
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
async def example_handler(topic: str, payload: EventPayload) -> None:
"""Example event handler."""
logger.info(
f"Received event on topic '{topic}': "
f"ID={payload.event_id}, "
f"Actor={payload.actor}, "
f"Data={payload.data}"
)
async def main():
"""Main example function."""
# Method 1: Direct instantiation
nats_bus = NATSEventBus(
servers="nats://localhost:4222", # Can be a list for cluster
stream_name="TAX_AGENT_EVENTS",
consumer_group="tax-agent",
)
# Method 2: Using factory
# nats_bus = create_event_bus(
# "nats",
# servers="nats://localhost:4222",
# stream_name="TAX_AGENT_EVENTS",
# consumer_group="tax-agent",
# )
try:
# Start the event bus
await nats_bus.start()
logger.info("NATS event bus started")
# Subscribe to a topic
await nats_bus.subscribe("user.created", example_handler)
await nats_bus.subscribe("user.updated", example_handler)
logger.info("Subscribed to topics")
# Publish some events
for i in range(5):
payload = EventPayload(
data={"user_id": f"user-{i}", "name": f"User {i}"},
actor="system",
tenant_id="tenant-123",
trace_id=f"trace-{i}",
)
success = await nats_bus.publish("user.created", payload)
if success:
logger.info(f"Published event {i}")
else:
logger.error(f"Failed to publish event {i}")
# Wait a bit for messages to be processed
await asyncio.sleep(2)
# Publish an update event
update_payload = EventPayload(
data={"user_id": "user-1", "name": "Updated User 1", "email": "user1@example.com"},
actor="admin",
tenant_id="tenant-123",
)
await nats_bus.publish("user.updated", update_payload)
logger.info("Published update event")
# Wait for processing
await asyncio.sleep(2)
except Exception as e:
logger.error(f"Error in example: {e}")
finally:
# Stop the event bus
await nats_bus.stop()
logger.info("NATS event bus stopped")
async def cluster_example():
"""Example with NATS cluster configuration."""
# Connect to a NATS cluster
cluster_bus = NATSEventBus(
servers=[
"nats://nats1.example.com:4222",
"nats://nats2.example.com:4222",
"nats://nats3.example.com:4222",
],
stream_name="PRODUCTION_EVENTS",
consumer_group="tax-agent-prod",
)
try:
await cluster_bus.start()
logger.info("Connected to NATS cluster")
# Subscribe to multiple topics
topics = ["document.uploaded", "document.processed", "tax.calculated"]
for topic in topics:
await cluster_bus.subscribe(topic, example_handler)
logger.info(f"Subscribed to {len(topics)} topics")
# Keep running for a while
await asyncio.sleep(10)
finally:
await cluster_bus.stop()
async def error_handling_example():
"""Example showing error handling."""
async def failing_handler(topic: str, payload: EventPayload) -> None:
"""Handler that sometimes fails."""
if payload.data.get("should_fail"):
raise ValueError("Simulated handler failure")
logger.info(f"Successfully processed event {payload.event_id}")
bus = NATSEventBus()
try:
await bus.start()
await bus.subscribe("test.events", failing_handler)
# Publish a good event
good_payload = EventPayload(
data={"message": "This will succeed"},
actor="test",
tenant_id="test-tenant",
)
await bus.publish("test.events", good_payload)
# Publish a bad event
bad_payload = EventPayload(
data={"message": "This will fail", "should_fail": True},
actor="test",
tenant_id="test-tenant",
)
await bus.publish("test.events", bad_payload)
await asyncio.sleep(2)
finally:
await bus.stop()
if __name__ == "__main__":
# Run the basic example
asyncio.run(main())
# Uncomment to run other examples:
# asyncio.run(cluster_example())
# asyncio.run(error_handling_example())

23
libs/events/factory.py Normal file
View File

@@ -0,0 +1,23 @@
"""Factory function for creating event bus instances."""
from typing import Any
from .base import EventBus
from .kafka_bus import KafkaEventBus
from .nats_bus import NATSEventBus
from .sqs_bus import SQSEventBus
def create_event_bus(bus_type: str, **kwargs: Any) -> EventBus:
"""Factory function to create event bus"""
if bus_type.lower() == "kafka":
return KafkaEventBus(kwargs.get("bootstrap_servers", "localhost:9092"))
if bus_type.lower() == "sqs":
return SQSEventBus(kwargs.get("region_name", "us-east-1"))
if bus_type.lower() == "nats":
return NATSEventBus(
servers=kwargs.get("servers", "nats://localhost:4222"),
stream_name=kwargs.get("stream_name", "TAX_AGENT_EVENTS"),
consumer_group=kwargs.get("consumer_group", "tax-agent"),
)
raise ValueError(f"Unsupported event bus type: {bus_type}")

140
libs/events/kafka_bus.py Normal file
View File

@@ -0,0 +1,140 @@
"""Kafka implementation of EventBus."""
import asyncio
import json
from collections.abc import Awaitable, Callable
import structlog
from aiokafka import AIOKafkaConsumer, AIOKafkaProducer # type: ignore
from .base import EventBus, EventPayload
logger = structlog.get_logger()
class KafkaEventBus(EventBus):
"""Kafka implementation of EventBus"""
def __init__(self, bootstrap_servers: str):
self.bootstrap_servers = bootstrap_servers.split(",")
self.producer: AIOKafkaProducer | None = None
self.consumers: dict[str, AIOKafkaConsumer] = {}
self.handlers: dict[
str, list[Callable[[str, EventPayload], Awaitable[None]]]
] = {}
self.running = False
async def start(self) -> None:
"""Start Kafka producer"""
if self.running:
return
self.producer = AIOKafkaProducer(
bootstrap_servers=",".join(self.bootstrap_servers),
value_serializer=lambda v: v.encode("utf-8"),
)
await self.producer.start()
self.running = True
logger.info("Kafka event bus started", bootstrap_servers=self.bootstrap_servers)
async def stop(self) -> None:
"""Stop Kafka producer and consumers"""
if not self.running:
return
if self.producer:
await self.producer.stop()
for consumer in self.consumers.values():
await consumer.stop()
self.running = False
logger.info("Kafka event bus stopped")
async def publish(self, topic: str, payload: EventPayload) -> bool:
"""Publish event to Kafka topic"""
if not self.producer:
raise RuntimeError("Event bus not started")
try:
await self.producer.send_and_wait(topic, payload.to_json())
logger.info(
"Event published",
topic=topic,
event_id=payload.event_id,
actor=payload.actor,
tenant_id=payload.tenant_id,
)
return True
except Exception as e: # pylint: disable=broad-exception-caught
logger.error(
"Failed to publish event",
topic=topic,
event_id=payload.event_id,
error=str(e),
)
return False
async def subscribe(
self, topic: str, handler: Callable[[str, EventPayload], Awaitable[None]]
) -> None:
"""Subscribe to Kafka topic"""
if topic not in self.handlers:
self.handlers[topic] = []
self.handlers[topic].append(handler)
if topic not in self.consumers:
consumer = AIOKafkaConsumer(
topic,
bootstrap_servers=",".join(self.bootstrap_servers),
value_deserializer=lambda m: m.decode("utf-8"),
group_id=f"tax-agent-{topic}",
auto_offset_reset="latest",
)
self.consumers[topic] = consumer
await consumer.start()
# Start consumer task
asyncio.create_task(self._consume_messages(topic, consumer))
logger.info("Subscribed to topic", topic=topic)
async def _consume_messages(self, topic: str, consumer: AIOKafkaConsumer) -> None:
"""Consume messages from Kafka topic"""
try:
async for message in consumer:
try:
if message.value is not None:
payload_dict = json.loads(message.value)
else:
continue
payload = EventPayload(
data=payload_dict["data"],
actor=payload_dict["actor"],
tenant_id=payload_dict["tenant_id"],
trace_id=payload_dict.get("trace_id"),
schema_version=payload_dict.get("schema_version", "1.0"),
)
payload.event_id = payload_dict["event_id"]
payload.occurred_at = payload_dict["occurred_at"]
# Call all handlers for this topic
for handler in self.handlers.get(topic, []):
try:
await handler(topic, payload)
except Exception as e: # pylint: disable=broad-exception-caught
logger.error(
"Handler failed",
topic=topic,
event_id=payload.event_id,
handler=handler.__name__,
error=str(e),
)
except json.JSONDecodeError as e:
logger.error("Failed to decode message", topic=topic, error=str(e))
except Exception as e: # pylint: disable=broad-exception-caught
logger.error("Failed to process message", topic=topic, error=str(e))
except Exception as e: # pylint: disable=broad-exception-caught
logger.error("Consumer error", topic=topic, error=str(e))

64
libs/events/memory_bus.py Normal file
View File

@@ -0,0 +1,64 @@
"""In-memory event bus for local development and testing."""
import asyncio
import logging
from collections import defaultdict
from collections.abc import Awaitable, Callable
from .base import EventBus, EventPayload
logger = logging.getLogger(__name__)
class MemoryEventBus(EventBus):
"""In-memory event bus implementation for local development"""
def __init__(self) -> None:
self.handlers: dict[
str, list[Callable[[str, EventPayload], Awaitable[None]]]
] = defaultdict(list)
self.running = False
async def publish(self, topic: str, payload: EventPayload) -> bool:
"""Publish event to topic"""
try:
if not self.running:
logger.warning(
"Event bus not running, skipping publish to topic: %s", topic
)
return False
handlers = self.handlers.get(topic, [])
if not handlers:
logger.debug("No handlers for topic: %s", topic)
return True
# Execute all handlers concurrently
tasks = [handler(topic, payload) for handler in handlers]
await asyncio.gather(*tasks, return_exceptions=True)
logger.debug(
"Published event to topic %s with %d handlers", topic, len(handlers)
)
return True
except Exception as e:
logger.error("Failed to publish event to topic %s: %s", topic, e)
return False
async def subscribe(
self, topic: str, handler: Callable[[str, EventPayload], Awaitable[None]]
) -> None:
"""Subscribe to topic with handler"""
self.handlers[topic].append(handler)
logger.debug("Subscribed handler to topic: %s", topic)
async def start(self) -> None:
"""Start the event bus"""
self.running = True
logger.info("Memory event bus started")
async def stop(self) -> None:
"""Stop the event bus"""
self.running = False
self.handlers.clear()
logger.info("Memory event bus stopped")

269
libs/events/nats_bus.py Normal file
View File

@@ -0,0 +1,269 @@
"""NATS.io with JetStream implementation of EventBus."""
import asyncio
import json
from collections.abc import Awaitable, Callable
from typing import Any
import nats # type: ignore
import structlog
from nats.aio.client import Client as NATS # type: ignore
from nats.js import JetStreamContext # type: ignore
from .base import EventBus, EventPayload
logger = structlog.get_logger()
class NATSEventBus(EventBus): # pylint: disable=too-many-instance-attributes
"""NATS.io with JetStream implementation of EventBus"""
def __init__(
self,
servers: str | list[str] = "nats://localhost:4222",
stream_name: str = "TAX_AGENT_EVENTS",
consumer_group: str = "tax-agent",
):
if isinstance(servers, str):
self.servers = [servers]
else:
self.servers = servers
self.stream_name = stream_name
self.consumer_group = consumer_group
self.nc: NATS | None = None
self.js: JetStreamContext | None = None
self.handlers: dict[
str, list[Callable[[str, EventPayload], Awaitable[None]]]
] = {}
self.subscriptions: dict[str, Any] = {}
self.running = False
self.consumer_tasks: list[asyncio.Task[None]] = []
async def start(self) -> None:
"""Start NATS connection and JetStream context"""
if self.running:
return
try:
# Connect to NATS
self.nc = await nats.connect(servers=self.servers)
# Get JetStream context
self.js = self.nc.jetstream()
# Ensure stream exists
await self._ensure_stream_exists()
self.running = True
logger.info(
"NATS event bus started",
servers=self.servers,
stream=self.stream_name,
)
except Exception as e:
logger.error("Failed to start NATS event bus", error=str(e))
raise
async def stop(self) -> None:
"""Stop NATS connection and consumers"""
if not self.running:
return
# Cancel consumer tasks
for task in self.consumer_tasks:
task.cancel()
if self.consumer_tasks:
await asyncio.gather(*self.consumer_tasks, return_exceptions=True)
# Unsubscribe from all subscriptions
for subscription in self.subscriptions.values():
try:
await subscription.unsubscribe()
except Exception as e: # pylint: disable=broad-exception-caught
logger.warning("Error unsubscribing", error=str(e))
# Close NATS connection
if self.nc:
await self.nc.close()
self.running = False
logger.info("NATS event bus stopped")
async def publish(self, topic: str, payload: EventPayload) -> bool:
"""Publish event to NATS JetStream"""
if not self.js:
raise RuntimeError("Event bus not started")
try:
# Create subject name from topic
subject = f"{self.stream_name}.{topic}"
# Publish message with headers
headers = {
"event_id": payload.event_id,
"tenant_id": payload.tenant_id,
"actor": payload.actor,
"trace_id": payload.trace_id or "",
"schema_version": payload.schema_version,
}
ack = await self.js.publish(
subject=subject,
payload=payload.to_json().encode(),
headers=headers,
)
logger.info(
"Event published",
topic=topic,
subject=subject,
event_id=payload.event_id,
stream_seq=ack.seq,
)
return True
except Exception as e: # pylint: disable=broad-exception-caught
logger.error(
"Failed to publish event",
topic=topic,
event_id=payload.event_id,
error=str(e),
)
return False
async def subscribe(
self, topic: str, handler: Callable[[str, EventPayload], Awaitable[None]]
) -> None:
"""Subscribe to NATS JetStream topic"""
if not self.js:
raise RuntimeError("Event bus not started")
if topic not in self.handlers:
self.handlers[topic] = []
self.handlers[topic].append(handler)
if topic not in self.subscriptions:
try:
# Create subject pattern for topic
subject = f"{self.stream_name}.{topic}"
# Create durable consumer
consumer_name = f"{self.consumer_group}-{topic}"
# Subscribe with pull-based consumer
subscription = await self.js.pull_subscribe(
subject=subject,
durable=consumer_name,
config=nats.js.api.ConsumerConfig(
durable_name=consumer_name,
ack_policy=nats.js.api.AckPolicy.EXPLICIT,
deliver_policy=nats.js.api.DeliverPolicy.NEW,
max_deliver=3,
ack_wait=30, # 30 seconds
),
)
self.subscriptions[topic] = subscription
# Start consumer task
task = asyncio.create_task(self._consume_messages(topic, subscription))
self.consumer_tasks.append(task)
logger.info(
"Subscribed to topic",
topic=topic,
subject=subject,
consumer=consumer_name,
)
except Exception as e:
logger.error("Failed to subscribe to topic", topic=topic, error=str(e))
raise
async def _ensure_stream_exists(self) -> None:
"""Ensure JetStream stream exists"""
if not self.js:
return
try:
# Try to get stream info
await self.js.stream_info(self.stream_name)
logger.debug("Stream already exists", stream=self.stream_name)
except nats.js.errors.NotFoundError:
# Stream doesn't exist, create it
try:
await self.js.add_stream(
name=self.stream_name,
subjects=[f"{self.stream_name}.*"],
retention=nats.js.api.RetentionPolicy.WORK_QUEUE,
max_age=7 * 24 * 60 * 60, # 7 days in seconds
storage=nats.js.api.StorageType.FILE,
)
logger.info("Created JetStream stream", stream=self.stream_name)
except Exception as e:
logger.error(
"Failed to create stream", stream=self.stream_name, error=str(e)
)
raise
async def _consume_messages(self, topic: str, subscription: Any) -> None:
"""Consume messages from NATS JetStream subscription"""
while self.running:
try:
# Fetch messages in batches
messages = await subscription.fetch(batch=10, timeout=20)
for message in messages:
try:
# Parse message payload
payload_dict = json.loads(message.data.decode())
payload = EventPayload(
data=payload_dict["data"],
actor=payload_dict["actor"],
tenant_id=payload_dict["tenant_id"],
trace_id=payload_dict.get("trace_id"),
schema_version=payload_dict.get("schema_version", "1.0"),
)
payload.event_id = payload_dict["event_id"]
payload.occurred_at = payload_dict["occurred_at"]
# Call all handlers for this topic
for handler in self.handlers.get(topic, []):
try:
await handler(topic, payload)
except (
Exception
) as e: # pylint: disable=broad-exception-caught
logger.error(
"Handler failed",
topic=topic,
event_id=payload.event_id,
error=str(e),
)
# Acknowledge message
await message.ack()
except json.JSONDecodeError as e:
logger.error(
"Failed to decode message", topic=topic, error=str(e)
)
await message.nak()
except Exception as e: # pylint: disable=broad-exception-caught
logger.error(
"Failed to process message", topic=topic, error=str(e)
)
await message.nak()
except asyncio.TimeoutError:
# No messages available, continue polling
continue
except Exception as e: # pylint: disable=broad-exception-caught
logger.error("Consumer error", topic=topic, error=str(e))
await asyncio.sleep(5) # Wait before retrying

212
libs/events/sqs_bus.py Normal file
View File

@@ -0,0 +1,212 @@
"""AWS SQS/SNS implementation of EventBus."""
import asyncio
import json
from collections.abc import Awaitable, Callable
from typing import Any
import boto3 # type: ignore
import structlog
from botocore.exceptions import ClientError # type: ignore
from .base import EventBus, EventPayload
logger = structlog.get_logger()
class SQSEventBus(EventBus): # pylint: disable=too-many-instance-attributes
"""AWS SQS/SNS implementation of EventBus"""
def __init__(self, region_name: str = "us-east-1"):
self.region_name = region_name
self.sns_client: Any = None
self.sqs_client: Any = None
self.topic_arns: dict[str, str] = {}
self.queue_urls: dict[str, str] = {}
self.handlers: dict[
str, list[Callable[[str, EventPayload], Awaitable[None]]]
] = {}
self.running = False
self.consumer_tasks: list[asyncio.Task[None]] = []
async def start(self) -> None:
"""Start SQS/SNS clients"""
if self.running:
return
self.sns_client = boto3.client("sns", region_name=self.region_name)
self.sqs_client = boto3.client("sqs", region_name=self.region_name)
self.running = True
logger.info("SQS event bus started", region=self.region_name)
async def stop(self) -> None:
"""Stop SQS/SNS clients and consumers"""
if not self.running:
return
# Cancel consumer tasks
for task in self.consumer_tasks:
task.cancel()
if self.consumer_tasks:
await asyncio.gather(*self.consumer_tasks, return_exceptions=True)
self.running = False
logger.info("SQS event bus stopped")
async def publish(self, topic: str, payload: EventPayload) -> bool:
"""Publish event to SNS topic"""
if not self.sns_client:
raise RuntimeError("Event bus not started")
try:
# Ensure topic exists
topic_arn = await self._ensure_topic_exists(topic)
# Publish message
response = self.sns_client.publish(
TopicArn=topic_arn,
Message=payload.to_json(),
MessageAttributes={
"event_id": {"DataType": "String", "StringValue": payload.event_id},
"tenant_id": {
"DataType": "String",
"StringValue": payload.tenant_id,
},
"actor": {"DataType": "String", "StringValue": payload.actor},
},
)
logger.info(
"Event published",
topic=topic,
event_id=payload.event_id,
message_id=response["MessageId"],
)
return True
except ClientError as e:
logger.error(
"Failed to publish event",
topic=topic,
event_id=payload.event_id,
error=str(e),
)
return False
async def subscribe(
self, topic: str, handler: Callable[[str, EventPayload], Awaitable[None]]
) -> None:
"""Subscribe to SNS topic via SQS queue"""
if topic not in self.handlers:
self.handlers[topic] = []
self.handlers[topic].append(handler)
if topic not in self.queue_urls:
# Create SQS queue for this topic
queue_name = f"tax-agent-{topic}"
queue_url = await self._ensure_queue_exists(queue_name)
self.queue_urls[topic] = queue_url
# Subscribe queue to SNS topic
topic_arn = await self._ensure_topic_exists(topic)
await self._subscribe_queue_to_topic(queue_url, topic_arn)
# Start consumer task
task = asyncio.create_task(self._consume_messages(topic, queue_url))
self.consumer_tasks.append(task)
logger.info("Subscribed to topic", topic=topic, queue_name=queue_name)
async def _ensure_topic_exists(self, topic: str) -> str:
"""Ensure SNS topic exists and return ARN"""
if topic in self.topic_arns:
return self.topic_arns[topic]
try:
response = self.sns_client.create_topic(Name=topic)
topic_arn = response["TopicArn"]
self.topic_arns[topic] = topic_arn
return str(topic_arn)
except ClientError as e:
logger.error("Failed to create topic", topic=topic, error=str(e))
raise
async def _ensure_queue_exists(self, queue_name: str) -> str:
"""Ensure SQS queue exists and return URL"""
try:
response = self.sqs_client.create_queue(QueueName=queue_name)
return str(response["QueueUrl"])
except ClientError as e:
logger.error("Failed to create queue", queue_name=queue_name, error=str(e))
raise
async def _subscribe_queue_to_topic(self, queue_url: str, topic_arn: str) -> None:
"""Subscribe SQS queue to SNS topic"""
try:
# Get queue attributes
queue_attrs = self.sqs_client.get_queue_attributes(
QueueUrl=queue_url, AttributeNames=["QueueArn"]
)
queue_arn = queue_attrs["Attributes"]["QueueArn"]
# Subscribe queue to topic
self.sns_client.subscribe(
TopicArn=topic_arn, Protocol="sqs", Endpoint=queue_arn
)
except ClientError as e:
logger.error("Failed to subscribe queue to topic", error=str(e))
raise
async def _consume_messages(self, topic: str, queue_url: str) -> None:
"""Consume messages from SQS queue"""
# pylint: disable=too-many-nested-blocks
while self.running:
try:
response = self.sqs_client.receive_message(
QueueUrl=queue_url, MaxNumberOfMessages=10, WaitTimeSeconds=20
)
messages = response.get("Messages", [])
for message in messages:
try:
# Parse SNS message
sns_message = json.loads(message["Body"])
payload_dict = json.loads(sns_message["Message"])
payload = EventPayload(
data=payload_dict["data"],
actor=payload_dict["actor"],
tenant_id=payload_dict["tenant_id"],
trace_id=payload_dict.get("trace_id"),
schema_version=payload_dict.get("schema_version", "1.0"),
)
payload.event_id = payload_dict["event_id"]
payload.occurred_at = payload_dict["occurred_at"]
# Call all handlers for this topic
for handler in self.handlers.get(topic, []):
try:
await handler(topic, payload)
# pylint: disable=broad-exception-caught
except Exception as e:
logger.error(
"Handler failed",
topic=topic,
event_id=payload.event_id,
error=str(e),
)
# Delete message from queue
self.sqs_client.delete_message(
QueueUrl=queue_url, ReceiptHandle=message["ReceiptHandle"]
)
except Exception as e: # pylint: disable=broad-exception-caught
logger.error(
"Failed to process message", topic=topic, error=str(e)
)
except Exception as e: # pylint: disable=broad-exception-caught
logger.error("Consumer error", topic=topic, error=str(e))
await asyncio.sleep(5) # Wait before retrying

17
libs/events/topics.py Normal file
View File

@@ -0,0 +1,17 @@
"""Standard event topic names."""
class EventTopics: # pylint: disable=too-few-public-methods
"""Standard event topic names"""
DOC_INGESTED = "doc.ingested"
DOC_OCR_READY = "doc.ocr_ready"
DOC_EXTRACTED = "doc.extracted"
KG_UPSERTED = "kg.upserted"
RAG_INDEXED = "rag.indexed"
CALC_SCHEDULE_READY = "calc.schedule_ready"
FORM_FILLED = "form.filled"
HMRC_SUBMITTED = "hmrc.submitted"
REVIEW_REQUESTED = "review.requested"
REVIEW_COMPLETED = "review.completed"
FIRM_SYNC_COMPLETED = "firm.sync.completed"