Initial commit
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
This commit is contained in:
315
docs/INFRASTRUCTURE_STATUS.md
Normal file
315
docs/INFRASTRUCTURE_STATUS.md
Normal file
@@ -0,0 +1,315 @@
|
||||
# Infrastructure Status Report
|
||||
|
||||
**Date**: 2025-09-29
|
||||
**Status**: ✅ **ALL SYSTEMS OPERATIONAL**
|
||||
**Last Updated**: 2025-09-29 20:15 UTC
|
||||
|
||||
## Executive Summary
|
||||
|
||||
All Docker Compose services are running and healthy. All health check issues have been resolved. The infrastructure is fully operational for both:
|
||||
|
||||
- **Production-like deployment** (Docker Compose with authentication)
|
||||
- **Local development** (Standalone services with `DISABLE_AUTH=true`)
|
||||
|
||||
### Recent Fixes Applied
|
||||
|
||||
✅ **Traefik Health Checks**: Fixed health check endpoint from `/health` to `/healthz` - no more 500 errors
|
||||
✅ **Development Mode**: Fixed environment variable parsing for `DISABLE_AUTH`
|
||||
✅ **Documentation**: Created comprehensive guides for development and deployment
|
||||
|
||||
See [FIXES_APPLIED.md](FIXES_APPLIED.md) for detailed information.
|
||||
|
||||
## Service Health Status
|
||||
|
||||
### Infrastructure Services (All Healthy ✅)
|
||||
|
||||
| Service | Status | Health | Ports | Purpose |
|
||||
| ------------ | ------- | ---------- | ---------------- | ------------------------------ |
|
||||
| **postgres** | Running | ✅ Healthy | 5432 | Primary database |
|
||||
| **redis** | Running | ✅ Healthy | 6379 | Cache & session store |
|
||||
| **minio** | Running | ✅ Healthy | 9092-9093 | Object storage (S3-compatible) |
|
||||
| **neo4j** | Running | ✅ Healthy | 7474, 7687 | Knowledge graph database |
|
||||
| **qdrant** | Running | ✅ Healthy | 6333-6334 | Vector database |
|
||||
| **nats** | Running | ✅ Healthy | 4222, 6222, 8222 | Message broker |
|
||||
| **vault** | Running | ✅ Healthy | 8200 | Secrets management |
|
||||
|
||||
### Authentication & Security (All Healthy ✅)
|
||||
|
||||
| Service | Status | Health | Purpose |
|
||||
| --------------------- | ------- | ---------- | ------------------------- |
|
||||
| **authentik-server** | Running | ✅ Healthy | SSO authentication server |
|
||||
| **authentik-worker** | Running | ✅ Healthy | Background task processor |
|
||||
| **authentik-outpost** | Running | ✅ Healthy | Forward auth proxy |
|
||||
| **authentik-db** | Running | ✅ Healthy | Authentik database |
|
||||
| **authentik-redis** | Running | ✅ Healthy | Authentik cache |
|
||||
|
||||
### Observability (All Running ✅)
|
||||
|
||||
| Service | Status | Ports | Purpose |
|
||||
| -------------- | ------- | ----- | --------------------- |
|
||||
| **prometheus** | Running | 9090 | Metrics collection |
|
||||
| **grafana** | Running | 3000 | Metrics visualization |
|
||||
| **loki** | Running | 3100 | Log aggregation |
|
||||
|
||||
### Networking & Routing (Running ✅)
|
||||
|
||||
| Service | Status | Ports | Purpose |
|
||||
| ----------- | ------- | ------------- | ----------------------------- |
|
||||
| **traefik** | Running | 80, 443, 8080 | Reverse proxy & load balancer |
|
||||
|
||||
### Feature Management (Running ✅)
|
||||
|
||||
| Service | Status | Ports | Purpose |
|
||||
| ----------- | ------- | ----- | ------------- |
|
||||
| **unleash** | Running | 4242 | Feature flags |
|
||||
|
||||
### Application Services (All Healthy ✅)
|
||||
|
||||
All 13 application services are running and healthy:
|
||||
|
||||
| Service | Status | Health | Purpose |
|
||||
| ----------------------- | ------- | ---------- | ----------------------------- |
|
||||
| **svc-ingestion** | Running | ✅ Healthy | Document upload & storage |
|
||||
| **svc-extract** | Running | ✅ Healthy | Data extraction |
|
||||
| **svc-ocr** | Running | ✅ Healthy | Optical character recognition |
|
||||
| **svc-normalize-map** | Running | ✅ Healthy | Data normalization |
|
||||
| **svc-kg** | Running | ✅ Healthy | Knowledge graph management |
|
||||
| **svc-rag-indexer** | Running | ✅ Healthy | RAG indexing |
|
||||
| **svc-rag-retriever** | Running | ✅ Healthy | RAG retrieval |
|
||||
| **svc-reason** | Running | ✅ Healthy | Reasoning engine |
|
||||
| **svc-coverage** | Running | ✅ Healthy | Coverage analysis |
|
||||
| **svc-forms** | Running | ✅ Healthy | Form generation |
|
||||
| **svc-hmrc** | Running | ✅ Healthy | HMRC integration |
|
||||
| **svc-rpa** | Running | ✅ Healthy | Robotic process automation |
|
||||
| **svc-firm-connectors** | Running | ✅ Healthy | Firm integrations |
|
||||
|
||||
### UI Services (All Healthy ✅)
|
||||
|
||||
| Service | Status | Health | Purpose |
|
||||
| ------------- | ------- | ---------- | ---------------- |
|
||||
| **ui-review** | Running | ✅ Healthy | Review interface |
|
||||
|
||||
## Health Check Configuration
|
||||
|
||||
### Infrastructure Services
|
||||
|
||||
All infrastructure services have health checks configured:
|
||||
|
||||
```yaml
|
||||
# PostgreSQL
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
|
||||
# Redis
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "redis-cli ping | grep PONG"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
|
||||
# MinIO
|
||||
healthcheck:
|
||||
test: ["CMD", "mc", "--version"]
|
||||
interval: 30s
|
||||
timeout: 20s
|
||||
retries: 3
|
||||
|
||||
# NATS
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8222/healthz"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
```
|
||||
|
||||
### Application Services
|
||||
|
||||
All application services have health checks in their Dockerfiles:
|
||||
|
||||
```dockerfile
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
||||
CMD curl -f http://localhost:8000/healthz || exit 1
|
||||
```
|
||||
|
||||
The `/healthz` endpoint is a public endpoint that doesn't require authentication.
|
||||
|
||||
## Configuration Fixes Applied
|
||||
|
||||
### 1. Authentication Middleware Enhancement
|
||||
|
||||
**File**: `libs/config/settings.py`
|
||||
|
||||
Added proper environment variable aliases for development mode:
|
||||
|
||||
```python
|
||||
# Development settings
|
||||
dev_mode: bool = Field(
|
||||
default=False,
|
||||
description="Enable development mode (disables auth)",
|
||||
validation_alias="DEV_MODE"
|
||||
)
|
||||
disable_auth: bool = Field(
|
||||
default=False,
|
||||
description="Disable authentication middleware",
|
||||
validation_alias="DISABLE_AUTH"
|
||||
)
|
||||
```
|
||||
|
||||
### 2. Middleware Configuration
|
||||
|
||||
**File**: `libs/security/middleware.py`
|
||||
|
||||
The middleware correctly handles development mode:
|
||||
|
||||
```python
|
||||
async def dispatch(self, request: Request, call_next: Callable[..., Any]) -> Any:
|
||||
# Check if authentication is disabled (development mode)
|
||||
if self.disable_auth:
|
||||
# Set development state
|
||||
request.state.user = "dev-user"
|
||||
request.state.email = "dev@example.com"
|
||||
request.state.roles = ["developers"]
|
||||
request.state.auth_token = "dev-token"
|
||||
logger.info("Development mode: authentication disabled", path=request.url.path)
|
||||
return await call_next(request)
|
||||
# ... rest of authentication logic
|
||||
```
|
||||
|
||||
### 3. App Factory Integration
|
||||
|
||||
**File**: `libs/app_factory.py`
|
||||
|
||||
The app factory correctly passes the `disable_auth` setting to middleware:
|
||||
|
||||
```python
|
||||
# Add middleware
|
||||
app.add_middleware(
|
||||
TrustedProxyMiddleware,
|
||||
internal_cidrs=settings.internal_cidrs,
|
||||
disable_auth=getattr(settings, "disable_auth", False),
|
||||
)
|
||||
```
|
||||
|
||||
## Running Services
|
||||
|
||||
### Docker Compose (Production-like)
|
||||
|
||||
All services run with full authentication:
|
||||
|
||||
```bash
|
||||
# Start all services
|
||||
cd infra/compose
|
||||
docker-compose -f docker-compose.local.yml up -d
|
||||
|
||||
# Check status
|
||||
docker-compose -f docker-compose.local.yml ps
|
||||
|
||||
# View logs
|
||||
docker-compose -f docker-compose.local.yml logs -f SERVICE_NAME
|
||||
```
|
||||
|
||||
### Local Development (Standalone)
|
||||
|
||||
Services can run locally with authentication disabled:
|
||||
|
||||
```bash
|
||||
# Run with authentication disabled
|
||||
DISABLE_AUTH=true make dev-service SERVICE=svc_ingestion
|
||||
|
||||
# Or directly with uvicorn
|
||||
DISABLE_AUTH=true cd apps/svc_ingestion && uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
### Health Check Verification
|
||||
|
||||
```bash
|
||||
# Test public health endpoint
|
||||
curl http://localhost:8000/healthz
|
||||
|
||||
# Expected response:
|
||||
# {"status":"healthy","service":"svc-ingestion","version":"1.0.0"}
|
||||
```
|
||||
|
||||
### Development Mode Verification
|
||||
|
||||
When running with `DISABLE_AUTH=true`, logs show:
|
||||
|
||||
```json
|
||||
{
|
||||
"path": "/healthz",
|
||||
"event": "Development mode: authentication disabled",
|
||||
"logger": "libs.security.middleware",
|
||||
"level": "info",
|
||||
"service": "svc-ingestion",
|
||||
"timestamp": 1759175839.638357
|
||||
}
|
||||
```
|
||||
|
||||
### Production Mode Testing
|
||||
|
||||
Without `DISABLE_AUTH`, requests require authentication headers:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/upload \
|
||||
-H "X-Authenticated-User: dev-user" \
|
||||
-H "X-Authenticated-Email: dev@example.com" \
|
||||
-H "Authorization: Bearer dev-token-12345" \
|
||||
-F "file=@document.pdf"
|
||||
```
|
||||
|
||||
## Network Configuration
|
||||
|
||||
### Docker Networks
|
||||
|
||||
- **ai-tax-agent-frontend**: Public-facing services (Traefik, UI)
|
||||
- **ai-tax-agent-backend**: Internal services (databases, message brokers, application services)
|
||||
|
||||
### Port Mappings
|
||||
|
||||
| Service | Internal Port | External Port | Access |
|
||||
| ---------- | ---------------- | ---------------- | -------- |
|
||||
| Traefik | 80, 443, 8080 | 80, 443, 8080 | Public |
|
||||
| PostgreSQL | 5432 | 5432 | Internal |
|
||||
| Redis | 6379 | 6379 | Internal |
|
||||
| MinIO | 9092-9093 | 9092-9093 | Internal |
|
||||
| Neo4j | 7474, 7687 | 7474, 7687 | Internal |
|
||||
| NATS | 4222, 6222, 8222 | 4222, 6222, 8222 | Internal |
|
||||
| Grafana | 3000 | 3000 | Public |
|
||||
| Prometheus | 9090 | 9090 | Internal |
|
||||
| Unleash | 4242 | 4242 | Internal |
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. ✅ **Infrastructure**: All services operational
|
||||
2. ✅ **Health Checks**: All passing
|
||||
3. ✅ **Development Mode**: Working correctly
|
||||
4. ✅ **Authentication**: Properly configured for both modes
|
||||
5. 📝 **Documentation**: Created comprehensive guides
|
||||
|
||||
### For Developers
|
||||
|
||||
- See [DEVELOPMENT.md](DEVELOPMENT.md) for local development setup
|
||||
- Use `DISABLE_AUTH=true` for local testing with Postman
|
||||
- All services support hot reload with `--reload` flag
|
||||
|
||||
### For Operations
|
||||
|
||||
- Monitor service health: `docker-compose ps`
|
||||
- View logs: `docker-compose logs -f SERVICE_NAME`
|
||||
- Restart services: `docker-compose restart SERVICE_NAME`
|
||||
- Check metrics: http://localhost:9090 (Prometheus)
|
||||
- View dashboards: http://localhost:3000 (Grafana)
|
||||
|
||||
## Conclusion
|
||||
|
||||
✅ **All systems are operational and healthy**
|
||||
✅ **Development mode working correctly**
|
||||
✅ **Production mode working correctly**
|
||||
✅ **Documentation complete**
|
||||
|
||||
The infrastructure is ready for both development and production-like testing.
|
||||
Reference in New Issue
Block a user