harkon/ai-tax-agent

Fork 0

Files

harkon b324ff09ef

CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled

Details

CI/CD Pipeline / Policy Validation (push) Has been cancelled

Details

CI/CD Pipeline / Test Suite (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled

Details

CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled

Details

CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled

Details

CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled

Details

CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled

Details

CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled

Details

CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled

Details

CI/CD Pipeline / Generate SBOM (push) Has been cancelled

Details

CI/CD Pipeline / Deploy to Staging (push) Has been cancelled

Details

CI/CD Pipeline / Deploy to Production (push) Has been cancelled

Details

CI/CD Pipeline / Notifications (push) Has been cancelled

Details

Initial commit

2025-10-11 08:41:36 +01:00

9.4 KiB

Raw Blame History

Base Image Architecture

Overview

To optimize Docker image sizes and build times, we use a layered base image architecture:

python:3.12-slim (150MB)
    ├─> base-runtime (300MB) - Core deps for ALL services
    └─> base-ml (1.2GB) - ML deps (sentence-transformers, PyTorch, etc.)
            ├─> svc-ocr (1.25GB = base-ml + 50MB app)
            ├─> svc-rag-indexer (1.25GB = base-ml + 50MB app)
            └─> svc-rag-retriever (1.25GB = base-ml + 50MB app)

Benefits

1. Build ML Dependencies Once

Heavy ML libraries (PyTorch, transformers, sentence-transformers) are built once in base-ml
All ML services reuse the same base image
No need to rebuild 1GB+ of dependencies for each service

2. Faster Builds

Before: Each ML service took 10-15 minutes to build
After: ML services build in 1-2 minutes (only app code + small deps)

3. Faster Pushes

Before: Pushing 1.3GB per service = 3.9GB total for 3 ML services
After: Push base-ml once (1.2GB) + 3 small app layers (50MB each) = 1.35GB total
Savings: 65% reduction in push time

4. Layer Caching

Docker reuses base-ml layers across all ML services
Only the small application layer (~50MB) needs to be pushed/pulled
Faster deployments and rollbacks

5. Easy Updates

Update ML library versions in one place (base-ml)
Rebuild base-ml once, then rebuild all ML services quickly
Consistent ML library versions across all services

Image Sizes

Image Type	Size	Contents
base-runtime	~300MB	FastAPI, uvicorn, database drivers, Redis, NATS, MinIO, Qdrant, etc.
base-ml	~1.2GB	base-runtime + sentence-transformers, PyTorch, transformers, numpy, scikit-learn, spacy, nltk
ML Service	~1.25GB	base-ml + service-specific deps (faiss, tiktoken, etc.) + app code (~50MB)
Non-ML Service	~350MB	python:3.12-slim + base deps + service deps + app code

Architecture

Base Images

1. base-runtime

Location: infra/docker/base-runtime.Dockerfile
Registry: gitea.harkon.co.uk/harkon/base-runtime:v1.0.1
Contents: Core dependencies for ALL services
- FastAPI, uvicorn, pydantic
- Database drivers (asyncpg, psycopg2, neo4j, redis)
- Object storage (minio)
- Vector DB (qdrant-client)
- Event bus (nats-py)
- Secrets (hvac)
- Monitoring (prometheus-client)
- HTTP client (httpx)
- Utilities (ulid-py, python-dateutil, orjson)

2. base-ml

Location: infra/docker/base-ml.Dockerfile
Registry: gitea.harkon.co.uk/harkon/base-ml:v1.0.1
Contents: base-runtime + ML dependencies
- sentence-transformers (includes PyTorch)
- transformers
- scikit-learn
- numpy
- spacy
- nltk
- fuzzywuzzy
- python-Levenshtein

Service Images

ML Services (use base-ml)

svc-ocr - OCR and document AI
- Additional deps: pytesseract, PyMuPDF, pdf2image, Pillow, opencv-python-headless, torchvision
- System deps: tesseract-ocr, poppler-utils
svc-rag-indexer - Document indexing and embedding
- Additional deps: tiktoken, beautifulsoup4, faiss-cpu, python-docx, python-pptx, openpyxl, sparse-dot-topn
svc-rag-retriever - Semantic search and retrieval
- Additional deps: rank-bm25, faiss-cpu, sparse-dot-topn

Non-ML Services (use python:3.12-slim directly)

All other services (svc-ingestion, svc-extract, svc-kg, svc-forms, etc.)
Build from scratch with base requirements + service-specific deps

Build Process

Step 1: Build Base Images (One Time)

IMPORTANT: Build base-ml on the remote server to avoid pushing 1.2GB+ over the network!

Option A: Build base-ml on Remote Server (Recommended)

# Build base-ml on remote server (fast push to Gitea on same network)
./scripts/remote-build-base-ml.sh deploy@141.136.35.199 /home/deploy/ai-tax-agent gitea.harkon.co.uk v1.0.1 harkon

# Or use defaults (deploy user, /home/deploy/ai-tax-agent)
./scripts/remote-build-base-ml.sh

This will:

Sync code to remote server
Build base-ml on remote (~1.2GB, 10-15 min)
Push to Gitea from remote (fast, same network)

Why build base-ml remotely?

✅ Faster push to Gitea (same datacenter/network)
✅ Saves local network bandwidth
✅ Image is cached on remote server for faster service builds
✅ Only need to do this once

Time: 10-15 minutes (one time only)

Option B: Build Locally (Not Recommended for base-ml)

# Build both base images locally
./scripts/build-base-images.sh gitea.harkon.co.uk v1.0.1 harkon

This builds:

gitea.harkon.co.uk/harkon/base-runtime:v1.0.1 (~300MB)
gitea.harkon.co.uk/harkon/base-ml:v1.0.1 (~1.2GB)

Note: Pushing 1.2GB base-ml from local machine is slow and may fail due to network issues.

Step 2: Build Service Images

# Build and push all services
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.1 harkon

ML services will:

Pull base-ml:v1.0.1 from registry (if not cached)
Install service-specific deps (~10-20 packages)
Copy application code
Build final image (~1.25GB)

Time per ML service: 1-2 minutes (vs 10-15 minutes before)

Step 3: Update Base Images (When Needed)

When you need to update ML library versions:

# 1. Update libs/requirements-ml.txt
vim libs/requirements-ml.txt

# 2. Rebuild base-ml with new version
./scripts/build-base-images.sh gitea.harkon.co.uk v1.0.2 harkon

# 3. Update service Dockerfiles to use new base version
# Change: ARG BASE_VERSION=v1.0.2

# 4. Rebuild ML services
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.2 harkon

Requirements Files

libs/requirements-base.txt

Core dependencies for ALL services (included in base-runtime and base-ml)

libs/requirements-ml.txt

ML dependencies (included in base-ml only)

apps/svc_*/requirements.txt

Service-specific dependencies:

ML services: Only additional deps NOT in base-ml (e.g., faiss-cpu, tiktoken)
Non-ML services: Service-specific deps (e.g., aiofiles, openai, anthropic)

Dockerfile Templates

ML Service Dockerfile Pattern

# Use pre-built ML base image
ARG REGISTRY=gitea.harkon.co.uk
ARG OWNER=harkon
ARG BASE_VERSION=v1.0.1
FROM ${REGISTRY}/${OWNER}/base-ml:${BASE_VERSION}

USER root
WORKDIR /app

# Install service-specific deps (minimal)
COPY apps/SERVICE_NAME/requirements.txt /tmp/service-requirements.txt
RUN pip install --no-cache-dir -r /tmp/service-requirements.txt

# Copy app code
COPY libs/ ./libs/
COPY apps/SERVICE_NAME/ ./apps/SERVICE_NAME/

RUN chown -R appuser:appuser /app
USER appuser

# Health check, expose, CMD...

Non-ML Service Dockerfile Pattern

# Multi-stage build from scratch
FROM python:3.12-slim AS builder

# Install build deps
RUN apt-get update && apt-get install -y build-essential curl && rm -rf /var/lib/apt/lists/*

# Create venv and install deps
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY libs/requirements-base.txt /tmp/libs-requirements.txt
COPY apps/SERVICE_NAME/requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/libs-requirements.txt -r /tmp/requirements.txt

# Production stage
FROM python:3.12-slim
# ... copy venv, app code, etc.

Comparison: Before vs After

Before (Monolithic Approach)

Each ML service:
- Build time: 10-15 minutes
- Image size: 1.6GB
- Push time: 5-10 minutes
- Total for 3 services: 30-45 min build + 15-30 min push = 45-75 minutes

After (Base Image Approach)

Base-ml (one time):
- Build time: 10-15 minutes
- Image size: 1.2GB
- Push time: 5-10 minutes

Each ML service:
- Build time: 1-2 minutes
- Image size: 1.25GB (but only 50MB new layers)
- Push time: 30-60 seconds (only new layers)
- Total for 3 services: 3-6 min build + 2-3 min push = 5-9 minutes

Total time savings: 40-66 minutes (89% faster!)

Best Practices

Version base images: Always tag with version (e.g., v1.0.1, v1.0.2)
Update base images infrequently: Only when ML library versions need updating
Keep service requirements minimal: Only add deps NOT in base-ml
Use build args: Make registry/owner/version configurable
Test base images: Ensure health checks pass before building services
Document changes: Update this file when modifying base images

Troubleshooting

Issue: Service can't find ML library

Cause: Library removed from service requirements but not in base-ml Solution: Add library to libs/requirements-ml.txt and rebuild base-ml

Issue: Base image not found

Cause: Base image not pushed to registry or wrong version Solution: Run ./scripts/build-base-images.sh first

Issue: Service image too large

Cause: Duplicate dependencies in service requirements Solution: Remove deps already in base-ml from service requirements.txt

Future Improvements

base-runtime for non-ML services: Use base-runtime instead of building from scratch
Multi-arch builds: Support ARM64 for Apple Silicon
Automated base image updates: CI/CD pipeline to rebuild base images on dependency updates
Layer analysis: Tools to analyze and optimize layer sizes

9.4 KiB Raw Blame History

Base Image Architecture

Overview

Benefits

1. Build ML Dependencies Once

2. Faster Builds

3. Faster Pushes

4. Layer Caching

5. Easy Updates

Image Sizes

Architecture

Base Images

1. base-runtime

2. base-ml

Service Images

ML Services (use base-ml)

Non-ML Services (use python:3.12-slim directly)

Build Process

Step 1: Build Base Images (One Time)

Option A: Build base-ml on Remote Server (Recommended)

Option B: Build Locally (Not Recommended for base-ml)

Step 2: Build Service Images

Step 3: Update Base Images (When Needed)

Requirements Files

libs/requirements-base.txt

libs/requirements-ml.txt

apps/svc_*/requirements.txt

Dockerfile Templates

ML Service Dockerfile Pattern

Non-ML Service Dockerfile Pattern

Comparison: Before vs After

Before (Monolithic Approach)

After (Base Image Approach)

Best Practices

Troubleshooting

Issue: Service can't find ML library

Issue: Base image not found

Issue: Service image too large

Future Improvements

9.4 KiB

Raw Blame History