ai-tax-agent/docs/BASE_IMAGE_ARCHITECTURE.md

# Base Image Architecture

## Overview

To optimize Docker image sizes and build times, we use a **layered base image architecture**:

```
python:3.12-slim (150MB)
    ├─> base-runtime (300MB) - Core deps for ALL services
    └─> base-ml (1.2GB) - ML deps (sentence-transformers, PyTorch, etc.)
            ├─> svc-ocr (1.25GB = base-ml + 50MB app)
            ├─> svc-rag-indexer (1.25GB = base-ml + 50MB app)
            └─> svc-rag-retriever (1.25GB = base-ml + 50MB app)
```

## Benefits

### 1. **Build ML Dependencies Once**

- Heavy ML libraries (PyTorch, transformers, sentence-transformers) are built once in `base-ml`
- All ML services reuse the same base image
- No need to rebuild 1GB+ of dependencies for each service

### 2. **Faster Builds**

- **Before**: Each ML service took 10-15 minutes to build
- **After**: ML services build in 1-2 minutes (only app code + small deps)

### 3. **Faster Pushes**

- **Before**: Pushing 1.3GB per service = 3.9GB total for 3 ML services
- **After**: Push base-ml once (1.2GB) + 3 small app layers (50MB each) = 1.35GB total
- **Savings**: 65% reduction in push time

### 4. **Layer Caching**

- Docker reuses base-ml layers across all ML services
- Only the small application layer (~50MB) needs to be pushed/pulled
- Faster deployments and rollbacks

### 5. **Easy Updates**

- Update ML library versions in one place (`base-ml`)
- Rebuild base-ml once, then rebuild all ML services quickly
- Consistent ML library versions across all services

## Image Sizes

| Image Type         | Size    | Contents                                                                                      |
| ------------------ | ------- | --------------------------------------------------------------------------------------------- |
| **base-runtime**   | ~300MB  | FastAPI, uvicorn, database drivers, Redis, NATS, MinIO, Qdrant, etc.                          |
| **base-ml**        | ~1.2GB  | base-runtime + sentence-transformers, PyTorch, transformers, numpy, scikit-learn, spacy, nltk |
| **ML Service**     | ~1.25GB | base-ml + service-specific deps (faiss, tiktoken, etc.) + app code (~50MB)                    |
| **Non-ML Service** | ~350MB  | python:3.12-slim + base deps + service deps + app code                                        |

## Architecture

### Base Images

#### 1. base-runtime

- **Location**: `infra/docker/base-runtime.Dockerfile`
- **Registry**: `gitea.harkon.co.uk/harkon/base-runtime:v1.0.1`
- **Contents**: Core dependencies for ALL services
  - FastAPI, uvicorn, pydantic
  - Database drivers (asyncpg, psycopg2, neo4j, redis)
  - Object storage (minio)
  - Vector DB (qdrant-client)
  - Event bus (nats-py)
  - Secrets (hvac)
  - Monitoring (prometheus-client)
  - HTTP client (httpx)
  - Utilities (ulid-py, python-dateutil, orjson)

#### 2. base-ml

- **Location**: `infra/docker/base-ml.Dockerfile`
- **Registry**: `gitea.harkon.co.uk/harkon/base-ml:v1.0.1`
- **Contents**: base-runtime + ML dependencies
  - sentence-transformers (includes PyTorch)
  - transformers
  - scikit-learn
  - numpy
  - spacy
  - nltk
  - fuzzywuzzy
  - python-Levenshtein

### Service Images

#### ML Services (use base-ml)

1. **svc-ocr** - OCR and document AI

   - Additional deps: pytesseract, PyMuPDF, pdf2image, Pillow, opencv-python-headless, torchvision
   - System deps: tesseract-ocr, poppler-utils

2. **svc-rag-indexer** - Document indexing and embedding

   - Additional deps: tiktoken, beautifulsoup4, faiss-cpu, python-docx, python-pptx, openpyxl, sparse-dot-topn

3. **svc-rag-retriever** - Semantic search and retrieval
   - Additional deps: rank-bm25, faiss-cpu, sparse-dot-topn

#### Non-ML Services (use python:3.12-slim directly)

- All other services (svc-ingestion, svc-extract, svc-kg, svc-forms, etc.)
- Build from scratch with base requirements + service-specific deps

## Build Process

### Step 1: Build Base Images (One Time)

**IMPORTANT**: Build `base-ml` on the remote server to avoid pushing 1.2GB+ over the network!

#### Option A: Build base-ml on Remote Server (Recommended)

```bash
# Build base-ml on remote server (fast push to Gitea on same network)
./scripts/remote-build-base-ml.sh deploy@141.136.35.199 /home/deploy/ai-tax-agent gitea.harkon.co.uk v1.0.1 harkon

# Or use defaults (deploy user, /home/deploy/ai-tax-agent)
./scripts/remote-build-base-ml.sh
```

This will:

1. Sync code to remote server
2. Build `base-ml` on remote (~1.2GB, 10-15 min)
3. Push to Gitea from remote (fast, same network)

**Why build base-ml remotely?**

- ✅ Faster push to Gitea (same datacenter/network)
- ✅ Saves local network bandwidth
- ✅ Image is cached on remote server for faster service builds
- ✅ Only need to do this once

**Time**: 10-15 minutes (one time only)

#### Option B: Build Locally (Not Recommended for base-ml)

```bash
# Build both base images locally
./scripts/build-base-images.sh gitea.harkon.co.uk v1.0.1 harkon
```

This builds:

- `gitea.harkon.co.uk/harkon/base-runtime:v1.0.1` (~300MB)
- `gitea.harkon.co.uk/harkon/base-ml:v1.0.1` (~1.2GB)

**Note**: Pushing 1.2GB base-ml from local machine is slow and may fail due to network issues.

### Step 2: Build Service Images

```bash
# Build and push all services
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.1 harkon
```

ML services will:

1. Pull `base-ml:v1.0.1` from registry (if not cached)
2. Install service-specific deps (~10-20 packages)
3. Copy application code
4. Build final image (~1.25GB)

**Time per ML service**: 1-2 minutes (vs 10-15 minutes before)

### Step 3: Update Base Images (When Needed)

When you need to update ML library versions:

```bash
# 1. Update libs/requirements-ml.txt
vim libs/requirements-ml.txt

# 2. Rebuild base-ml with new version
./scripts/build-base-images.sh gitea.harkon.co.uk v1.0.2 harkon

# 3. Update service Dockerfiles to use new base version
# Change: ARG BASE_VERSION=v1.0.2

# 4. Rebuild ML services
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.2 harkon
```

## Requirements Files

### libs/requirements-base.txt

Core dependencies for ALL services (included in base-runtime and base-ml)

### libs/requirements-ml.txt

ML dependencies (included in base-ml only)

### apps/svc\_\*/requirements.txt

Service-specific dependencies:

- **ML services**: Only additional deps NOT in base-ml (e.g., faiss-cpu, tiktoken)
- **Non-ML services**: Service-specific deps (e.g., aiofiles, openai, anthropic)

## Dockerfile Templates

### ML Service Dockerfile Pattern

```dockerfile
# Use pre-built ML base image
ARG REGISTRY=gitea.harkon.co.uk
ARG OWNER=harkon
ARG BASE_VERSION=v1.0.1
FROM ${REGISTRY}/${OWNER}/base-ml:${BASE_VERSION}

USER root
WORKDIR /app

# Install service-specific deps (minimal)
COPY apps/SERVICE_NAME/requirements.txt /tmp/service-requirements.txt
RUN pip install --no-cache-dir -r /tmp/service-requirements.txt

# Copy app code
COPY libs/ ./libs/
COPY apps/SERVICE_NAME/ ./apps/SERVICE_NAME/

RUN chown -R appuser:appuser /app
USER appuser

# Health check, expose, CMD...
```

### Non-ML Service Dockerfile Pattern

```dockerfile
# Multi-stage build from scratch
FROM python:3.12-slim AS builder

# Install build deps
RUN apt-get update && apt-get install -y build-essential curl && rm -rf /var/lib/apt/lists/*

# Create venv and install deps
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY libs/requirements-base.txt /tmp/libs-requirements.txt
COPY apps/SERVICE_NAME/requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/libs-requirements.txt -r /tmp/requirements.txt

# Production stage
FROM python:3.12-slim
# ... copy venv, app code, etc.
```

## Comparison: Before vs After

### Before (Monolithic Approach)

```
Each ML service:
- Build time: 10-15 minutes
- Image size: 1.6GB
- Push time: 5-10 minutes
- Total for 3 services: 30-45 min build + 15-30 min push = 45-75 minutes
```

### After (Base Image Approach)

```
Base-ml (one time):
- Build time: 10-15 minutes
- Image size: 1.2GB
- Push time: 5-10 minutes

Each ML service:
- Build time: 1-2 minutes
- Image size: 1.25GB (but only 50MB new layers)
- Push time: 30-60 seconds (only new layers)
- Total for 3 services: 3-6 min build + 2-3 min push = 5-9 minutes

Total time savings: 40-66 minutes (89% faster!)
```

## Best Practices

1. **Version base images**: Always tag with version (e.g., v1.0.1, v1.0.2)
2. **Update base images infrequently**: Only when ML library versions need updating
3. **Keep service requirements minimal**: Only add deps NOT in base-ml
4. **Use build args**: Make registry/owner/version configurable
5. **Test base images**: Ensure health checks pass before building services
6. **Document changes**: Update this file when modifying base images

## Troubleshooting

### Issue: Service can't find ML library

**Cause**: Library removed from service requirements but not in base-ml
**Solution**: Add library to `libs/requirements-ml.txt` and rebuild base-ml

### Issue: Base image not found

**Cause**: Base image not pushed to registry or wrong version
**Solution**: Run `./scripts/build-base-images.sh` first

### Issue: Service image too large

**Cause**: Duplicate dependencies in service requirements
**Solution**: Remove deps already in base-ml from service requirements.txt

## Future Improvements

1. **base-runtime for non-ML services**: Use base-runtime instead of building from scratch
2. **Multi-arch builds**: Support ARM64 for Apple Silicon
3. **Automated base image updates**: CI/CD pipeline to rebuild base images on dependency updates
4. **Layer analysis**: Tools to analyze and optimize layer sizes