9.4 KiB
Base Image Architecture
Overview
To optimize Docker image sizes and build times, we use a layered base image architecture:
python:3.12-slim (150MB)
├─> base-runtime (300MB) - Core deps for ALL services
└─> base-ml (1.2GB) - ML deps (sentence-transformers, PyTorch, etc.)
├─> svc-ocr (1.25GB = base-ml + 50MB app)
├─> svc-rag-indexer (1.25GB = base-ml + 50MB app)
└─> svc-rag-retriever (1.25GB = base-ml + 50MB app)
Benefits
1. Build ML Dependencies Once
- Heavy ML libraries (PyTorch, transformers, sentence-transformers) are built once in
base-ml - All ML services reuse the same base image
- No need to rebuild 1GB+ of dependencies for each service
2. Faster Builds
- Before: Each ML service took 10-15 minutes to build
- After: ML services build in 1-2 minutes (only app code + small deps)
3. Faster Pushes
- Before: Pushing 1.3GB per service = 3.9GB total for 3 ML services
- After: Push base-ml once (1.2GB) + 3 small app layers (50MB each) = 1.35GB total
- Savings: 65% reduction in push time
4. Layer Caching
- Docker reuses base-ml layers across all ML services
- Only the small application layer (~50MB) needs to be pushed/pulled
- Faster deployments and rollbacks
5. Easy Updates
- Update ML library versions in one place (
base-ml) - Rebuild base-ml once, then rebuild all ML services quickly
- Consistent ML library versions across all services
Image Sizes
| Image Type | Size | Contents |
|---|---|---|
| base-runtime | ~300MB | FastAPI, uvicorn, database drivers, Redis, NATS, MinIO, Qdrant, etc. |
| base-ml | ~1.2GB | base-runtime + sentence-transformers, PyTorch, transformers, numpy, scikit-learn, spacy, nltk |
| ML Service | ~1.25GB | base-ml + service-specific deps (faiss, tiktoken, etc.) + app code (~50MB) |
| Non-ML Service | ~350MB | python:3.12-slim + base deps + service deps + app code |
Architecture
Base Images
1. base-runtime
- Location:
infra/docker/base-runtime.Dockerfile - Registry:
gitea.harkon.co.uk/harkon/base-runtime:v1.0.1 - Contents: Core dependencies for ALL services
- FastAPI, uvicorn, pydantic
- Database drivers (asyncpg, psycopg2, neo4j, redis)
- Object storage (minio)
- Vector DB (qdrant-client)
- Event bus (nats-py)
- Secrets (hvac)
- Monitoring (prometheus-client)
- HTTP client (httpx)
- Utilities (ulid-py, python-dateutil, orjson)
2. base-ml
- Location:
infra/docker/base-ml.Dockerfile - Registry:
gitea.harkon.co.uk/harkon/base-ml:v1.0.1 - Contents: base-runtime + ML dependencies
- sentence-transformers (includes PyTorch)
- transformers
- scikit-learn
- numpy
- spacy
- nltk
- fuzzywuzzy
- python-Levenshtein
Service Images
ML Services (use base-ml)
-
svc-ocr - OCR and document AI
- Additional deps: pytesseract, PyMuPDF, pdf2image, Pillow, opencv-python-headless, torchvision
- System deps: tesseract-ocr, poppler-utils
-
svc-rag-indexer - Document indexing and embedding
- Additional deps: tiktoken, beautifulsoup4, faiss-cpu, python-docx, python-pptx, openpyxl, sparse-dot-topn
-
svc-rag-retriever - Semantic search and retrieval
- Additional deps: rank-bm25, faiss-cpu, sparse-dot-topn
Non-ML Services (use python:3.12-slim directly)
- All other services (svc-ingestion, svc-extract, svc-kg, svc-forms, etc.)
- Build from scratch with base requirements + service-specific deps
Build Process
Step 1: Build Base Images (One Time)
IMPORTANT: Build base-ml on the remote server to avoid pushing 1.2GB+ over the network!
Option A: Build base-ml on Remote Server (Recommended)
# Build base-ml on remote server (fast push to Gitea on same network)
./scripts/remote-build-base-ml.sh deploy@141.136.35.199 /home/deploy/ai-tax-agent gitea.harkon.co.uk v1.0.1 harkon
# Or use defaults (deploy user, /home/deploy/ai-tax-agent)
./scripts/remote-build-base-ml.sh
This will:
- Sync code to remote server
- Build
base-mlon remote (~1.2GB, 10-15 min) - Push to Gitea from remote (fast, same network)
Why build base-ml remotely?
- ✅ Faster push to Gitea (same datacenter/network)
- ✅ Saves local network bandwidth
- ✅ Image is cached on remote server for faster service builds
- ✅ Only need to do this once
Time: 10-15 minutes (one time only)
Option B: Build Locally (Not Recommended for base-ml)
# Build both base images locally
./scripts/build-base-images.sh gitea.harkon.co.uk v1.0.1 harkon
This builds:
gitea.harkon.co.uk/harkon/base-runtime:v1.0.1(~300MB)gitea.harkon.co.uk/harkon/base-ml:v1.0.1(~1.2GB)
Note: Pushing 1.2GB base-ml from local machine is slow and may fail due to network issues.
Step 2: Build Service Images
# Build and push all services
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.1 harkon
ML services will:
- Pull
base-ml:v1.0.1from registry (if not cached) - Install service-specific deps (~10-20 packages)
- Copy application code
- Build final image (~1.25GB)
Time per ML service: 1-2 minutes (vs 10-15 minutes before)
Step 3: Update Base Images (When Needed)
When you need to update ML library versions:
# 1. Update libs/requirements-ml.txt
vim libs/requirements-ml.txt
# 2. Rebuild base-ml with new version
./scripts/build-base-images.sh gitea.harkon.co.uk v1.0.2 harkon
# 3. Update service Dockerfiles to use new base version
# Change: ARG BASE_VERSION=v1.0.2
# 4. Rebuild ML services
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.2 harkon
Requirements Files
libs/requirements-base.txt
Core dependencies for ALL services (included in base-runtime and base-ml)
libs/requirements-ml.txt
ML dependencies (included in base-ml only)
apps/svc_*/requirements.txt
Service-specific dependencies:
- ML services: Only additional deps NOT in base-ml (e.g., faiss-cpu, tiktoken)
- Non-ML services: Service-specific deps (e.g., aiofiles, openai, anthropic)
Dockerfile Templates
ML Service Dockerfile Pattern
# Use pre-built ML base image
ARG REGISTRY=gitea.harkon.co.uk
ARG OWNER=harkon
ARG BASE_VERSION=v1.0.1
FROM ${REGISTRY}/${OWNER}/base-ml:${BASE_VERSION}
USER root
WORKDIR /app
# Install service-specific deps (minimal)
COPY apps/SERVICE_NAME/requirements.txt /tmp/service-requirements.txt
RUN pip install --no-cache-dir -r /tmp/service-requirements.txt
# Copy app code
COPY libs/ ./libs/
COPY apps/SERVICE_NAME/ ./apps/SERVICE_NAME/
RUN chown -R appuser:appuser /app
USER appuser
# Health check, expose, CMD...
Non-ML Service Dockerfile Pattern
# Multi-stage build from scratch
FROM python:3.12-slim AS builder
# Install build deps
RUN apt-get update && apt-get install -y build-essential curl && rm -rf /var/lib/apt/lists/*
# Create venv and install deps
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
COPY libs/requirements-base.txt /tmp/libs-requirements.txt
COPY apps/SERVICE_NAME/requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/libs-requirements.txt -r /tmp/requirements.txt
# Production stage
FROM python:3.12-slim
# ... copy venv, app code, etc.
Comparison: Before vs After
Before (Monolithic Approach)
Each ML service:
- Build time: 10-15 minutes
- Image size: 1.6GB
- Push time: 5-10 minutes
- Total for 3 services: 30-45 min build + 15-30 min push = 45-75 minutes
After (Base Image Approach)
Base-ml (one time):
- Build time: 10-15 minutes
- Image size: 1.2GB
- Push time: 5-10 minutes
Each ML service:
- Build time: 1-2 minutes
- Image size: 1.25GB (but only 50MB new layers)
- Push time: 30-60 seconds (only new layers)
- Total for 3 services: 3-6 min build + 2-3 min push = 5-9 minutes
Total time savings: 40-66 minutes (89% faster!)
Best Practices
- Version base images: Always tag with version (e.g., v1.0.1, v1.0.2)
- Update base images infrequently: Only when ML library versions need updating
- Keep service requirements minimal: Only add deps NOT in base-ml
- Use build args: Make registry/owner/version configurable
- Test base images: Ensure health checks pass before building services
- Document changes: Update this file when modifying base images
Troubleshooting
Issue: Service can't find ML library
Cause: Library removed from service requirements but not in base-ml
Solution: Add library to libs/requirements-ml.txt and rebuild base-ml
Issue: Base image not found
Cause: Base image not pushed to registry or wrong version
Solution: Run ./scripts/build-base-images.sh first
Issue: Service image too large
Cause: Duplicate dependencies in service requirements Solution: Remove deps already in base-ml from service requirements.txt
Future Improvements
- base-runtime for non-ML services: Use base-runtime instead of building from scratch
- Multi-arch builds: Support ARM64 for Apple Silicon
- Automated base image updates: CI/CD pipeline to rebuild base images on dependency updates
- Layer analysis: Tools to analyze and optimize layer sizes