Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
7.8 KiB
7.8 KiB
Docker Image Size Optimization
Problem Identified
Initial Docker images were 1.6GB each, which is unacceptably large for microservices.
Root Causes
- Heavy ML dependencies in all services -
sentence-transformers(~2GB with PyTorch) was included in base requirements - Development dependencies in production - pytest, mypy, black, ruff, etc. were being installed in Docker images
- Unnecessary dependencies - Many services don't need ML but were getting all ML libraries
- Redundant dependencies - Multiple overlapping packages (transformers + sentence-transformers both include PyTorch)
Solution
1. Split Requirements Files
Before: Single libs/requirements.txt with everything (97 lines)
After: Modular requirements:
libs/requirements-base.txt- Core dependencies (~30 packages, ~200MB)libs/requirements-ml.txt- ML dependencies (only for 3 services, ~2GB)libs/requirements-pdf.txt- PDF processing (only for services that need it)libs/requirements-rdf.txt- RDF/semantic web (only for KG service)libs/requirements-dev.txt- Development only (NOT in Docker)
2. Service-Specific Optimization
Services WITHOUT ML (11 services) - ~300MB each
- svc-ingestion
- svc-extract
- svc-forms
- svc-hmrc
- svc-rpa
- svc-normalize-map
- svc-reason
- svc-firm-connectors
- svc-coverage
- svc-kg
- ui-review
Dockerfile pattern:
COPY libs/requirements-base.txt /tmp/libs-requirements.txt
COPY apps/svc_xxx/requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/libs-requirements.txt -r /tmp/requirements.txt
Services WITH ML (3 services) - ~1.2GB each
- svc-ocr (needs transformers for document AI)
- svc-rag-indexer (needs sentence-transformers for embeddings)
- svc-rag-retriever (needs sentence-transformers for retrieval)
Dockerfile pattern:
COPY libs/requirements-base.txt /tmp/libs-requirements.txt
COPY apps/svc_xxx/requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/libs-requirements.txt -r /tmp/requirements.txt
3. Additional Optimizations
Removed from Base Requirements
- ❌
sentence-transformers- Only 3 services need it - ❌
transformers- Only 3 services need it - ❌
spacy- Only 2 services need it - ❌
nltk- Only 2 services need it - ❌
scikit-learn- Not needed by most services - ❌
numpy- Only needed by ML services - ❌
aiokafka- Using NATS instead - ❌
boto3/botocore- Not needed - ❌
asyncio-mqtt- Not used - ❌
ipaddress- Built-in to Python - ❌ All OpenTelemetry packages - Moved to dev
- ❌ All testing packages - Moved to dev
- ❌ All code quality tools - Moved to dev
Optimized in Service Requirements
- ✅
opencv-python→opencv-python-headless(smaller, no GUI) - ✅
langchain→tiktoken(just the tokenizer, not the whole framework) - ✅ Removed
presidio(PII detection) - can be added later if needed - ✅ Removed
layoutparser- using transformers directly - ✅ Removed
cohere- using OpenAI/Anthropic only
4. Expected Results
| Service Type | Before | After | Savings |
|---|---|---|---|
| Non-ML services (11) | 1.6GB | ~300MB | 81% reduction |
| ML services (3) | 1.6GB | ~1.2GB | 25% reduction |
| Total (14 services) | 22.4GB | 6.9GB | 69% reduction |
Implementation Checklist
Phase 1: Requirements Files ✅
- Create
libs/requirements-base.txt - Create
libs/requirements-ml.txt - Create
libs/requirements-pdf.txt - Create
libs/requirements-rdf.txt - Create
libs/requirements-dev.txt - Update
libs/requirements.txtto point to base
Phase 2: Service Requirements ✅
- Optimize
svc_ingestion/requirements.txt - Optimize
svc_extract/requirements.txt - Optimize
svc_ocr/requirements.txt - Optimize
svc_rag_retriever/requirements.txt - Optimize
svc_rag_indexer/requirements.txt
Phase 3: Dockerfiles 🟡
- Update
svc_ingestion/Dockerfile - Update
svc_extract/Dockerfile - Update
svc_kg/Dockerfile - Update
svc_rag_retriever/Dockerfile - Update
svc_rag_indexer/Dockerfile - Update
svc_forms/Dockerfile - Update
svc_hmrc/Dockerfile - Update
svc_ocr/Dockerfile - Update
svc_rpa/Dockerfile - Update
svc_normalize_map/Dockerfile - Update
svc_reason/Dockerfile - Update
svc_firm_connectors/Dockerfile - Update
svc_coverage/Dockerfile - Update
ui_review/Dockerfile
Phase 4: Rebuild & Test
- Clean old images:
docker system prune -a - Rebuild all images
- Verify image sizes:
docker images | grep gitea.harkon.co.uk - Test services locally
- Push to registry
Dockerfile Template
For Non-ML Services (Most Services)
# Multi-stage build for svc_xxx
FROM python:3.12-slim AS builder
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Copy requirements and install dependencies
COPY libs/requirements-base.txt /tmp/libs-requirements.txt
COPY apps/svc_xxx/requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r /tmp/libs-requirements.txt -r /tmp/requirements.txt
# Production stage
FROM python:3.12-slim
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
curl \
&& rm -rf /var/lib/apt/lists/* \
&& groupadd -r appuser \
&& useradd -r -g appuser appuser
# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Set working directory
WORKDIR /app
# Copy application code
COPY libs/ ./libs/
COPY apps/svc_xxx/ ./apps/svc_xxx/
# Create non-root user and set permissions
RUN chown -R appuser:appuser /app
USER appuser
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/healthz || exit 1
# Expose port
EXPOSE 8000
# Run the application
CMD ["python", "-m", "uvicorn", "apps.svc_xxx.main:app", "--host", "0.0.0.0", "--port", "8000"]
For ML Services (OCR, RAG Indexer, RAG Retriever)
Same as above, but service requirements already include ML dependencies.
Verification Commands
# Check image sizes
docker images | grep gitea.harkon.co.uk | awk '{print $1":"$2, $7$8}'
# Check what's installed in an image
docker run --rm gitea.harkon.co.uk/blue/svc-ingestion:v1.0.0 pip list
# Compare sizes
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}" | grep gitea
# Check layer sizes
docker history gitea.harkon.co.uk/blue/svc-ingestion:v1.0.0
Next Steps
- Update all Dockerfiles to use
requirements-base.txt - Clean Docker cache:
docker system prune -a --volumes - Rebuild images:
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.1 blue - Verify sizes: Should see ~300MB for most services, ~1.2GB for ML services
- Update deployment: Change version to
v1.0.1in production compose files
Benefits
- Faster builds - Less to download and install
- Faster deployments - Smaller images to push/pull
- Lower storage costs - 69% reduction in total storage
- Faster startup - Less to load into memory
- Better security - Fewer dependencies = smaller attack surface
- Easier maintenance - Clear separation of concerns
Notes
- Development dependencies are now in
libs/requirements-dev.txt- install locally withpip install -r libs/requirements-dev.txt - ML services still need PyTorch, but we're using CPU-only versions where possible
- Consider using
python:3.12-alpinefor even smaller images (but requires more build dependencies) - Monitor for any missing dependencies after deployment