Files
ai-tax-agent/docs/REMOTE_BUILD_TROUBLESHOOTING.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

6.5 KiB

Remote Build Troubleshooting Guide

Problem: Docker Push Failing on Remote Server

When building base-ml image on the remote server and pushing to Gitea, the push fails with large image layers (>1GB).


Root Cause

The issue is likely one of these:

  1. Upload size limit in Traefik (default ~100MB)
  2. Upload size limit in Gitea (default varies)
  3. Network timeout during large uploads
  4. Not logged in to Gitea registry
  5. Disk space issues

Quick Diagnosis

On Remote Server (ssh deploy@141.136.35.199)

Run these commands to diagnose:

# 1. Check if logged in
cat ~/.docker/config.json

# 2. Test registry endpoint
curl -I https://gitea.harkon.co.uk/v2/

# 3. Check Gitea logs for errors
docker logs --tail 50 gitea-server | grep -i error

# 4. Check Traefik logs for 413 errors
docker logs --tail 50 traefik | grep -E "413|error"

# 5. Check disk space
df -h

# 6. Test with small image
docker pull alpine:latest
docker tag alpine:latest gitea.harkon.co.uk/harkon/test:latest
docker push gitea.harkon.co.uk/harkon/test:latest

Copy the fix script to the remote server and run it:

# On your local machine
scp scripts/fix-gitea-upload-limit.sh deploy@141.136.35.199:~/

# SSH to remote
ssh deploy@141.136.35.199

# Run the fix script
chmod +x fix-gitea-upload-limit.sh
./fix-gitea-upload-limit.sh

This script will:

  • Create Traefik middleware for large uploads (5GB limit)
  • Update Gitea configuration for large files
  • Restart both services
  • Test the registry endpoint

Solution 2: Manual Fix

Step 1: Configure Traefik

# SSH to remote
ssh deploy@141.136.35.199

# Create Traefik middleware config
sudo mkdir -p /opt/traefik/config
sudo tee /opt/traefik/config/gitea-large-upload.yml > /dev/null << 'EOF'
http:
  middlewares:
    gitea-large-upload:
      buffering:
        maxRequestBodyBytes: 5368709120   # 5GB
        memRequestBodyBytes: 104857600    # 100MB
        maxResponseBodyBytes: 5368709120  # 5GB
        memResponseBodyBytes: 104857600   # 100MB
EOF

# Restart Traefik
docker restart traefik

Step 2: Update Gitea Container Labels

Find your Gitea docker-compose file and add this label:

services:
  gitea:
    labels:
      - "traefik.http.routers.gitea.middlewares=gitea-large-upload@file"

Then restart:

docker-compose up -d gitea

Step 3: Configure Gitea Settings

# Backup config
docker exec gitea-server cp /data/gitea/conf/app.ini /data/gitea/conf/app.ini.backup

# Edit config
docker exec -it gitea-server vi /data/gitea/conf/app.ini

Add these settings:

[server]
LFS_MAX_FILE_SIZE = 5368709120  ; 5GB

[packages]
ENABLED = true
CHUNKED_UPLOAD_PATH = /data/gitea/tmp/package-upload

Restart Gitea:

docker restart gitea-server

Solution 3: Alternative - Use GitHub Container Registry

If Gitea continues to have issues, use GitHub Container Registry instead:

On Remote Server:

# Login to GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

# Build and push to GitHub
cd /home/deploy/ai-tax-agent
docker build -f infra/docker/base-ml.Dockerfile -t ghcr.io/harkon/base-ml:v1.0.1 .
docker push ghcr.io/harkon/base-ml:v1.0.1

Update Dockerfiles:

Change FROM statements from:

FROM gitea.harkon.co.uk/harkon/base-ml:v1.0.1

To:

FROM ghcr.io/harkon/base-ml:v1.0.1

Testing the Fix

After applying the fix:

1. Test with Small Image

docker pull alpine:latest
docker tag alpine:latest gitea.harkon.co.uk/harkon/test:latest
docker push gitea.harkon.co.uk/harkon/test:latest

Expected: Push succeeds

2. Test with Large Image

cd /home/deploy/ai-tax-agent
docker build -f infra/docker/base-ml.Dockerfile -t gitea.harkon.co.uk/harkon/base-ml:test .
docker push gitea.harkon.co.uk/harkon/base-ml:test

Expected: Push succeeds (may take 5-10 minutes)

3. Monitor Logs

In separate terminals:

# Terminal 1: Traefik logs
docker logs -f traefik

# Terminal 2: Gitea logs
docker logs -f gitea-server

# Terminal 3: Push image
docker push gitea.harkon.co.uk/harkon/base-ml:test

Look for:

  • 413 Request Entity Too Large - Upload limit still too low
  • 502 Bad Gateway - Timeout issue
  • unauthorized - Not logged in
  • Pushed - Success!

Common Errors and Fixes

Error: 413 Request Entity Too Large

Fix: Increase Traefik buffering limit (see Solution 1 or 2 above)

Error: unauthorized: authentication required

Fix: Log in to Gitea registry

docker login gitea.harkon.co.uk

Error: no space left on device

Fix: Clean up Docker

docker system prune -a --volumes -f
df -h

Error: net/http: request canceled while waiting for connection

Fix: Network timeout - increase timeout or use chunked uploads

# Add to Traefik middleware
retryExpression: "IsNetworkError() && Attempts() < 3"

Error: received unexpected HTTP status: 500 Internal Server Error

Fix: Check Gitea logs for the actual error

docker logs gitea-server --tail 100

Verification Checklist

After fixing, verify:

  • Traefik middleware created and loaded
  • Gitea container has middleware label
  • Gitea app.ini has LFS_MAX_FILE_SIZE set
  • Gitea packages enabled
  • Both services restarted
  • Registry endpoint returns 401 (not 404)
  • Logged in to registry
  • Small image push works
  • Large image push works

Next Steps After Fix

Once the fix is applied and tested:

  1. Build base-ml on remote:
cd /home/deploy/ai-tax-agent
docker build -f infra/docker/base-ml.Dockerfile -t gitea.harkon.co.uk/harkon/base-ml:v1.0.1 .
docker push gitea.harkon.co.uk/harkon/base-ml:v1.0.1
  1. Build services locally (they'll pull base-ml from Gitea):
# On local machine
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.1 harkon
  1. Deploy to production:
./scripts/deploy-to-production.sh

Support Resources


Files Created

  • scripts/fix-gitea-upload-limit.sh - Automated fix script
  • scripts/remote-debug-commands.txt - Manual debug commands
  • docs/GITEA_REGISTRY_DEBUG.md - Detailed debugging guide
  • docs/REMOTE_BUILD_TROUBLESHOOTING.md - This file