Files
ai-tax-agent/docs/REMOTE_BUILD_TROUBLESHOOTING.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

314 lines
6.5 KiB
Markdown

# Remote Build Troubleshooting Guide
## Problem: Docker Push Failing on Remote Server
When building `base-ml` image on the remote server and pushing to Gitea, the push fails with large image layers (>1GB).
---
## Root Cause
The issue is likely one of these:
1. **Upload size limit in Traefik** (default ~100MB)
2. **Upload size limit in Gitea** (default varies)
3. **Network timeout** during large uploads
4. **Not logged in** to Gitea registry
5. **Disk space** issues
---
## Quick Diagnosis
### On Remote Server (ssh deploy@141.136.35.199)
Run these commands to diagnose:
```bash
# 1. Check if logged in
cat ~/.docker/config.json
# 2. Test registry endpoint
curl -I https://gitea.harkon.co.uk/v2/
# 3. Check Gitea logs for errors
docker logs --tail 50 gitea-server | grep -i error
# 4. Check Traefik logs for 413 errors
docker logs --tail 50 traefik | grep -E "413|error"
# 5. Check disk space
df -h
# 6. Test with small image
docker pull alpine:latest
docker tag alpine:latest gitea.harkon.co.uk/harkon/test:latest
docker push gitea.harkon.co.uk/harkon/test:latest
```
---
## Solution 1: Automated Fix (Recommended)
Copy the fix script to the remote server and run it:
```bash
# On your local machine
scp scripts/fix-gitea-upload-limit.sh deploy@141.136.35.199:~/
# SSH to remote
ssh deploy@141.136.35.199
# Run the fix script
chmod +x fix-gitea-upload-limit.sh
./fix-gitea-upload-limit.sh
```
This script will:
- ✅ Create Traefik middleware for large uploads (5GB limit)
- ✅ Update Gitea configuration for large files
- ✅ Restart both services
- ✅ Test the registry endpoint
---
## Solution 2: Manual Fix
### Step 1: Configure Traefik
```bash
# SSH to remote
ssh deploy@141.136.35.199
# Create Traefik middleware config
sudo mkdir -p /opt/traefik/config
sudo tee /opt/traefik/config/gitea-large-upload.yml > /dev/null << 'EOF'
http:
middlewares:
gitea-large-upload:
buffering:
maxRequestBodyBytes: 5368709120 # 5GB
memRequestBodyBytes: 104857600 # 100MB
maxResponseBodyBytes: 5368709120 # 5GB
memResponseBodyBytes: 104857600 # 100MB
EOF
# Restart Traefik
docker restart traefik
```
### Step 2: Update Gitea Container Labels
Find your Gitea docker-compose file and add this label:
```yaml
services:
gitea:
labels:
- "traefik.http.routers.gitea.middlewares=gitea-large-upload@file"
```
Then restart:
```bash
docker-compose up -d gitea
```
### Step 3: Configure Gitea Settings
```bash
# Backup config
docker exec gitea-server cp /data/gitea/conf/app.ini /data/gitea/conf/app.ini.backup
# Edit config
docker exec -it gitea-server vi /data/gitea/conf/app.ini
```
Add these settings:
```ini
[server]
LFS_MAX_FILE_SIZE = 5368709120 ; 5GB
[packages]
ENABLED = true
CHUNKED_UPLOAD_PATH = /data/gitea/tmp/package-upload
```
Restart Gitea:
```bash
docker restart gitea-server
```
---
## Solution 3: Alternative - Use GitHub Container Registry
If Gitea continues to have issues, use GitHub Container Registry instead:
### On Remote Server:
```bash
# Login to GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin
# Build and push to GitHub
cd /home/deploy/ai-tax-agent
docker build -f infra/docker/base-ml.Dockerfile -t ghcr.io/harkon/base-ml:v1.0.1 .
docker push ghcr.io/harkon/base-ml:v1.0.1
```
### Update Dockerfiles:
Change `FROM` statements from:
```dockerfile
FROM gitea.harkon.co.uk/harkon/base-ml:v1.0.1
```
To:
```dockerfile
FROM ghcr.io/harkon/base-ml:v1.0.1
```
---
## Testing the Fix
After applying the fix:
### 1. Test with Small Image
```bash
docker pull alpine:latest
docker tag alpine:latest gitea.harkon.co.uk/harkon/test:latest
docker push gitea.harkon.co.uk/harkon/test:latest
```
Expected: ✅ Push succeeds
### 2. Test with Large Image
```bash
cd /home/deploy/ai-tax-agent
docker build -f infra/docker/base-ml.Dockerfile -t gitea.harkon.co.uk/harkon/base-ml:test .
docker push gitea.harkon.co.uk/harkon/base-ml:test
```
Expected: ✅ Push succeeds (may take 5-10 minutes)
### 3. Monitor Logs
In separate terminals:
```bash
# Terminal 1: Traefik logs
docker logs -f traefik
# Terminal 2: Gitea logs
docker logs -f gitea-server
# Terminal 3: Push image
docker push gitea.harkon.co.uk/harkon/base-ml:test
```
Look for:
-`413 Request Entity Too Large` - Upload limit still too low
-`502 Bad Gateway` - Timeout issue
-`unauthorized` - Not logged in
-`Pushed` - Success!
---
## Common Errors and Fixes
### Error: `413 Request Entity Too Large`
**Fix**: Increase Traefik buffering limit (see Solution 1 or 2 above)
### Error: `unauthorized: authentication required`
**Fix**: Log in to Gitea registry
```bash
docker login gitea.harkon.co.uk
```
### Error: `no space left on device`
**Fix**: Clean up Docker
```bash
docker system prune -a --volumes -f
df -h
```
### Error: `net/http: request canceled while waiting for connection`
**Fix**: Network timeout - increase timeout or use chunked uploads
```bash
# Add to Traefik middleware
retryExpression: "IsNetworkError() && Attempts() < 3"
```
### Error: `received unexpected HTTP status: 500 Internal Server Error`
**Fix**: Check Gitea logs for the actual error
```bash
docker logs gitea-server --tail 100
```
---
## Verification Checklist
After fixing, verify:
- [ ] Traefik middleware created and loaded
- [ ] Gitea container has middleware label
- [ ] Gitea app.ini has LFS_MAX_FILE_SIZE set
- [ ] Gitea packages enabled
- [ ] Both services restarted
- [ ] Registry endpoint returns 401 (not 404)
- [ ] Logged in to registry
- [ ] Small image push works
- [ ] Large image push works
---
## Next Steps After Fix
Once the fix is applied and tested:
1. **Build base-ml on remote**:
```bash
cd /home/deploy/ai-tax-agent
docker build -f infra/docker/base-ml.Dockerfile -t gitea.harkon.co.uk/harkon/base-ml:v1.0.1 .
docker push gitea.harkon.co.uk/harkon/base-ml:v1.0.1
```
2. **Build services locally** (they'll pull base-ml from Gitea):
```bash
# On local machine
./scripts/build-and-push-images.sh gitea.harkon.co.uk v1.0.1 harkon
```
3. **Deploy to production**:
```bash
./scripts/deploy-to-production.sh
```
---
## Support Resources
- **Gitea Registry Docs**: https://docs.gitea.io/en-us/packages/container/
- **Traefik Buffering**: https://doc.traefik.io/traefik/middlewares/http/buffering/
- **Docker Registry API**: https://docs.docker.com/registry/spec/api/
---
## Files Created
- `scripts/fix-gitea-upload-limit.sh` - Automated fix script
- `scripts/remote-debug-commands.txt` - Manual debug commands
- `docs/GITEA_REGISTRY_DEBUG.md` - Detailed debugging guide
- `docs/REMOTE_BUILD_TROUBLESHOOTING.md` - This file