Traefik Operations Runbook¶
Quick reference for common Traefik operations and troubleshooting.
Quick Commands¶
Status & Monitoring¶
# Check Traefik status
ssh production "docker ps | grep traefik"
# View logs (live)
ssh production "docker logs -f traefik"
# View recent errors
ssh production "docker logs traefik 2>&1 | grep -i error | tail -20"
# Check certificate status
ssh production "cat /opt/traefik/acme.json | jq -r '.cloudflare.Certificates[].domain.main'"
# Test SSL certificate
curl -vI https://git.kua.cl 2>&1 | grep -E 'subject|issuer|SSL'
Restart & Reload¶
# Restart Traefik (downtime: 2-3 seconds)
ssh production "cd /opt/traefik && docker-compose restart"
# Force reload configuration (no downtime)
ssh production "docker exec traefik kill -USR1 1"
# Full restart (down + up)
ssh production "cd /opt/traefik && docker-compose down && docker-compose up -d"
Check Configuration¶
# View static config
ssh production "cat /opt/traefik/traefik.yml"
# View middleware config
ssh production "cat /opt/traefik/config.yml"
# Check environment variables
ssh production "docker exec traefik env | grep CF_"
# Verify acme.json permissions
ssh production "ls -la /opt/traefik/acme.json" # Should be -rw------- (600)
Adding a New Service¶
Step 1: Add Traefik Network to Service¶
Edit the service's docker-compose.yml:
services:
myservice:
image: myimage:latest
networks:
- myservice # Internal network (if any)
- traefik # Add this line
# ... rest of config
networks:
myservice:
external: false
traefik: # Add this block
external: true
Step 2: Add Traefik Labels¶
Add labels to the service definition:
services:
myservice:
# ... existing config
labels:
# Enable Traefik for this service
- "traefik.enable=true"
- "traefik.docker.network=traefik"
# HTTP router (for redirect)
- "traefik.http.routers.myservice.entrypoints=http"
- "traefik.http.routers.myservice.rule=Host(`myservice.kua.cl`)"
# HTTPS redirect middleware
- "traefik.http.middlewares.myservice-https-redirect.redirectscheme.scheme=https"
- "traefik.http.routers.myservice.middlewares=myservice-https-redirect"
# HTTPS router
- "traefik.http.routers.myservice-secure.entrypoints=https"
- "traefik.http.routers.myservice-secure.rule=Host(`myservice.kua.cl`)"
- "traefik.http.routers.myservice-secure.tls=true"
- "traefik.http.routers.myservice-secure.tls.certresolver=cloudflare"
- "traefik.http.routers.myservice-secure.service=myservice"
# Backend service port
- "traefik.http.services.myservice.loadbalancer.server.port=8080"
Replace:
- myservice → your service name (lowercase, no spaces)
- myservice.kua.cl → your desired subdomain
- 8080 → your service's internal port
Step 3: Restart Service¶
Step 4: Verify¶
# Check if Traefik detected the service
ssh production "docker logs traefik | grep myservice"
# Test HTTP redirect
curl -I http://myservice.kua.cl
# Test HTTPS
curl -I https://myservice.kua.cl
Troubleshooting¶
Problem: Certificate Not Generating¶
Symptoms: - HTTP works (redirects to HTTPS) - HTTPS returns "connection refused" or certificate error
Check:
# 1. Check Traefik logs for ACME errors
ssh production "docker logs traefik 2>&1 | grep -i 'acme\|certificate\|error'"
# 2. Verify Cloudflare credentials
ssh production "docker exec traefik env | grep CF_"
# Should show CF_API_EMAIL and CF_API_KEY
# 3. Check acme.json permissions
ssh production "ls -la /opt/traefik/acme.json"
# Should be -rw------- (600)
# 4. Verify DNS points to VPS
dig myservice.kua.cl
# Should return 116.203.109.220
Fix:
# If credentials missing/wrong:
ssh production "vim /opt/traefik/.env"
ssh production "cd /opt/traefik && docker-compose restart"
# If permissions wrong:
ssh production "chmod 600 /opt/traefik/acme.json"
ssh production "cd /opt/traefik && docker-compose restart"
# If DNS not configured:
# Add DNS record via Terraform (see terraform/dns.tf)
Problem: Service Returns 404¶
Symptoms: - HTTPS works (valid certificate) - Traefik returns "404 page not found"
Check:
# 1. Verify service is running
ssh production "docker ps | grep myservice"
# 2. Check service labels
ssh production "docker inspect myservice | jq '.[0].Config.Labels'"
# 3. Check if service is on traefik network
ssh production "docker inspect myservice | jq '.[0].NetworkSettings.Networks'"
Fix:
# If labels missing or wrong:
# Edit docker-compose.yml, add/fix labels
ssh production "cd /opt/myservice && docker-compose down && docker-compose up -d"
# If not on traefik network:
# Add traefik network to docker-compose.yml
ssh production "cd /opt/myservice && docker-compose down && docker-compose up -d"
# Force Traefik to reload
ssh production "docker exec traefik kill -USR1 1"
Problem: HTTP Not Redirecting to HTTPS¶
Symptoms: - HTTP works (shows service) - Doesn't redirect to HTTPS
Check:
# Verify redirect middleware is configured
ssh production "docker inspect myservice | jq '.[0].Config.Labels' | grep redirect"
Fix:
Add redirect middleware to service labels:
labels:
- "traefik.http.middlewares.myservice-https-redirect.redirectscheme.scheme=https"
- "traefik.http.routers.myservice.middlewares=myservice-https-redirect"
Problem: "Gateway Timeout" (504)¶
Symptoms: - HTTPS works - Traefik returns 504 Gateway Timeout
Causes: 1. Backend service not responding 2. Backend port wrong in Traefik labels 3. Backend service crashed
Check:
# 1. Check if backend is running
ssh production "docker ps | grep myservice"
# 2. Check backend logs
ssh production "docker logs myservice"
# 3. Verify backend port
ssh production "docker port myservice"
# 4. Test backend directly
ssh production "curl http://localhost:8080" # Replace 8080 with actual port
Fix:
# If backend crashed:
ssh production "cd /opt/myservice && docker-compose restart"
# If port wrong:
# Fix port in labels
ssh production "cd /opt/myservice && docker-compose down && docker-compose up -d"
Problem: Traefik Won't Start¶
Symptoms: - Container starts then immediately exits
Check:
# View startup logs
ssh production "docker logs traefik"
# Common errors:
# - "acme.json permissions 0644 too open" → chmod 600
# - "cannot get ACME client" → CF credentials missing
# - "address already in use" → port 80/443 conflict
Fix:
# Fix acme.json permissions
ssh production "chmod 600 /opt/traefik/acme.json"
# Add Cloudflare credentials
ssh production "vim /opt/traefik/.env"
# Check port conflicts
ssh production "netstat -tulpn | grep -E ':80|:443'"
# If something else is using port 80/443, stop it first
# Restart Traefik
ssh production "cd /opt/traefik && docker-compose up -d"
Certificate Management¶
Viewing Certificates¶
# List all certificates
ssh production "cat /opt/traefik/acme.json | jq -r '.cloudflare.Certificates[].domain.main'"
# Check certificate expiration
echo | openssl s_client -connect git.kua.cl:443 2>/dev/null | openssl x509 -noout -dates
# View certificate details
echo | openssl s_client -connect git.kua.cl:443 2>/dev/null | openssl x509 -noout -text
Force Certificate Regeneration¶
# 1. Stop Traefik
ssh production "cd /opt/traefik && docker-compose down"
# 2. Delete acme.json
ssh production "rm /opt/traefik/acme.json"
# 3. Recreate with correct permissions
ssh production "touch /opt/traefik/acme.json && chmod 600 /opt/traefik/acme.json"
# 4. Start Traefik (will regenerate certificates)
ssh production "cd /opt/traefik && docker-compose up -d"
# 5. Monitor logs for certificate generation
ssh production "docker logs -f traefik"
Warning: This causes ~30 seconds downtime while certificates regenerate.
Rotating Cloudflare API Key¶
# 1. Generate new API key in Cloudflare dashboard
# 2. Update in Infisical
# 3. Update .env file
ssh production "vim /opt/traefik/.env"
# Update CF_API_KEY
# 4. Restart Traefik
ssh production "cd /opt/traefik && docker-compose restart"
# 5. Verify
ssh production "docker logs traefik | grep -i error"
Performance Monitoring¶
Resource Usage¶
# Check CPU and memory
ssh production "docker stats traefik --no-stream"
# Typical values:
# CPU: <5%
# RAM: 50-100 MB
Connection Stats¶
# View active connections (requires dashboard)
# Access dashboard via SSH tunnel:
ssh -L 8090:localhost:8090 production
# Then visit http://localhost:8090
Backup & Recovery¶
Backup Configuration¶
# Backup all Traefik config files
ssh production "tar -czf /tmp/traefik-backup.tar.gz -C /opt/traefik *.yml .env docker-compose.yml"
scp production:/tmp/traefik-backup.tar.gz ~/backups/traefik-$(date +%Y%m%d).tar.gz
# Configuration is also backed up in Git
cd ~/kavi-infra
git pull
# Files in traefik/ directory
Restore from Backup¶
# 1. Extract backup
scp ~/backups/traefik-*.tar.gz production:/tmp/
ssh production "mkdir -p /opt/traefik && tar -xzf /tmp/traefik-*.tar.gz -C /opt/traefik"
# 2. Fix permissions
ssh production "chmod 600 /opt/traefik/acme.json"
# 3. Create network if needed
ssh production "docker network create traefik 2>/dev/null || true"
# 4. Start Traefik
ssh production "cd /opt/traefik && docker-compose up -d"
Security Operations¶
Checking for Vulnerabilities¶
# Check Traefik version
ssh production "docker exec traefik traefik version"
# Check for security updates
ssh production "docker pull traefik:v2.11"
ssh production "cd /opt/traefik && docker-compose up -d"
Reviewing Access Logs¶
# Traefik doesn't log access by default
# To enable, add to traefik.yml:
# accessLog:
# filePath: "/var/log/traefik/access.log"
# Then mount log directory in docker-compose.yml
Adding IP Whitelisting¶
To restrict a service to specific IPs:
labels:
# ... existing labels
- "traefik.http.middlewares.myservice-whitelist.ipwhitelist.sourcerange=192.168.1.0/24,10.0.0.0/8"
- "traefik.http.routers.myservice-secure.middlewares=myservice-whitelist"
Emergency Procedures¶
Total Traefik Failure¶
If Traefik is completely down and all HTTPS services are inaccessible:
# 1. Access services via direct ports (emergency)
ssh production
docker ps # Find service ports
# Access via http://116.203.109.220:<port>
# 2. Check if Traefik is running
docker ps | grep traefik
# 3. If not running, check why
docker logs traefik
# 4. Restart Traefik
cd /opt/traefik && docker-compose up -d
# 5. If restart fails, restore from backup
git clone ssh://git@116.203.109.220:2222/kuatecno/kavi-infra.git /tmp/kavi-infra
cp /tmp/kavi-infra/traefik/* /opt/traefik/
vim /opt/traefik/.env # Add CF credentials from Infisical
chmod 600 /opt/traefik/acme.json
docker-compose up -d
Certificate Expired¶
If Let's Encrypt certificate expired (shouldn't happen with auto-renewal):
# 1. Check why renewal failed
ssh production "docker logs traefik | grep -i 'renew\|error'"
# 2. Fix issue (usually Cloudflare credentials)
ssh production "vim /opt/traefik/.env"
# 3. Force regeneration
ssh production "cd /opt/traefik && docker-compose down"
ssh production "rm /opt/traefik/acme.json && touch /opt/traefik/acme.json && chmod 600 /opt/traefik/acme.json"
ssh production "cd /opt/traefik && docker-compose up -d"
Related Documentation¶
- Traefik Service Documentation - Complete Traefik documentation
- Services Overview - All production services
- Disaster Recovery - Recovery procedures
- DNS Management - Adding DNS records for new services
Last updated: December 2025 - Traefik v2.11 operations