Skip to content

Traefik Operations Runbook

Quick reference for common Traefik operations and troubleshooting.


Quick Commands

Status & Monitoring

# Check Traefik status
ssh production "docker ps | grep traefik"

# View logs (live)
ssh production "docker logs -f traefik"

# View recent errors
ssh production "docker logs traefik 2>&1 | grep -i error | tail -20"

# Check certificate status
ssh production "cat /opt/traefik/acme.json | jq -r '.cloudflare.Certificates[].domain.main'"

# Test SSL certificate
curl -vI https://git.kua.cl 2>&1 | grep -E 'subject|issuer|SSL'

Restart & Reload

# Restart Traefik (downtime: 2-3 seconds)
ssh production "cd /opt/traefik && docker-compose restart"

# Force reload configuration (no downtime)
ssh production "docker exec traefik kill -USR1 1"

# Full restart (down + up)
ssh production "cd /opt/traefik && docker-compose down && docker-compose up -d"

Check Configuration

# View static config
ssh production "cat /opt/traefik/traefik.yml"

# View middleware config
ssh production "cat /opt/traefik/config.yml"

# Check environment variables
ssh production "docker exec traefik env | grep CF_"

# Verify acme.json permissions
ssh production "ls -la /opt/traefik/acme.json"  # Should be -rw------- (600)

Adding a New Service

Step 1: Add Traefik Network to Service

Edit the service's docker-compose.yml:

services:
  myservice:
    image: myimage:latest
    networks:
      - myservice  # Internal network (if any)
      - traefik    # Add this line
    # ... rest of config

networks:
  myservice:
    external: false
  traefik:         # Add this block
    external: true

Step 2: Add Traefik Labels

Add labels to the service definition:

services:
  myservice:
    # ... existing config
    labels:
      # Enable Traefik for this service
      - "traefik.enable=true"
      - "traefik.docker.network=traefik"

      # HTTP router (for redirect)
      - "traefik.http.routers.myservice.entrypoints=http"
      - "traefik.http.routers.myservice.rule=Host(`myservice.kua.cl`)"

      # HTTPS redirect middleware
      - "traefik.http.middlewares.myservice-https-redirect.redirectscheme.scheme=https"
      - "traefik.http.routers.myservice.middlewares=myservice-https-redirect"

      # HTTPS router
      - "traefik.http.routers.myservice-secure.entrypoints=https"
      - "traefik.http.routers.myservice-secure.rule=Host(`myservice.kua.cl`)"
      - "traefik.http.routers.myservice-secure.tls=true"
      - "traefik.http.routers.myservice-secure.tls.certresolver=cloudflare"
      - "traefik.http.routers.myservice-secure.service=myservice"

      # Backend service port
      - "traefik.http.services.myservice.loadbalancer.server.port=8080"

Replace: - myservice → your service name (lowercase, no spaces) - myservice.kua.cl → your desired subdomain - 8080 → your service's internal port

Step 3: Restart Service

ssh production "cd /opt/myservice && docker-compose down && docker-compose up -d"

Step 4: Verify

# Check if Traefik detected the service
ssh production "docker logs traefik | grep myservice"

# Test HTTP redirect
curl -I http://myservice.kua.cl

# Test HTTPS
curl -I https://myservice.kua.cl

Troubleshooting

Problem: Certificate Not Generating

Symptoms: - HTTP works (redirects to HTTPS) - HTTPS returns "connection refused" or certificate error

Check:

# 1. Check Traefik logs for ACME errors
ssh production "docker logs traefik 2>&1 | grep -i 'acme\|certificate\|error'"

# 2. Verify Cloudflare credentials
ssh production "docker exec traefik env | grep CF_"
# Should show CF_API_EMAIL and CF_API_KEY

# 3. Check acme.json permissions
ssh production "ls -la /opt/traefik/acme.json"
# Should be -rw------- (600)

# 4. Verify DNS points to VPS
dig myservice.kua.cl
# Should return 116.203.109.220

Fix:

# If credentials missing/wrong:
ssh production "vim /opt/traefik/.env"
ssh production "cd /opt/traefik && docker-compose restart"

# If permissions wrong:
ssh production "chmod 600 /opt/traefik/acme.json"
ssh production "cd /opt/traefik && docker-compose restart"

# If DNS not configured:
# Add DNS record via Terraform (see terraform/dns.tf)

Problem: Service Returns 404

Symptoms: - HTTPS works (valid certificate) - Traefik returns "404 page not found"

Check:

# 1. Verify service is running
ssh production "docker ps | grep myservice"

# 2. Check service labels
ssh production "docker inspect myservice | jq '.[0].Config.Labels'"

# 3. Check if service is on traefik network
ssh production "docker inspect myservice | jq '.[0].NetworkSettings.Networks'"

Fix:

# If labels missing or wrong:
# Edit docker-compose.yml, add/fix labels
ssh production "cd /opt/myservice && docker-compose down && docker-compose up -d"

# If not on traefik network:
# Add traefik network to docker-compose.yml
ssh production "cd /opt/myservice && docker-compose down && docker-compose up -d"

# Force Traefik to reload
ssh production "docker exec traefik kill -USR1 1"

Problem: HTTP Not Redirecting to HTTPS

Symptoms: - HTTP works (shows service) - Doesn't redirect to HTTPS

Check:

# Verify redirect middleware is configured
ssh production "docker inspect myservice | jq '.[0].Config.Labels' | grep redirect"

Fix:

Add redirect middleware to service labels:

labels:
  - "traefik.http.middlewares.myservice-https-redirect.redirectscheme.scheme=https"
  - "traefik.http.routers.myservice.middlewares=myservice-https-redirect"

Problem: "Gateway Timeout" (504)

Symptoms: - HTTPS works - Traefik returns 504 Gateway Timeout

Causes: 1. Backend service not responding 2. Backend port wrong in Traefik labels 3. Backend service crashed

Check:

# 1. Check if backend is running
ssh production "docker ps | grep myservice"

# 2. Check backend logs
ssh production "docker logs myservice"

# 3. Verify backend port
ssh production "docker port myservice"

# 4. Test backend directly
ssh production "curl http://localhost:8080"  # Replace 8080 with actual port

Fix:

# If backend crashed:
ssh production "cd /opt/myservice && docker-compose restart"

# If port wrong:
# Fix port in labels
ssh production "cd /opt/myservice && docker-compose down && docker-compose up -d"

Problem: Traefik Won't Start

Symptoms: - Container starts then immediately exits

Check:

# View startup logs
ssh production "docker logs traefik"

# Common errors:
# - "acme.json permissions 0644 too open" → chmod 600
# - "cannot get ACME client" → CF credentials missing
# - "address already in use" → port 80/443 conflict

Fix:

# Fix acme.json permissions
ssh production "chmod 600 /opt/traefik/acme.json"

# Add Cloudflare credentials
ssh production "vim /opt/traefik/.env"

# Check port conflicts
ssh production "netstat -tulpn | grep -E ':80|:443'"
# If something else is using port 80/443, stop it first

# Restart Traefik
ssh production "cd /opt/traefik && docker-compose up -d"

Certificate Management

Viewing Certificates

# List all certificates
ssh production "cat /opt/traefik/acme.json | jq -r '.cloudflare.Certificates[].domain.main'"

# Check certificate expiration
echo | openssl s_client -connect git.kua.cl:443 2>/dev/null | openssl x509 -noout -dates

# View certificate details
echo | openssl s_client -connect git.kua.cl:443 2>/dev/null | openssl x509 -noout -text

Force Certificate Regeneration

# 1. Stop Traefik
ssh production "cd /opt/traefik && docker-compose down"

# 2. Delete acme.json
ssh production "rm /opt/traefik/acme.json"

# 3. Recreate with correct permissions
ssh production "touch /opt/traefik/acme.json && chmod 600 /opt/traefik/acme.json"

# 4. Start Traefik (will regenerate certificates)
ssh production "cd /opt/traefik && docker-compose up -d"

# 5. Monitor logs for certificate generation
ssh production "docker logs -f traefik"

Warning: This causes ~30 seconds downtime while certificates regenerate.

Rotating Cloudflare API Key

# 1. Generate new API key in Cloudflare dashboard
# 2. Update in Infisical
# 3. Update .env file
ssh production "vim /opt/traefik/.env"
# Update CF_API_KEY

# 4. Restart Traefik
ssh production "cd /opt/traefik && docker-compose restart"

# 5. Verify
ssh production "docker logs traefik | grep -i error"

Performance Monitoring

Resource Usage

# Check CPU and memory
ssh production "docker stats traefik --no-stream"

# Typical values:
# CPU: <5%
# RAM: 50-100 MB

Connection Stats

# View active connections (requires dashboard)
# Access dashboard via SSH tunnel:
ssh -L 8090:localhost:8090 production
# Then visit http://localhost:8090

Backup & Recovery

Backup Configuration

# Backup all Traefik config files
ssh production "tar -czf /tmp/traefik-backup.tar.gz -C /opt/traefik *.yml .env docker-compose.yml"
scp production:/tmp/traefik-backup.tar.gz ~/backups/traefik-$(date +%Y%m%d).tar.gz

# Configuration is also backed up in Git
cd ~/kavi-infra
git pull
# Files in traefik/ directory

Restore from Backup

# 1. Extract backup
scp ~/backups/traefik-*.tar.gz production:/tmp/
ssh production "mkdir -p /opt/traefik && tar -xzf /tmp/traefik-*.tar.gz -C /opt/traefik"

# 2. Fix permissions
ssh production "chmod 600 /opt/traefik/acme.json"

# 3. Create network if needed
ssh production "docker network create traefik 2>/dev/null || true"

# 4. Start Traefik
ssh production "cd /opt/traefik && docker-compose up -d"

Security Operations

Checking for Vulnerabilities

# Check Traefik version
ssh production "docker exec traefik traefik version"

# Check for security updates
ssh production "docker pull traefik:v2.11"
ssh production "cd /opt/traefik && docker-compose up -d"

Reviewing Access Logs

# Traefik doesn't log access by default
# To enable, add to traefik.yml:
# accessLog:
#   filePath: "/var/log/traefik/access.log"

# Then mount log directory in docker-compose.yml

Adding IP Whitelisting

To restrict a service to specific IPs:

labels:
  # ... existing labels
  - "traefik.http.middlewares.myservice-whitelist.ipwhitelist.sourcerange=192.168.1.0/24,10.0.0.0/8"
  - "traefik.http.routers.myservice-secure.middlewares=myservice-whitelist"

Emergency Procedures

Total Traefik Failure

If Traefik is completely down and all HTTPS services are inaccessible:

# 1. Access services via direct ports (emergency)
ssh production
docker ps  # Find service ports
# Access via http://116.203.109.220:<port>

# 2. Check if Traefik is running
docker ps | grep traefik

# 3. If not running, check why
docker logs traefik

# 4. Restart Traefik
cd /opt/traefik && docker-compose up -d

# 5. If restart fails, restore from backup
git clone ssh://git@116.203.109.220:2222/kuatecno/kavi-infra.git /tmp/kavi-infra
cp /tmp/kavi-infra/traefik/* /opt/traefik/
vim /opt/traefik/.env  # Add CF credentials from Infisical
chmod 600 /opt/traefik/acme.json
docker-compose up -d

Certificate Expired

If Let's Encrypt certificate expired (shouldn't happen with auto-renewal):

# 1. Check why renewal failed
ssh production "docker logs traefik | grep -i 'renew\|error'"

# 2. Fix issue (usually Cloudflare credentials)
ssh production "vim /opt/traefik/.env"

# 3. Force regeneration
ssh production "cd /opt/traefik && docker-compose down"
ssh production "rm /opt/traefik/acme.json && touch /opt/traefik/acme.json && chmod 600 /opt/traefik/acme.json"
ssh production "cd /opt/traefik && docker-compose up -d"


Last updated: December 2025 - Traefik v2.11 operations