Skip to content

Troubleshooting Guide

Common issues and their solutions for the Kavi infrastructure.


Traefik Issues

"client version 1.24 is too old"

Symptoms:

Error response from daemon: client version 1.24 is too old.
Minimum supported API version is 1.44

Cause: Traefik version < v3.6 with Docker 29.x on Ubuntu 24.04

Solution: Update to Traefik v3.6 or later:

# In docker-compose.yml
traefik:
  image: traefik:v3.6 # NOT v3.3 or earlier

Then redeploy:

ssh root@SERVERIP "cd /root/coder-core/services/production && docker compose pull traefik && docker compose up -d traefik"

ACME Certificate Errors

Symptoms:

Unable to obtain ACME certificate for domains
error: 403 :: urn:ietf:params:acme:error:unauthorized

Cause: DNS not pointing to the server, or Cloudflare proxy blocking TLS challenge

Solution:

  1. Check DNS:
dig +short secrets.kua.cl

Should return server IP or Cloudflare IPs (if proxied)

  1. For Cloudflare-proxied domains:

  2. Temporarily set to "DNS Only" (gray cloud) during certificate issuance

  3. Or use Cloudflare origin certificates

  4. Rate limited? Wait 1 hour and retry (see retry time in error message)


Service Returns 404 (Not Found)

Symptoms: Domain works but returns Traefik 404

Cause: Missing or incorrect Traefik labels

Solution:

  1. Check container has correct labels:
ssh root@SERVERIP "docker inspect <container> | jq '.[0].Config.Labels'"
  1. Verify required labels:

  2. traefik.enable=true

  3. traefik.http.routers.<name>.rule=Host(\domain.kua.cl`)`
  4. traefik.http.services.<name>.loadbalancer.server.port=XXXX

  5. Check container is on proxy network:

    ssh root@SERVERIP "docker inspect <container> | jq '.[0].NetworkSettings.Networks'"
    


Docker Issues

Container Keeps Restarting

Symptoms: docker ps shows container restarting

Solution:

  1. Check logs:
ssh root@SERVERIP "docker logs --tail 100 <container>"
  1. Common causes:
  2. Missing environment variables (check .env)
  3. Database not ready (check healthcheck)
  4. Missing volume mounts

Docker Compose Fails to Start

Symptoms: docker compose up fails with YAML errors

Solution:

  1. Validate YAML syntax:
cd ~/coder-core/services/production
docker compose config
  1. Common issues:
  2. Duplicate keys (e.g., two environment: sections)
  3. Incorrect indentation
  4. Missing quotes around special characters

Infisical Issues

Cannot Login to Infisical

Symptoms: Login page loads but authentication fails

Cause: Database not migrated or ENCRYPTION_KEY mismatch

Solution:

  1. Check Infisical logs:
ssh root@SERVERIP "docker logs --tail 50 infisical"
  1. Verify database has data:
ssh root@SERVERIP "docker exec postgres psql -U kavi main -c 'SELECT count(*) FROM users;'"
  1. If database is empty, migrate from source server:
# On source server
ssh root@SOURCE_IP "docker exec postgres pg_dump -U USER DB > /root/dump.sql"

# Transfer and restore
scp root@SOURCE_IP:/root/dump.sql /tmp/
scp /tmp/dump.sql root@TARGET_IP:/root/
ssh root@TARGET_IP "cat /root/dump.sql | docker exec -i postgres psql -U kavi main"

Infisical CLI Authentication Fails

Symptoms: infisical export returns authentication error

Solution:

  1. Check Machine Identity credentials:
infisical login --method=universal-auth \
  --client-id=YOUR_CLIENT_ID \
  --client-secret=YOUR_CLIENT_SECRET \
  --domain=https://secrets.kua.cl
  1. Verify Machine Identity has access to Production environment in Infisical UI

Storage Issues

Storage Box Not Mounted

Symptoms: /mnt/storagebox is empty or mount fails

Solution:

  1. Check mount status:
ssh root@SERVERIP "systemctl status mount-storagebox"
ssh root@SERVERIP "journalctl -u mount-storagebox"
  1. Check SSH key exists:
ssh root@SERVERIP "ls -la /root/.ssh/id_storagebox"
  1. Test manual mount:
    ssh root@SERVERIP "rclone mount storagebox: /mnt/storagebox --daemon"
    

S3 Access Denied

Symptoms: Services fail to access S3 storage

Solution:

  1. Test rclone:
ssh root@SERVERIP "rclone ls hetzner-s3: --max-depth 1"
  1. Check credentials in .env:
ssh root@SERVERIP "grep S3_ /root/coder-core/services/production/.env"
  1. Verify endpoint URL is correct for your region

SSH Issues

Cannot SSH to Server

Symptoms: SSH connection refused or times out

Solution:

  1. Check server is running (Hetzner Console)

  2. Verify firewall allows SSH:

# From Hetzner Console, check firewall rules include port 22
  1. Check SSH key is correct:
ssh -v root@SERVERIP
  1. If using Tailscale, try direct IP instead:
    ssh root@188.34.198.57  # Direct IP, not Tailscale
    

SSH Key Sync Not Working

Symptoms: New device cannot SSH after adding key to Infisical

Solution:

  1. Check sync script ran:
ssh root@SERVERIP "cat /var/log/ssh-key-sync.log"
  1. Manually trigger sync:
ssh root@SERVERIP "/usr/local/bin/sync-ssh-keys.sh"
  1. Verify device is in active list:
  2. In Infisical, check SSH_KEYS_ACTIVE_DEVICES includes the device name
  3. Check SSH_KEY_DEVICENAME_STATUS is set to active

Database Issues

PostgreSQL Container Unhealthy

Symptoms: Services depending on postgres fail to start

Solution:

  1. Check postgres logs:
ssh root@SERVERIP "docker logs --tail 50 postgres"
  1. Check disk space:
ssh root@SERVERIP "df -h /"
  1. Restart postgres:
    ssh root@SERVERIP "cd /root/coder-core/services/production && docker compose restart postgres"
    

Database Connection Refused

Symptoms: Services log "connection refused" to database

Solution:

  1. Verify postgres is running:
ssh root@SERVERIP "docker ps | grep postgres"
  1. Check services are on same network:
ssh root@SERVERIP "docker network inspect internal"
  1. Verify connection string in .env:
    ssh root@SERVERIP "grep DB_CONNECTION /root/coder-core/services/production/.env"
    

DNS Issues

Domain Not Resolving

Symptoms: curl to domain times out or returns wrong IP

Solution:

  1. Check DNS:
dig +short domain.kua.cl
  1. If using Cloudflare, verify:

  2. A record exists

  3. Points to correct IP
  4. Proxy status is as expected (orange/gray cloud)

  5. Clear local DNS cache:

    # macOS
    sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
    


SSL Certificate Invalid

Symptoms: Browser shows certificate warning

Solution:

  1. Check Traefik logs for ACME errors:
ssh root@SERVERIP "docker logs traefik 2>&1 | grep -i certificate"
  1. Delete old certificates and restart:
ssh root@SERVERIP "rm /root/coder-core/services/production/letsencrypt/acme.json"
ssh root@SERVERIP "cd /root/coder-core/services/production && docker compose restart traefik"
  1. Wait for new certificates (check logs)

Quick Diagnostic Commands

# Check all containers
ssh root@SERVERIP "docker ps --format 'table {{.Names}}\t{{.Status}}'"

# Check Traefik logs
ssh root@SERVERIP "docker logs --tail 30 traefik"

# Check storage mounts
ssh root@SERVERIP "df -h | grep mnt"

# Test DNS
dig +short domain.kua.cl

# Test HTTPS
curl -sI https://domain.kua.cl | head -5

# Check .env file exists
ssh root@SERVERIP "ls -la /root/coder-core/services/production/.env"

# View .env contents (sensitive!)
ssh root@SERVERIP "cat /root/coder-core/services/production/.env"

Getting Help

If issues persist:

  1. Check container logs for specific error messages
  2. Review recent changes to configuration
  3. Verify all secrets are correctly set in Infisical
  4. Check Hetzner status page for infrastructure issues

Last updated: January 2026