Troubleshooting Guide¶
Common issues and their solutions for the Kavi infrastructure.
Traefik Issues¶
"client version 1.24 is too old"¶
Symptoms:
Cause: Traefik version < v3.6 with Docker 29.x on Ubuntu 24.04
Solution: Update to Traefik v3.6 or later:
Then redeploy:
ssh root@SERVERIP "cd /root/coder-core/services/production && docker compose pull traefik && docker compose up -d traefik"
ACME Certificate Errors¶
Symptoms:
Cause: DNS not pointing to the server, or Cloudflare proxy blocking TLS challenge
Solution:
- Check DNS:
Should return server IP or Cloudflare IPs (if proxied)
-
For Cloudflare-proxied domains:
-
Temporarily set to "DNS Only" (gray cloud) during certificate issuance
-
Or use Cloudflare origin certificates
-
Rate limited? Wait 1 hour and retry (see retry time in error message)
Service Returns 404 (Not Found)¶
Symptoms: Domain works but returns Traefik 404
Cause: Missing or incorrect Traefik labels
Solution:
- Check container has correct labels:
-
Verify required labels:
-
traefik.enable=true traefik.http.routers.<name>.rule=Host(\domain.kua.cl`)`-
traefik.http.services.<name>.loadbalancer.server.port=XXXX -
Check container is on proxy network:
Docker Issues¶
Container Keeps Restarting¶
Symptoms: docker ps shows container restarting
Solution:
- Check logs:
- Common causes:
- Missing environment variables (check
.env) - Database not ready (check healthcheck)
- Missing volume mounts
Docker Compose Fails to Start¶
Symptoms: docker compose up fails with YAML errors
Solution:
- Validate YAML syntax:
- Common issues:
- Duplicate keys (e.g., two
environment:sections) - Incorrect indentation
- Missing quotes around special characters
Infisical Issues¶
Cannot Login to Infisical¶
Symptoms: Login page loads but authentication fails
Cause: Database not migrated or ENCRYPTION_KEY mismatch
Solution:
- Check Infisical logs:
- Verify database has data:
- If database is empty, migrate from source server:
# On source server
ssh root@SOURCE_IP "docker exec postgres pg_dump -U USER DB > /root/dump.sql"
# Transfer and restore
scp root@SOURCE_IP:/root/dump.sql /tmp/
scp /tmp/dump.sql root@TARGET_IP:/root/
ssh root@TARGET_IP "cat /root/dump.sql | docker exec -i postgres psql -U kavi main"
Infisical CLI Authentication Fails¶
Symptoms: infisical export returns authentication error
Solution:
- Check Machine Identity credentials:
infisical login --method=universal-auth \
--client-id=YOUR_CLIENT_ID \
--client-secret=YOUR_CLIENT_SECRET \
--domain=https://secrets.kua.cl
- Verify Machine Identity has access to Production environment in Infisical UI
Storage Issues¶
Storage Box Not Mounted¶
Symptoms: /mnt/storagebox is empty or mount fails
Solution:
- Check mount status:
ssh root@SERVERIP "systemctl status mount-storagebox"
ssh root@SERVERIP "journalctl -u mount-storagebox"
- Check SSH key exists:
- Test manual mount:
S3 Access Denied¶
Symptoms: Services fail to access S3 storage
Solution:
- Test rclone:
- Check credentials in
.env:
- Verify endpoint URL is correct for your region
SSH Issues¶
Cannot SSH to Server¶
Symptoms: SSH connection refused or times out
Solution:
-
Check server is running (Hetzner Console)
-
Verify firewall allows SSH:
- Check SSH key is correct:
- If using Tailscale, try direct IP instead:
SSH Key Sync Not Working¶
Symptoms: New device cannot SSH after adding key to Infisical
Solution:
- Check sync script ran:
- Manually trigger sync:
- Verify device is in active list:
- In Infisical, check
SSH_KEYS_ACTIVE_DEVICESincludes the device name - Check
SSH_KEY_DEVICENAME_STATUSis set toactive
Database Issues¶
PostgreSQL Container Unhealthy¶
Symptoms: Services depending on postgres fail to start
Solution:
- Check postgres logs:
- Check disk space:
- Restart postgres:
Database Connection Refused¶
Symptoms: Services log "connection refused" to database
Solution:
- Verify postgres is running:
- Check services are on same network:
- Verify connection string in
.env:
DNS Issues¶
Domain Not Resolving¶
Symptoms: curl to domain times out or returns wrong IP
Solution:
- Check DNS:
-
If using Cloudflare, verify:
-
A record exists
- Points to correct IP
-
Proxy status is as expected (orange/gray cloud)
-
Clear local DNS cache:
SSL Certificate Invalid¶
Symptoms: Browser shows certificate warning
Solution:
- Check Traefik logs for ACME errors:
- Delete old certificates and restart:
ssh root@SERVERIP "rm /root/coder-core/services/production/letsencrypt/acme.json"
ssh root@SERVERIP "cd /root/coder-core/services/production && docker compose restart traefik"
- Wait for new certificates (check logs)
Quick Diagnostic Commands¶
# Check all containers
ssh root@SERVERIP "docker ps --format 'table {{.Names}}\t{{.Status}}'"
# Check Traefik logs
ssh root@SERVERIP "docker logs --tail 30 traefik"
# Check storage mounts
ssh root@SERVERIP "df -h | grep mnt"
# Test DNS
dig +short domain.kua.cl
# Test HTTPS
curl -sI https://domain.kua.cl | head -5
# Check .env file exists
ssh root@SERVERIP "ls -la /root/coder-core/services/production/.env"
# View .env contents (sensitive!)
ssh root@SERVERIP "cat /root/coder-core/services/production/.env"
Getting Help¶
If issues persist:
- Check container logs for specific error messages
- Review recent changes to configuration
- Verify all secrets are correctly set in Infisical
- Check Hetzner status page for infrastructure issues
Last updated: January 2026