Skip to content

Terraform Migration Guide

Safe migration from manual infrastructure to Terraform - Critical procedures for preserving production data.


CRITICAL: You Have Production Infrastructure

This guide is for migrating existing production infrastructure to Terraform, NOT setting up from scratch. You have:

  • 20+ containers running (Immich, Coolify, Infisical, n8n, Portainer)
  • 5GB+ photos in Immich on Storage Box mount
  • Critical secrets in Infisical (ENCRYPTION_KEY must be backed up)
  • Live databases (PostgreSQL, MongoDB, Redis)

ONE MISTAKE = DATA LOSS. Follow this guide exactly.


Table of Contents


Overview

The Challenge

You're in a unique situation:

What You Have: - ✅ Production Hetzner VPS (100.80.53.55) with 20+ containers - ✅ Immich: 5GB+ photos, using 5.033GB RAM - ✅ Coolify: Self-hosting platform (311MB RAM) - ✅ Infisical: Secrets manager (743.6MB RAM, contains ENCRYPTION_KEY) - ✅ Storage Box: Mounted at /mnt/storagebox via rclone - ✅ n8n, Portainer, PostgreSQL, Redis, Traefik all running

What You Want: - 🎯 Infrastructure as Code (Terraform) - 🎯 Multi-device management (MacBook, iPad, PC) - 🎯 Version control for infrastructure - 🎯 Disaster recovery via terraform apply

The Problem: - ❌ Can't just terraform import existing VPS (too risky) - ❌ Can't destroy and recreate (would lose data) - ❌ Need zero downtime migration

The Solution: Parallel Infrastructure

Strategy: 1. Keep existing VPS running (zero risk to production) 2. Terraform creates NEW VPS alongside it 3. Test restore procedures on new VPS 4. Switch DNS when verified 5. Destroy old VPS after 1 week verification period

Timeline: - Week 1: Backups and planning (CRITICAL - do not skip) - Week 2: Terraform for DNS and SSH keys (low risk) - Week 3: Create new VPS, test restore - Week 4: Cutover and verification


Why NOT Just terraform import

The Import Trap

What Seems Easy:

# This SEEMS like it would work:
terraform import hcloud_server.hetzner_vps 12345678
terraform plan  # "No changes needed"
terraform apply # Should be safe, right?

What Actually Happens:

Terraform will perform the following actions:

  # hcloud_server.hetzner_vps must be replaced
  -/+ resource "hcloud_server" "hetzner_vps" {
      ~ id = "12345678" -> (known after apply)
      ~ ... 50 other attributes ...

      # Reason: user_data changed (didn't exist before)
      # Reason: ssh_keys changed (managed differently)
      # Reason: firewall_ids changed (implicit vs explicit)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

⚠️ THIS WILL DESTROY YOUR SERVER AND ALL LOCAL DATA

Why Import Fails

1. Attribute Mismatches

Your running VPS has attributes Terraform doesn't know about: - cloud-init scripts that ran once (not in Terraform) - manual firewall rules added over time - SSH keys added via Hetzner console vs Terraform resources - labels/tags that might differ

Even one mismatch = Terraform wants to replace (destroy + recreate)

2. Storage Mount Complexity

Your /mnt/storagebox mount: - Setup via /etc/fstab or systemd mount unit - Configured with rclone parameters - Has specific permissions

Terraform doesn't track this → can't reproduce → forces replacement

3. Docker State

Your 20+ containers: - Have local volumes with database data - Have network configurations not tracked in Terraform - Have environment variables from various sources - Have restart policies set manually

Importing VPS ≠ importing Docker state

4. The "Known After Apply" Problem

Many attributes show as "(known after apply)" after import:

~ ipv4_address = "46.224.146.107" -> (known after apply)

This means: "I'll find out the IP AFTER I destroy and recreate"

Result: DNS breaks, Tailscale breaks, everything breaks.

Real Example from Community

From Terraform community (similar case):

User: "I imported my production server. terraform plan shows
      'must be replaced'. Is this normal?"

Expert: "Yes. Importing rarely achieves 'no changes'. Even small
        differences force replacement. For production, rebuild
        from scratch via Terraform is safer than importing."

User: "But I have data on the server..."

Expert: "Exactly why you shouldn't import. Back up, create new
        server with Terraform, restore data, destroy old one."

When Import IS Safe

Import works well for: - ✅ DNS records (no data, easy to verify) - ✅ SSH keys (additive, can't destroy data) - ✅ Firewall rules (can test without applying) - ✅ S3 buckets (metadata import, data stays intact)

Import is RISKY for: - ❌ VPS instances (complex state, data on disk) - ❌ Databases (data loss risk too high) - ❌ Any resource with local storage


The Hybrid Storage Reality

Performance Discovery (From Gemini Analysis)

Critical Insight: Not all storage is equal.

Storage Type Read/Write Speed Latency Use Case
Local NVMe (VPS disk) ~3000 MB/s <0.1ms Databases (Postgres, Redis, MongoDB)
Storage Box (network mount) ~50-100 MB/s Network latency Static files (photos, backups)

Performance Ratio: NVMe is 30-60x faster than Storage Box.

Your Current Architecture (CORRECT)

From SSH inspection:

docker inspect immich-server

# Mounts:
Mounts: [
  {
    "Type": "volume",
    "Source": "immich_pgdata",           # ✅ LOCAL NVMe (database)
    "Destination": "/var/lib/postgresql/data"
  },
  {
    "Type": "bind",
    "Source": "/mnt/storagebox/immich/upload",  # ✅ Storage Box (photos)
    "Destination": "/usr/src/app/upload"
  }
]

This is PERFECT: - ✅ Database on local NVMe (fast, low latency) - ✅ Photos on Storage Box (slow is OK for large files)

Why This Matters for Terraform: - VPS can be destroyed → recreated via Terraform - Database data backs up to Storage Box nightly - Photos already on Storage Box (persist through VPS rebuild) - Restore: Mount Storage Box → restore DB from backups → photos already there

The "Disposable Server" Architecture

Concept: Server is ephemeral, data is persistent.

┌─────────────────────────────────────────────────────┐
│              HETZNER VPS (Ephemeral)                │
│  ┌────────────────────────────────────────────────┐ │
│  │  LOCAL NVMe (Databases - Fast, Temporary)      │ │
│  │  - immich-postgres (local volume)              │ │
│  │  - coolify-db (local volume)                   │ │
│  │  - redis (local volume)                        │ │
│  │  - infisical mongodb (local volume)            │ │
│  │                                                 │ │
│  │  Nightly Backup Script (via cron):             │ │
│  │  3 AM: pg_dump → /mnt/storagebox/backups/     │ │
│  └────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
                     │ rclone mount (CIFS/WebDAV)
┌─────────────────────────────────────────────────────┐
│          STORAGE BOX (Persistent)                   │
│  ┌────────────────────────────────────────────────┐ │
│  │  /immich/upload/                               │ │
│  │    └─ Photos, videos (already here)            │ │
│  │                                                 │ │
│  │  /backups/                                     │ │
│  │    ├─ immich_db_20250115.sql                   │ │
│  │    ├─ coolify_db_20250115.sql                  │ │
│  │    ├─ infisical_env_20250115.gpg (CRITICAL!)   │ │
│  │    └─ ... (7 days retention)                   │ │
│  │                                                 │ │
│  │  /terraform-state/                             │ │
│  │    └─ prod/terraform.tfstate                   │ │
│  └────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

Disaster Recovery Flow: 1. VPS destroyed (hardware failure, accidental deletion, etc.) 2. terraform apply → Creates new VPS (10 min) 3. cloud-init script runs on boot: - Installs Docker, rclone - Mounts Storage Box at /mnt/storagebox - Restores databases from latest backups - Pulls docker-compose.yml from Git - Starts all containers 4. Services online (30 min total)

Data Preserved: - ✅ Photos (already on Storage Box) - ✅ Databases (restored from nightly backup) - ✅ Infisical ENCRYPTION_KEY (backed up to Storage Box) - ✅ Configuration (in Git)

Data Loss Window: Maximum 24 hours (since last backup)

Why Gemini Was Right

Gemini's Warning:

"Don't run the DB live on Storage Box. It will kill performance."

Explanation: - PostgreSQL does thousands of small random reads/writes per second - Network storage latency: ~1-5ms per operation - Result: Queries that take 10ms on NVMe take 1000ms on Storage Box - Immich would be unusable (slow photo loading, timeouts)

You Already Avoided This Mistake: - Your current setup has DBs on local volumes ✅ - Only static files (photos) on Storage Box ✅ - This is the CORRECT architecture

For Terraform Migration: - New VPS must replicate this pattern - cloud-init script must mount Storage Box first - Docker volumes must be local, not on /mnt/storagebox/


Migration Strategy

Decision Matrix: Import vs Rebuild

Resource Import? Rebuild? Why
Existing VPS ❌ NO ✅ YES Too risky, data preservation complex, attribute mismatches
DNS records ✅ YES - Safe, no data loss risk, easy to verify
SSH keys ✅ YES - Additive operation, can't break existing access
Firewall rules ✅ YES - Can test with terraform plan before applying
Storage Box ❌ NO Manual Keep mounted, outside Terraform scope, managed by Hetzner
Docker containers ❌ NO docker-compose NOT Terraform's job, too dynamic
Databases ❌ NO Restore from backup Data managed separately

Three-Phase Approach

Phase 0: Pre-Migration Backup (Week 1) - Backup EVERYTHING (databases, ENCRYPTION_KEY, configs) - Test restore procedures - Document current state - Verify backups are complete and restorable

Phase 1: Low-Risk Resources (Week 2) - Import DNS records to Terraform - Add SSH public keys via Terraform - Test: Can still access servers? Is DNS working?

Phase 2: Parallel Infrastructure (Week 3) - Terraform creates NEW VPS (hetzner-vps-v2) - Install Docker, mount Storage Box - Restore databases from backups - Test: Do services work on new VPS?

Phase 3: Cutover (Week 4) - Update DNS to point to new VPS (via Terraform) - Monitor for 1 week - If successful, destroy old VPS manually

Risk Assessment

Phase Risk Level Impact if Failed Rollback Time
Phase 0 CRITICAL Data loss if skipped N/A (prevention)
Phase 1 Low DNS temporarily broken 5 minutes (git revert)
Phase 2 Medium Wasted effort 0 (old VPS still running)
Phase 3 Medium-High Service downtime 15 minutes (DNS rollback)

Mitigation: - Phase 0: Multiple backup locations, test restores - Phase 1: Terraform plan verification, DNS TTL reduction - Phase 2: Parallel running (old VPS unaffected) - Phase 3: 1 week verification before destroying old VPS


Phase 0: Pre-Migration Backup

DO NOT SKIP THIS PHASE

30% of infrastructure migrations fail due to inadequate backups. Spend the time to do this right.

Backup Checklist

1. Database Dumps

# SSH to existing VPS
ssh kavi@100.80.53.55

# Create backup directory
mkdir -p /mnt/storagebox/backups/pre-terraform-migration
cd /mnt/storagebox/backups/pre-terraform-migration

# Immich database
docker exec immich-postgres pg_dump -U postgres immich > immich_db_$(date +%Y%m%d).sql
gzip immich_db_$(date +%Y%m%d).sql

# Coolify database
docker exec coolify-db pg_dump -U postgres coolify > coolify_db_$(date +%Y%m%d).sql
gzip coolify_db_$(date +%Y%m%d).sql

# Infisical database (MongoDB)
docker exec infisical-mongo mongodump --archive=infisical_db_$(date +%Y%m%d).archive --gzip

Verify:

ls -lh /mnt/storagebox/backups/pre-terraform-migration/

# Expected output:
# -rw-r--r-- 1 kavi kavi  245M Jan 15 10:30 immich_db_20250115.sql.gz
# -rw-r--r-- 1 kavi kavi   12M Jan 15 10:31 coolify_db_20250115.sql.gz
# -rw-r--r-- 1 kavi kavi    8M Jan 15 10:32 infisical_db_20250115.archive

Test Restore (critical):

# On local machine or test container:
gunzip -c immich_db_20250115.sql.gz | docker exec -i test-postgres psql -U postgres

# Should complete without errors
# Verify row counts match production

2. Infisical ENCRYPTION_KEY (MOST CRITICAL)

ENCRYPTION_KEY = ALL SECRETS

If you lose this key, ALL secrets in Infisical are permanently lost. No recovery possible.

Location: ~/infisical/.env on Hetzner VPS

# Backup 1: Storage Box (GPG encrypted)
ssh kavi@100.80.53.55
gpg --symmetric --cipher-algo AES256 ~/infisical/.env
# Enter strong passphrase (store in password manager)
mv ~/infisical/.env.gpg /mnt/storagebox/backups/pre-terraform-migration/infisical_env_$(date +%Y%m%d).gpg

# Backup 2: Local encrypted copy
scp kavi@100.80.53.55:/mnt/storagebox/backups/pre-terraform-migration/infisical_env_$(date +%Y%m%d).gpg ~/Backups/

# Backup 3: Kimsufi server (secondary location)
scp /mnt/storagebox/backups/pre-terraform-migration/infisical_env_$(date +%Y%m%d).gpg ubuntu@100.81.231.36:~/backups/

# Backup 4: Print to paper (optional but paranoid-safe)
cat ~/infisical/.env
# Write down ENCRYPTION_KEY value on paper, store in safe

Test Decryption:

# Verify you can decrypt
gpg --decrypt infisical_env_20250115.gpg > test_decrypt.txt
cat test_decrypt.txt
# Should see ENCRYPTION_KEY=... and other vars
rm test_decrypt.txt

3. Docker Volumes

# List all volumes
docker volume ls

# Backup Immich data volume (if exists)
docker run --rm \
  -v immich_data:/data \
  -v /mnt/storagebox/backups/pre-terraform-migration:/backup \
  ubuntu tar czf /backup/immich_data_$(date +%Y%m%d).tar.gz /data

# Backup Coolify volumes
docker run --rm \
  -v coolify_data:/data \
  -v /mnt/storagebox/backups/pre-terraform-migration:/backup \
  ubuntu tar czf /backup/coolify_data_$(date +%Y%m%d).tar.gz /data

Size Check:

du -sh /mnt/storagebox/backups/pre-terraform-migration/*.tar.gz

# If any volume is >10GB, consider:
# - Selective backup (exclude cache directories)
# - Split into chunks
# - Verify Storage Box has space

4. Configuration Files

# Docker Compose files
mkdir -p /mnt/storagebox/backups/pre-terraform-migration/configs
cp ~/docker-compose.yml /mnt/storagebox/backups/pre-terraform-migration/configs/
cp ~/*/docker-compose.yml /mnt/storagebox/backups/pre-terraform-migration/configs/

# Systemd units (if any)
cp /etc/systemd/system/immich* /mnt/storagebox/backups/pre-terraform-migration/configs/ 2>/dev/null || true

# Nginx/Traefik configs (if any)
cp -r ~/traefik/ /mnt/storagebox/backups/pre-terraform-migration/configs/ 2>/dev/null || true

# Storage Box mount config
cp /etc/fstab /mnt/storagebox/backups/pre-terraform-migration/configs/fstab.backup
cp /etc/rclone.conf /mnt/storagebox/backups/pre-terraform-migration/configs/rclone.conf.backup 2>/dev/null || true

5. Current State Documentation

# Container list with details
docker ps -a --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" > /mnt/storagebox/backups/pre-terraform-migration/containers_list.txt

# Full container inspection
for container in $(docker ps -q); do
  docker inspect $container > /mnt/storagebox/backups/pre-terraform-migration/inspect_$(docker inspect --format='{{.Name}}' $container).json
done

# Network configuration
docker network ls > /mnt/storagebox/backups/pre-terraform-migration/networks.txt
ip addr > /mnt/storagebox/backups/pre-terraform-migration/ip_addr.txt

# Disk usage
df -h > /mnt/storagebox/backups/pre-terraform-migration/disk_usage.txt
du -sh /var/lib/docker/volumes/* > /mnt/storagebox/backups/pre-terraform-migration/volume_sizes.txt

# Installed packages (for cloud-init script reference)
dpkg -l > /mnt/storagebox/backups/pre-terraform-migration/installed_packages.txt

# Systemd services
systemctl list-units --type=service --state=running > /mnt/storagebox/backups/pre-terraform-migration/running_services.txt

Backup Verification

Create verification script:

cat > /mnt/storagebox/backups/pre-terraform-migration/VERIFY.sh << 'EOF'
#!/bin/bash
set -e

echo "=== Backup Verification ==="

# Check database dumps exist and are not empty
for db in immich_db coolify_db; do
  if [ ! -f ${db}_*.sql.gz ]; then
    echo "❌ FAIL: ${db} backup missing"
    exit 1
  fi

  size=$(stat -f%z ${db}_*.sql.gz 2>/dev/null || stat -c%s ${db}_*.sql.gz)
  if [ $size -lt 1000000 ]; then  # Less than 1MB
    echo "❌ FAIL: ${db} backup suspiciously small ($size bytes)"
    exit 1
  fi

  echo "✅ PASS: ${db} backup exists ($size bytes)"
done

# Check ENCRYPTION_KEY backup
if [ ! -f infisical_env_*.gpg ]; then
  echo "❌ FAIL: Infisical ENCRYPTION_KEY backup missing"
  exit 1
fi
echo "✅ PASS: Infisical ENCRYPTION_KEY backed up"

# Check configs
if [ ! -f configs/docker-compose.yml ]; then
  echo "⚠️  WARN: docker-compose.yml not backed up"
fi

echo ""
echo "=== Verification Complete ==="
echo "Backup date: $(ls immich_db_*.sql.gz | grep -o '[0-9]\{8\}')"
echo "Total backup size: $(du -sh . | cut -f1)"
echo ""
echo "Next steps:"
echo "1. Test database restore on local machine"
echo "2. Verify ENCRYPTION_KEY can be decrypted"
echo "3. Proceed to Phase 1 only after successful restore test"
EOF

chmod +x /mnt/storagebox/backups/pre-terraform-migration/VERIFY.sh
./VERIFY.sh

Test Restore Procedure

Critical: Actually test restoring before migration.

On local machine or test VPS:

# 1. Start test PostgreSQL
docker run -d --name test-postgres -e POSTGRES_PASSWORD=test postgres:16

# 2. Download backup
scp kavi@100.80.53.55:/mnt/storagebox/backups/pre-terraform-migration/immich_db_*.sql.gz ./

# 3. Restore
gunzip -c immich_db_*.sql.gz | docker exec -i test-postgres psql -U postgres -c "CREATE DATABASE immich"
gunzip -c immich_db_*.sql.gz | docker exec -i test-postgres psql -U postgres immich

# 4. Verify
docker exec test-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM users;"
docker exec test-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM assets;"

# Expected: Row counts match production

# 5. Cleanup
docker stop test-postgres && docker rm test-postgres

If restore fails: - ❌ DO NOT PROCEED to Phase 1 - Fix backup process - Retry until restore succeeds

Backup Completion Checklist

Before proceeding to Phase 1:

  • [ ] Database dumps created and verified (immich, coolify, infisical)
  • [ ] Infisical ENCRYPTION_KEY backed up to 4 locations
  • [ ] Docker volumes backed up
  • [ ] Configuration files backed up
  • [ ] Current state documented (containers, networks, disk usage)
  • [ ] Backup verification script passes
  • [ ] Test restore completed successfully (CRITICAL)
  • [ ] Backups stored in at least 2 physical locations
  • [ ] Decryption passphrase stored in password manager
  • [ ] Total backup size fits on Storage Box with room to spare

Only proceed to Phase 1 after ALL items checked.


Phase 1: Low-Risk Resources

Goal

Migrate resources to Terraform that have: - ✅ No data loss risk - ✅ Easy rollback - ✅ Can verify with terraform plan before applying

Resources: DNS records, SSH public keys, firewall rules

Step 1: Setup Terraform

On MacBook/iPad/PC:

# Install Terraform (macOS)
brew tap hashicorp/tap
brew install hashicorp/tap/terraform

# Verify
terraform --version
# Expected: Terraform v1.6.0 or higher

Create project directory:

mkdir -p ~/Coding/terraform-infra
cd ~/Coding/terraform-infra

Initialize Git:

git init
cat > .gitignore << 'EOF'
# Terraform
.terraform/
*.tfstate
*.tfstate.*
terraform.tfvars
*.tfvars

# Secrets
*.env
.env.*

# macOS
.DS_Store
EOF

git add .gitignore
git commit -m "Initial commit: .gitignore"

Step 2: Configure Providers

Create terraform.tf:

terraform {
  required_version = ">= 1.6.0"

  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = "~> 1.45"
    }

    cloudflare = {
      source  = "cloudflare/cloudflare"
      version = "~> 4.20"
    }
  }

  # Remote state (we'll configure this in Phase 1, Step 5)
  # backend "s3" {
  #   # Storage Box S3 backend
  # }
}

# Providers
provider "hcloud" {
  token = var.hcloud_token
}

provider "cloudflare" {
  api_token = var.cloudflare_api_token
}

Create variables.tf:

variable "hcloud_token" {
  description = "Hetzner Cloud API token"
  type        = string
  sensitive   = true
}

variable "cloudflare_api_token" {
  description = "Cloudflare API token"
  type        = string
  sensitive   = true
}

variable "cloudflare_zone_id" {
  description = "Cloudflare zone ID for kua.cl"
  type        = string
}

Get API tokens from Infisical:

# Fetch tokens
ssh kavi@100.80.53.55
infisical export --env=prod --projectId=personal-vault --format=dotenv | grep -E '(HCLOUD_TOKEN|CLOUDFLARE_API_TOKEN|CLOUDFLARE_ZONE_ID)'

# Copy values, then create terraform.tfvars on local machine
cat > ~/Coding/terraform-infra/terraform.tfvars << 'EOF'
hcloud_token          = "YOUR_HCLOUD_TOKEN_HERE"
cloudflare_api_token  = "YOUR_CLOUDFLARE_API_TOKEN_HERE"
cloudflare_zone_id    = "YOUR_CLOUDFLARE_ZONE_ID_HERE"
EOF

# Verify .gitignore prevents committing secrets
git status
# Should NOT show terraform.tfvars

Initialize Terraform:

cd ~/Coding/terraform-infra
terraform init

# Expected output:
# Terraform has been successfully initialized!

Step 3: Import DNS Records

List current DNS records (via Cloudflare API or dashboard):

# Via curl (if you have CLOUDFLARE_API_TOKEN):
curl -X GET "https://api.cloudflare.com/client/v4/zones/${CLOUDFLARE_ZONE_ID}/dns_records" \
  -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
  -H "Content-Type: application/json" | jq '.result[] | {name, type, content, id}'

Create dns.tf:

# Root A record (kua.cl → Hetzner VPS)
resource "cloudflare_record" "root" {
  zone_id = var.cloudflare_zone_id
  name    = "@"
  value   = "46.224.146.107"  # Current IP (will be dynamic later)
  type    = "A"
  proxied = false
  ttl     = 3600
}

# Service CNAMEs
resource "cloudflare_record" "secrets" {
  zone_id = var.cloudflare_zone_id
  name    = "secrets"
  value   = "kua.cl"
  type    = "CNAME"
  proxied = false
  ttl     = 3600
}

resource "cloudflare_record" "plex" {
  zone_id = var.cloudflare_zone_id
  name    = "plex"
  value   = "kua.cl"
  type    = "CNAME"
  proxied = false
  ttl     = 3600
}

resource "cloudflare_record" "overseerr" {
  zone_id = var.cloudflare_zone_id
  name    = "overseerr"
  value   = "kua.cl"
  type    = "CNAME"
  proxied = false
  ttl     = 3600
}

resource "cloudflare_record" "media" {
  zone_id = var.cloudflare_zone_id
  name    = "media"
  value   = "kua.cl"
  type    = "CNAME"
  proxied = false
  ttl     = 3600
}

Import existing records:

# Get record IDs from Cloudflare API output above
terraform import cloudflare_record.root <RECORD_ID_FOR_ROOT>
terraform import cloudflare_record.secrets <RECORD_ID_FOR_SECRETS>
terraform import cloudflare_record.plex <RECORD_ID_FOR_PLEX>
terraform import cloudflare_record.overseerr <RECORD_ID_FOR_OVERSEERR>
terraform import cloudflare_record.media <RECORD_ID_FOR_MEDIA>

Verify import:

terraform plan

# Expected output:
# No changes. Your infrastructure matches the configuration.

If plan shows changes: - Check: Does Terraform want to modify attributes (ttl, proxied, etc.)? - Adjust dns.tf to match actual values - Re-run terraform plan until it shows "No changes"

Commit:

git add dns.tf terraform.tf variables.tf
git commit -m "feat(dns): import existing DNS records to Terraform"

Step 4: Add SSH Keys

Create ssh-keys.tf:

resource "hcloud_ssh_key" "macbook" {
  name       = "kavi-macbook"
  public_key = file("${path.module}/keys/id_ed25519_macbook.pub")
}

resource "hcloud_ssh_key" "ipad" {
  name       = "kavi-ipad"
  public_key = file("${path.module}/keys/id_ed25519_ipad.pub")
}

resource "hcloud_ssh_key" "pc" {
  name       = "kavi-pc"
  public_key = file("${path.module}/keys/id_ed25519_pc.pub")
}

# Output SSH key IDs for use in VPS resources
output "ssh_key_ids" {
  description = "SSH key IDs for all devices"
  value = [
    hcloud_ssh_key.macbook.id,
    hcloud_ssh_key.ipad.id,
    hcloud_ssh_key.pc.id
  ]
}

Add public keys:

mkdir ~/Coding/terraform-infra/keys

# Copy public keys from each device
cat ~/.ssh/id_ed25519_macbook.pub > ~/Coding/terraform-infra/keys/id_ed25519_macbook.pub
# (Repeat for ipad, pc)

# Verify format
cat ~/Coding/terraform-infra/keys/id_ed25519_macbook.pub
# Should start with: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5...

Apply (creates SSH keys in Hetzner):

terraform plan
# Expected:
# + hcloud_ssh_key.macbook
# + hcloud_ssh_key.ipad
# + hcloud_ssh_key.pc
# Plan: 3 to add, 0 to change, 0 to destroy

terraform apply
# Type: yes

Verify:

# Hetzner Cloud Console → SSH Keys
# Should see: kavi-macbook, kavi-ipad, kavi-pc

# Existing VPS still accessible (keys are additive)
ssh kavi@100.80.53.55
# Should work

Commit:

git add ssh-keys.tf keys/
git commit -m "feat(ssh): add device SSH keys to Terraform"

Step 5: Remote State Backend

Why: Enable multi-device access to Terraform state.

Create S3 bucket on Storage Box:

# Using s3cmd or AWS CLI configured for Storage Box
s3cmd mb s3://terraform-state

# Or via Hetzner console (Storage Box → Create bucket)

Update terraform.tf (uncomment backend):

terraform {
  # ... (providers stay same)

  backend "s3" {
    bucket                      = "terraform-state"
    key                         = "prod/terraform.tfstate"
    region                      = "us-east-1"  # Dummy for compatibility
    endpoint                    = "https://u522581.your-storagebox.de"
    skip_credentials_validation = true
    skip_region_validation      = true
    skip_metadata_api_check     = true
  }
}

Migrate state:

terraform init -migrate-state

# Terraform will ask: Do you want to copy state to new backend?
# Answer: yes

# Verify
ls -la  # terraform.tfstate should be gone (moved to remote)

Test multi-device access:

# On iPad or PC:
git clone <repo-url>
cd terraform-infra

# Add terraform.tfvars (with API tokens)

terraform init
# Should download state from Storage Box

terraform plan
# Should show: No changes (state synchronized)

Phase 1 Completion Checklist

  • [ ] Terraform installed on all devices
  • [ ] DNS records imported and plan shows "No changes"
  • [ ] SSH keys added to Hetzner (verified via console)
  • [ ] Existing SSH access still works
  • [ ] Remote state backend configured
  • [ ] Multi-device access tested (MacBook + iPad can both run terraform plan)
  • [ ] Git repository created and committed
  • [ ] Secrets NOT committed (verify git log -- terraform.tfvars is empty)

Proceed to Phase 2 only after all items checked.


Phase 2: Parallel Infrastructure

Goal

Create a NEW Hetzner VPS via Terraform while keeping the existing one running. Test restore procedures on the new VPS before cutover.

Step 1: Define New VPS Resource

Create vps.tf:

# New VPS (will be created alongside existing one)
resource "hcloud_server" "hetzner_vps_v2" {
  name        = "hetzner-vps-v2"
  server_type = "cpx42"
  image       = "ubuntu-22.04"
  location    = "fsn1"

  ssh_keys = [
    hcloud_ssh_key.macbook.id,
    hcloud_ssh_key.ipad.id,
    hcloud_ssh_key.pc.id
  ]

  # cloud-init script (we'll define this next)
  user_data = file("${path.module}/cloud-init.yml")

  labels = {
    managed_by = "terraform"
    environment = "production"
  }

  lifecycle {
    # Prevent accidental deletion
    prevent_destroy = true
  }
}

# Output IP for DNS update
output "new_vps_ip" {
  description = "IPv4 address of new VPS"
  value       = hcloud_server.hetzner_vps_v2.ipv4_address
}

Step 2: Create cloud-init Script

Purpose: Automate VPS setup on first boot.

Create cloud-init.yml:

#cloud-config

# Install packages
packages:
  - docker.io
  - docker-compose
  - rclone
  - curl
  - git
  - gpg

# Setup Docker
runcmd:
  # Enable Docker
  - systemctl enable docker
  - systemctl start docker
  - usermod -aG docker root

  # Create user (matching existing VPS)
  - useradd -m -s /bin/bash kavi
  - usermod -aG docker kavi
  - mkdir -p /home/kavi/.ssh
  - chmod 700 /home/kavi/.ssh
  - chown kavi:kavi /home/kavi/.ssh

  # Configure rclone for Storage Box
  - mkdir -p /root/.config/rclone
  - |
    cat > /root/.config/rclone/rclone.conf << 'RCLONE_EOF'
    [storagebox]
    type = webdav
    url = https://u522581.your-storagebox.de
    vendor = other
    user = u522581
    pass = YOUR_ENCRYPTED_PASSWORD_HERE  # Use `rclone obscure` to generate
    RCLONE_EOF

  # Mount Storage Box
  - mkdir -p /mnt/storagebox
  - rclone mount storagebox:/ /mnt/storagebox --daemon --vfs-cache-mode writes --allow-other

  # Wait for mount (give it 10 seconds)
  - sleep 10

  # Restore Infisical ENCRYPTION_KEY
  - mkdir -p /root/infisical
  - gpg --decrypt /mnt/storagebox/backups/pre-terraform-migration/infisical_env_$(date +%Y%m%d).gpg > /root/infisical/.env || true
  # (Requires GPG passphrase - handle via Infisical or manual)

  # Download docker-compose.yml from Git
  - git clone https://github.com/kavi/infra-config.git /opt/infra-config
  - ln -s /opt/infra-config/docker-compose.yml /root/docker-compose.yml

  # Restore databases (PostgreSQL example)
  - docker run -d --name temp-postgres -e POSTGRES_PASSWORD=temp postgres:16
  - sleep 5
  - gunzip -c /mnt/storagebox/backups/pre-terraform-migration/immich_db_*.sql.gz | docker exec -i temp-postgres psql -U postgres
  # (Add similar commands for other databases)

  # Start services
  - cd /root && docker-compose up -d

  # Cleanup
  - docker stop temp-postgres && docker rm temp-postgres

# Final message
final_message: "VPS setup complete. Services starting..."

Important: - Replace YOUR_ENCRYPTED_PASSWORD_HERE with output of rclone obscure YOUR_STORAGE_BOX_PASSWORD - GPG decryption requires passphrase (consider storing in Infisical or manual intervention) - Adjust database restore commands for your specific databases

Step 3: Create New VPS

terraform plan

# Expected output:
# + hcloud_server.hetzner_vps_v2
# Plan: 1 to add, 0 to change, 0 to destroy

terraform apply
# Type: yes

# Wait for creation (~2-3 minutes)
# Watch output for new VPS IP

Get new IP:

terraform output new_vps_ip
# Example: 46.224.XXX.XXX

Step 4: Verify New VPS

SSH to new VPS:

# Add to ~/.ssh/config temporarily:
Host hetzner-v2
    HostName <NEW_VPS_IP>
    User kavi
    IdentityFile ~/.ssh/id_ed25519_macbook

ssh hetzner-v2

Check cloud-init progress:

# On new VPS:
cloud-init status

# Expected:
# status: done

# If still running:
tail -f /var/log/cloud-init-output.log

Verify services:

# Docker running?
docker ps

# Storage Box mounted?
ls -la /mnt/storagebox/

# Immich container running?
docker ps | grep immich

# Can access Immich?
curl -I http://localhost:3001
# Expected: HTTP/1.1 200 OK

Test from external:

# Add Tailscale to new VPS
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

# Get Tailscale IP
tailscale ip
# Example: 100.80.XX.XX

# From MacBook, test access:
curl -I http://100.80.XX.XX:3001
# Should return Immich response

Step 5: Data Integrity Verification

On new VPS, verify restored data:

# Immich database row counts
docker exec immich-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM users;"
docker exec immich-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM assets;"

# Compare with old VPS:
ssh kavi@100.80.53.55
docker exec immich-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM users;"
# Numbers should match

Check photos:

# New VPS:
ls /mnt/storagebox/immich/upload/ | head -20
# Should show photos

# Old VPS:
ssh kavi@100.80.53.55
ls /mnt/storagebox/immich/upload/ | head -20
# Should match (same Storage Box mount)

Infisical ENCRYPTION_KEY:

# New VPS:
cat /root/infisical/.env | grep ENCRYPTION_KEY

# Old VPS:
ssh kavi@100.80.53.55
cat ~/infisical/.env | grep ENCRYPTION_KEY

# Values should match EXACTLY

Step 6: Parallel Running Period

Run both VPS for 1 week:

  • ✅ Old VPS: Continue serving production traffic (DNS points here)
  • ✅ New VPS: Test via Tailscale IP, verify all services work

Test new VPS services:

# From MacBook (via Tailscale):
NEW_VPS_IP="100.80.XX.XX"  # New VPS Tailscale IP

# Immich
curl -I http://${NEW_VPS_IP}:3001
# Expected: HTTP 200

# Infisical
curl -I http://${NEW_VPS_IP}:8080
# Expected: HTTP 200

# n8n
curl -I http://${NEW_VPS_IP}:5678
# Expected: HTTP 200

Monitor resource usage:

ssh hetzner-v2
docker stats

# Compare with old VPS
ssh kavi@100.80.53.55
docker stats

# Usage should be similar

Phase 2 Completion Checklist

  • [ ] New VPS created via Terraform
  • [ ] cloud-init script executed successfully
  • [ ] Docker installed and running
  • [ ] Storage Box mounted at /mnt/storagebox
  • [ ] Databases restored from backups
  • [ ] Database row counts match old VPS
  • [ ] Infisical ENCRYPTION_KEY matches old VPS
  • [ ] All containers running (docker ps shows 20+ containers)
  • [ ] Services accessible via Tailscale IP
  • [ ] Resource usage (RAM, CPU) similar to old VPS
  • [ ] Both VPS running in parallel for testing

Proceed to Phase 3 only after all items checked and 1 week verification period.


Phase 3: Cutover and Verification

Goal

Switch production traffic to new VPS, monitor for issues, then decommission old VPS.

Step 1: DNS Cutover Preparation

Reduce DNS TTL (before cutover):

Update dns.tf:

resource "cloudflare_record" "root" {
  zone_id = var.cloudflare_zone_id
  name    = "@"
  value   = "46.224.146.107"  # Still old IP
  type    = "A"
  proxied = false
  ttl     = 300  # Reduced from 3600 to 5 minutes
}

Apply:

terraform apply
# Wait 1 hour for old TTL to expire

Step 2: Update DNS to New VPS

Update dns.tf to use new VPS IP:

# Use dynamic reference to new VPS IP
resource "cloudflare_record" "root" {
  zone_id = var.cloudflare_zone_id
  name    = "@"
  value   = hcloud_server.hetzner_vps_v2.ipv4_address  # Dynamic!
  type    = "A"
  proxied = false
  ttl     = 300
}

Plan and verify:

terraform plan

# Expected output:
# ~ cloudflare_record.root
#   value: "46.224.146.107" -> "46.224.XXX.XXX"  # New IP

Apply during maintenance window:

# Choose low-traffic time (e.g., 3 AM)
terraform apply
# Type: yes

# DNS propagation: ~5-15 minutes

Step 3: Monitor Cutover

Verify DNS propagation:

# Check DNS resolution
dig +short kua.cl
# Should show new VPS IP (46.224.XXX.XXX)

# From different DNS servers
dig +short kua.cl @8.8.8.8
dig +short kua.cl @1.1.1.1
# All should show new IP

Monitor service access:

# Test services
curl -I https://secrets.kua.cl  # Infisical
curl -I https://plex.kua.cl     # Plex
curl -I https://media.kua.cl    # KaviCloud

# All should return HTTP 200

Check logs:

ssh hetzner-v2
docker logs immich-server --tail 100
docker logs infisical-api --tail 100

# Look for errors or unusual activity

Monitor resource usage:

docker stats
# CPU and RAM should be normal

df -h
# Disk usage should be stable

Step 4: Verification Period (1 Week)

Daily checks:

# Day 1-7: Run these tests daily

# Services responding?
curl -I https://kua.cl
curl -I https://secrets.kua.cl
curl -I https://plex.kua.cl

# Database integrity?
ssh hetzner-v2
docker exec immich-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM assets;"
# Count should increase as new photos added

# Backups running?
ls -lt /mnt/storagebox/backups/ | head -5
# Should see fresh backups from new VPS

# Disk space OK?
df -h | grep sda
# Should not be filling up

Verify old VPS is idle:

ssh kavi@100.80.53.55

# Check web server logs (should be minimal traffic)
docker logs traefik --tail 100 | wc -l
# Should be very low (only old DNS cache hits)

# Check CPU (should be idle)
top
# %CPU should be <5%

Step 5: Decommission Old VPS

After 1 week with no issues:

Final backup from old VPS (paranoid safety):

ssh kavi@100.80.53.55

# Database dumps (one last time)
docker exec immich-postgres pg_dump -U postgres immich > /mnt/storagebox/backups/old-vps-final/immich_db_$(date +%Y%m%d).sql

# Any local-only data
tar czf /mnt/storagebox/backups/old-vps-final/var_lib_docker_$(date +%Y%m%d).tar.gz /var/lib/docker/volumes/

Stop services on old VPS:

ssh kavi@100.80.53.55
docker-compose down

# Verify stopped
docker ps
# Should be empty

Wait 24 hours (ensure no issues from stopping old VPS)

Destroy old VPS (manual, NOT Terraform):

# Via Hetzner Cloud Console:
# Servers → hetzner-vps (old) → Delete

# Or via hcloud CLI:
hcloud server delete hetzner-vps

Cleanup Terraform: Remove prevent_destroy and rename resource

Update vps.tf:

# Rename resource (now the primary VPS)
resource "hcloud_server" "hetzner_vps" {  # Removed "_v2"
  name        = "hetzner-vps"              # Removed "-v2"
  server_type = "cpx42"
  # ... (rest stays same)

  lifecycle {
    # Can remove prevent_destroy now that testing complete
  }
}

Terraform state update:

# Since we renamed the resource, update state
terraform state mv hcloud_server.hetzner_vps_v2 hcloud_server.hetzner_vps

terraform plan
# Expected: No changes (just renamed)

Commit:

git add vps.tf
git commit -m "feat(vps): migrate production to Terraform-managed VPS"
git push

Phase 3 Completion Checklist

  • [ ] DNS TTL reduced before cutover
  • [ ] DNS updated to new VPS IP
  • [ ] DNS propagation verified (dig shows new IP)
  • [ ] All services accessible on new VPS
  • [ ] 1 week verification period completed
  • [ ] No errors in logs
  • [ ] Database integrity verified daily
  • [ ] Backups running on new VPS
  • [ ] Old VPS idle (confirmed via logs and CPU usage)
  • [ ] Final backup from old VPS completed
  • [ ] Old VPS services stopped
  • [ ] Old VPS destroyed (manually)
  • [ ] Terraform state updated (resource renamed)
  • [ ] Git repository updated and pushed

Migration complete! 🎉


Rollback Procedures

Scenario 1: Issues During Phase 1 (DNS/SSH Import)

Problem: DNS broken after Terraform import.

Rollback:

# Revert Terraform changes
git log --oneline
git revert <COMMIT_HASH>

terraform plan
terraform apply

# DNS should return to previous state in 5-15 minutes

Alternative: Manually fix DNS via Cloudflare console.

Scenario 2: Issues During Phase 2 (New VPS)

Problem: New VPS not working correctly.

Rollback: None needed! Old VPS still running production.

Fix:

# Destroy new VPS
terraform destroy -target=hcloud_server.hetzner_vps_v2

# Fix cloud-init script or restore procedure

# Recreate
terraform apply

Scenario 3: Issues After DNS Cutover (Phase 3)

Problem: Services down or degraded after switching DNS.

Immediate Rollback:

Update dns.tf back to old IP:

resource "cloudflare_record" "root" {
  zone_id = var.cloudflare_zone_id
  name    = "@"
  value   = "46.224.146.107"  # Old VPS IP
  type    = "A"
  proxied = false
  ttl     = 300
}

Apply:

terraform apply
# DNS will propagate in 5-15 minutes (TTL is 300s)

Alternative: Emergency DNS change via Cloudflare console (faster than Terraform).

Fix new VPS issues, then retry cutover.

Scenario 4: Data Loss on New VPS

Problem: Database or photos missing on new VPS.

Recovery:

# SSH to new VPS
ssh hetzner-v2

# Re-mount Storage Box (if unmounted)
rclone mount storagebox:/ /mnt/storagebox --daemon

# Restore databases again from backups
cd /mnt/storagebox/backups/pre-terraform-migration/
gunzip -c immich_db_*.sql.gz | docker exec -i immich-postgres psql -U postgres immich

# Restart containers
docker-compose restart

If photos missing: Check Storage Box mount and /mnt/storagebox/immich/upload/

Scenario 5: Infisical ENCRYPTION_KEY Lost

Problem: ENCRYPTION_KEY not restored correctly on new VPS.

Recovery:

# SSH to old VPS (if still running)
ssh kavi@100.80.53.55
cat ~/infisical/.env | grep ENCRYPTION_KEY

# Copy to new VPS
ssh hetzner-v2
echo "ENCRYPTION_KEY=<VALUE_FROM_OLD_VPS>" >> /root/infisical/.env

# Restart Infisical
docker-compose restart infisical-api

If old VPS destroyed: Restore from backup

# Decrypt backup
gpg --decrypt /mnt/storagebox/backups/pre-terraform-migration/infisical_env_*.gpg > /tmp/infisical.env

# Copy ENCRYPTION_KEY to /root/infisical/.env

Summary

Migration Timeline

Week Phase Activities Risk Level
Week 1 Phase 0 Backups, test restores, documentation CRITICAL (must complete)
Week 2 Phase 1 Import DNS, add SSH keys, remote state Low
Week 3 Phase 2 Create new VPS, test services, parallel run Medium
Week 4 Phase 3 DNS cutover, 1 week verification, decommission Medium-High

Success Criteria

  • ✅ Zero data loss
  • ✅ Zero downtime (parallel infrastructure approach)
  • ✅ All services running on Terraform-managed VPS
  • ✅ Can destroy VPS → terraform apply → restore in 45 minutes
  • ✅ Multi-device workflow (MacBook, iPad, PC all manage infrastructure)

What You Achieved

Before Migration: - ❌ Manual VPS management - ❌ No version control for infrastructure - ❌ Disaster recovery = hours of manual work - ❌ Multi-device coordination difficult

After Migration: - ✅ Infrastructure as Code (entire setup in Git) - ✅ Declarative DNS, SSH keys, VPS configuration - ✅ Disaster recovery: terraform apply (30 minutes) - ✅ Multi-device access via remote state (MacBook, iPad, PC) - ✅ Audit trail (Git history shows all infrastructure changes)

Next Steps

  1. Setup Nightly Backups (see docs/runbooks/backup-procedures.md)
  2. Database dumps to Storage Box (3 AM cron job)
  3. Terraform state backups (already handled by remote backend)

  4. Create cloud-init Restore Script (see docs/terraform/cloud-init.md)

  5. Auto-restore from Storage Box backups on VPS creation
  6. Fully automated disaster recovery

  7. Document Terraform Workflow (see docs/terraform/workflow.md)

  8. Daily operations (plan, apply)
  9. Common tasks (adding services, updating configs)

  10. Implement Monitoring (optional)

  11. Uptime monitoring for services
  12. Disk space alerts
  13. Backup verification

Migration complete. Infrastructure now managed as code. 🚀