Terraform Migration Guide¶
Safe migration from manual infrastructure to Terraform - Critical procedures for preserving production data.
CRITICAL: You Have Production Infrastructure
This guide is for migrating existing production infrastructure to Terraform, NOT setting up from scratch. You have:
- 20+ containers running (Immich, Coolify, Infisical, n8n, Portainer)
- 5GB+ photos in Immich on Storage Box mount
- Critical secrets in Infisical (ENCRYPTION_KEY must be backed up)
- Live databases (PostgreSQL, MongoDB, Redis)
ONE MISTAKE = DATA LOSS. Follow this guide exactly.
Table of Contents¶
- Overview
- Why NOT Just terraform import
- The Hybrid Storage Reality
- Migration Strategy
- Phase 0: Pre-Migration Backup
- Phase 1: Low-Risk Resources
- Phase 2: Parallel Infrastructure
- Phase 3: Cutover and Verification
- Rollback Procedures
Overview¶
The Challenge¶
You're in a unique situation:
What You Have:
- ✅ Production Hetzner VPS (100.80.53.55) with 20+ containers
- ✅ Immich: 5GB+ photos, using 5.033GB RAM
- ✅ Coolify: Self-hosting platform (311MB RAM)
- ✅ Infisical: Secrets manager (743.6MB RAM, contains ENCRYPTION_KEY)
- ✅ Storage Box: Mounted at /mnt/storagebox via rclone
- ✅ n8n, Portainer, PostgreSQL, Redis, Traefik all running
What You Want:
- 🎯 Infrastructure as Code (Terraform)
- 🎯 Multi-device management (MacBook, iPad, PC)
- 🎯 Version control for infrastructure
- 🎯 Disaster recovery via terraform apply
The Problem:
- ❌ Can't just terraform import existing VPS (too risky)
- ❌ Can't destroy and recreate (would lose data)
- ❌ Need zero downtime migration
The Solution: Parallel Infrastructure¶
Strategy: 1. Keep existing VPS running (zero risk to production) 2. Terraform creates NEW VPS alongside it 3. Test restore procedures on new VPS 4. Switch DNS when verified 5. Destroy old VPS after 1 week verification period
Timeline: - Week 1: Backups and planning (CRITICAL - do not skip) - Week 2: Terraform for DNS and SSH keys (low risk) - Week 3: Create new VPS, test restore - Week 4: Cutover and verification
Why NOT Just terraform import¶
The Import Trap¶
What Seems Easy:
# This SEEMS like it would work:
terraform import hcloud_server.hetzner_vps 12345678
terraform plan # "No changes needed"
terraform apply # Should be safe, right?
What Actually Happens:
Terraform will perform the following actions:
# hcloud_server.hetzner_vps must be replaced
-/+ resource "hcloud_server" "hetzner_vps" {
~ id = "12345678" -> (known after apply)
~ ... 50 other attributes ...
# Reason: user_data changed (didn't exist before)
# Reason: ssh_keys changed (managed differently)
# Reason: firewall_ids changed (implicit vs explicit)
}
Plan: 1 to add, 0 to change, 1 to destroy.
⚠️ THIS WILL DESTROY YOUR SERVER AND ALL LOCAL DATA
Why Import Fails¶
1. Attribute Mismatches
Your running VPS has attributes Terraform doesn't know about: - cloud-init scripts that ran once (not in Terraform) - manual firewall rules added over time - SSH keys added via Hetzner console vs Terraform resources - labels/tags that might differ
Even one mismatch = Terraform wants to replace (destroy + recreate)
2. Storage Mount Complexity
Your /mnt/storagebox mount:
- Setup via /etc/fstab or systemd mount unit
- Configured with rclone parameters
- Has specific permissions
Terraform doesn't track this → can't reproduce → forces replacement
3. Docker State
Your 20+ containers: - Have local volumes with database data - Have network configurations not tracked in Terraform - Have environment variables from various sources - Have restart policies set manually
Importing VPS ≠ importing Docker state
4. The "Known After Apply" Problem
Many attributes show as "(known after apply)" after import:
This means: "I'll find out the IP AFTER I destroy and recreate"
Result: DNS breaks, Tailscale breaks, everything breaks.
Real Example from Community¶
From Terraform community (similar case):
User: "I imported my production server. terraform plan shows
'must be replaced'. Is this normal?"
Expert: "Yes. Importing rarely achieves 'no changes'. Even small
differences force replacement. For production, rebuild
from scratch via Terraform is safer than importing."
User: "But I have data on the server..."
Expert: "Exactly why you shouldn't import. Back up, create new
server with Terraform, restore data, destroy old one."
When Import IS Safe¶
Import works well for: - ✅ DNS records (no data, easy to verify) - ✅ SSH keys (additive, can't destroy data) - ✅ Firewall rules (can test without applying) - ✅ S3 buckets (metadata import, data stays intact)
Import is RISKY for: - ❌ VPS instances (complex state, data on disk) - ❌ Databases (data loss risk too high) - ❌ Any resource with local storage
The Hybrid Storage Reality¶
Performance Discovery (From Gemini Analysis)¶
Critical Insight: Not all storage is equal.
| Storage Type | Read/Write Speed | Latency | Use Case |
|---|---|---|---|
| Local NVMe (VPS disk) | ~3000 MB/s | <0.1ms | Databases (Postgres, Redis, MongoDB) |
| Storage Box (network mount) | ~50-100 MB/s | Network latency | Static files (photos, backups) |
Performance Ratio: NVMe is 30-60x faster than Storage Box.
Your Current Architecture (CORRECT)¶
From SSH inspection:
docker inspect immich-server
# Mounts:
Mounts: [
{
"Type": "volume",
"Source": "immich_pgdata", # ✅ LOCAL NVMe (database)
"Destination": "/var/lib/postgresql/data"
},
{
"Type": "bind",
"Source": "/mnt/storagebox/immich/upload", # ✅ Storage Box (photos)
"Destination": "/usr/src/app/upload"
}
]
This is PERFECT: - ✅ Database on local NVMe (fast, low latency) - ✅ Photos on Storage Box (slow is OK for large files)
Why This Matters for Terraform: - VPS can be destroyed → recreated via Terraform - Database data backs up to Storage Box nightly - Photos already on Storage Box (persist through VPS rebuild) - Restore: Mount Storage Box → restore DB from backups → photos already there
The "Disposable Server" Architecture¶
Concept: Server is ephemeral, data is persistent.
┌─────────────────────────────────────────────────────┐
│ HETZNER VPS (Ephemeral) │
│ ┌────────────────────────────────────────────────┐ │
│ │ LOCAL NVMe (Databases - Fast, Temporary) │ │
│ │ - immich-postgres (local volume) │ │
│ │ - coolify-db (local volume) │ │
│ │ - redis (local volume) │ │
│ │ - infisical mongodb (local volume) │ │
│ │ │ │
│ │ Nightly Backup Script (via cron): │ │
│ │ 3 AM: pg_dump → /mnt/storagebox/backups/ │ │
│ └────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
│
│ rclone mount (CIFS/WebDAV)
▼
┌─────────────────────────────────────────────────────┐
│ STORAGE BOX (Persistent) │
│ ┌────────────────────────────────────────────────┐ │
│ │ /immich/upload/ │ │
│ │ └─ Photos, videos (already here) │ │
│ │ │ │
│ │ /backups/ │ │
│ │ ├─ immich_db_20250115.sql │ │
│ │ ├─ coolify_db_20250115.sql │ │
│ │ ├─ infisical_env_20250115.gpg (CRITICAL!) │ │
│ │ └─ ... (7 days retention) │ │
│ │ │ │
│ │ /terraform-state/ │ │
│ │ └─ prod/terraform.tfstate │ │
│ └────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Disaster Recovery Flow:
1. VPS destroyed (hardware failure, accidental deletion, etc.)
2. terraform apply → Creates new VPS (10 min)
3. cloud-init script runs on boot:
- Installs Docker, rclone
- Mounts Storage Box at /mnt/storagebox
- Restores databases from latest backups
- Pulls docker-compose.yml from Git
- Starts all containers
4. Services online (30 min total)
Data Preserved: - ✅ Photos (already on Storage Box) - ✅ Databases (restored from nightly backup) - ✅ Infisical ENCRYPTION_KEY (backed up to Storage Box) - ✅ Configuration (in Git)
Data Loss Window: Maximum 24 hours (since last backup)
Why Gemini Was Right¶
Gemini's Warning:
"Don't run the DB live on Storage Box. It will kill performance."
Explanation: - PostgreSQL does thousands of small random reads/writes per second - Network storage latency: ~1-5ms per operation - Result: Queries that take 10ms on NVMe take 1000ms on Storage Box - Immich would be unusable (slow photo loading, timeouts)
You Already Avoided This Mistake: - Your current setup has DBs on local volumes ✅ - Only static files (photos) on Storage Box ✅ - This is the CORRECT architecture
For Terraform Migration:
- New VPS must replicate this pattern
- cloud-init script must mount Storage Box first
- Docker volumes must be local, not on /mnt/storagebox/
Migration Strategy¶
Decision Matrix: Import vs Rebuild¶
| Resource | Import? | Rebuild? | Why |
|---|---|---|---|
| Existing VPS | ❌ NO | ✅ YES | Too risky, data preservation complex, attribute mismatches |
| DNS records | ✅ YES | - | Safe, no data loss risk, easy to verify |
| SSH keys | ✅ YES | - | Additive operation, can't break existing access |
| Firewall rules | ✅ YES | - | Can test with terraform plan before applying |
| Storage Box | ❌ NO | Manual | Keep mounted, outside Terraform scope, managed by Hetzner |
| Docker containers | ❌ NO | docker-compose | NOT Terraform's job, too dynamic |
| Databases | ❌ NO | Restore from backup | Data managed separately |
Three-Phase Approach¶
Phase 0: Pre-Migration Backup (Week 1) - Backup EVERYTHING (databases, ENCRYPTION_KEY, configs) - Test restore procedures - Document current state - Verify backups are complete and restorable
Phase 1: Low-Risk Resources (Week 2) - Import DNS records to Terraform - Add SSH public keys via Terraform - Test: Can still access servers? Is DNS working?
Phase 2: Parallel Infrastructure (Week 3)
- Terraform creates NEW VPS (hetzner-vps-v2)
- Install Docker, mount Storage Box
- Restore databases from backups
- Test: Do services work on new VPS?
Phase 3: Cutover (Week 4) - Update DNS to point to new VPS (via Terraform) - Monitor for 1 week - If successful, destroy old VPS manually
Risk Assessment¶
| Phase | Risk Level | Impact if Failed | Rollback Time |
|---|---|---|---|
| Phase 0 | CRITICAL | Data loss if skipped | N/A (prevention) |
| Phase 1 | Low | DNS temporarily broken | 5 minutes (git revert) |
| Phase 2 | Medium | Wasted effort | 0 (old VPS still running) |
| Phase 3 | Medium-High | Service downtime | 15 minutes (DNS rollback) |
Mitigation: - Phase 0: Multiple backup locations, test restores - Phase 1: Terraform plan verification, DNS TTL reduction - Phase 2: Parallel running (old VPS unaffected) - Phase 3: 1 week verification before destroying old VPS
Phase 0: Pre-Migration Backup¶
DO NOT SKIP THIS PHASE
30% of infrastructure migrations fail due to inadequate backups. Spend the time to do this right.
Backup Checklist¶
1. Database Dumps¶
# SSH to existing VPS
ssh kavi@100.80.53.55
# Create backup directory
mkdir -p /mnt/storagebox/backups/pre-terraform-migration
cd /mnt/storagebox/backups/pre-terraform-migration
# Immich database
docker exec immich-postgres pg_dump -U postgres immich > immich_db_$(date +%Y%m%d).sql
gzip immich_db_$(date +%Y%m%d).sql
# Coolify database
docker exec coolify-db pg_dump -U postgres coolify > coolify_db_$(date +%Y%m%d).sql
gzip coolify_db_$(date +%Y%m%d).sql
# Infisical database (MongoDB)
docker exec infisical-mongo mongodump --archive=infisical_db_$(date +%Y%m%d).archive --gzip
Verify:
ls -lh /mnt/storagebox/backups/pre-terraform-migration/
# Expected output:
# -rw-r--r-- 1 kavi kavi 245M Jan 15 10:30 immich_db_20250115.sql.gz
# -rw-r--r-- 1 kavi kavi 12M Jan 15 10:31 coolify_db_20250115.sql.gz
# -rw-r--r-- 1 kavi kavi 8M Jan 15 10:32 infisical_db_20250115.archive
Test Restore (critical):
# On local machine or test container:
gunzip -c immich_db_20250115.sql.gz | docker exec -i test-postgres psql -U postgres
# Should complete without errors
# Verify row counts match production
2. Infisical ENCRYPTION_KEY (MOST CRITICAL)¶
ENCRYPTION_KEY = ALL SECRETS
If you lose this key, ALL secrets in Infisical are permanently lost. No recovery possible.
Location: ~/infisical/.env on Hetzner VPS
# Backup 1: Storage Box (GPG encrypted)
ssh kavi@100.80.53.55
gpg --symmetric --cipher-algo AES256 ~/infisical/.env
# Enter strong passphrase (store in password manager)
mv ~/infisical/.env.gpg /mnt/storagebox/backups/pre-terraform-migration/infisical_env_$(date +%Y%m%d).gpg
# Backup 2: Local encrypted copy
scp kavi@100.80.53.55:/mnt/storagebox/backups/pre-terraform-migration/infisical_env_$(date +%Y%m%d).gpg ~/Backups/
# Backup 3: Kimsufi server (secondary location)
scp /mnt/storagebox/backups/pre-terraform-migration/infisical_env_$(date +%Y%m%d).gpg ubuntu@100.81.231.36:~/backups/
# Backup 4: Print to paper (optional but paranoid-safe)
cat ~/infisical/.env
# Write down ENCRYPTION_KEY value on paper, store in safe
Test Decryption:
# Verify you can decrypt
gpg --decrypt infisical_env_20250115.gpg > test_decrypt.txt
cat test_decrypt.txt
# Should see ENCRYPTION_KEY=... and other vars
rm test_decrypt.txt
3. Docker Volumes¶
# List all volumes
docker volume ls
# Backup Immich data volume (if exists)
docker run --rm \
-v immich_data:/data \
-v /mnt/storagebox/backups/pre-terraform-migration:/backup \
ubuntu tar czf /backup/immich_data_$(date +%Y%m%d).tar.gz /data
# Backup Coolify volumes
docker run --rm \
-v coolify_data:/data \
-v /mnt/storagebox/backups/pre-terraform-migration:/backup \
ubuntu tar czf /backup/coolify_data_$(date +%Y%m%d).tar.gz /data
Size Check:
du -sh /mnt/storagebox/backups/pre-terraform-migration/*.tar.gz
# If any volume is >10GB, consider:
# - Selective backup (exclude cache directories)
# - Split into chunks
# - Verify Storage Box has space
4. Configuration Files¶
# Docker Compose files
mkdir -p /mnt/storagebox/backups/pre-terraform-migration/configs
cp ~/docker-compose.yml /mnt/storagebox/backups/pre-terraform-migration/configs/
cp ~/*/docker-compose.yml /mnt/storagebox/backups/pre-terraform-migration/configs/
# Systemd units (if any)
cp /etc/systemd/system/immich* /mnt/storagebox/backups/pre-terraform-migration/configs/ 2>/dev/null || true
# Nginx/Traefik configs (if any)
cp -r ~/traefik/ /mnt/storagebox/backups/pre-terraform-migration/configs/ 2>/dev/null || true
# Storage Box mount config
cp /etc/fstab /mnt/storagebox/backups/pre-terraform-migration/configs/fstab.backup
cp /etc/rclone.conf /mnt/storagebox/backups/pre-terraform-migration/configs/rclone.conf.backup 2>/dev/null || true
5. Current State Documentation¶
# Container list with details
docker ps -a --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" > /mnt/storagebox/backups/pre-terraform-migration/containers_list.txt
# Full container inspection
for container in $(docker ps -q); do
docker inspect $container > /mnt/storagebox/backups/pre-terraform-migration/inspect_$(docker inspect --format='{{.Name}}' $container).json
done
# Network configuration
docker network ls > /mnt/storagebox/backups/pre-terraform-migration/networks.txt
ip addr > /mnt/storagebox/backups/pre-terraform-migration/ip_addr.txt
# Disk usage
df -h > /mnt/storagebox/backups/pre-terraform-migration/disk_usage.txt
du -sh /var/lib/docker/volumes/* > /mnt/storagebox/backups/pre-terraform-migration/volume_sizes.txt
# Installed packages (for cloud-init script reference)
dpkg -l > /mnt/storagebox/backups/pre-terraform-migration/installed_packages.txt
# Systemd services
systemctl list-units --type=service --state=running > /mnt/storagebox/backups/pre-terraform-migration/running_services.txt
Backup Verification¶
Create verification script:
cat > /mnt/storagebox/backups/pre-terraform-migration/VERIFY.sh << 'EOF'
#!/bin/bash
set -e
echo "=== Backup Verification ==="
# Check database dumps exist and are not empty
for db in immich_db coolify_db; do
if [ ! -f ${db}_*.sql.gz ]; then
echo "❌ FAIL: ${db} backup missing"
exit 1
fi
size=$(stat -f%z ${db}_*.sql.gz 2>/dev/null || stat -c%s ${db}_*.sql.gz)
if [ $size -lt 1000000 ]; then # Less than 1MB
echo "❌ FAIL: ${db} backup suspiciously small ($size bytes)"
exit 1
fi
echo "✅ PASS: ${db} backup exists ($size bytes)"
done
# Check ENCRYPTION_KEY backup
if [ ! -f infisical_env_*.gpg ]; then
echo "❌ FAIL: Infisical ENCRYPTION_KEY backup missing"
exit 1
fi
echo "✅ PASS: Infisical ENCRYPTION_KEY backed up"
# Check configs
if [ ! -f configs/docker-compose.yml ]; then
echo "⚠️ WARN: docker-compose.yml not backed up"
fi
echo ""
echo "=== Verification Complete ==="
echo "Backup date: $(ls immich_db_*.sql.gz | grep -o '[0-9]\{8\}')"
echo "Total backup size: $(du -sh . | cut -f1)"
echo ""
echo "Next steps:"
echo "1. Test database restore on local machine"
echo "2. Verify ENCRYPTION_KEY can be decrypted"
echo "3. Proceed to Phase 1 only after successful restore test"
EOF
chmod +x /mnt/storagebox/backups/pre-terraform-migration/VERIFY.sh
./VERIFY.sh
Test Restore Procedure¶
Critical: Actually test restoring before migration.
On local machine or test VPS:
# 1. Start test PostgreSQL
docker run -d --name test-postgres -e POSTGRES_PASSWORD=test postgres:16
# 2. Download backup
scp kavi@100.80.53.55:/mnt/storagebox/backups/pre-terraform-migration/immich_db_*.sql.gz ./
# 3. Restore
gunzip -c immich_db_*.sql.gz | docker exec -i test-postgres psql -U postgres -c "CREATE DATABASE immich"
gunzip -c immich_db_*.sql.gz | docker exec -i test-postgres psql -U postgres immich
# 4. Verify
docker exec test-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM users;"
docker exec test-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM assets;"
# Expected: Row counts match production
# 5. Cleanup
docker stop test-postgres && docker rm test-postgres
If restore fails: - ❌ DO NOT PROCEED to Phase 1 - Fix backup process - Retry until restore succeeds
Backup Completion Checklist¶
Before proceeding to Phase 1:
- [ ] Database dumps created and verified (immich, coolify, infisical)
- [ ] Infisical ENCRYPTION_KEY backed up to 4 locations
- [ ] Docker volumes backed up
- [ ] Configuration files backed up
- [ ] Current state documented (containers, networks, disk usage)
- [ ] Backup verification script passes
- [ ] Test restore completed successfully (CRITICAL)
- [ ] Backups stored in at least 2 physical locations
- [ ] Decryption passphrase stored in password manager
- [ ] Total backup size fits on Storage Box with room to spare
Only proceed to Phase 1 after ALL items checked.
Phase 1: Low-Risk Resources¶
Goal¶
Migrate resources to Terraform that have:
- ✅ No data loss risk
- ✅ Easy rollback
- ✅ Can verify with terraform plan before applying
Resources: DNS records, SSH public keys, firewall rules
Step 1: Setup Terraform¶
On MacBook/iPad/PC:
# Install Terraform (macOS)
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
# Verify
terraform --version
# Expected: Terraform v1.6.0 or higher
Create project directory:
Initialize Git:
git init
cat > .gitignore << 'EOF'
# Terraform
.terraform/
*.tfstate
*.tfstate.*
terraform.tfvars
*.tfvars
# Secrets
*.env
.env.*
# macOS
.DS_Store
EOF
git add .gitignore
git commit -m "Initial commit: .gitignore"
Step 2: Configure Providers¶
Create terraform.tf:
terraform {
required_version = ">= 1.6.0"
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = "~> 1.45"
}
cloudflare = {
source = "cloudflare/cloudflare"
version = "~> 4.20"
}
}
# Remote state (we'll configure this in Phase 1, Step 5)
# backend "s3" {
# # Storage Box S3 backend
# }
}
# Providers
provider "hcloud" {
token = var.hcloud_token
}
provider "cloudflare" {
api_token = var.cloudflare_api_token
}
Create variables.tf:
variable "hcloud_token" {
description = "Hetzner Cloud API token"
type = string
sensitive = true
}
variable "cloudflare_api_token" {
description = "Cloudflare API token"
type = string
sensitive = true
}
variable "cloudflare_zone_id" {
description = "Cloudflare zone ID for kua.cl"
type = string
}
Get API tokens from Infisical:
# Fetch tokens
ssh kavi@100.80.53.55
infisical export --env=prod --projectId=personal-vault --format=dotenv | grep -E '(HCLOUD_TOKEN|CLOUDFLARE_API_TOKEN|CLOUDFLARE_ZONE_ID)'
# Copy values, then create terraform.tfvars on local machine
cat > ~/Coding/terraform-infra/terraform.tfvars << 'EOF'
hcloud_token = "YOUR_HCLOUD_TOKEN_HERE"
cloudflare_api_token = "YOUR_CLOUDFLARE_API_TOKEN_HERE"
cloudflare_zone_id = "YOUR_CLOUDFLARE_ZONE_ID_HERE"
EOF
# Verify .gitignore prevents committing secrets
git status
# Should NOT show terraform.tfvars
Initialize Terraform:
cd ~/Coding/terraform-infra
terraform init
# Expected output:
# Terraform has been successfully initialized!
Step 3: Import DNS Records¶
List current DNS records (via Cloudflare API or dashboard):
# Via curl (if you have CLOUDFLARE_API_TOKEN):
curl -X GET "https://api.cloudflare.com/client/v4/zones/${CLOUDFLARE_ZONE_ID}/dns_records" \
-H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
-H "Content-Type: application/json" | jq '.result[] | {name, type, content, id}'
Create dns.tf:
# Root A record (kua.cl → Hetzner VPS)
resource "cloudflare_record" "root" {
zone_id = var.cloudflare_zone_id
name = "@"
value = "46.224.146.107" # Current IP (will be dynamic later)
type = "A"
proxied = false
ttl = 3600
}
# Service CNAMEs
resource "cloudflare_record" "secrets" {
zone_id = var.cloudflare_zone_id
name = "secrets"
value = "kua.cl"
type = "CNAME"
proxied = false
ttl = 3600
}
resource "cloudflare_record" "plex" {
zone_id = var.cloudflare_zone_id
name = "plex"
value = "kua.cl"
type = "CNAME"
proxied = false
ttl = 3600
}
resource "cloudflare_record" "overseerr" {
zone_id = var.cloudflare_zone_id
name = "overseerr"
value = "kua.cl"
type = "CNAME"
proxied = false
ttl = 3600
}
resource "cloudflare_record" "media" {
zone_id = var.cloudflare_zone_id
name = "media"
value = "kua.cl"
type = "CNAME"
proxied = false
ttl = 3600
}
Import existing records:
# Get record IDs from Cloudflare API output above
terraform import cloudflare_record.root <RECORD_ID_FOR_ROOT>
terraform import cloudflare_record.secrets <RECORD_ID_FOR_SECRETS>
terraform import cloudflare_record.plex <RECORD_ID_FOR_PLEX>
terraform import cloudflare_record.overseerr <RECORD_ID_FOR_OVERSEERR>
terraform import cloudflare_record.media <RECORD_ID_FOR_MEDIA>
Verify import:
If plan shows changes:
- Check: Does Terraform want to modify attributes (ttl, proxied, etc.)?
- Adjust dns.tf to match actual values
- Re-run terraform plan until it shows "No changes"
Commit:
git add dns.tf terraform.tf variables.tf
git commit -m "feat(dns): import existing DNS records to Terraform"
Step 4: Add SSH Keys¶
Create ssh-keys.tf:
resource "hcloud_ssh_key" "macbook" {
name = "kavi-macbook"
public_key = file("${path.module}/keys/id_ed25519_macbook.pub")
}
resource "hcloud_ssh_key" "ipad" {
name = "kavi-ipad"
public_key = file("${path.module}/keys/id_ed25519_ipad.pub")
}
resource "hcloud_ssh_key" "pc" {
name = "kavi-pc"
public_key = file("${path.module}/keys/id_ed25519_pc.pub")
}
# Output SSH key IDs for use in VPS resources
output "ssh_key_ids" {
description = "SSH key IDs for all devices"
value = [
hcloud_ssh_key.macbook.id,
hcloud_ssh_key.ipad.id,
hcloud_ssh_key.pc.id
]
}
Add public keys:
mkdir ~/Coding/terraform-infra/keys
# Copy public keys from each device
cat ~/.ssh/id_ed25519_macbook.pub > ~/Coding/terraform-infra/keys/id_ed25519_macbook.pub
# (Repeat for ipad, pc)
# Verify format
cat ~/Coding/terraform-infra/keys/id_ed25519_macbook.pub
# Should start with: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5...
Apply (creates SSH keys in Hetzner):
terraform plan
# Expected:
# + hcloud_ssh_key.macbook
# + hcloud_ssh_key.ipad
# + hcloud_ssh_key.pc
# Plan: 3 to add, 0 to change, 0 to destroy
terraform apply
# Type: yes
Verify:
# Hetzner Cloud Console → SSH Keys
# Should see: kavi-macbook, kavi-ipad, kavi-pc
# Existing VPS still accessible (keys are additive)
ssh kavi@100.80.53.55
# Should work
Commit:
Step 5: Remote State Backend¶
Why: Enable multi-device access to Terraform state.
Create S3 bucket on Storage Box:
# Using s3cmd or AWS CLI configured for Storage Box
s3cmd mb s3://terraform-state
# Or via Hetzner console (Storage Box → Create bucket)
Update terraform.tf (uncomment backend):
terraform {
# ... (providers stay same)
backend "s3" {
bucket = "terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1" # Dummy for compatibility
endpoint = "https://u522581.your-storagebox.de"
skip_credentials_validation = true
skip_region_validation = true
skip_metadata_api_check = true
}
}
Migrate state:
terraform init -migrate-state
# Terraform will ask: Do you want to copy state to new backend?
# Answer: yes
# Verify
ls -la # terraform.tfstate should be gone (moved to remote)
Test multi-device access:
# On iPad or PC:
git clone <repo-url>
cd terraform-infra
# Add terraform.tfvars (with API tokens)
terraform init
# Should download state from Storage Box
terraform plan
# Should show: No changes (state synchronized)
Phase 1 Completion Checklist¶
- [ ] Terraform installed on all devices
- [ ] DNS records imported and plan shows "No changes"
- [ ] SSH keys added to Hetzner (verified via console)
- [ ] Existing SSH access still works
- [ ] Remote state backend configured
- [ ] Multi-device access tested (MacBook + iPad can both run terraform plan)
- [ ] Git repository created and committed
- [ ] Secrets NOT committed (verify
git log -- terraform.tfvarsis empty)
Proceed to Phase 2 only after all items checked.
Phase 2: Parallel Infrastructure¶
Goal¶
Create a NEW Hetzner VPS via Terraform while keeping the existing one running. Test restore procedures on the new VPS before cutover.
Step 1: Define New VPS Resource¶
Create vps.tf:
# New VPS (will be created alongside existing one)
resource "hcloud_server" "hetzner_vps_v2" {
name = "hetzner-vps-v2"
server_type = "cpx42"
image = "ubuntu-22.04"
location = "fsn1"
ssh_keys = [
hcloud_ssh_key.macbook.id,
hcloud_ssh_key.ipad.id,
hcloud_ssh_key.pc.id
]
# cloud-init script (we'll define this next)
user_data = file("${path.module}/cloud-init.yml")
labels = {
managed_by = "terraform"
environment = "production"
}
lifecycle {
# Prevent accidental deletion
prevent_destroy = true
}
}
# Output IP for DNS update
output "new_vps_ip" {
description = "IPv4 address of new VPS"
value = hcloud_server.hetzner_vps_v2.ipv4_address
}
Step 2: Create cloud-init Script¶
Purpose: Automate VPS setup on first boot.
Create cloud-init.yml:
#cloud-config
# Install packages
packages:
- docker.io
- docker-compose
- rclone
- curl
- git
- gpg
# Setup Docker
runcmd:
# Enable Docker
- systemctl enable docker
- systemctl start docker
- usermod -aG docker root
# Create user (matching existing VPS)
- useradd -m -s /bin/bash kavi
- usermod -aG docker kavi
- mkdir -p /home/kavi/.ssh
- chmod 700 /home/kavi/.ssh
- chown kavi:kavi /home/kavi/.ssh
# Configure rclone for Storage Box
- mkdir -p /root/.config/rclone
- |
cat > /root/.config/rclone/rclone.conf << 'RCLONE_EOF'
[storagebox]
type = webdav
url = https://u522581.your-storagebox.de
vendor = other
user = u522581
pass = YOUR_ENCRYPTED_PASSWORD_HERE # Use `rclone obscure` to generate
RCLONE_EOF
# Mount Storage Box
- mkdir -p /mnt/storagebox
- rclone mount storagebox:/ /mnt/storagebox --daemon --vfs-cache-mode writes --allow-other
# Wait for mount (give it 10 seconds)
- sleep 10
# Restore Infisical ENCRYPTION_KEY
- mkdir -p /root/infisical
- gpg --decrypt /mnt/storagebox/backups/pre-terraform-migration/infisical_env_$(date +%Y%m%d).gpg > /root/infisical/.env || true
# (Requires GPG passphrase - handle via Infisical or manual)
# Download docker-compose.yml from Git
- git clone https://github.com/kavi/infra-config.git /opt/infra-config
- ln -s /opt/infra-config/docker-compose.yml /root/docker-compose.yml
# Restore databases (PostgreSQL example)
- docker run -d --name temp-postgres -e POSTGRES_PASSWORD=temp postgres:16
- sleep 5
- gunzip -c /mnt/storagebox/backups/pre-terraform-migration/immich_db_*.sql.gz | docker exec -i temp-postgres psql -U postgres
# (Add similar commands for other databases)
# Start services
- cd /root && docker-compose up -d
# Cleanup
- docker stop temp-postgres && docker rm temp-postgres
# Final message
final_message: "VPS setup complete. Services starting..."
Important:
- Replace YOUR_ENCRYPTED_PASSWORD_HERE with output of rclone obscure YOUR_STORAGE_BOX_PASSWORD
- GPG decryption requires passphrase (consider storing in Infisical or manual intervention)
- Adjust database restore commands for your specific databases
Step 3: Create New VPS¶
terraform plan
# Expected output:
# + hcloud_server.hetzner_vps_v2
# Plan: 1 to add, 0 to change, 0 to destroy
terraform apply
# Type: yes
# Wait for creation (~2-3 minutes)
# Watch output for new VPS IP
Get new IP:
Step 4: Verify New VPS¶
SSH to new VPS:
# Add to ~/.ssh/config temporarily:
Host hetzner-v2
HostName <NEW_VPS_IP>
User kavi
IdentityFile ~/.ssh/id_ed25519_macbook
ssh hetzner-v2
Check cloud-init progress:
# On new VPS:
cloud-init status
# Expected:
# status: done
# If still running:
tail -f /var/log/cloud-init-output.log
Verify services:
# Docker running?
docker ps
# Storage Box mounted?
ls -la /mnt/storagebox/
# Immich container running?
docker ps | grep immich
# Can access Immich?
curl -I http://localhost:3001
# Expected: HTTP/1.1 200 OK
Test from external:
# Add Tailscale to new VPS
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Get Tailscale IP
tailscale ip
# Example: 100.80.XX.XX
# From MacBook, test access:
curl -I http://100.80.XX.XX:3001
# Should return Immich response
Step 5: Data Integrity Verification¶
On new VPS, verify restored data:
# Immich database row counts
docker exec immich-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM users;"
docker exec immich-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM assets;"
# Compare with old VPS:
ssh kavi@100.80.53.55
docker exec immich-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM users;"
# Numbers should match
Check photos:
# New VPS:
ls /mnt/storagebox/immich/upload/ | head -20
# Should show photos
# Old VPS:
ssh kavi@100.80.53.55
ls /mnt/storagebox/immich/upload/ | head -20
# Should match (same Storage Box mount)
Infisical ENCRYPTION_KEY:
# New VPS:
cat /root/infisical/.env | grep ENCRYPTION_KEY
# Old VPS:
ssh kavi@100.80.53.55
cat ~/infisical/.env | grep ENCRYPTION_KEY
# Values should match EXACTLY
Step 6: Parallel Running Period¶
Run both VPS for 1 week:
- ✅ Old VPS: Continue serving production traffic (DNS points here)
- ✅ New VPS: Test via Tailscale IP, verify all services work
Test new VPS services:
# From MacBook (via Tailscale):
NEW_VPS_IP="100.80.XX.XX" # New VPS Tailscale IP
# Immich
curl -I http://${NEW_VPS_IP}:3001
# Expected: HTTP 200
# Infisical
curl -I http://${NEW_VPS_IP}:8080
# Expected: HTTP 200
# n8n
curl -I http://${NEW_VPS_IP}:5678
# Expected: HTTP 200
Monitor resource usage:
ssh hetzner-v2
docker stats
# Compare with old VPS
ssh kavi@100.80.53.55
docker stats
# Usage should be similar
Phase 2 Completion Checklist¶
- [ ] New VPS created via Terraform
- [ ] cloud-init script executed successfully
- [ ] Docker installed and running
- [ ] Storage Box mounted at
/mnt/storagebox - [ ] Databases restored from backups
- [ ] Database row counts match old VPS
- [ ] Infisical ENCRYPTION_KEY matches old VPS
- [ ] All containers running (docker ps shows 20+ containers)
- [ ] Services accessible via Tailscale IP
- [ ] Resource usage (RAM, CPU) similar to old VPS
- [ ] Both VPS running in parallel for testing
Proceed to Phase 3 only after all items checked and 1 week verification period.
Phase 3: Cutover and Verification¶
Goal¶
Switch production traffic to new VPS, monitor for issues, then decommission old VPS.
Step 1: DNS Cutover Preparation¶
Reduce DNS TTL (before cutover):
Update dns.tf:
resource "cloudflare_record" "root" {
zone_id = var.cloudflare_zone_id
name = "@"
value = "46.224.146.107" # Still old IP
type = "A"
proxied = false
ttl = 300 # Reduced from 3600 to 5 minutes
}
Apply:
Step 2: Update DNS to New VPS¶
Update dns.tf to use new VPS IP:
# Use dynamic reference to new VPS IP
resource "cloudflare_record" "root" {
zone_id = var.cloudflare_zone_id
name = "@"
value = hcloud_server.hetzner_vps_v2.ipv4_address # Dynamic!
type = "A"
proxied = false
ttl = 300
}
Plan and verify:
terraform plan
# Expected output:
# ~ cloudflare_record.root
# value: "46.224.146.107" -> "46.224.XXX.XXX" # New IP
Apply during maintenance window:
Step 3: Monitor Cutover¶
Verify DNS propagation:
# Check DNS resolution
dig +short kua.cl
# Should show new VPS IP (46.224.XXX.XXX)
# From different DNS servers
dig +short kua.cl @8.8.8.8
dig +short kua.cl @1.1.1.1
# All should show new IP
Monitor service access:
# Test services
curl -I https://secrets.kua.cl # Infisical
curl -I https://plex.kua.cl # Plex
curl -I https://media.kua.cl # KaviCloud
# All should return HTTP 200
Check logs:
ssh hetzner-v2
docker logs immich-server --tail 100
docker logs infisical-api --tail 100
# Look for errors or unusual activity
Monitor resource usage:
Step 4: Verification Period (1 Week)¶
Daily checks:
# Day 1-7: Run these tests daily
# Services responding?
curl -I https://kua.cl
curl -I https://secrets.kua.cl
curl -I https://plex.kua.cl
# Database integrity?
ssh hetzner-v2
docker exec immich-postgres psql -U postgres immich -c "SELECT COUNT(*) FROM assets;"
# Count should increase as new photos added
# Backups running?
ls -lt /mnt/storagebox/backups/ | head -5
# Should see fresh backups from new VPS
# Disk space OK?
df -h | grep sda
# Should not be filling up
Verify old VPS is idle:
ssh kavi@100.80.53.55
# Check web server logs (should be minimal traffic)
docker logs traefik --tail 100 | wc -l
# Should be very low (only old DNS cache hits)
# Check CPU (should be idle)
top
# %CPU should be <5%
Step 5: Decommission Old VPS¶
After 1 week with no issues:
Final backup from old VPS (paranoid safety):
ssh kavi@100.80.53.55
# Database dumps (one last time)
docker exec immich-postgres pg_dump -U postgres immich > /mnt/storagebox/backups/old-vps-final/immich_db_$(date +%Y%m%d).sql
# Any local-only data
tar czf /mnt/storagebox/backups/old-vps-final/var_lib_docker_$(date +%Y%m%d).tar.gz /var/lib/docker/volumes/
Stop services on old VPS:
Wait 24 hours (ensure no issues from stopping old VPS)
Destroy old VPS (manual, NOT Terraform):
# Via Hetzner Cloud Console:
# Servers → hetzner-vps (old) → Delete
# Or via hcloud CLI:
hcloud server delete hetzner-vps
Cleanup Terraform: Remove prevent_destroy and rename resource
Update vps.tf:
# Rename resource (now the primary VPS)
resource "hcloud_server" "hetzner_vps" { # Removed "_v2"
name = "hetzner-vps" # Removed "-v2"
server_type = "cpx42"
# ... (rest stays same)
lifecycle {
# Can remove prevent_destroy now that testing complete
}
}
Terraform state update:
# Since we renamed the resource, update state
terraform state mv hcloud_server.hetzner_vps_v2 hcloud_server.hetzner_vps
terraform plan
# Expected: No changes (just renamed)
Commit:
Phase 3 Completion Checklist¶
- [ ] DNS TTL reduced before cutover
- [ ] DNS updated to new VPS IP
- [ ] DNS propagation verified (dig shows new IP)
- [ ] All services accessible on new VPS
- [ ] 1 week verification period completed
- [ ] No errors in logs
- [ ] Database integrity verified daily
- [ ] Backups running on new VPS
- [ ] Old VPS idle (confirmed via logs and CPU usage)
- [ ] Final backup from old VPS completed
- [ ] Old VPS services stopped
- [ ] Old VPS destroyed (manually)
- [ ] Terraform state updated (resource renamed)
- [ ] Git repository updated and pushed
Migration complete! 🎉
Rollback Procedures¶
Scenario 1: Issues During Phase 1 (DNS/SSH Import)¶
Problem: DNS broken after Terraform import.
Rollback:
# Revert Terraform changes
git log --oneline
git revert <COMMIT_HASH>
terraform plan
terraform apply
# DNS should return to previous state in 5-15 minutes
Alternative: Manually fix DNS via Cloudflare console.
Scenario 2: Issues During Phase 2 (New VPS)¶
Problem: New VPS not working correctly.
Rollback: None needed! Old VPS still running production.
Fix:
# Destroy new VPS
terraform destroy -target=hcloud_server.hetzner_vps_v2
# Fix cloud-init script or restore procedure
# Recreate
terraform apply
Scenario 3: Issues After DNS Cutover (Phase 3)¶
Problem: Services down or degraded after switching DNS.
Immediate Rollback:
Update dns.tf back to old IP:
resource "cloudflare_record" "root" {
zone_id = var.cloudflare_zone_id
name = "@"
value = "46.224.146.107" # Old VPS IP
type = "A"
proxied = false
ttl = 300
}
Apply:
Alternative: Emergency DNS change via Cloudflare console (faster than Terraform).
Fix new VPS issues, then retry cutover.
Scenario 4: Data Loss on New VPS¶
Problem: Database or photos missing on new VPS.
Recovery:
# SSH to new VPS
ssh hetzner-v2
# Re-mount Storage Box (if unmounted)
rclone mount storagebox:/ /mnt/storagebox --daemon
# Restore databases again from backups
cd /mnt/storagebox/backups/pre-terraform-migration/
gunzip -c immich_db_*.sql.gz | docker exec -i immich-postgres psql -U postgres immich
# Restart containers
docker-compose restart
If photos missing: Check Storage Box mount and /mnt/storagebox/immich/upload/
Scenario 5: Infisical ENCRYPTION_KEY Lost¶
Problem: ENCRYPTION_KEY not restored correctly on new VPS.
Recovery:
# SSH to old VPS (if still running)
ssh kavi@100.80.53.55
cat ~/infisical/.env | grep ENCRYPTION_KEY
# Copy to new VPS
ssh hetzner-v2
echo "ENCRYPTION_KEY=<VALUE_FROM_OLD_VPS>" >> /root/infisical/.env
# Restart Infisical
docker-compose restart infisical-api
If old VPS destroyed: Restore from backup
# Decrypt backup
gpg --decrypt /mnt/storagebox/backups/pre-terraform-migration/infisical_env_*.gpg > /tmp/infisical.env
# Copy ENCRYPTION_KEY to /root/infisical/.env
Summary¶
Migration Timeline¶
| Week | Phase | Activities | Risk Level |
|---|---|---|---|
| Week 1 | Phase 0 | Backups, test restores, documentation | CRITICAL (must complete) |
| Week 2 | Phase 1 | Import DNS, add SSH keys, remote state | Low |
| Week 3 | Phase 2 | Create new VPS, test services, parallel run | Medium |
| Week 4 | Phase 3 | DNS cutover, 1 week verification, decommission | Medium-High |
Success Criteria¶
- ✅ Zero data loss
- ✅ Zero downtime (parallel infrastructure approach)
- ✅ All services running on Terraform-managed VPS
- ✅ Can destroy VPS →
terraform apply→ restore in 45 minutes - ✅ Multi-device workflow (MacBook, iPad, PC all manage infrastructure)
What You Achieved¶
Before Migration: - ❌ Manual VPS management - ❌ No version control for infrastructure - ❌ Disaster recovery = hours of manual work - ❌ Multi-device coordination difficult
After Migration:
- ✅ Infrastructure as Code (entire setup in Git)
- ✅ Declarative DNS, SSH keys, VPS configuration
- ✅ Disaster recovery: terraform apply (30 minutes)
- ✅ Multi-device access via remote state (MacBook, iPad, PC)
- ✅ Audit trail (Git history shows all infrastructure changes)
Next Steps¶
- Setup Nightly Backups (see
docs/runbooks/backup-procedures.md) - Database dumps to Storage Box (3 AM cron job)
-
Terraform state backups (already handled by remote backend)
-
Create cloud-init Restore Script (see
docs/terraform/cloud-init.md) - Auto-restore from Storage Box backups on VPS creation
-
Fully automated disaster recovery
-
Document Terraform Workflow (see
docs/terraform/workflow.md) - Daily operations (
plan,apply) -
Common tasks (adding services, updating configs)
-
Implement Monitoring (optional)
- Uptime monitoring for services
- Disk space alerts
- Backup verification
Migration complete. Infrastructure now managed as code. 🚀