Terraform State Management¶
Understanding and managing Terraform state - The most critical concept in Terraform (planned implementation).
Planned Architecture
This documentation describes the planned Terraform state setup. Terraform is not yet implemented but state management is documented here to prevent common mistakes.
Critical Concept
The state file is Terraform's memory, NOT a log.
30% of developers get this wrong. The state file tracks the CURRENT state of your infrastructure, not the history of changes. Understanding this is critical to using Terraform safely.
Table of Contents¶
- What Is Terraform State
- Why State Matters
- The 30% Mistake
- Remote State on Storage Box
- State Locking
- State Operations
- Backup and Recovery
- Common Mistakes
- Security
What Is Terraform State¶
The Basics¶
Terraform state is a JSON file (terraform.tfstate) that contains:
- Current infrastructure state: What resources exist
- Resource metadata: IDs, attributes, dependencies
- Mapping: Terraform config → real infrastructure
Example state snippet:
{
"version": 4,
"terraform_version": "1.6.5",
"resources": [
{
"mode": "managed",
"type": "hcloud_server",
"name": "hetzner_vps",
"instances": [
{
"attributes": {
"id": "12345678",
"name": "hetzner-vps",
"server_type": "cpx42",
"ipv4_address": "46.224.146.107",
"status": "running"
}
}
]
}
]
}
What State Contains¶
| Data | Example | Purpose |
|---|---|---|
| Resource IDs | id: "12345678" |
Map Terraform resource to real infrastructure |
| Attributes | ipv4_address: "46.224.146.107" |
Track current values |
| Dependencies | Server depends on SSH key | Determine destroy order |
| Metadata | Creation time, provider version | Internal Terraform use |
| Sensitive Data | Passwords, private keys | ⚠️ Security concern |
How State Is Used¶
Every Terraform operation uses state:
# terraform plan
1. Read desired state (*.tf files)
2. Read current state (terraform.tfstate)
3. Compare: desired vs current
4. Output: plan (diff)
# terraform apply
1. Execute plan
2. Call provider APIs (create/update/delete resources)
3. Update state with new values
4. Write state file
# terraform destroy
1. Read state (what exists)
2. Determine destroy order (dependencies)
3. Call provider APIs (delete resources)
4. Update state (remove resources)
5. Write state file
Why State Matters¶
Without State¶
Imagine Terraform without state:
# First run
terraform apply
# Creates: hcloud_server.hetzner_vps
# Second run (no changes to *.tf)
terraform apply
# Without state: Terraform doesn't know server exists!
# Result: Tries to create ANOTHER server (conflict)
# With state: Terraform knows server exists
# Result: "No changes. Infrastructure is up-to-date."
State Enables¶
1. Resource Tracking
# In .tf file
resource "hcloud_server" "hetzner_vps" {
name = "hetzner-vps"
}
# In state
{
"type": "hcloud_server",
"name": "hetzner_vps",
"id": "12345678" ← Maps to real server in Hetzner
}
2. Dependency Resolution
resource "hcloud_ssh_key" "macbook" { ... }
resource "hcloud_server" "hetzner_vps" {
ssh_keys = [hcloud_ssh_key.macbook.id]
# Terraform reads SSH key ID from state
}
3. Drift Detection
# Someone manually changes server in Hetzner console
# From: CPX42 → To: CPX31 (downgrade)
terraform plan
# Compares state (CPX42) vs reality (CPX31)
# Output: "~ hcloud_server.hetzner_vps will be updated"
# server_type: "cpx31" -> "cpx42"
4. Destroy Order
terraform destroy
# State knows dependencies:
# Server depends on SSH key
# Must destroy: Server first, then SSH key
# Without state: Could try to delete SSH key first (fails, still in use)
The 30% Mistake¶
Common Misconception¶
WRONG: "State file is a log of changes, like git history"
RIGHT: "State file is Terraform's memory of CURRENT infrastructure"
Why This Matters¶
Scenario: Developer thinks state is a "log"
# Developer creates server
terraform apply
# State now: { server: created, id: 123 }
# Developer thinks:
# "I'll delete state to start fresh, like deleting git history"
rm terraform.tfstate
# Developer runs:
terraform apply
# Terraform reads .tf (wants server)
# Reads state (empty!)
# Thinks: No server exists
# Result: Creates SECOND server (now you have 2!)
# Actual infrastructure: 2 servers (first is "orphaned")
# Terraform only knows about: 1 server (the second one)
Correct understanding:
- State = Terraform's memory of what exists
- Deleting state = Terraform amnesia
- Result: Terraform forgets about existing resources
- Consequence: Can't manage or destroy them anymore
The Fix¶
If you accidentally delete state:
Option A: Import existing resources (tedious)
# Find resource ID from provider
# Example: Server ID is 12345678
# Import into state
terraform import hcloud_server.hetzner_vps 12345678
# Now Terraform knows about it again
Option B: Restore from backup (if available)
# Restore state from backup
cp terraform.tfstate.backup terraform.tfstate
# Or from remote backend
terraform init -reconfigure
Remote State on Storage Box¶
Why Remote State¶
Local state (default):
Problems: - ❌ No collaboration (file on one machine) - ❌ No locking (concurrent runs corrupt state) - ❌ No backup (lose file = lose state) - ❌ Secrets in plaintext on disk
Remote state (recommended):
Benefits: - ✅ Shared access (multiple developers/machines) - ✅ State locking (prevent concurrent modifications) - ✅ Automatic backup (S3 durability) - ✅ Encryption at rest (Storage Box encryption)
Backend Configuration¶
terraform.tf:
terraform {
backend "s3" {
# Bucket name
bucket = "terraform-state"
# State file path within bucket
key = "infrastructure/terraform.tfstate"
# Dummy region (required by S3 API but not used by Storage Box)
region = "us-east-1"
# Storage Box S3 endpoint
endpoint = "https://u522581.your-storagebox.de"
# Credentials (provided via -backend-config flags)
# access_key = <set via flag>
# secret_key = <set via flag>
# S3 compatibility settings
skip_credentials_validation = true
skip_region_validation = true
skip_metadata_api_check = true
force_path_style = true
}
}
Initialize Backend¶
# From environment (secrets from Infisical)
terraform init \
-backend-config="access_key=$STORAGEBOX_ACCESS_KEY" \
-backend-config="secret_key=$STORAGEBOX_SECRET_KEY"
# Terraform downloads state from Storage Box
# Local .terraform/ directory created
# State cached locally, synced on operations
How It Works¶
DEVELOPER
↓
terraform plan/apply
↓
TERRAFORM
├─ Reads state from: s3://terraform-state/infrastructure/terraform.tfstate
├─ Locks state (prevents concurrent access)
├─ Performs operation
├─ Writes updated state to S3
└─ Unlocks state
↓
STORAGE BOX
└─ terraform.tfstate (encrypted, backed up)
Multiple Environments¶
Use different state files per environment:
# Production
backend "s3" {
key = "prod/terraform.tfstate"
}
# Staging
backend "s3" {
key = "staging/terraform.tfstate"
}
# Development
backend "s3" {
key = "dev/terraform.tfstate"
}
State Locking¶
The Problem¶
Without locking:
DEVELOPER A DEVELOPER B
↓ ↓
terraform apply (starts) terraform apply (starts)
↓ ↓
Read state: server=CPX42 Read state: server=CPX42
↓ ↓
Plan: upgrade to CPX51 Plan: downgrade to CPX31
↓ ↓
Apply: server → CPX51 Apply: server → CPX31
↓ ↓
Write state: CPX51 Write state: CPX31 (overwrites!)
↓
State corrupted: Says CPX31, but server is actually CPX51
The Solution: Locking¶
With locking:
DEVELOPER A DEVELOPER B
↓ ↓
terraform apply (starts) terraform apply (starts)
↓ ↓
Acquire lock ✅ Try lock ❌ (locked by A)
↓ ↓
Read state Error: State locked
↓ ↓
Apply changes Wait...
↓
Write state
↓
Release lock
↓
Acquire lock ✅
Read updated state
Apply changes
Locking Options¶
Option 1: DynamoDB (AWS) - Full S3 backend support
terraform {
backend "s3" {
bucket = "terraform-state"
key = "infrastructure/terraform.tfstate"
region = "us-east-1"
# DynamoDB table for locking
dynamodb_table = "terraform-state-lock"
# Requires AWS account + DynamoDB table
}
}
Pros: Native support, battle-tested Cons: Requires AWS account, additional service
Option 2: Terraform Cloud - Managed locking
Pros: Built-in locking, free tier, managed Cons: State stored on Terraform Cloud (not self-hosted)
Option 3: File-based locking - Simple but limited
Storage Box S3 backend doesn't support native locking. Workarounds: - Single operator (you) - low risk - Manual coordination ("I'm running terraform, wait 5 min") - Wrapper script with lock file
Recommendation for Kavi infrastructure: - Current: Single operator, low risk - Future (if team grows): Terraform Cloud free tier or AWS DynamoDB
State Operations¶
terraform state list¶
List all managed resources:
terraform state list
# Example output:
# hcloud_ssh_key.macbook
# hcloud_ssh_key.ipad
# hcloud_server.hetzner_vps
# cloudflare_record.root
# cloudflare_record.secrets
terraform state show¶
Show detailed attributes of a resource:
terraform state show hcloud_server.hetzner_vps
# Output:
# resource "hcloud_server" "hetzner_vps" {
# id = "12345678"
# name = "hetzner-vps"
# server_type = "cpx42"
# ipv4_address = "46.224.146.107"
# status = "running"
# location = "fsn1"
# ...
# }
terraform state mv¶
Rename a resource in state (doesn't change infrastructure):
# Renamed in .tf files:
# hcloud_server.hetzner_vps → hcloud_server.main_vps
# Update state to match:
terraform state mv hcloud_server.hetzner_vps hcloud_server.main_vps
# Now Terraform knows they're the same resource
Use case: Refactoring Terraform code without recreating resources
terraform state rm¶
Remove a resource from state (infrastructure still exists!):
# Remove from Terraform management
terraform state rm hcloud_server.hetzner_vps
# Resource still exists in Hetzner
# But Terraform no longer manages it
Use case: - Migrating resource to different Terraform project - Handing off resource to manual management - Danger: Easy to forget about "orphaned" resources
terraform import¶
Add existing resource to state:
# Resource exists in Hetzner (ID: 12345678)
# But not in Terraform state
# Import it
terraform import hcloud_server.hetzner_vps 12345678
# Now Terraform manages it
Use case: - Importing manually-created resources - Recovering from state deletion - Migrating to Terraform
terraform state pull¶
Download state file:
# Get current state from backend
terraform state pull > current-state.json
# Useful for inspection or backup
terraform state push¶
Upload state file (dangerous!):
# Upload modified state
terraform state push modified-state.json
# ⚠️ Use with extreme caution
# Can corrupt state if wrong
Backup and Recovery¶
Automatic Backups¶
Terraform creates backups automatically:
# Before each state write
terraform apply
# Creates:
# terraform.tfstate ← Current state
# terraform.tfstate.backup ← Previous state
Storage Box backups: - Remote state on Storage Box - Storage Box has snapshot feature - Daily snapshots retained (check Hetzner docs for retention)
Manual Backup¶
Before risky operations:
# Download current state
terraform state pull > backup-$(date +%Y%m%d-%H%M%S).json
# Or if using remote backend
cp terraform.tfstate terraform.tfstate.backup-$(date +%Y%m%d)
Recommended frequency:
- Before major changes (new VPS, DNS migration)
- Weekly scheduled backup
- Before running terraform destroy
Recovery Scenarios¶
Scenario 1: Accidentally deleted local state (remote backend exists)
# Solution: Reinitialize
rm -rf .terraform terraform.tfstate*
terraform init \
-backend-config="access_key=$STORAGEBOX_ACCESS_KEY" \
-backend-config="secret_key=$STORAGEBOX_SECRET_KEY"
# State downloaded from Storage Box
# Back to working state
Scenario 2: Remote state corrupted
# Restore from manual backup
terraform state push backup-20250115.json
# Or restore from Storage Box snapshot
# (Requires accessing Storage Box, downloading old snapshot)
Scenario 3: Lost Storage Box state entirely
# Last resort: Rebuild state via imports
# For each resource:
terraform import hcloud_server.hetzner_vps 12345678
terraform import cloudflare_record.root abc123def456
# Tedious but possible
# This is why backups are critical
Common Mistakes¶
1. Deleting State File¶
Mistake:
Result: Terraform forgets all resources (amnesia)
Fix: Restore from backup or import resources
2. Committing State to Git¶
Mistake:
Problems: - Secrets exposed in Git history - Merge conflicts on collaborative projects - State diverges from reality
Fix: Add to .gitignore, use remote backend
3. Editing State Manually¶
Mistake:
Result: State says CPX42, reality is CPX31 (drift)
Fix: Use terraform state commands or terraform apply
4. Concurrent Runs Without Locking¶
Mistake:
Result: State corruption
Fix: Use state locking (DynamoDB or Terraform Cloud)
5. Losing State Backups¶
Mistake: Only one copy of state (no backups)
Result: Corruption = total loss
Fix: Multiple backups (remote + local + manual)
Security¶
Secrets in State¶
Problem: State contains sensitive data
{
"resources": [
{
"type": "random_password",
"attributes": {
"result": "MySecretPassword123!" ← Plaintext in state
}
}
]
}
Mitigation:
- Encrypt state at rest: Storage Box encryption
- Restrict access: Only necessary team members
- Never commit to Git: Use remote backend only
- Use Infisical for secrets: Don't generate secrets in Terraform
State File Permissions¶
Local state (if used):
Remote state: - Storage Box credentials in Infisical only - TF_VAR environment variables (not .tfvars files) - Never commit backend credentials to Git
Audit Access¶
Who can access state?
- Anyone with Storage Box credentials
- Anyone who runs terraform in the project
Best practices: - Rotate Storage Box password periodically - Audit who has Infisical access to STORAGEBOX_* secrets - Monitor Storage Box access logs (if available)
Summary¶
Terraform State Essentials: - ✅ State = Terraform's memory (not a log) - ✅ Contains current infrastructure state - ✅ Enables resource tracking, dependency resolution, drift detection - ✅ Remote backend (Storage Box S3) for collaboration - ✅ State locking prevents corruption - ✅ Backup before risky operations
The 30% Mistake: - ❌ State is NOT a log or history - ✅ State is Terraform's memory of current infrastructure - ❌ Deleting state = amnesia (resources orphaned) - ✅ Restore from backup or import resources
Remote State on Storage Box:
- Location: s3://terraform-state/infrastructure/terraform.tfstate
- Backend: S3-compatible endpoint
- Encryption: At rest on Storage Box
- Locking: Manual coordination or Terraform Cloud (future)
Critical Operations:
- terraform state list - List managed resources
- terraform state show - Show resource details
- terraform import - Import existing resources
- terraform state pull - Backup state
- Always backup before major changes
What's Next: - Learn Providers - Hetzner, Cloudflare, Docker setup - Learn Workflow - best practices with state - Learn VPS Management - provision infrastructure