Skip to content

Terraform State Management

Understanding and managing Terraform state - The most critical concept in Terraform (planned implementation).


Planned Architecture

This documentation describes the planned Terraform state setup. Terraform is not yet implemented but state management is documented here to prevent common mistakes.


Critical Concept

The state file is Terraform's memory, NOT a log.

30% of developers get this wrong. The state file tracks the CURRENT state of your infrastructure, not the history of changes. Understanding this is critical to using Terraform safely.


Table of Contents


What Is Terraform State

The Basics

Terraform state is a JSON file (terraform.tfstate) that contains:

  • Current infrastructure state: What resources exist
  • Resource metadata: IDs, attributes, dependencies
  • Mapping: Terraform config → real infrastructure

Example state snippet:

{
  "version": 4,
  "terraform_version": "1.6.5",
  "resources": [
    {
      "mode": "managed",
      "type": "hcloud_server",
      "name": "hetzner_vps",
      "instances": [
        {
          "attributes": {
            "id": "12345678",
            "name": "hetzner-vps",
            "server_type": "cpx42",
            "ipv4_address": "46.224.146.107",
            "status": "running"
          }
        }
      ]
    }
  ]
}

What State Contains

Data Example Purpose
Resource IDs id: "12345678" Map Terraform resource to real infrastructure
Attributes ipv4_address: "46.224.146.107" Track current values
Dependencies Server depends on SSH key Determine destroy order
Metadata Creation time, provider version Internal Terraform use
Sensitive Data Passwords, private keys ⚠️ Security concern

How State Is Used

Every Terraform operation uses state:

# terraform plan
1. Read desired state (*.tf files)
2. Read current state (terraform.tfstate)
3. Compare: desired vs current
4. Output: plan (diff)

# terraform apply
1. Execute plan
2. Call provider APIs (create/update/delete resources)
3. Update state with new values
4. Write state file

# terraform destroy
1. Read state (what exists)
2. Determine destroy order (dependencies)
3. Call provider APIs (delete resources)
4. Update state (remove resources)
5. Write state file

Why State Matters

Without State

Imagine Terraform without state:

# First run
terraform apply
# Creates: hcloud_server.hetzner_vps

# Second run (no changes to *.tf)
terraform apply
# Without state: Terraform doesn't know server exists!
# Result: Tries to create ANOTHER server (conflict)

# With state: Terraform knows server exists
# Result: "No changes. Infrastructure is up-to-date."

State Enables

1. Resource Tracking

# In .tf file
resource "hcloud_server" "hetzner_vps" {
  name = "hetzner-vps"
}

# In state
{
  "type": "hcloud_server",
  "name": "hetzner_vps",
  "id": "12345678"   Maps to real server in Hetzner
}

2. Dependency Resolution

resource "hcloud_ssh_key" "macbook" { ... }

resource "hcloud_server" "hetzner_vps" {
  ssh_keys = [hcloud_ssh_key.macbook.id]
  # Terraform reads SSH key ID from state
}

3. Drift Detection

# Someone manually changes server in Hetzner console
# From: CPX42 → To: CPX31 (downgrade)

terraform plan
# Compares state (CPX42) vs reality (CPX31)
# Output: "~ hcloud_server.hetzner_vps will be updated"
#         server_type: "cpx31" -> "cpx42"

4. Destroy Order

terraform destroy

# State knows dependencies:
# Server depends on SSH key
# Must destroy: Server first, then SSH key

# Without state: Could try to delete SSH key first (fails, still in use)


The 30% Mistake

Common Misconception

WRONG: "State file is a log of changes, like git history"

RIGHT: "State file is Terraform's memory of CURRENT infrastructure"

Why This Matters

Scenario: Developer thinks state is a "log"

# Developer creates server
terraform apply
# State now: { server: created, id: 123 }

# Developer thinks:
# "I'll delete state to start fresh, like deleting git history"
rm terraform.tfstate

# Developer runs:
terraform apply
# Terraform reads .tf (wants server)
# Reads state (empty!)
# Thinks: No server exists
# Result: Creates SECOND server (now you have 2!)

# Actual infrastructure: 2 servers (first is "orphaned")
# Terraform only knows about: 1 server (the second one)

Correct understanding:

  • State = Terraform's memory of what exists
  • Deleting state = Terraform amnesia
  • Result: Terraform forgets about existing resources
  • Consequence: Can't manage or destroy them anymore

The Fix

If you accidentally delete state:

Option A: Import existing resources (tedious)

# Find resource ID from provider
# Example: Server ID is 12345678

# Import into state
terraform import hcloud_server.hetzner_vps 12345678

# Now Terraform knows about it again

Option B: Restore from backup (if available)

# Restore state from backup
cp terraform.tfstate.backup terraform.tfstate

# Or from remote backend
terraform init -reconfigure


Remote State on Storage Box

Why Remote State

Local state (default):

terraform.tfstate in project directory

Problems: - ❌ No collaboration (file on one machine) - ❌ No locking (concurrent runs corrupt state) - ❌ No backup (lose file = lose state) - ❌ Secrets in plaintext on disk

Remote state (recommended):

s3://terraform-state/infrastructure/terraform.tfstate

Benefits: - ✅ Shared access (multiple developers/machines) - ✅ State locking (prevent concurrent modifications) - ✅ Automatic backup (S3 durability) - ✅ Encryption at rest (Storage Box encryption)

Backend Configuration

terraform.tf:

terraform {
  backend "s3" {
    # Bucket name
    bucket = "terraform-state"

    # State file path within bucket
    key    = "infrastructure/terraform.tfstate"

    # Dummy region (required by S3 API but not used by Storage Box)
    region = "us-east-1"

    # Storage Box S3 endpoint
    endpoint = "https://u522581.your-storagebox.de"

    # Credentials (provided via -backend-config flags)
    # access_key = <set via flag>
    # secret_key = <set via flag>

    # S3 compatibility settings
    skip_credentials_validation = true
    skip_region_validation      = true
    skip_metadata_api_check     = true
    force_path_style            = true
  }
}

Initialize Backend

# From environment (secrets from Infisical)
terraform init \
  -backend-config="access_key=$STORAGEBOX_ACCESS_KEY" \
  -backend-config="secret_key=$STORAGEBOX_SECRET_KEY"

# Terraform downloads state from Storage Box
# Local .terraform/ directory created
# State cached locally, synced on operations

How It Works

DEVELOPER
terraform plan/apply
TERRAFORM
    ├─ Reads state from: s3://terraform-state/infrastructure/terraform.tfstate
    ├─ Locks state (prevents concurrent access)
    ├─ Performs operation
    ├─ Writes updated state to S3
    └─ Unlocks state
STORAGE BOX
    └─ terraform.tfstate (encrypted, backed up)

Multiple Environments

Use different state files per environment:

# Production
backend "s3" {
  key = "prod/terraform.tfstate"
}

# Staging
backend "s3" {
  key = "staging/terraform.tfstate"
}

# Development
backend "s3" {
  key = "dev/terraform.tfstate"
}

State Locking

The Problem

Without locking:

DEVELOPER A                  DEVELOPER B
    ↓                            ↓
terraform apply (starts)    terraform apply (starts)
    ↓                            ↓
Read state: server=CPX42    Read state: server=CPX42
    ↓                            ↓
Plan: upgrade to CPX51      Plan: downgrade to CPX31
    ↓                            ↓
Apply: server → CPX51       Apply: server → CPX31
    ↓                            ↓
Write state: CPX51          Write state: CPX31 (overwrites!)
State corrupted: Says CPX31, but server is actually CPX51

The Solution: Locking

With locking:

DEVELOPER A                  DEVELOPER B
    ↓                            ↓
terraform apply (starts)    terraform apply (starts)
    ↓                            ↓
Acquire lock ✅             Try lock ❌ (locked by A)
    ↓                            ↓
Read state                   Error: State locked
    ↓                            ↓
Apply changes                Wait...
Write state
Release lock
                            Acquire lock ✅
                            Read updated state
                            Apply changes

Locking Options

Option 1: DynamoDB (AWS) - Full S3 backend support

terraform {
  backend "s3" {
    bucket         = "terraform-state"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-east-1"

    # DynamoDB table for locking
    dynamodb_table = "terraform-state-lock"

    # Requires AWS account + DynamoDB table
  }
}

Pros: Native support, battle-tested Cons: Requires AWS account, additional service

Option 2: Terraform Cloud - Managed locking

terraform {
  backend "remote" {
    organization = "kavi-infra"

    workspaces {
      name = "production"
    }
  }
}

Pros: Built-in locking, free tier, managed Cons: State stored on Terraform Cloud (not self-hosted)

Option 3: File-based locking - Simple but limited

Storage Box S3 backend doesn't support native locking. Workarounds: - Single operator (you) - low risk - Manual coordination ("I'm running terraform, wait 5 min") - Wrapper script with lock file

Recommendation for Kavi infrastructure: - Current: Single operator, low risk - Future (if team grows): Terraform Cloud free tier or AWS DynamoDB


State Operations

terraform state list

List all managed resources:

terraform state list

# Example output:
# hcloud_ssh_key.macbook
# hcloud_ssh_key.ipad
# hcloud_server.hetzner_vps
# cloudflare_record.root
# cloudflare_record.secrets

terraform state show

Show detailed attributes of a resource:

terraform state show hcloud_server.hetzner_vps

# Output:
# resource "hcloud_server" "hetzner_vps" {
#     id              = "12345678"
#     name            = "hetzner-vps"
#     server_type     = "cpx42"
#     ipv4_address    = "46.224.146.107"
#     status          = "running"
#     location        = "fsn1"
#     ...
# }

terraform state mv

Rename a resource in state (doesn't change infrastructure):

# Renamed in .tf files:
# hcloud_server.hetzner_vps → hcloud_server.main_vps

# Update state to match:
terraform state mv hcloud_server.hetzner_vps hcloud_server.main_vps

# Now Terraform knows they're the same resource

Use case: Refactoring Terraform code without recreating resources

terraform state rm

Remove a resource from state (infrastructure still exists!):

# Remove from Terraform management
terraform state rm hcloud_server.hetzner_vps

# Resource still exists in Hetzner
# But Terraform no longer manages it

Use case: - Migrating resource to different Terraform project - Handing off resource to manual management - Danger: Easy to forget about "orphaned" resources

terraform import

Add existing resource to state:

# Resource exists in Hetzner (ID: 12345678)
# But not in Terraform state

# Import it
terraform import hcloud_server.hetzner_vps 12345678

# Now Terraform manages it

Use case: - Importing manually-created resources - Recovering from state deletion - Migrating to Terraform

terraform state pull

Download state file:

# Get current state from backend
terraform state pull > current-state.json

# Useful for inspection or backup

terraform state push

Upload state file (dangerous!):

# Upload modified state
terraform state push modified-state.json

# ⚠️ Use with extreme caution
# Can corrupt state if wrong

Backup and Recovery

Automatic Backups

Terraform creates backups automatically:

# Before each state write
terraform apply

# Creates:
# terraform.tfstate        ← Current state
# terraform.tfstate.backup ← Previous state

Storage Box backups: - Remote state on Storage Box - Storage Box has snapshot feature - Daily snapshots retained (check Hetzner docs for retention)

Manual Backup

Before risky operations:

# Download current state
terraform state pull > backup-$(date +%Y%m%d-%H%M%S).json

# Or if using remote backend
cp terraform.tfstate terraform.tfstate.backup-$(date +%Y%m%d)

Recommended frequency: - Before major changes (new VPS, DNS migration) - Weekly scheduled backup - Before running terraform destroy

Recovery Scenarios

Scenario 1: Accidentally deleted local state (remote backend exists)

# Solution: Reinitialize
rm -rf .terraform terraform.tfstate*
terraform init \
  -backend-config="access_key=$STORAGEBOX_ACCESS_KEY" \
  -backend-config="secret_key=$STORAGEBOX_SECRET_KEY"

# State downloaded from Storage Box
# Back to working state

Scenario 2: Remote state corrupted

# Restore from manual backup
terraform state push backup-20250115.json

# Or restore from Storage Box snapshot
# (Requires accessing Storage Box, downloading old snapshot)

Scenario 3: Lost Storage Box state entirely

# Last resort: Rebuild state via imports
# For each resource:

terraform import hcloud_server.hetzner_vps 12345678
terraform import cloudflare_record.root abc123def456

# Tedious but possible
# This is why backups are critical

Common Mistakes

1. Deleting State File

Mistake:

rm terraform.tfstate  # "Start fresh"

Result: Terraform forgets all resources (amnesia)

Fix: Restore from backup or import resources

2. Committing State to Git

Mistake:

git add terraform.tfstate
git commit -m "Add Terraform state"

Problems: - Secrets exposed in Git history - Merge conflicts on collaborative projects - State diverges from reality

Fix: Add to .gitignore, use remote backend

3. Editing State Manually

Mistake:

# vim terraform.tfstate
# Change server_type: "cpx31" → "cpx42"

Result: State says CPX42, reality is CPX31 (drift)

Fix: Use terraform state commands or terraform apply

4. Concurrent Runs Without Locking

Mistake:

# Terminal A
terraform apply &

# Terminal B (while A is running)
terraform apply

Result: State corruption

Fix: Use state locking (DynamoDB or Terraform Cloud)

5. Losing State Backups

Mistake: Only one copy of state (no backups)

Result: Corruption = total loss

Fix: Multiple backups (remote + local + manual)


Security

Secrets in State

Problem: State contains sensitive data

{
  "resources": [
    {
      "type": "random_password",
      "attributes": {
        "result": "MySecretPassword123!"   Plaintext in state
      }
    }
  ]
}

Mitigation:

  1. Encrypt state at rest: Storage Box encryption
  2. Restrict access: Only necessary team members
  3. Never commit to Git: Use remote backend only
  4. Use Infisical for secrets: Don't generate secrets in Terraform

State File Permissions

Local state (if used):

# Restrictive permissions
chmod 600 terraform.tfstate
chmod 600 terraform.tfstate.backup

Remote state: - Storage Box credentials in Infisical only - TF_VAR environment variables (not .tfvars files) - Never commit backend credentials to Git

Audit Access

Who can access state? - Anyone with Storage Box credentials - Anyone who runs terraform in the project

Best practices: - Rotate Storage Box password periodically - Audit who has Infisical access to STORAGEBOX_* secrets - Monitor Storage Box access logs (if available)


Summary

Terraform State Essentials: - ✅ State = Terraform's memory (not a log) - ✅ Contains current infrastructure state - ✅ Enables resource tracking, dependency resolution, drift detection - ✅ Remote backend (Storage Box S3) for collaboration - ✅ State locking prevents corruption - ✅ Backup before risky operations

The 30% Mistake: - ❌ State is NOT a log or history - ✅ State is Terraform's memory of current infrastructure - ❌ Deleting state = amnesia (resources orphaned) - ✅ Restore from backup or import resources

Remote State on Storage Box: - Location: s3://terraform-state/infrastructure/terraform.tfstate - Backend: S3-compatible endpoint - Encryption: At rest on Storage Box - Locking: Manual coordination or Terraform Cloud (future)

Critical Operations: - terraform state list - List managed resources - terraform state show - Show resource details - terraform import - Import existing resources - terraform state pull - Backup state - Always backup before major changes

What's Next: - Learn Providers - Hetzner, Cloudflare, Docker setup - Learn Workflow - best practices with state - Learn VPS Management - provision infrastructure