Storing your Terraform state remotely in an S3 bucket isn’t just about not losing your state file; it’s fundamentally about enabling collaboration and preventing accidental overwrites.
Let’s watch Terraform in action, managing a simple S3 bucket and configuring itself to use that same bucket for its state.
# main.tf
resource "aws_s3_bucket" "terraform_state_bucket" {
bucket = "my-unique-terraform-state-bucket-20231027" # Must be globally unique
acl = "private"
versioning {
enabled = true
}
tags = {
Name = "Terraform State Bucket"
Environment = "Development"
}
}
resource "aws_s3_bucket_versioning" "terraform_state_bucket_versioning" {
bucket = aws_s3_bucket.terraform_state_bucket.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_dynamodb_table" "terraform_state_lock_table" {
name = "terraform-state-lock-dynamodb"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
# backend.tf
terraform {
backend "s3" {
bucket = "my-unique-terraform-state-bucket-20231027" # Matches the bucket name above
key = "dev/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock-dynamodb" # Matches the DynamoDB table name above
encrypt = true
}
}
When you run terraform init with this configuration, Terraform will first create the S3 bucket and the DynamoDB table if they don’t exist. Then, it configures its backend to use these resources. Subsequent terraform apply commands will read the state from s3://my-unique-terraform-state-bucket-20231027/dev/terraform.tfstate and use the DynamoDB table for locking.
The core problem this solves is the single, local terraform.tfstate file. Imagine two engineers running terraform apply simultaneously. Without remote state and locking, they could each download the same terraform.tfstate, make different changes, and then one’s apply would overwrite the other’s, leading to a desynchronized infrastructure. The S3 backend stores the state file in a central, accessible location. The DynamoDB table acts as a gatekeeper: before Terraform modifies the state, it tries to acquire a lock in the DynamoDB table. If the lock exists, it means someone else is currently modifying the state, and Terraform will wait or error out, preventing concurrent writes.
You control the bucket name (which must be globally unique across all AWS accounts), the key (the path within the bucket where the state file will live), the region where these resources reside, and the dynamodb_table name for locking. Setting encrypt = true ensures that your state file is encrypted at rest within S3.
The locking mechanism is crucial. When Terraform attempts to acquire a lock, it creates an item in the DynamoDB table with a LockID attribute (typically set to "default") and a TTL (Time To Live) that’s much longer than any expected terraform apply duration. If a terraform apply is interrupted unexpectedly (e.g., a network failure, a Ctrl+C), the lock might remain in the DynamoDB table, preventing future operations. To clear a stale lock, you’d typically delete the item from the DynamoDB table.
When you run terraform plan or terraform apply, Terraform checks the S3 bucket for the state file. If it finds one, it downloads it, reads the current infrastructure state, and then attempts to acquire a lock in the specified DynamoDB table. If the lock is acquired, it proceeds with generating the plan or applying the changes. Upon successful completion of apply, it uploads the new state file to S3 and releases the lock. If apply fails, the lock is not automatically released, which is why you sometimes need to manually intervene.
The ability to encrypt your state file at rest in S3 using encrypt = true is a good default, but it’s important to understand that this uses SSE-S3 (Server-Side Encryption with Amazon S3-Managed Keys). For more granular control over encryption keys, you can specify kms_key_id and sse_kms_key_id to use AWS KMS keys. This adds another layer of security and compliance for sensitive environments.
The most surprising part of the S3 backend for many is how robust the locking is, but also how brittle it can be if not managed. A common misconception is that Terraform automatically cleans up locks. It doesn’t. If a process is killed mid-apply, the lock remains. You’ll then see errors like Error: Error acquiring state lock. Another Terraform process (PID 12345) holds the lock... and you’ll need to manually delete the item from the DynamoDB table.
Understanding the interplay between S3 for state storage and DynamoDB for locking is key to managing infrastructure as code effectively in a team.