Terraform’s remote state management is surprisingly complex, and the common setup with S3 and DynamoDB is often misunderstood.
Let’s see it in action. Imagine you have two Terraform configurations, project-a and project-b, both targeting the same AWS account.
# project-a/main.tf
provider "aws" {
region = "us-east-1"
}
terraform {
backend "s3" {
bucket = "my-unique-terraform-state-bucket"
key = "project-a.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock-dynamodb-table"
}
}
resource "aws_instance" "example_a" {
ami = "ami-0c55b159cbfafe1f0" # Example AMI, replace with a valid one
instance_type = "t2.micro"
tags = {
Name = "ProjectAInstance"
}
}
# project-b/main.tf
provider "aws" {
region = "us-east-1"
}
terraform {
backend "s3" {
bucket = "my-unique-terraform-state-bucket"
key = "project-b.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock-dynamodb-table"
}
}
resource "aws_instance" "example_b" {
ami = "ami-0c55b159cbfafe1f0" # Example AMI, replace with a valid one
instance_type = "t2.micro"
tags = {
Name = "ProjectBInstance"
}
}
When terraform init is run in project-a, it configures the S3 backend. If the my-unique-terraform-state-bucket doesn’t exist, Terraform will offer to create it. Similarly, if terraform-state-lock-dynamodb-table isn’t present, it will prompt to create that too. The S3 bucket will store your project-a.tfstate file, and the DynamoDB table will be used for state locking.
The magic happens during terraform apply. Before Terraform makes any changes to your infrastructure, it attempts to acquire a lock on the state file in the DynamoDB table. This lock is identified by the key (e.g., project-a.tfstate) and includes a unique ID representing the Terraform process. If another Terraform process is already holding the lock for that specific state file, your apply will fail with a "failed to acquire state lock" error. Once the apply is complete (successfully or with errors), Terraform releases the lock.
This setup solves the critical problem of concurrent Terraform runs corrupting your state. Without the DynamoDB lock, if two engineers (or two CI/CD pipelines) tried to terraform apply at the exact same time, they could both read the same state, make different changes, and then one would overwrite the other’s work, leading to an inconsistent and broken infrastructure.
The S3 bucket acts as the durable storage for your state file. The DynamoDB table, with its primary key LockID (String), acts as a lightweight, high-throughput service for coordinating access. When Terraform tries to lock, it inserts an item into DynamoDB with a LockID matching the state file’s key and a unique ID. If an item with that LockID already exists, the insert fails, indicating the lock is held. When Terraform releases the lock, it deletes that item.
Here’s a breakdown of the DynamoDB table structure for locking:
- TableName:
terraform-state-lock-dynamodb-table(or whatever you configure) - Primary Key:
LockID(String) - This will be the Terraform state file key (e.g.,project-a.tfstate). - Attributes: Terraform adds other attributes like
ID(the unique identifier for the locking process) andCreated(timestamp) as part of the lock.
If you ever find yourself stuck with a stale lock (e.g., a Terraform process crashed mid-apply), you can manually delete the corresponding item from the DynamoDB table. Be cautious with this, as you’re essentially forcing a lock release, so ensure no Terraform process is actually running. You can query the table using the AWS CLI:
aws dynamodb query \
--table-name terraform-state-lock-dynamodb-table \
--key-condition-expression "LockID = :lockid" \
--expression-attribute-values '{
":lockid": {"S": "project-a.tfstate"}
}'
If you see an item returned, you can delete it:
aws dynamodb delete-item \
--table-name terraform-state-lock-dynamodb-table \
--key '{"LockID": {"S": "project-a.tfstate"}}'
The most surprising part of this setup is how DynamoDB’s atomic operations are leveraged. When Terraform inserts a lock item, it uses a conditional PutItem operation. This operation only succeeds if an item with the specified primary key (LockID) does not already exist. If it does exist, the operation fails, and Terraform interprets this as the lock being held. This atomic check-and-set behavior is what guarantees that only one process can acquire the lock at a time, even under heavy concurrency.
The next major concept to grapple with is managing state drift and using Terraform’s refresh functionality in conjunction with this remote backend.