Terraform’s create_before_destroy lifecycle argument, when used with resource replacements, doesn’t actually guarantee zero downtime; it only ensures that the new resource is fully provisioned before the old one is destroyed.
Let’s see this in action. Imagine you have a load balancer and a web server instance behind it. You want to replace the web server with a new one, perhaps to update its configuration or AMI.
resource "aws_instance" "webserver" {
ami = "ami-0c55b159cbfafe1f0" # Example AMI
instance_type = "t2.micro"
tags = {
Name = "HelloWorld"
}
lifecycle {
create_before_destroy = true
}
}
resource "aws_lb_target_group" "web_targets" {
name = "web-targets"
port = 80
protocol = "HTTP"
vpc_id = "vpc-0123456789abcdef0" # Example VPC ID
health_check {
path = "/"
}
}
resource "aws_lb_target_group_attachment" "web_attachment" {
target_group_arn = aws_lb_target_group.web_targets.arn
target_id = aws_instance.webserver.id
port = 80
}
# Assume a load balancer (aws_lb) is already configured
# and attached to the web_targets target group.
When you change ami in the aws_instance.webserver resource and run terraform apply, Terraform’s plan will show a replacement:
-/+ resource "aws_instance" "webserver" {
id = "i-0abcdef1234567890"
ami = "ami-0c55b159cbfafe1f0" -> "ami-0fedcba9876543210" # Example new AMI
# ... other attributes
}
Here’s what create_before_destroy = true does:
- Create New: Terraform initiates the creation of a new
aws_instancewith the updated AMI. It waits for this new instance to be fully provisioned and ready. - Attach New: Once the new instance is ready, Terraform attempts to attach it to the
aws_lb_target_group. - Detach Old: After the new instance is successfully registered with the target group, Terraform proceeds to detach the old instance from the target group.
- Destroy Old: Finally, the old
aws_instanceis destroyed.
The "zero-downtime" aspect comes from the load balancer’s health checks and its awareness of target group memberships. As long as the new instance passes health checks, the load balancer will start sending traffic to it before it stops sending traffic to the old one.
However, "downtime" is a nuanced term here. The service might remain available, but the specific instance you were interacting with is gone. If you were in the middle of a long-running request on the old instance, that request will be terminated when the instance is destroyed. The critical factor for service availability is the speed of provisioning the new instance and its ability to pass health checks before the old one is de-registered.
The most surprising thing about create_before_destroy is that it doesn’t inherently solve the problem of stateful connections or long-running operations. If your application needs to gracefully shut down, you’ll need to implement that logic within the application itself, perhaps by listening for termination signals and completing active tasks before exiting. Terraform can trigger the replacement, but the application must manage its own shutdown sequence.
The exact mechanism of create_before_destroy involves Terraform managing the order of operations for a given resource. When a resource is marked for replacement (e.g., due to a change in an immutable attribute like an AMI), Terraform first triggers the creation of the new resource, waits for it to become "ready" (as defined by the provider, often meaning it’s fully provisioned and accessible), then updates any dependent resources (like load balancer attachments) to include the new resource and exclude the old one, and only then proceeds to destroy the old resource. This orchestrated sequence is what allows for a smooth transition for traffic managed by external services like load balancers.
The next challenge you’ll face is handling database schema migrations or other stateful data changes during these rolling updates.