Terraform drift detection in CI isn’t about preventing manual changes; it’s about making the inevitable manual changes visible and manageable before they break your production infrastructure.

Let’s see what this looks like in practice. Imagine you have a Terraform configuration for an AWS S3 bucket, and you’ve defined it like this:

resource "aws_s3_bucket" "example" {
  bucket = "my-unique-terraform-example-bucket-12345"
  acl    = "private"

  tags = {
    Environment = "Development"
    ManagedBy   = "Terraform"
  }
}

This code is committed to your Git repository. Now, someone on your team, perhaps under pressure, decides to quickly change the ACL of this bucket directly in the AWS console to public-read to troubleshoot an issue, forgetting it’s managed by Terraform.

Here’s how you’d integrate drift detection into your CI pipeline, typically using GitHub Actions or GitLab CI. The core command is terraform plan.

name: Terraform CI

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.5.0 # Use your desired Terraform version

      - name: Terraform Init
        run: terraform init

      - name: Terraform Validate
        run: terraform validate

      - name: Terraform Plan
        run: terraform plan
        # This step will fail if there's drift.
        # For a real CI, you'd capture the output and compare it.
        # For a simple failure, Terraform plan will exit non-zero if changes are needed.

When the CI pipeline runs after the manual change, terraform plan will detect that the actual state of the aws_s3_bucket.example resource in AWS no longer matches what Terraform thinks it should be.

Terraform will perform the following actions:

  # aws_s3_bucket.example will be updated in-place
  ~ resource "aws_s3_bucket" "example" {
      ~ acl    = "private" -> "public-read" # This is the drift
        id   = "my-unique-terraform-example-bucket-12345"
      ~ tags = {
          - "ManagedBy"   = "Terraform"
            "Environment" = "Development"
        }
        # (other attributes unchanged)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

The CI job will then fail because terraform plan exited with a non-zero status code, indicating that there are pending changes. This immediately alerts the team that the infrastructure state managed by Terraform has diverged from the actual deployed state.

The mental model here is that Terraform maintains a state.tfstate file (locally or in a remote backend) which is the source of truth for what Terraform believes is deployed. When you run terraform plan, it compares the desired state in your .tf files against the actual state in your state.tfstate file and then queries your cloud provider to see if the live infrastructure matches the state file. Any mismatch detected between the live infrastructure and the state file is "drift."

The exact levers you control are your Terraform configurations and your CI pipeline’s triggers and steps. By running terraform plan as a gatekeeper in your CI, you ensure that any deviation from the declared infrastructure is flagged before a terraform apply can merge or be approved.

Common Causes of Drift and How to Fix Them:

  1. Manual Changes via Cloud Provider Console/CLI:

    • Diagnosis: Run terraform plan. Look for resources that are marked for update (~) or destruction (-) that you didn’t intend to change. Compare the Plan: output with your current .tf files.
    • Fix:
      • Option A (Revert Manual Change): If the manual change was accidental or can be easily undone, revert it in the cloud provider console/CLI. Then, re-run terraform plan to confirm drift is gone. This is the ideal scenario.
      • Option B (Import into Terraform): If the manual change was intentional and necessary, you need to bring Terraform’s state file in sync.
        1. Identify the drifted resource ID (e.g., the S3 bucket name my-unique-terraform-example-bucket-12345).
        2. Run terraform import aws_s3_bucket.example my-unique-terraform-example-bucket-12345.
        3. After import, run terraform plan again. It should now show no changes for that resource, or minimal changes if your .tf file was also out of date. Update your .tf file to match the imported state if necessary.
    • Why it works: terraform import reads the live resource’s attributes from the cloud provider and writes them into your state.tfstate file, making Terraform aware of the resource’s current configuration as if it had managed it all along.
  2. Changes by Other Tools or Automation:

    • Diagnosis: Same as above: terraform plan will show the drift.
    • Fix: If another automation tool is making changes, you must decide:
      • Option A (Consolidate): Modify the other tool to stop managing the resource and let Terraform handle it. Then use terraform import as described above.
      • Option B (Co-manage with Caution): If co-management is unavoidable, ensure the other tool’s changes are reflected in your Terraform code before running terraform plan. This often involves manual updates to .tf files or a more complex workflow.
    • Why it works: Terraform’s plan command is the ultimate arbiter. If it detects a difference between its state and reality, it flags it. Bringing the state file in sync is key.
  3. Terraform Provider Bugs or Updates:

    • Diagnosis: Run terraform plan. If a resource shows unexpected changes after a provider update or even without one, it might be a provider issue. Check the Terraform provider’s GitHub issues.
    • Fix:
      • Option A (Pin Provider Version): In your .tf files, specify the exact provider version being used:
        terraform {
          required_providers {
            aws = {
              source  = "hashicorp/aws"
              version = "5.0.0" # Pin to a known good version
            }
          }
        }
        
        Then run terraform init and terraform plan.
      • Option B (Update Terraform Code): If the provider update requires changes to your .tf files, update them accordingly.
      • Option C (Report Bug): If it’s a genuine provider bug, report it to the provider’s maintainers.
    • Why it works: Pinning a provider version ensures that terraform plan uses the same logic it did previously, preventing unexpected drift due to provider changes.
  4. Resource Deletion Outside Terraform:

    • Diagnosis: terraform plan will show a resource marked for destruction (-) that is no longer present in the cloud provider. This is a specific type of drift.
    • Fix:
      1. Run terraform state rm <resource_address> (e.g., terraform state rm aws_s3_bucket.example).
      2. Run terraform plan. The resource should no longer appear as needing destruction.
    • Why it works: terraform state rm removes the resource from Terraform’s state file, telling Terraform it no longer needs to manage it because it’s already gone from the cloud.
  5. Changes to Resource Attributes Not Managed by Terraform:

    • Diagnosis: terraform plan might show changes to attributes that your Terraform code doesn’t explicitly define.
    • Fix:
      • Option A (Ignore): If these are attributes you don’t care about or that are managed by another system, you can tell Terraform to ignore them.
        resource "aws_s3_bucket" "example" {
          bucket = "my-unique-terraform-example-bucket-12345"
          acl    = "private"
        
          tags = {
            Environment = "Development"
            ManagedBy   = "Terraform"
          }
        
          lifecycle {
            ignore_changes = [
              # Ignore changes to the 'acl' attribute if it's managed elsewhere
              acl,
              # Ignore changes to specific tags
              tags["ManagedBy"]
            ]
          }
        }
        
        Then run terraform plan.
      • Option B (Manage): If you do want Terraform to manage these attributes, add them to your .tf file and let Terraform apply them.
    • Why it works: The ignore_changes lifecycle block tells terraform plan to disregard differences in specified attributes between the state file and the actual infrastructure, effectively silencing that specific drift.
  6. Incorrectly Configured Remote State Backend:

    • Diagnosis: If your remote state backend (like an S3 bucket for state) is misconfigured, inaccessible, or corrupted, terraform plan might fail or report spurious drift because it cannot read the correct state.
    • Fix:
      1. Verify your remote backend configuration in main.tf (or similar).
      2. Ensure the credentials/roles used by your CI runner have permission to access the backend.
      3. Check the backend itself for integrity (e.g., is the S3 bucket name correct? Does it exist? Are there versioning issues if applicable?).
      4. If the state file itself is corrupted, you might need to restore from a backup or, in extreme cases, re-initialize the backend and import existing resources.
    • Why it works: Terraform relies on an accurate, accessible state file to compare against. If the state file is unavailable or incorrect, it cannot perform drift detection reliably.

The next error you’ll hit after fixing drift is often a terraform apply failure if you have pre-commit hooks that don’t run terraform plan or if there are other validation issues in your code.

Want structured learning?

Take the full Terraform course →