Back to all blog posts

How to detect, manage and remediate Drift in Terraform, OpenTofu and Terragrunt with Terramate

Picture of Soren Martius
Sören Martius Chief Product Officer
Reading Time:9 min read

This guide provides an in-detail explanation of how to use Terramate’s drift detection and remediation capabilities to keep your Terraform, OpenTofu and Terragrunt environments drift-free and efficient.

How to detect, manage and remediate drift in Terraform, OpenTofu and Terragrunt with Terramate

Introduction

In the world of infrastructure-as-code (IaC), one critical concept is the drift, which refers to the disparity that can arise between the planned, desired state of your infrastructure and its actual real-world condition. This discrepancy might originate from two sources, changes directly introduced by external actors (either humans or machines) and changes occurring due to the dependency of your resources on external data sources.

When managing complex IaC environments using tools such as Terraform, OpenTofu, Pulumi and others, drifts are almost inevitable. To detect and manage drift efficiently, organizations must implement processes and workflows. If undetected and ignored, drift can significantly impact your infrastructure's stability and add risk to your business by causing cost explosion or security vulnerabilities. If you want to learn more about drift as a fundamental concept in IaC, the different reasons why it exists and what risks and can cause, we advise you to take a look at our guide, “The Ultimate Guide For Terraform and OpenTofu Drift Detection and Remediation”.

Terramate, a modern Infrastructure as Code management platform, has built-in drift detection and management features designed to help teams efficiently detect, manage, observe and remediate drift.

This guide examines Terramate’s drift detection and remediation capabilities and how they help keep your IaC environments drift-free and efficient.

About Orchestration in Terramate

Before we look at how drift detection works in Terramate, let’s look at how Terramate CLI and Terramate Cloud work fundamentally. First and foremost, it’s important to understand that Terramate Cloud is not a CI/CD platform, meaning that we don’t host or run IaC tools such as Terraform, OpenTofu or Terragrunt for you.

Instead, Terramate shifts the orchestration capabilities to the client side. You can use Terramate CLI to orchestrate IaC tools such as Terraform, OpenTofu, and Terragrunt locally or on any CI/CD such as GitHub Actions, GitLab CI/CD or BitBucket Pipelines. This differentiates Terramate fundamentally from most other vendors and platforms in the market because you can fully configure your orchestration workflows on the client side without relying on Terramate Cloud! This also means your workflows can be invoked and tested locally and ported to any other system or CI/CD platform.

Another great side-effect of shifting the orchestration to the client side is that Terramate Cloud doesn’t require access to your cloud accounts or state buckets, making it highly secure to use.

How Terramate works overviewTerramate Cloud seamlessly integrates with Terramate CLI and offers features such as better Pull Request Previews, Observability, Asset Management, Policies, Alerts and Notifications, Drift Management and more that help teams collaborate better.

To use Terramate Cloud, you must sync data from Terramate CLI to Terramate Cloud for operations such as previews, deployments and drift checks. For example, the following command creates plan files called drift.tfplan in all available stacks and syncs those plans to Terramate Cloud. It explicitly marks the data sync as a drift detection workflow, meaning if any plans contain diffs (which means that your live infrastructure has drifted away from your desired configuration), Terramate Cloud will process them as drifts and relate them to your stacks in Terramate Cloud.

terramate run \
  --sync-drift-status \
  --terraform-plan-file=drift.tfplan \
  --continue-on-error \
  --parallel 5 \
  -- \
  terraform plan -out drift.tfplan -detailed-exitcode -lock=false

Here’s a quick overview of the data that will be synced to Terramate Cloud whenever you decide to sync pull requests, deployments or drift detection workflows:

  • Sanitized Terraform and OpenTofu plans
  • Git metadata
  • GitHub/GitLab, etc. metadata

Plans are always sanitized locally by Terramate CLI before being synced and processed to Terramate Cloud. We filter out all sensitive values, such as secrets and credentials, that might be persisted in your plan files. For a detailed overview, see the Security and Data Access Overview section of the documentation.

If you want to learn more about how Terramate CLI and Terramate Cloud work under the hood, we advise you to read the How it Works section in our documentation.

How Terramate Manages Drift

Now that we have learned about Terramate CLI and Terramate Cloud fundamentals, let’s understand how drift detection works in Terramate.

There are two primary ways in Terramate to detect drift:

  • Using scheduled drift detection jobs
  • Using drift detection health checks after failed or successful deployments

Let’s look at both options in detail, understand their use cases and see how we can configure them.

Using Scheduled Drift Detection Jobs

This runs scheduled drift detection jobs in your CI/CD. You can configure different workflows and intervals for stacks, environments, etc. We recommend running a drift check in all relevant stacks at least once every 24 hours. For example, the following GitHub Actions workflow runs a drift detection job in all stacks at 2:00 AM daily.

name: Scheduled Terraform Drift Detection

on:
  schedule:
    - cron: '0 2 * * *'

jobs:
  drift-detection:
    name: Check Drift

    permissions:
      id-token: write
      contents: read
      pull-requests: read
      checks: read

    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          ref: ${{ github.head_ref }}
          fetch-depth: 0

      - name: Install Terramate
        uses: terramate-io/terramate-action@v1

      - name: Install Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.4
          terraform_wrapper: false

      - name: Configure AWS credentials via OIDC
        if: steps.list.outputs.stdout
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-region: 'CHANGEME: AWS REGION'
          role-to-assume: 'CHANGEME: IAM ROLE ARN'

      - name: Run Terraform init on all stacks
        id: init
        run: terramate run -- terraform init

      - name: Run drift detection
        id: drift
        run: |
          terramate run \
          --sync-drift-status \
          --terraform-plan-file=drift.tfplan \
          --continue-on-error \
          --parallel 5 \
          -- \
          terraform plan -out drift.tfplan -detailed-exitcode -lock=false
        env:
          GITHUB_TOKEN: ${{ github.token }}

Like the workflow above, our documentation includes blueprints for GitLab CI/CD, BitBucket Pipelines and others.

Using Drift Detection Health Checks after Failed or Successful Deployments

We recommend you run drift detection health checks for all stacks deployed with, e.g., terraform apply , tofu apply , or terragrunt apply after a successful or failed deployment

For successful deployments, the drift detection health check guarantees that the infrastructure resources don’t drift right away—which often happens with auto scalers that scale up and down right after a deployment.

For failed deployments, the drift detection health check will create a new plan, which is extremely helpful in the case of partially applied deployments. This will help your team understand what part of the desired configuration has been successfully applied and what part is missing. We will dive into this further later in this article.

For example, the following GitHub Actions workflow shows a deployment pipeline with a drift detection health check after each deployment.

name: Terraform Deployment

on:
  push:
    branches:
      - main

jobs:
  deploy:
    name: Deploy Terraform changes in changed Terramate stacks

    permissions:
      id-token: write
      contents: read
      pull-requests: read
      checks: read

    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Install Terramate
        uses: terramate-io/terramate-action@v1

      - name: Install Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.4
          terraform_wrapper: false

      - name: List changed stacks
        id: list
        run: terramate list --changed

      - name: Configure AWS credentials via OIDC
        if: steps.list.outputs.stdout
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-region: 'CHANGEME: AWS REGION'
          role-to-assume: 'CHANGEME: IAM ROLE ARN'

      - name: Run Terraform init in each changed stacks
        if: steps.list.outputs.stdout
        run: |
          terramate run \
            --changed \
            -- \
            terraform init

      - name: Create Terraform plan on changed stacks
        if: steps.list.outputs.stdout
        run: |
          terramate run \
            --changed \
            -- \
            terraform plan -lock-timeout=5m -out out.tfplan

      - name: Apply planned changes on changed stacks
        if: steps.list.outputs.stdout
        run: |
          terramate run \
            --changed \
            --sync-deployment \
            --terraform-plan-file=out.tfplan \
            -- \
            terraform apply -input=false -auto-approve -lock-timeout=5m out.tfplan
        env:
          GITHUB_TOKEN: ${{ github.token }}

      - name: Run drift detection
        if: steps.list.outputs.stdout && ! cancelled() && steps.apply.outcome != 'skipped'
        run: |
          terramate run \
            --changed \
            --sync-drift-status \
            --terraform-plan-file=drift.tfplan \
            -- \
            terraform plan -out drift.tfplan -detailed-exitcode
        env:
          GITHUB_TOKEN: ${{ github.token }}

By looking at the workflow, you will see that Terramate CLI orchestrates the terraform apply command in all stacks that contain changes (hence the --changed flag) and syncs the operation as a deployment to Terramate Cloud. After the deployment, Terramate will orchestrate the terraform plan command in exactly the same stacks and sync those as a drift detection health check to Terramate Cloud.

Drift Visibility in Terramate Cloud

Now that we have learned how Terramate could be used to run scheduled drift detection workflows and how to check the integrity of deployments with drift detection health checks after a successful or failed deployment, let’s look at how Terramate Cloud helps you manage drift efficiently.

Whenever you sync plans that contain diffs and thus indicate a drift with the --sync-drift-status flag, Terramate Cloud will do the following things:

  • Mark all stacks in Terramate Cloud that contain drift as drifted .
  • Save the plan as a health check for each stack so that developers can review the drift for each stack and decide how to remediate it.
  • Create an Alert for each drifted stack in Terramate Cloud and automatically assign it to the relevant individuals or teams (owners of the related stacks).
  • Send new Slack Notifications to all relevant individuals and teams via the Slack Bot Integration (Optionally, you can also configure notifications to be sent to a centralized Slack channel to inform a broader audience).

Let’s address each action in more detail below.

View Drifted Stacks in Terramate Cloud

In the stacks detail view in Terramate Cloud, you can review each stack's drift in detail. For example, the following screenshot shows a stack used to manage the settings for a GitHub repository using Terraform. The drift indicates that a branch protection configuration was changed outside of Terraform.

It also shows you:

  • Current alerts, such as failed deployments or drift indicating that a stack is unhealthy
  • What resources contain drift, and how does it look like
  • The latest drift runs and deployments
  • All policy checks and if any of your resources contain policy violations. Every time you check a stack for drifts, Terramate Cloud automatically executes all policy checks to detect any misconfiguration caused by drift.

Stacks Details with Drift Alerts VisibilitySince you can sync data from multiple repositories and even from multiple VCS accounts and organizations, it is ideal to keep a holistic overview of all your infrastructure managed with IaC in a single place.

Using Alerts in Terramate Cloud to manage Drift efficiently

Having a single dashboard allowing us to keep track of drift is great, but often, having an overview only is not actionable for teams. For example, how do I know which person is responsible for remediating this drift? How can I keep up to date on newly detected drift or whenever a drift is being resolved?

Terramate Cloud Alerts OverviewTerramate Cloud Alerts DetailsBecause of this, we’ve added Alerts to Terramate Cloud. Alerts help teams assign and manage incidents such as failed deployments and drift to individuals and teams, allowing you to effectively manage such events without losing track.

Alerts automatically assign the relevant individuals and teams to new incidents and integrate seamlessly with the Slack Bot Integration. This allows you to directly inform users in your Slack workspace about alerts assigned to them and about continuous updates for previously created and assigned alerts.

Terramate Cloud Alerts Slack IntegrationRemediating Drift with Terramate

Terramate CLI and Terramate Cloud help you detect drift with scheduled workflows, alerts, and Slack notifications, but how can we use both to remediate drift so that we can keep our infrastructure drift-free?

How to reconcile Drift with Terramate

Reconciling drift means we overwrite changes to our live infrastructure outside of Terraform. This is typically done by running another application, e.g., terraform apply or tofu apply . With Terramate, you can easily query your Terramate Cloud Organization for drifted stacks, allowing you to filter all stacks in a repository.

Running commands such as terraform apply on drifted stacks

One way to reconcile drift is to run commands such as terraform apply in all drifted stacks using Terramate CLI.

terramate run --status=drifted -- terraform apply

Adding a change trigger for all drifted stacks to be considered by the change detection in the CI/CD run

When running Terramate in automation, you often configure deployment pipelines that use Terramate's change detection to only deploy stacks that contain changes. To reconcile stacks using the normal Pull Request flow, it’s recommended to mark the relevant stacks as triggered using Change Triggers so that they will be considered in the next CI run even though they don’t contain any changes.

terramate experimental trigger --status=drifted

The status filter argument can also be combined with other filters such as change detection, tags and directories, allowing you to filter the graph of orchestrated stacks in an advanced manner, e.g.:

terramate run \
  --filter drifted \ # accepts 'drifted', 'failed' and 'unhealthy'
  --chidr stacks/prod \
  --tags terraform \
  --changed \
  --parallel 5 \
  -- \
  terraform apply

Automatically reconciling drift in scheduled workflows

A common way of remediating drift is to automatically reconcile it once it is detected with commands such as terraform apply or tofu apply . Add a reconciliation step to the drift detection workflow discussed earlier in this guide to achieve this.

The following workflow will automatically run terraform apply on all stacks that contain drift right and that a tag reconcile at 2:00 AM daily.

name: Scheduled Terraform Drift Detection and Reconciliation

on:
  schedule:
    - cron: '0 2 * * *'

jobs:
  drift-detection:
    name: Check Drift

    permissions:
      id-token: write
      contents: read
      pull-requests: read
      checks: read

    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          ref: ${{ github.head_ref }}
          fetch-depth: 0

      - name: Install Terramate
        uses: terramate-io/terramate-action@v1

      - name: Install Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.4
          terraform_wrapper: false

      - name: Configure AWS credentials via OIDC
        if: steps.list.outputs.stdout
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-region: 'CHANGEME: AWS REGION'
          role-to-assume: 'CHANGEME: IAM ROLE ARN'

      - name: Run Terraform init on all stacks
        id: init
        run: terramate run -- terraform init

      - name: Run drift detection
        id: drift
        run: |
          terramate run \
          --sync-drift-status \
          --terraform-plan-file=drift.tfplan \
          --continue-on-error \
          --parallel 5 \
          -- \
          terraform plan -out drift.tfplan -detailed-exitcode -lock=false
        env:
          GITHUB_TOKEN: ${{ github.token }}
          
      - name: Reconcile Drift
        id: reconcile-drift
        run: |
          terramate run \
          --status=drifted \
          --tags=reconcile \
          --sync-deployment \
          --terraform-plan-file=drift.tfplan \
          --continue-on-error \
          --parallel 5 \
          -- \
          terraform apply -input=false -auto-approve -lock-timeout=5m drift.tfplan
        env:
          GITHUB_TOKEN: ${{ github.token }}

This will add a reconcile drift step to the scheduled drift detection workflow, running a deployment on all stacks marked as drifted and tagged with reconcile right after the drift detection step.

Automated reconciliation is usually advisable for all stacks that manage production infrastructure, especially for mission-critical services.

Summary

Congratulations. You just learned a lot about how to detect, manage and remediate drift in your Terraform, OpenTofu and Terragrunt projects using Terramate CLI and Terramate Cloud. Use alerts, orchestration, drift detection, cloud sync and others to keep your infrastructure drift-free while preserving full control and flexibility.

Feel free to join the Terramate Community on Discord if you have any questions or to book a demo to learn more about how Terramate can help you manage drift in your Infrastructure as Code projects.