Mastering Terraform Workflows: apply-before-merge vs apply-after-merge

If you have ever managed cloud infrastructure, you likely encountered Terraform, the most popular IaC tool in the space. In 2023 Hashicorp Configuration Language (HCL) again emerged as one of the fastest-growing languages in GitHub’s yearly Octoverse report, showing a 36% year-over-year growth.

With growing adoption, teams face the same questions over and over again:

“Shall we run terraform apply after merging a Pull Request back to main or before?”

In this article, we will examine the two different approaches and answer the ultimate question of life.

About the Stateful Nature of Terraform

CI/CD for Infrastructure as Code tools such as Terraform and OpenTofu are different from pipelines used for applications because Terraform heavily relies on state, which describes the result of the latest deployment triggered with terraform apply .

Think of the state in Terraform and OpenTofu as the single source of truth that contains all information, such as configuration and metadata, but also sensitive values, such as secrets for your cloud infrastructure.

Teams usually use remote backends such as AWS S3 or Google Cloud Storage to version, secure and encrypt state files.

Unlike other CI pipelines where each pipeline can run independently and in parallel, CI pipelines for Terraform can read and modify the state. This makes them unsuitable for parallelism when working with a single-state file.

Here are some facts about the state in Terraform and its impact:

- The primary purpose of Terraform’s state is to store bindings between objects in a remote system and resource instances declared in your configuration.

- Because of that, Terraform’s state acts as a database to map Terraform configurations to the real world (deployed cloud infrastructure).

- Terraform state also tracks metadata such as resource dependencies.

- Terraform uses remote locking to prevent multiple users from accidentally running Terraform on the same configuration and state simultaneously. This ensures that each Terraform run begins with the most recent updated state.

- You can only invoke one terraform apply per state at a given time. By default, each apply locks the state exclusively for the current run, making it unsuitable for parallel deployments.

The stateful nature of Terraform makes every run of terraform apply potentially dependent on any other run. This means that, unlike traditional CI where every pipeline can run independently and in parallel, Terraform pipelines typically run sequentially and are thus blocking (you can only run one workflow that invokes terraform apply at a time).

Terraform Workflows

Now that we know Terraform pipelines must run sequentially let’s examine the two most common workflows.

Apply after Merge

This workflow ensures the clarity that the main branch always reflects the desired state of your infrastructure. The aim is to reach a 1:1 mapping between this desired state and the live state of the deployed resources.

Apply after Merge

Workflow

Open a PR that introduces some changes.
CI/CD pipeline checks code for correctness (e.g. terraform fmt , terraform validate , terraform test ), runs tests and third-party tooling. It also generates a Plan with terraform plan for each commit visible to reviewers.
Reviewers review code request changes or approve the Pull Request.
Once the Pull Request gets merged back to the main, the pipeline applies the latest reviewed Plan created in step 2 to deploy exactly the changes approved by the reviewers.

Benefits

The main branch always acts as the single source of truth (and thus reflects the current state of your infrastructure).
Address conflicts from different in-flight PRs before the apply action, preventing impacts on the live state of resources.
Simplify CI/CD pipelines by avoiding the complexity of pre-merge validations.

Cons

Provide feedback to developers only after merging, which slows down debugging and issue resolution.
Recovering from failed deployments post-merge can be challenging and disruptive. For example, if an apply fails on merge back to main, an additional Pull Request is needed. This can clutter the main branch with commits from iterative steps to correct the failed apply actions.

The beauty of the apply-after-merge workflow lies in its simplicity. No additional infrastructure or tooling is required to configure this workflow in most general-purpose CI/CD providers such as GitHub Actions, GitLab CI/CD, BitBucket Pipelines, etc.

The Problem with Flaky Applies

The biggest drawback of the apply-after-merge workflow is that most errors in Terraform are not visible during the Plan phase. For example, you often encounter issues such as permission problems, misconfigurations, or edge cases like hitting quotas when running an apply that cannot be caught during the Plan phase.

The reason for that is that many providers don’t have the ability to in-depth validate changes prior to applying them. For example, the AWS provider is mostly capable of validating whether the configuration for an S3 bucket is correct before trying to apply it; however, the provider can’t be aware of any policies configured in your AWS account that prevent the creation of public buckets. If such a policy is in place, an apply will fail.

This often gives teams the false belief that a Pull Request that appears ready to merge will deploy without any issues, only to be unpleasantly surprised after hitting the merge button.

These possibilities make the apply phase an unpredictable part of the workflow. For example, when an apply fails, the only way to fix it is to re-run the pipeline in case of networking issues such as a timeout or to open new Pull Requests to fix misconfiguration. This sends developers through the entire loop again, which can cost time and hit a developer's nerves.

Apply before Merge

Flaky applies is the main reason why some teams are adopting the apply-before-merge workflow, allowing them to apply changes inside a Pull Request by using a comment or any other trigger before merging changes back to the main branch.

Apply before Merge This deals with failures in the PR if required and then merges the new infra code in the default branch. It promises to keep the main branch clean without the clutter of iterative steps.

Workflow

Open a PR that introduces some changes.
CI/CD pipeline checks code for correctness (e.g. terraform fmt , terraform validate , terraform test ), runs tests and third-party tooling. It also generates a Plan with terraform plan for each commit visible to reviewers.
Reviewer looks at code changes and the plan file.
Reviewers or the author triggers an Apply inside the PR (optionally often creating new plans can be triggered as well).
Pipeline fetches the persisted plan file from step 2 and runs terraform apply using the previously generated plan file.
Once terraform apply ran successfully, the PR gets merged into the main branch.

Benefits

Guarantees that only changes that have been successfully validated and applied are merged and integrated into the main branch.
Catch errors and issues before they reach the main branch, keeping the git history of the main branch clean.
Increase iteration speed, useful for fixing failed to Apply run in specific environments.

Cons

The main branch often falls behind the actual state of your cloud infrastructure which is against everything that GitOps has taught us. Also, what if there are multiple PRs? Which one represents the desired state?
Developers start spending a lot of time rebasing branches to pull in the latest changes and avoid conflicts.
Increase complexity and time required to set up and maintain pipelines that support pre-merge Apply.
It needs additional tooling to prevent race conditions and conflicts.
Which often leads to locking conflicts and, thus, poor collaboration experiences.
Needs RBAC or similar permission management to manage who can trigger plans and applies.

While apply-before-merge promises to overcome the problem of flaky applies it comes with added complexity and costs. This means that whenever you want to apply changes from inside a Pull Request, you will face a whole new set of problems to solve:

How do we prevent multiple Pull Requests from adding changes to the same resources and thus causing conflicts?
How do we ensure that PRs are always merged after running an apply.
How do we orchestrate commands such as terraform plan and terraform apply on events such as a comment inside the PR?
How do we restrict the ability to run e.g. terraform apply to certain individuals and teams only?

Open source and commercial vendors such as Atlantis, Digger and Terrateam exist to provide tooling to overcome those challenges with features such as PR-level locks, autoplanning and automerging.

While it's great that tools exist to solve problems introduced by complex workflows, it remains debatable if you should consider the apply-before-merge workflow at all.

When to use apply-after-merge vs apply-before-merge

So now that we have learned about the differences the apply-after-merge and apply-before-merge workflows, which one should you use? Take a look at the following comparison to understand the potential implications based on your environment in detail:

Workflow Decision Matrix

Based on the above-mentioned facts, we think that most teams should be using the apply-after-merge workflow as it is simple to adopt and manage, has less potential for conflicts, doesn’t require additional tooling and ensures that the main branch always acts as a single source of truth!

The problem of monolithic states

apply-before-merge tends to be more problematic with monolithic and large state files. For example, if you use apply-before-merge , a single Pull Request introducing changes to a single Terraform state will lock down all other Pull Requests using a concept called PR-level locks, to avoid conflicts. Thus teams can only ever work at one Pull Request at a time whenever working with a monolithic state file.

Unless you split up the state into multiple smaller state files, a workflow such as apply-before-merge has a high chance of blocking your engineering velocity.

In fact, splitting up state is considered a best practice for limiting blast radius, speeding up CI/CD runtimes and enabling better collaboration when working with large IaC code bases.

Bonus: A lesson learned by looking at 5000 real-life IaC deployments

At Terramate, we work closely with our customers to determine the best workflows for their needs. Collevtively, our team has reviewed more than 5000 Terraform deployments, and the trend is clear: apply-before-merge is more likely to create issues and slow down teams in the long run, not to mention the added complexity of needing additional tools.

With the apply-before-merge strategy, changes may apply successfully in multiple passes within a Pull Request, which gives a false sense of stability. At any point, code in the main branch that seemed stable can break due to external data sources like SSM parameters or less strict version pinning. Therefore, regardless of how you merge, identifying failures (e.g., via drift detection) is critical. By implementing apply-after-merge , one drastically reduces the likelihood of introducing cycles.

To sum up this article, here’s an anecdote about the Toyota Production System “andon cord.” In the nineties, Toyota had the most productive car factories in the world. Car executives from other car manufacturers visited the production facilities and could not believe that every person working on the assembly line could shut down the whole thing—what a waste of productivity.

But in reality, shutting down the whole thing, identifying problems and resolving them, one at a time, was one of the key success factors in keeping the assembly line running smoothly. The same principle applies to workflows in Infrastructure as Code. If the main branch is flaky, it absolutely has to be tackled first, and the root cause identified and resolved.

The above problem can not occur when you apply-after-merge . The code that gets merged gets applied, and you have a nice, orderly sequence of events.

Summary

In this article, we learned about the different workflows available in Terraform and OpenTofu. The key takeaway of this article is that simplicity is a key driver of reliability and stability, which is why teams should focus on adopting theapply-after-merge for managing their IaC deployments.

Thank you if you stayed with us until the end. If you are interested in learning how to set up apply-after-merge pipelines using general-purpose CI/CD systems, we recommend you look at our Terraform and OpenTofu CI/CD Blueprints, which are a collection of pre-configured and production-grade workflows to adopt the apply-after-merge workflow in GitHub Actions and GitLab CI/CD. Each blueprint comes with pre-configured pipelines for preview, deployment and drift detection pipelines and helps you to get your Terraform and OpenTofu automation up and running in no time!

Also, we’d love to learn what workflow you are using and why. Feel free to join our Discord Community and share details about your setup.