Octoverse report, showing a 36% year-over-year growth.
With growing adoption, teams face the same questions over and over again:
“Shall we run terraform apply
after merging a Pull Request back to main or before?”
In this article, we will examine the two different approaches and answer the ultimate question of life.
CI/CD for Infrastructure as Code tools such as Terraform and OpenTofu are different from pipelines used for applications because Terraform heavily relies on state, which describes the result of the latest deployment triggered with terraform apply
.
Think of the state in Terraform and OpenTofu as the single source of truth that contains all information, such as configuration and metadata, but also sensitive values, such as secrets for your cloud infrastructure.
Teams usually use remote backends such as AWS S3 or Google Cloud Storage to version, secure and encrypt state files.
Unlike other CI pipelines where each pipeline can run independently and in parallel, CI pipelines for Terraform can read and modify the state. This makes them unsuitable for parallelism when working with a single-state file.
Here are some facts about the state in Terraform and its impact:
- The primary purpose of Terraform’s state is to store bindings between objects in a remote system and resource instances declared in your configuration.
- Because of that, Terraform’s state acts as a database to map Terraform configurations to the real world (deployed cloud infrastructure).
- Terraform state also tracks metadata such as resource dependencies.
- Terraform uses remote locking to prevent multiple users from accidentally running Terraform on the same configuration and state simultaneously. This ensures that each Terraform run begins with the most recent updated state.
- You can only invoke one terraform apply
per state at a given time. By default, each apply
locks the state exclusively for the current run, making it unsuitable for parallel deployments.
The stateful nature of Terraform makes every run of terraform apply
potentially dependent on any other run. This means that, unlike traditional CI where every pipeline can run independently and in parallel, Terraform pipelines typically run sequentially and are thus blocking (you can only run one workflow that invokes terraform apply
at a time).
Now that we know Terraform pipelines must run sequentially let’s examine the two most common workflows.
This workflow ensures the clarity that the main branch always reflects the desired state of your infrastructure. The aim is to reach a 1:1 mapping between this desired state and the live state of the deployed resources.
terraform fmt
, terraform validate
, terraform test
), runs tests and third-party tooling. It also generates a Plan with terraform plan
for each commit visible to reviewers.The beauty of the apply-after-merge
workflow lies in its simplicity. No additional infrastructure or tooling is required to configure this workflow in most general-purpose CI/CD providers such as GitHub Actions, GitLab CI/CD, BitBucket Pipelines, etc.
The biggest drawback of the apply-after-merge workflow is that most errors in Terraform are not visible during the Plan phase. For example, you often encounter issues such as permission problems, misconfigurations, or edge cases like hitting quotas when running an apply that cannot be caught during the Plan phase.
The reason for that is that many providers don’t have the ability to in-depth validate changes prior to applying them. For example, the AWS provider is mostly capable of validating whether the configuration for an S3 bucket is correct before trying to apply it; however, the provider can’t be aware of any policies configured in your AWS account that prevent the creation of public buckets. If such a policy is in place, an apply will fail.
This often gives teams the false belief that a Pull Request that appears ready to merge will deploy without any issues, only to be unpleasantly surprised after hitting the merge button.
These possibilities make the apply phase an unpredictable part of the workflow. For example, when an apply fails, the only way to fix it is to re-run the pipeline in case of networking issues such as a timeout or to open new Pull Requests to fix misconfiguration. This sends developers through the entire loop again, which can cost time and hit a developer's nerves.
Flaky applies is the main reason why some teams are adopting the apply-before-merge
workflow, allowing them to apply changes inside a Pull Request by using a comment or any other trigger before merging changes back to the main branch.
This deals with failures in the PR if required and then merges the new infra code in the default branch. It promises to keep the main branch clean without the clutter of iterative steps.
terraform fmt
, terraform validate
, terraform test
), runs tests and third-party tooling. It also generates a Plan with terraform plan
for each commit visible to reviewers.terraform apply
using the previously generated plan file.terraform apply
ran successfully, the PR gets merged into the main branch.While apply-before-merge
promises to overcome the problem of flaky applies it comes with added complexity and costs. This means that whenever you want to apply changes from inside a Pull Request, you will face a whole new set of problems to solve:
terraform plan
and terraform apply
on events such as a comment inside the PR?terraform apply
to certain individuals and teams only?Open source and commercial vendors such as Atlantis, Digger and Terrateam exist to provide tooling to overcome those challenges with features such as PR-level locks, autoplanning and automerging.
While it's great that tools exist to solve problems introduced by complex workflows, it remains debatable if you should consider the apply-before-merge
workflow at all.
So now that we have learned about the differences the apply-after-merge
and apply-before-merge
workflows, which one should you use? Take a look at the following comparison to understand the potential implications based on your environment in detail:
Based on the above-mentioned facts, we think that most teams should be using the apply-after-merge
workflow as it is simple to adopt and manage, has less potential for conflicts, doesn’t require additional tooling and ensures that the main branch always acts as a single source of truth!
apply-before-merge
tends to be more problematic with monolithic and large state files. For example, if you use apply-before-merge
, a single Pull Request introducing changes to a single Terraform state will lock down all other Pull Requests using a concept called PR-level locks, to avoid conflicts. Thus teams can only ever work at one Pull Request at a time whenever working with a monolithic state file.
Unless you split up the state into multiple smaller state files, a workflow such as apply-before-merge
has a high chance of blocking your engineering velocity.
In fact, splitting up state is considered a best practice for limiting blast radius, speeding up CI/CD runtimes and enabling better collaboration when working with large IaC code bases.
At Terramate, we work closely with our customers to determine the best workflows for their needs. Collevtively, our team has reviewed more than 5000 Terraform deployments, and the trend is clear: apply-before-merge
is more likely to create issues and slow down teams in the long run, not to mention the added complexity of needing additional tools.
With the apply-before-merge strategy, changes may apply successfully in multiple passes within a Pull Request, which gives a false sense of stability. At any point, code in the main
branch that seemed stable can break due to external data sources like SSM parameters or less strict version pinning. Therefore, regardless of how you merge, identifying failures (e.g., via drift detection) is critical. By implementing apply-after-merge
, one drastically reduces the likelihood of introducing cycles.
To sum up this article, here’s an anecdote about the Toyota Production System “andon cord.” In the nineties, Toyota had the most productive car factories in the world. Car executives from other car manufacturers visited the production facilities and could not believe that every person working on the assembly line could shut down the whole thing—what a waste of productivity.
But in reality, shutting down the whole thing, identifying problems and resolving them, one at a time, was one of the key success factors in keeping the assembly line running smoothly. The same principle applies to workflows in Infrastructure as Code. If the main branch is flaky, it absolutely has to be tackled first, and the root cause identified and resolved.
The above problem can not occur when you apply-after-merge
. The code that gets merged gets applied, and you have a nice, orderly sequence of events.
In this article, we learned about the different workflows available in Terraform and OpenTofu. The key takeaway of this article is that simplicity is a key driver of reliability and stability, which is why teams should focus on adopting theapply-after-merge
for managing their IaC deployments.
Thank you if you stayed with us until the end. If you are interested in learning how to set up apply-after-merge
pipelines using general-purpose CI/CD systems, we recommend you look at our Terraform and OpenTofu CI/CD Blueprints, which are a collection of pre-configured and production-grade workflows to adopt the apply-after-merge
workflow in GitHub Actions and GitLab CI/CD. Each blueprint comes with pre-configured pipelines for preview, deployment and drift detection pipelines and helps you to get your Terraform and OpenTofu automation up and running in no time!
Also, we’d love to learn what workflow you are using and why. Feel free to join our Discord Community and share details about your setup.