Development velocity relies upon a short feedback loop that allows the developer to immediately see the effects of changes, which is impossible with large-state projects. Moreover, most teams also have strict SLAs and the slowness of fixing things correctly through code means they will resort to “ClickOps” to resolve production issues, and the changes may never be reflected into the codebase. A good rule of thumb is that executing terraform plan
should never take significantly longer than making the change in code.
In a large-state project, it can be challenging for multiple team members to work simultaneously, as they might face conflicts and state-locking issues. It’s harder to isolate changes, leading to unintentional impacts on other parts of the infrastructure. Slow execution times are often compounded because the developer must wait for other applies to finish, leading to wasted time. Another common issue with large-state projects is that the developer runs a plan and notices that some resources they didn’t modify have planned changes, which begins a frustrating and time-consuming series of chat threads to try to identify who changed what, and whether it’s safe to continue with the apply.
Large-state projects are also dangerous for a variety of reasons. For one thing, the plan
output can often become unreadably large, even for trivial changes. "Flapping" resources (those planned changes that can safely be ignored, often due to provider bugs) can add to the noise. If developers aren't carefully reading the plan and are just trusting the changes, they made to the code are correct, then dangerous and unwanted changes can occur in production. Worse still, large-state projects allow no separation of responsibilities or principle of least privilege. As soon as someone is permitted to approve PRs and run an apply, they can potentially change things they have little understanding of, and that may be maintained by developers in other teams. Similarly, the permissions required for the CI/CD to run must, by necessity, be very broad with a large-state project because they are required to modify potentially anything. Finally, the larger the state, the larger the blast radius. Incorrect changes can be devastating and state corruption or conflicts can be more challenging to resolve.
Large-state projects can become complex and hard to manage as the number of resources and dependencies grows. The complexity can make it difficult to understand the entire infrastructure and can lead to errors.
Testing and debugging can be more challenging in a large-state project due to the sheer size and interconnectedness of the resources.
So, as you can see, there are a lot of disadvantages and challenges when grouping a large deployment into a single state file. Stay tuned for the next installment where we help you mitigate these issues. Also, if you liked this, and are interested in more like it, be sure to follow us on X (Formerly Twitter), and join our Discord channel!