Working with Terraform for over five years has taught me some key lessons. 5 practices have been critical to having a sane and usable Terraform setup regardless of the size of the team or the nature of the project.
This one might seem obvious, but I’ve seen it go wrong several times. When organising Terraform code, either standardising the directory structure or defining naming conventions, it’s vital to consider the intended audience. Will your team be using these Terraform scripts and modules? Are you handing the work over to another team? Will new people be joining your team sooner or later? Are you working on this project solo? Will you be using this setup in 6 months or a year, or will it be assigned to someone else?
Questions like these will affect several decisions. Ideally, you should have Remote State and State Locking in place regardless of the team size now or in the future. Remote State will ensure your laptop is not the only place your Terraform works and State Locking will ensure that only one person at a time is changing the infrastructure.
The naming convention should make sense to the eventual owners of the project, not just the team that is writing the code. If the project is for another team, make sure they have a say in the naming convention. If non-technical stakeholders or internal security/GCR teams review the code, make sure they check the naming convention. In addition to resource names, you should leverage resource tags to highlight any data classification/privacy requirements (high, medium, low) for more careful examination by reviewers.
The Terraform Registry provides a library of ready-to-use modules for the most common use-cases. I’ve written about the extensive parameterisation available in the VPC module and security groups. Simply calling modules with different parameters will be enough to handle most, if not all, potential use-cases. Reuse these shared modules as much as possible to avoid useless typing/testing/checking/fixing/refactoring.
I’ve also found that separating modules and resources based on the frequency of use or change is beneficial. For example, infrastructure scaffolding used only once belongs together, such as setting up the VPC, security groups, routing tables, VPC endpoints, etc. But things like private hosted zone entries, autoscaling groups, target groups, load balancers, etc., might change with every deployment, so separating these from the one-time scaffolding will make code reviews easier and debugging faster.
Terraform code often contains incorrect assumptions baked into it. Teams assume that the Terraform version used to write the code today will never change, or the external modules won’t change, or the providers they are using won’t change. These lead to invisible issues a few weeks down the road when these external dependencies inevitably get updated.
Ensure you are explicitly defining versions everywhere possible: in the main Terraform block, in the provider block, in the module block, etc. Defining versions will ensure that your dependent libraries stay frozen so that you can explicitly update dependencies when required after thorough discussions, reviews, and testing.
Leveraging automation at every stage of the deployment process can avoid future problems before they even arise.
Use Git pre-commit hooks to run
terraform fmt and
terraform validate before you commit your code. Pre-commit hooks ensure that code is, at a bare minimum, adequately formatted and syntactically correct. Check-in this pre-commit file to the repo, and everyone on your team can benefit from the same automation. This small but vital quality control at the first step of the process can achieve substantial time savings as your project progresses.
All modern deployment tools have CI processes. You can use these to run SAST and unit testing tools when pushing your code to origin. I’ve written about how Checkov can test Terraform code for security and compliance and create custom checks for organisation-specific conventions. Add these unit testing tools to your CI pipeline to improve code quality and robustness.
We all like to think that Terraform code is self-documenting. Sure it is, but only if your future team already knows your company’s naming conventions and guidelines and secret handshakes and inside jokes and whatever else your repo contains besides valid Terraform code. Getting into the habit of having a good
README.md can be a huge time saver, and it will keep your team honest by holding them accountable for everything explicitly committed to in the README.
At a minimum, your README should contain the steps to initialise the right Terraform environment on your workstations (Mac, Windows, Linux, etc.), including the Terraform version to install. It should specify the required dependencies (Checkov, TerraGrunt, etc.) with versions and any handy Linux aliases your team uses (some people like to define
tff as a short-hand for
terraform fmt). Most importantly, the branching and PR review strategy/process, the naming conventions, and the resource tagging standards should be specified.
The README should pass a simple test: if a new member joins your team tomorrow, is the README enough to teach them what to do and how to do it correctly? If not, you may find youself hosting never-ending standards and process meetings repeatedly for the next few months.