Terraform Skeleton Part 1: Organizing Terragrunt
Terraform is my go-to infrastructure definition tool. I love that it enables declarative management of different Cloud, PaaS, and SaaS platforms with its unified HCL language and provider model. At times, however, I’ve wished it was easier to start a project off on the right foot.
When I was getting started, it wasn’t obvious to me how to organize a terraform project in a way that makes “the easy things easy and the hard things possible” and would scale well as a team grows. Terragrunt helps a great deal, but there are many configuration options, adding to the organizational complexity.
This blog series takes lessons learned from using terraform and terragrunt in production across multiple teams and, over many posts, condenses those lessons into a walking skeleton repo to serve as a starting point for teams working with terraform and terragrunt. I’ve used this skeleton’s approach for a moderately complex suite of applications spanning multiple clouds and multiple Kubernetes clusters. I’m also using it with a smaller team running a much simpler monolith application. I believe it is a solid starting point, worth sharing with others for consideration.
I’m using semantic versioning to track changes to the skeleton repo. This post covers major-minor version 1.0. The code this post arrives at is available on the release branch for that version here.
Goals
- Version control all infrastructure as code
- Provide a basic safety net of explicit tool versions and pre-commit checks
- Re-use the same infrastructure definition files across multiple environments
- Have multiple levels of blast radiuses; allow the engineer to easily apply a small stack of infrastructure or an entire environment
- Focus on manual infrastructure deployment for now. Continuous Integration will follow. Continuous Delivery will be case-by-case.1
Setup
You’ll need the following installed on your workstation:
- git
- tfenv for managing terraform versions2
- tgenv for managing terragrunt versions
- pre-commit for running syntax, semantic, and style checks on
git commit
The ‘infra’ repo
I currently start teams off with a monorepo structure for terraform infrastructure. I admit that having one or more separate modules repositories, as Terraform Up & Running advises, is a good destination, but my teams have found it easier to get started making changes within a single repository. The organization that follows works both with mono and distributed repositories, however. Teams I support have made the jump from monorepo to distributed, and it was not painful.3
Our team calls our monorepo infra
, and that’s the name I will use here.
Step 1: Boilerplate
After installing the tools listed above and creating a new infra
repository:
-
Create a CHANGELOG.md
I like the style of keepachangelog.com.
-
Add a .gitignore
I use gitignore.io as a starting point.
-
Set tool versions
For instance, if using terraform 0.13.5 and terragrunt 0.26.4 set the versions with:
echo "0.13.5" > .terraform-version echo "0.26.4" > .terragrunt-version
Then install the versions with:
tfenv install tgenv install
Commit and update your changelog accordingly e.g.,
## [Unreleased] ### Added - terraform version set to 0.13.5 - terragrunt version set to 0.26.4
-
Install pre-commit hooks
These hooks provide a safety net that gets executed on each
git commit
. I’d start with at least the following:4- terraform_fmt hook formats *.tf files using the terraform fmt command
- terraform_validate similarly runs terraform validate
- terragrunt-hclfmt runs terragrunt hclfmt on terragrunt.hcl files
You specify these in a.pre-commit-config.yaml
file. Here’s what it looks like.Next, run
pre-commit install
to install the hooks.
Step 2: Directory Structure
Covering the directory structure introduces a lot of terms. I start with two top-level directories:
deployments/
modules/
Deployments
deployments are instantiations of infrastructure. Directories under deployments contain terragrunt.hcl
files. You run terragrunt
commands inside deployment directories.5
Deployments are organized in a directory tree. The first level in the tree is the infrastructure tiers. Small projects may just have a single tier of infrastructure. For larger projects, there are often multiple tiers relying or building on each other. Sometimes these tiers are managed by different teams. For instance, my team has a foundation
tier containing shared services like our docker image registry and CI/CD tooling, and a service
tier containing application infrastructure:
deployments/
foundation/
service/
Each tier consists of one or more instantiations called environments. For instance: development, staging, and production.
deployments/
foundation/
dev/
stage/
prod/
service/
dev/
stage/
prod/
Underneath each environment, you have stacks and/or layers of stacks that make up that tier-environment. Layers are optional directories grouping stacks together. Stacks are bundles of infrastructure that are managed as a unit using terragrunt commands.
For instance, our foundation tier might consist of a network
and k8s
stack, and a jenkins
stack underneath an apps
layer:
deployments/
foundation/
dev/
network/
terragrunt.hcl
k8s/
terragrunt.hcl
apps/
jenkins/
terragrunt.hcl
root.hcl
Bringing it all together, each deployment
is an instantiation of a stack for a tier-environment. For example, the jenkins stack for the foundation tier’s dev environment, which my team would summarize as foundation-dev-jenkins. To manage foundation-dev-jenkins, you run terragrunt
commands from the deployments/foundation/dev/apps/jenkins
directory.
Deployment directories do not contain the infrastructure definition (.tf) files themselves, however. Instead, each deployment references a stack underneath modules/stacks/
. This allows us to have a single stack definition deployed multiple times - across multiple environments for example.
Modules
Expanding modules I have:
deployments/
modules/
components/
stacks/
stacks are root terraform modules and group together infrastructure that is managed as a unit. Stacks are defined as directories under modules/stacks/
, organized by tier, and contain the *.tf
files defining the stack’s infrastructure. Using our foundation example from above, modules/stacks
looks like this:
modules/
stacks/
foundation/
network/
...
k8s/
...
jenkins/
main.tf
outputs.tf
providers.tf
variables.tf
Stacks may contain terraform provider resources directly (e.g., aws_s3_bucket
), but often will contain child terraform modules, which I call components.
Any components specific to this infra
repository reside in directories under modules/components
, and those directories similarly contain *.tf
files defining the infrastructure belonging to the component. For example:
modules/
components/
some-component/
main.tf
variables.tf
outputs.tf
another-component/
...
Summarized, each deployment references a stack, and each stack may instantiate any number of components.
You probably noticed the terragrunt.hcl
and root.hcl
files in the example above. Let’s talk about those next.
Step 3: HCL files
Each deployment contains a terragrunt.hcl
file, marking it as a stack to be deployed with terragrunt. They are usually extremely simple, only containing an include to the root.hcl
file.6
# terragrunt.hcl
include {
path = find_in_parent_folders("root.hcl")
}
The root.hcl
file contains all the terragrunt configuration. A simple starting point is:
# root.hcl
locals {
root_deployments_dir = get_parent_terragrunt_dir()
relative_deployment_path = path_relative_to_include()
deployment_path_components = compact(split("/", local.relative_deployment_path))
tier = local.deployment_path_components[0]
stack = reverse(local.deployment_path_components)[0]
}
# Default the stack each deployment deploys based on its directory structure
# Can be overridden by redefining this block in a child terragrunt.hcl
terraform {
source = "${local.root_deployments_dir}/../modules/stacks/${local.tier}/${local.stack}"
}
This makes it so each deployment directory, by default, deploys the stack under modules/stacks/<tier>
with the same name as the deployment directory. For instance, deployments/foundation/dev/apps/jenkins/
deploys modules/stacks/foundation/jenkins
.7 I haven’t decided if I like making the layer part of that path or not, so thus far have left it out.
With an understanding of the tools, directory structure, and HCL files, we can put together a minimal skeleton repo.
Step 4: The Skeleton
Here’s a minimal skeleton I would start with, not including boilerplate files:
deployments/
app/
dev/
test-stack/
terragrunt.hcl
stage/
test-stack/
terragrunt.hcl
prod/
test-stack/
terragrunt.hcl
root.hcl
modules/
components/
stacks/
app/
test-stack/
main.tf
outputs.tf
providers.tf
Our basic test-stack could use the random provider to create a single resource and output:
# main.tf
resource "random_pet" "pet" {
}
# outputs.tf
output "pet" {
value = random_pet.pet.id
}
# providers.tf
terraform {
required_providers {
random = {
source = "hashicorp/random"
version = "~> 3.0.0"
}
}
}
With this in place, we can run terragrunt plan
and terragrunt apply
from any directory under deployments/
. We can also run plan-all and apply-all from:
- The root of the repository to affect all tiers and all environments
- Any directory within deployments to affect everything underneath that directory
We can also use –terragrunt-exclude-dir and –terragrunt-include-dir to target *-all
commands. For example:
terragrunt apply-all --terragrunt-exclude-dir deployments/*/prod/**
would apply all non-production environments across all tiers.
What’s next?
This skeleton is quite minimal. We haven’t covered variables, backends, RBAC, external modules, build tooling, continuous integration, and so on.
The next post will dive deeper into what we can do with root.hcl
, including introducing a variable loading hierarchy that makes our deployments easier to work with.
Footnotes
-
In short, CI/CD with terraform is tricky. The authors of terragrunt discuss some of the challenges here. I’m a proponent of pushing CI/CD for infrastructure, but doing so carefully and expecting times of manual intervention. Some things are difficult to automate out of the box - state migrations (e.g.,
terraform state mv
) being one example. I’ve trended towards starting teams off with CI and manual applies of infrastructure and working towards CD as the team becomes more comfortable. More on CI/CD in future posts. ↩ -
Terraform stores its state in state files and those state files contain the version of terraform last used. The version specified in the state file is auto-bumped whenever a newer version of terraform touches that state file. Once that happens, you may force anyone working on that infrastructure, or relying on outputs from your state file via terraform_remote_state data sources to upgrade. The second case is the most damaging because it can affect teams outside of your own, depending on who’s using your outputs. terraform 0.14 addresses this, making state files backward and forward compatible across terraform versions as much as possible, but anyone using older versions of terraform should be particularly careful with terraform versioning. ↩
-
We’ve since split our infrastructure across multiple repositories, but only after demonstrated needs: distributed ownership, differing change/release rates, and re-use across teams. Much like starting with a monolith before evolving to microservices, starting with an infra monorepo avoids bloat slowing your team unnecessarily. ↩
-
There are lots of other tools/hooks that are worth looking at that I haven’t had time to yet. Some on my list to explore include:
-
If you’ve read Terraform Up & Running, my
deployments/
directory equates to Yevgeniy’slive/
directory. My team just found the name deployments to be more understandable. ↩ -
I use the name
root.hcl
instead ofterragrunt.hcl
because the latter causes errors runningplan-all
andapply-all
from the root ordeployments/
directory. Terragrunt seems to treat the parent hcl file as a stack to be deployed and errors out. I tried terragrunt’s skip option but to no avail, at least as of version 0.26.2 ↩ -
Terragrunt glues the deployment to the stack definition by copying everything in the deployment directory and everything in the referenced stack directory (containing the
*.tf
files) into a.terragrunt-cache
directory and then executing commands from that directory. By default, the.terragrunt-cache
directory lives in your deployment directory. ↩