Terraform is my go-to infrastructure definition tool. I love that it enables declarative management of different Cloud, PaaS, and SaaS platforms with its unified HCL language and provider model. At times, however, I’ve wished it was easier to start a project off on the right foot.

When I was getting started, it wasn’t obvious to me how to organize a terraform project in a way that makes “the easy things easy and the hard things possible” and would scale well as a team grows. Terragrunt helps a great deal, but there are many configuration options, adding to the organizational complexity.

This blog series takes lessons learned from using terraform and terragrunt in production across multiple teams and, over many posts, condenses those lessons into a walking skeleton repo to serve as a starting point for teams working with terraform and terragrunt. I’ve used this skeleton’s approach for a moderately complex suite of applications spanning multiple clouds and multiple Kubernetes clusters. I’m also using it with a smaller team running a much simpler monolith application. I believe it is a solid starting point, worth sharing with others for consideration.

I’m using semantic versioning to track changes to the skeleton repo. This post covers major-minor version 1.0. The code this post arrives at is available on the release branch for that version here.

Goals

  1. Version control all infrastructure as code
  2. Provide a basic safety net of explicit tool versions and pre-commit checks
  3. Re-use the same infrastructure definition files across multiple environments
  4. Have multiple levels of blast radiuses; allow the engineer to easily apply a small stack of infrastructure or an entire environment
  5. Focus on manual infrastructure deployment for now. Continuous Integration will follow. Continuous Delivery will be case-by-case.1

Setup

You’ll need the following installed on your workstation:

  • git
  • tfenv for managing terraform versions2
  • tgenv for managing terragrunt versions
  • pre-commit for running syntax, semantic, and style checks on git commit

The ‘infra’ repo

I currently start teams off with a monorepo structure for terraform infrastructure. I admit that having one or more separate modules repositories, as Terraform Up & Running advises, is a good destination, but my teams have found it easier to get started making changes within a single repository. The organization that follows works both with mono and distributed repositories, however. Teams I support have made the jump from monorepo to distributed, and it was not painful.3

Our team calls our monorepo infra, and that’s the name I will use here.

Step 1: Boilerplate

After installing the tools listed above and creating a new infra repository:

  1. Create a CHANGELOG.md

    I like the style of keepachangelog.com.

  2. Add a .gitignore

    I use gitignore.io as a starting point.

  3. Set tool versions

    For instance, if using terraform 0.13.5 and terragrunt 0.26.4 set the versions with:

     echo "0.13.5" > .terraform-version
     echo "0.26.4" > .terragrunt-version
    

    Then install the versions with:

     tfenv install
     tgenv install
    

    Commit and update your changelog accordingly e.g.,

     ## [Unreleased]
    
     ### Added
     - terraform version set to 0.13.5
     - terragrunt version set to 0.26.4
    
  4. Install pre-commit hooks

    These hooks provide a safety net that gets executed on each git commit. I’d start with at least the following:4


    You specify these in a .pre-commit-config.yaml file. Here’s what it looks like.

    Next, run pre-commit install to install the hooks.

Step 2: Directory Structure

Covering the directory structure introduces a lot of terms. I start with two top-level directories:

deployments/
modules/

Deployments

deployments are instantiations of infrastructure. Directories under deployments contain terragrunt.hcl files. You run terragrunt commands inside deployment directories.5

Deployments are organized in a directory tree. The first level in the tree is the infrastructure tiers. Small projects may just have a single tier of infrastructure. For larger projects, there are often multiple tiers relying or building on each other. Sometimes these tiers are managed by different teams. For instance, my team has a foundation tier containing shared services like our docker image registry and CI/CD tooling, and a service tier containing application infrastructure:

deployments/
    foundation/
    service/

Each tier consists of one or more instantiations called environments. For instance: development, staging, and production.

deployments/
    foundation/
        dev/
        stage/
        prod/
    service/
        dev/
        stage/
        prod/

Underneath each environment, you have stacks and/or layers of stacks that make up that tier-environment. Layers are optional directories grouping stacks together. Stacks are bundles of infrastructure that are managed as a unit using terragrunt commands.

For instance, our foundation tier might consist of a network and k8s stack, and a jenkins stack underneath an apps layer:

deployments/
    foundation/
        dev/
            network/
                terragrunt.hcl
            k8s/
                terragrunt.hcl
            apps/
                jenkins/
                    terragrunt.hcl
    root.hcl

Bringing it all together, each deployment is an instantiation of a stack for a tier-environment. For example, the jenkins stack for the foundation tier’s dev environment, which my team would summarize as foundation-dev-jenkins. To manage foundation-dev-jenkins, you run terragrunt commands from the deployments/foundation/dev/apps/jenkins directory.

Deployment directories do not contain the infrastructure definition (.tf) files themselves, however. Instead, each deployment references a stack underneath modules/stacks/. This allows us to have a single stack definition deployed multiple times - across multiple environments for example.

Modules

Expanding modules I have:

deployments/
modules/
    components/
    stacks/

stacks are root terraform modules and group together infrastructure that is managed as a unit. Stacks are defined as directories under modules/stacks/, organized by tier, and contain the *.tf files defining the stack’s infrastructure. Using our foundation example from above, modules/stacks looks like this:

modules/
    stacks/
        foundation/
            network/
                ...
            k8s/
                ...
            jenkins/
                main.tf
                outputs.tf
                providers.tf
                variables.tf

Stacks may contain terraform provider resources directly (e.g., aws_s3_bucket), but often will contain child terraform modules, which I call components.

Any components specific to this infra repository reside in directories under modules/components, and those directories similarly contain *.tf files defining the infrastructure belonging to the component. For example:

modules/
    components/
        some-component/
            main.tf
            variables.tf
            outputs.tf
        another-component/
            ...

Summarized, each deployment references a stack, and each stack may instantiate any number of components.

You probably noticed the terragrunt.hcl and root.hcl files in the example above. Let’s talk about those next.

Step 3: HCL files

Each deployment contains a terragrunt.hcl file, marking it as a stack to be deployed with terragrunt. They are usually extremely simple, only containing an include to the root.hcl file.6

# terragrunt.hcl
include {
  path = find_in_parent_folders("root.hcl")
}

The root.hcl file contains all the terragrunt configuration. A simple starting point is:

# root.hcl
locals {
  root_deployments_dir       = get_parent_terragrunt_dir()
  relative_deployment_path   = path_relative_to_include()
  deployment_path_components = compact(split("/", local.relative_deployment_path))

  tier  = local.deployment_path_components[0]
  stack = reverse(local.deployment_path_components)[0]
}

# Default the stack each deployment deploys based on its directory structure
# Can be overridden by redefining this block in a child terragrunt.hcl
terraform {
  source = "${local.root_deployments_dir}/../modules/stacks/${local.tier}/${local.stack}"
}

This makes it so each deployment directory, by default, deploys the stack under modules/stacks/<tier> with the same name as the deployment directory. For instance, deployments/foundation/dev/apps/jenkins/ deploys modules/stacks/foundation/jenkins.7 I haven’t decided if I like making the layer part of that path or not, so thus far have left it out.

With an understanding of the tools, directory structure, and HCL files, we can put together a minimal skeleton repo.

Step 4: The Skeleton

Here’s a minimal skeleton I would start with, not including boilerplate files:

deployments/
    app/
        dev/
            test-stack/
                terragrunt.hcl
        stage/
            test-stack/
                terragrunt.hcl
        prod/
            test-stack/
                terragrunt.hcl
    root.hcl
modules/
    components/
    stacks/
        app/
            test-stack/
                main.tf
                outputs.tf
                providers.tf

Our basic test-stack could use the random provider to create a single resource and output:

# main.tf
resource "random_pet" "pet" {
}
# outputs.tf
output "pet" {
  value = random_pet.pet.id
}
# providers.tf
terraform {
  required_providers {
    random = {
      source  = "hashicorp/random"
      version = "~> 3.0.0"
    }
  }
}

With this in place, we can run terragrunt plan and terragrunt apply from any directory under deployments/. We can also run plan-all and apply-all from:

  1. The root of the repository to affect all tiers and all environments
  2. Any directory within deployments to affect everything underneath that directory

We can also use –terragrunt-exclude-dir and –terragrunt-include-dir to target *-all commands. For example:

terragrunt apply-all --terragrunt-exclude-dir deployments/*/prod/**

would apply all non-production environments across all tiers.

What’s next?

This skeleton is quite minimal. We haven’t covered variables, backends, RBAC, external modules, build tooling, continuous integration, and so on.

The next post will dive deeper into what we can do with root.hcl, including introducing a variable loading hierarchy that makes our deployments easier to work with.

Footnotes

  1. In short, CI/CD with terraform is tricky. The authors of terragrunt discuss some of the challenges here. I’m a proponent of pushing CI/CD for infrastructure, but doing so carefully and expecting times of manual intervention. Some things are difficult to automate out of the box - state migrations (e.g., terraform state mv) being one example. I’ve trended towards starting teams off with CI and manual applies of infrastructure and working towards CD as the team becomes more comfortable. More on CI/CD in future posts. 

  2. Terraform stores its state in state files and those state files contain the version of terraform last used. The version specified in the state file is auto-bumped whenever a newer version of terraform touches that state file. Once that happens, you may force anyone working on that infrastructure, or relying on outputs from your state file via terraform_remote_state data sources to upgrade. The second case is the most damaging because it can affect teams outside of your own, depending on who’s using your outputs. terraform 0.14 addresses this, making state files backward and forward compatible across terraform versions as much as possible, but anyone using older versions of terraform should be particularly careful with terraform versioning. 

  3. We’ve since split our infrastructure across multiple repositories, but only after demonstrated needs: distributed ownership, differing change/release rates, and re-use across teams. Much like starting with a monolith before evolving to microservices, starting with an infra monorepo avoids bloat slowing your team unnecessarily. 

  4. There are lots of other tools/hooks that are worth looking at that I haven’t had time to yet. Some on my list to explore include:

  5. If you’ve read Terraform Up & Running, my deployments/ directory equates to Yevgeniy’s live/ directory. My team just found the name deployments to be more understandable. 

  6. I use the name root.hcl instead of terragrunt.hcl because the latter causes errors running plan-all and apply-all from the root or deployments/ directory. Terragrunt seems to treat the parent hcl file as a stack to be deployed and errors out. I tried terragrunt’s skip option but to no avail, at least as of version 0.26.2 

  7. Terragrunt glues the deployment to the stack definition by copying everything in the deployment directory and everything in the referenced stack directory (containing the *.tf files) into a .terragrunt-cache directory and then executing commands from that directory. By default, the .terragrunt-cache directory lives in your deployment directory.