Part 5 moved Terraform’s operational infrastructure into a CloudFormation stack so that Terragrunt no longer manages the state bucket, lock table, and log bucket. Doing so offers us the opportunity to protect these resources in ways not supported by Terragrunt. We will capitalize on that opportunity today.

We’ll cover adding a bucket policy to our CloudFormation template to restrict access to Terraform’s state, leveraging the backend role and Terraformer principal tag introduced in Part 4. I hope my example will ease your navigation of IAM policy syntax and save you the pain of decrypting unhelpful IAM error messages.

We’ll also add some basic protections to the log bucket that Terragrunt fails to implement out of the box.

Goals

  • Grant only authorized principals access to Terraform state
  • Protect the log bucket in a similar fashion to the state bucket
  • Remove unnecessary permissions from the backend role

If you prefer to jump to the end, the code implementing this post’s final result is available on branch release/1.5 on GitHub. Additionally, you can view the diffs from part 5, if that’s more your speed.

Access Roles

There are three types of authorized access to Terraform state we will cover today.

First is our backend role, which the Terragrunt configures Terraform to use for all state operations via the root.hcl from Part 4, reprinted here:

remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite"
  }
  config = {
    bucket   = "terraform-skeleton-state"
    region   = "us-east-1"
    encrypt  = true
    role_arn = "arn:aws:iam::${get_aws_account_id()}:role/terraform/TerraformBackend"

    key = "${dirname(local.relative_deployment_path)}/${local.stack}.tfstate"

    dynamodb_table            = "terraform-skeleton-state-locks"
    accesslogging_bucket_name = "terraform-skeleton-state-logs"
  }
}

Second are humans, who may wish to interact with the state for any number of reasons. One example is changing the name of an existing stack (which in our setup will change the name of the state file).

I like to differentiate between developer-level and administrative-level access to Terraform state, the primary difference being that only administrators should be allowed to delete state files permanently. Deleting a state file is not something to take lightly. If you delete one you still need by accident, and you have no backups, your only choice will be to reconstruct it by hand.

The policy we implement today will grant:

  • Full access to administrative IAM users, including the AWS account’s root user
  • Limited access to developer IAM users and the backend role
  • Deny access to anyone else

State Bucket Policy

We will use an S3 bucket policy to implement the above restrictions. An S3 bucket policy is an ideal choice for two reasons. First, it directly applies permissions to the resource we want to protect (the state bucket). Second, developers can likely create it themselves, unlike adding permissions to IAM users, something typically controlled by an enterprise team.

We’ll add the bucket policy as a resource to our init-admin-account.cf.yml CloudFormation template. We’ll break the policy down statement by statement.

The first statement requires TLS encryption for any requests accessing Terraform state. This policy Terragrunt creates for you; we preserve it here.

  TerraformStateBucketPolicy:
    Type: 'AWS::S3::BucketPolicy'
    DeletionPolicy: Retain
    UpdateReplacePolicy: Retain
    Properties:
      Bucket: !Ref TerraformStateBucket
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Sid: 'AllowTLSRequestsOnly'
            Principal: '*'
            Condition:
              Bool:
                'aws:SecureTransport': false
            Effect: Deny
            Action: '*'
            Resource:
              - !GetAtt "TerraformStateBucket.Arn"
              - !Sub
                - "${Bucket}/*"
                - Bucket: !GetAtt "TerraformStateBucket.Arn"

The second statement uses the aws:PrincipalType and aws:PrincipalTag condition keys to deny access to any IAM users lacking the Terraformer principal tag we introduced in Part 4:

          - Sid: DenyNonTerraformerUsers
            Principal: "*"
            Condition:
              StringEquals:
                aws:PrincipalType: User
              StringNotLike:
                'aws:PrincipalTag/Terraformer': '*'
            Effect: Deny
            Action: '*'
            Resource:
              - !GetAtt "TerraformStateBucket.Arn"
              - !Sub
                - "${Bucket}/*"
                - Bucket: !GetAtt "TerraformStateBucket.Arn"

The third statement begins differentiating between administrative and non-administrative access to the state. We do so again using the Terraformer tag, this time inspecting its value.

          - Sid: RestrictTerraformNonAdmins
            Principal: "*"
            Condition:
              StringEquals:
                aws:PrincipalType: User
              StringLike:
                'aws:PrincipalTag/Terraformer': '*'
              StringNotEquals:
                'aws:PrincipalTag/Terraformer': 'Admin'
            Effect: Deny
            NotAction:
              - 's3:List*'
              - 's3:Get*'
              - 's3:Describe*'
              - 's3:PutObject'
              - 's3:DeleteObject'
            Resource:
              - !GetAtt "TerraformStateBucket.Arn"
              - !Sub
                - "${Bucket}/*"
                - Bucket: !GetAtt "TerraformStateBucket.Arn"

If the IAM user has the Terraformer tag, but its value is not Admin, we grant non-administrative access to that user. We use IAM’s NotAction to whitelist the permitted actions.

Notably, non-administrative access permits s3:DeleteObject but not s3:DeleteObjectVersion. Since our state bucket is versioned (see Part 5), granting s3:DeleteObject is not inherently dangerous because all it does is add a delete marker to the object; you can always restore the version before the delete marker. Granting developers the ability to add delete markers aids state migration, so we do so here.

The fourth statement denies access to all IAM roles other than our backend role:

          - Sid: DenyNonBackendRoles
            Principal: "*"
            Condition:
              StringEquals:
                aws:PrincipalType: AssumedRole
              StringNotLike:
                aws:userId:
                  - !Sub
                    - "${TerraformBackendRoleId}:*"
                    - TerraformBackendRoleId: !GetAtt "TerraformBackendRole.RoleId"
            Effect: Deny
            Action: '*'
            Resource:
              - !GetAtt "TerraformStateBucket.Arn"
              - !Sub
                - "${Bucket}/*"
                - Bucket: !GetAtt "TerraformStateBucket.Arn"

I did not find the syntax intuitive for restricting access to a specific IAM role. Specifying the role ARN in the Principal property does not work. This AWS post explains why and demonstrates using the StringNotLike and aws:userId combination I use here.

The fifth statement grants the backend role access:

          - Sid: ResrictBackendRoleToReadWrite
            Principal: "*"
            Condition:
              StringEquals:
                aws:PrincipalType: AssumedRole
              StringLike:
                aws:userId:
                  - !Sub
                    - "${TerraformBackendRoleId}:*"
                    - TerraformBackendRoleId: !GetAtt "TerraformBackendRole.RoleId"
            Effect: Deny
            NotAction:
              - 's3:ListBucket'
              - 's3:GetBucketVersioning'
              - 's3:GetObject'
              - 's3:PutObject'
            Resource:
              - !GetAtt "TerraformStateBucket.Arn"
              - !Sub
                - "${Bucket}/*"
                - Bucket: !GetAtt "TerraformStateBucket.Arn"

And our final statement denies access to any other principal types (e.g., FederatedUsers), as we’re not considering those as part of this skeleton.

          - Sid: DenyAllOtherPrincipals
            Principal: "*"
            Condition:
              StringNotEquals:
                aws:PrincipalType:
                  - AssumedRole
                  - Account
                  - User
            Effect: Deny
            Action: '*'
            Resource:
              - !GetAtt "TerraformStateBucket.Arn"
              - !Sub
                - "${Bucket}/*"
                - Bucket: !GetAtt "TerraformStateBucket.Arn"

A summary of all changes is available by viewing the diffs from part 5.

There are three last items to note about this policy.

First, the bucket policy does not contain any permissions for users who have the Terraformer tag set to Admin. The lack of permissions means such users will have whatever access the IAM policy attached to their IAM user grants, presumably full access to S3.

Second, the bucket policy does not explicitly grant the AWS account’s root user access. AWS always allows the root user access to remove or modify bucket policies of buckets owned by that root user’s account, making it unnecessary to specify here.

Finally, none of the permissions in the policy grant either adding or modifying the bucket policy, which means that aside from the root user, only Admin Terraformers can do so, assuming those admins have the requisite permissions on their IAM user).

Applying the State Bucket Policy

We’re now ready to deploy the bucket policy.

First, change the Terraformer tag on your IAM user to have a value of Admin.

aws iam tag-user \
  --user-name ${IAM_USER} \
  --tags '{
    "Key": "Terraformer",
    "Value": "Admin"
  }'

Second, if Terragrunt created the state bucket for you, it may already have a bucket policy attached to it. Delete it using the following CLI command, replacing the bucket name as appropriate:

aws s3api delete-bucket-policy --bucket terraform-skeleton-state

Deploy the updated CloudFormation template using the init-admin Makefile target we added in Part 4.1

make init-admin
aws cloudformation deploy \
		--template-file init/admin/init-admin-account.cf.yml \
		--stack-name tf-admin-init \
		--capabilities CAPABILITY_NAMED_IAM \
		--parameter-overrides \
			AdminAccountId=<omitted> \
			StateBucketName=terraform-skeleton-state \
			StateLogBucketName=terraform-skeleton-state-logs \
			LockTableName=terraform-skeleton-state-locks

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - tf-admin-init
aws cloudformation update-termination-protection \
		--stack-name tf-admin-init \
		--enable-termination-protection
{
    "StackId": "arn:aws:cloudformation:us-east-1:<omitted>:stack/tf-admin-init/8704b070-5f61-11eb-9ff1-0eea077046db"
}

Let’s see if it works as expected.

Testing the State Bucket Policy

First, we can quickly verify the backend role has the access it requires by running terragrunt apply-all.

Verifying user-level access requires manipulating the Terraformer tag. Ideally, we’d have these tests automated and run as part of a CI pipeline. Perhaps we’ll get to that in another post.

Remove the Terraformer tag from your user altogether:

aws iam untag-user --user-name ${IAM_USER} --tag-keys Terraformer

and verify you can’t even list the state bucket now:

aws s3 ls s3://terraform-skeleton-state

An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

Next, set the Terraformer tag on your IAM user to something other than Admin. For instance:

aws iam tag-user \
  --user-name ${IAM_USER} \
  --tags '{
    "Key": "Terraformer",
    "Value": "User"
  }'

Then verify you cannot update the state bucket’s policy:

aws s3api put-bucket-policy --bucket terraform-skeleton-state --policy ""

An error occurred (AccessDenied) when calling the PutBucketPolicy operation: Access Denied

or delete it:

aws s3api delete-bucket-policy --bucket terraform-skeleton-state

An error occurred (AccessDenied) when calling the DeleteBucketPolicy operation: Access Denied

Also, verify you can delete a state object:

aws s3 rm s3://terraform-skeleton-state/app/dev/test-stack.tfstate
delete: s3://terraform-skeleton-state/app/dev/test-stack.tfstate

but can’t delete an object version:

DELETE_MARKER_VERSION=$(aws s3api list-object-versions \
  --bucket terraform-skeleton-state \
  --prefix app/dev/test-stack.tfstate \
  --query 'DeleteMarkers[?IsLatest==`true`].VersionId' | jq -r '.[0]' \
)

aws s3api delete-object \
  --bucket terraform-skeleton-state \
  --key app/dev/test-stack.tfstate \
  --version-id ${DELETE_MARKER_VERSION}

An error occurred (AccessDenied) when calling the DeleteObject operation: Access Denied

The inability to remove the delete marker does introduce a hurdle users will have to clear to restore state files, but restoration is still possible. See the footnotes for more information.2

Change the Terraformer tag to Admin again:

aws iam tag-user \
  --user-name ${IAM_USER} \
  --tags '{
    "Key": "Terraformer",
    "Value": "Admin"
  }'

and delete the delete marker again, verifying it works this time.

aws s3api delete-object \
  --bucket terraform-skeleton-state \
  --key app/dev/test-stack.tfstate \
  --version-id ${DELETE_MARKER_VERSION}

{
    "DeleteMarker": true,
    "VersionId": "..."
}

That about covers it. We now have a bucket policy restricting access to Terraform’s state to only authorized principals.

Cleanup

With the bucket policy in place, let’s turn our attention to a couple of other hardening items we can tackle in the CloudFormation template: protecting the logs bucket and removing unnecessary permissions from the backend role.

Protecting the Logs Bucket

As discussed in Part 3, when Terragrunt creates the log bucket for you, it does not enable encryption or explicitly block public access. We can rectify both of those issues now that the bucket is under CloudFormation’s control.

Using our TerraformStateBucket resource as a template, add BucketEncryption and PublicAccessBlockConfiguration properties to the log bucket:

  TerraformStateLogBucket:
    Type: 'AWS::S3::Bucket'
    DeletionPolicy: Retain
    UpdateReplacePolicy: Retain
    Properties:
      BucketName: !Ref StateLogBucketName
      AccessControl: LogDeliveryWrite
      BucketEncryption:
        ServerSideEncryptionConfiguration:
          - ServerSideEncryptionByDefault:
              SSEAlgorithm: aws:kms
      PublicAccessBlockConfiguration:
        BlockPublicAcls: True
        BlockPublicPolicy: True
        IgnorePublicAcls: True
        RestrictPublicBuckets: True

Run make init-admin to deploy.

Backend Role Permissions

When we created the backend role in Part 4, we granted it permissions to create S3 buckets and DynamoDB tables because Terragrunt managed our state bucket and lock table. We can remove those permissions now that CloudFormation deploys both:

diff --git a/init/admin/init-admin-account.cf.yml b/init/admin/init-admin-account.cf.yml
index b8dcda0..3aed8e5 100644
--- a/init/admin/init-admin-account.cf.yml
+++ b/init/admin/init-admin-account.cf.yml
@@ -122,27 +122,6 @@ Resources:
               - 'dynamodb:PutItem'
               - 'dynamodb:DeleteItem'
             Resource: !Sub "arn:aws:dynamodb:${AWS::Region}:${AWS::AccountId}:table/${LockTableName}"
-          - Sid: AllowStateBucketCreation
-            Effect: Allow
-            Action:
-              - 's3:GetBucketAcl'
-              - 's3:GetBucketLogging'
-              - 's3:CreateBucket'
-              - 's3:PutBucketPublicAccessBlock'
-              - 's3:PutBucketTagging'
-              - 's3:PutBucketPolicy'
-              - 's3:PutBucketVersioning'
-              - 's3:PutEncryptionConfiguration'
-              - 's3:PutBucketAcl'
-              - 's3:PutBucketLogging'
-            Resource:
-              - !Sub "arn:aws:s3:::${StateBucketName}"
-              - !Sub "arn:aws:s3:::${StateLogBucketName}"
-          - Sid: AllowLockTableCreation
-            Effect: Allow
-            Action:
-              - 'dynamodb:CreateTable'
-            Resource: !Sub "arn:aws:dynamodb:${AWS::Region}:${AWS::AccountId}:table/${LockTableName}"

Run a terragrunt apply-all to verify all’s well.

What’s Next?

In this post, we’ve significantly enhanced the protections surrounding the Terraform state. Looking back at these first six entries, we’ve covered a lot of ground, and I think it will be worthwhile to take a step back and summarize what we’ve done so far. After a recap, I’d like to start covering continuous integration with Terraform and Terragrunt. We’ll see where that takes us.

Footnotes

  1. I hope that you don’t encounter any errors deploying the bucket policy. If you do, AWS is often not much help diagnosing what went wrong. Here are some pointers.

    If make init-admin fails, it will likely say:

    Failed to create/update the stack. Run the following command to fetch the
    list of events leading up to the failure
    aws cloudformation describe-stack-events --stack-name tf-admin-init
    

    If you do so, you’ll get a JSON dump of events. Since you presumably failed when creating the bucket policy, look for an event with a ResourceStatus field set to CREATE_FAILED. Here’s an abbreviated example:

    {  
        "StackName": "tf-admin-init",
        "ResourceStatus": "CREATE_FAILED",
        "ResourceStatusReason": "Invalid policy syntax. (Service: Amazon S3; Status Code: 400; Error Code: MalformedPolicy; Request ID: 9EB1BD50BCF1FCB7; S3 Extended Request ID: hHkqY/snGYyhc4paSxhBT1IzpmgoWjKvz5I/JYiYUKu3PLSn1CWuAceLU7QEckf/omDhF4ZdeGU=; Proxy: null)",
        "ResourceProperties": "{\"Bucket\":\"terraform-skeleton-state\"..."
    }
    

    If you have an Invalid policy syntax error, I recommend removing statements until the policy works, then adding them back in one-by-one until you find the problem. 

  2. Since users cannot delete object versions, they cannot restore a state file by deleting its delete marker since it is itself an object version.

    As discussed in the AWS docs here and here, users can still restore state files by copying a previous version to become the latest. Here’s an example of how:

    Step 1: Find version to restore (e.g., using the LastModified field to limit the search area).

    aws s3api list-object-versions \
      --bucket terraform-skeleton-state \
      --key app/dev/test-stack.tfstate \
      --query 'Versions[?contains(LastModified, `'"2021-02-20"'`)]' \
      | jq '.[] | { Key, VersionId, LastModified }'
    
    {
      "Key": "app/stage/test-stack.tfstate",
      "VersionId": "lwMes1R1AfkEZ.lQ4U9d217yeU7rWbcj",
      "LastModified": "2021-02-20T13:03:50+00:00"
    }
    

    Step 2: Copy the desired object version to make it the new latest version, replacing ${VERSION_ID} as appropriate.

    BUCKET=terraform-skeleton-state
    PREFIX=app/dev/test-stack.tfstate
    VERSION=TgfUNdVuoZKwSOHF1QeGO_nB8iZGzE3f
    
    aws s3api copy-object \
      --copy-source "${BUCKET}/${PREFIX}?versionId=${VERSION}" \
      --key ${PREFIX} \
      --bucket ${BUCKET}
    
    {  
        ...
        "CopyObjectResult": {
            "ETag": "\"...\"",
            "LastModified": "2021-02-20T13:36:47+00:00"
        }
    }