The Lessons I Learnt in a Hard Way Building Terraform Modules
The principle of building good terraform modules is no different from building any good reusable code/modules in any other languages. There are some lessons that I learnt in the hard way and I’d like to share them here so you don’t have to.
Hope these tips can help you.
Start with the requirements and common scenarios first.
There is no way your module is going to cover EVERY scenario out there.
Add an layer of abstraction means there must be some extra limitation It’s unrealistic to expect a module to do everthing the vanilla terraform resources can.
Therefore, instead of finding ways to cover all hypothetical scenarios which will lead everyone to nowhere, you can:
- Gather the requirements from the developers.
Understand the common scenarios they will be using the modules for. - Aiming for 95% coverage of the scenarios.
You will have a much cleaner design, much smaller scope to work on and much nicer first release. - Add more functionality to the modules gradually.
In the good old agile way.
Use the public modules when possible.
It’s always tempting to re-invent the wheels, that’s why everyone knows re-inventing wheels is bad but we’re still doing it all the time.
After you get the requirements and understand the scenarios, it’s time to look if an existing module can serve you. Maybe all you need is figuring out how to use public modules and coming up with a set of good defaults.
That sounds less exciting, but much faster and cost-effective to do.
We created our own VPC module. Don’t do that, please use one of the thousand VPC modules out there.
Some popular modules:
Design before code.
It’s software engineering 101, but it’s so easy to be overlooked, especially when it comes to terraform. After all, it’s just a configuration language that people use when they are tired of click-ops, right?
It might be true. When terraform is used for a small project or even for a local module, it’s okay to go rouge a bit, no big deal.
When it comes to a remote terraform module that you’d like to share with multiple developer teams or even external customers, it becomes a proper software engineering problem.
We need to go the whole nine yards – clean code, separation of concerns, you name it.
Think about:
- how each module will work with each other, which resource should belong to which module.
- Defining the vars and outputs of the modules before implementation, as if you’re designing APIs.
We would have saved so much time if we invested in module design in the beginning.
Beware of future breaking changes and one-way doors.
We’re aiming for a neat, clean, nice first release that dazzles developers, but be aware of the extensibility of the modules.
There are certain decisions that’s hard to change later, certain doors are only one-way.
The definition for “breaking change” of the following discussion is: if a new release needs the user to change their terraform template, like adding
moved, removed
resources, changing variable name, etc., it is a breaking change.A minor version release is a version user can safely bump up the number and get a
No changes.
fromterraform plan
.
Future breaking changes:
- Variable/output name or type change.
Naming change sounds trivial, but it will requires users to update their terraform configuration, which makes it a BREAKING CHANGE.
i.e. From1 2 3 4
variable "security_group_id" { type = string description = "The ID of the security group to associate with the resource." }
To
1 2 3 4
variable "security_group_ids" { type = list(string) description = "The IDs of the security groups to associate with the resource." }
- Moving resources between modules. It requires a
moved
block for users to update.1 2 3 4
moved { from = module.ModuleA.resource.name to = module.ModuleB.resource.name }
One-way doors:
lifecycle
rules
You can’t use variable
in lifecycle
rules, which means you can’t make a feature flag to turn on/off the prevent_destroy
or ignore_changes
.
Only add lifecycle
to your module if you’re super confident and it’s absolutely needed.
There are also some decisions looking like a big deal, but actually are safe two-way doors:
Two-way door: hard-coded default value in resources.
If the developer is not likely to modify an attribute, it’s perfectly fine to make it a hard-coded value because it’s only a minor release to upgrade that to variable.
For example, say this is the release v1.0.0
1
2
3
4
resource "aws_resource" "example" {
attr_a = var.var_a
attr_b = "value_b"
}
We hard-coded attr_b
with a sane default value_b
. Later, we get a feature request to make the attr_b
a variable. It’s as simple as
1
2
3
4
5
6
7
8
9
variable "var_b" {
default = "value_b"
...
}
resource "aws_resource" "example" {
attr_a = var.var_a
attr_b = var.var_b
}
It’s a safe minor change. The default value of var_b
is still value_b
, so users can safely bump version from v1.0.0
to v1.0.1
and will get a No changes.
after upgrade.
During our implementation, we spent hours on debating if we could leave something as a hard-coded value, which will be an easy minor release in the future anyway.
At the same time, we greenlighted one-way door
lifecycle
config without understanding its impact on future extensibility.
Be careful of one-way doors.
Variable should have types WHENEVER possible.
When I say “WHENEVER” I mean it literally. There are almost no good reasons for not having a type.
Validation.
Input validation is the key to make the modules more usable. If someone is not familiar with your code and is providing some variables won’t work, it’s better to let them know as early as possible.
There are several places in terraform that you can validate the input and the outcome.
Single Variable.
Variable validation can be used when you’d like to validate a single variable input. It looks like
1
2
3
4
5
6
7
variable "var_name" {
...
validation {
condition = "..."
error_message = "..."
}
}
Please add input validation whenever you can. It was a quite time consuming process, but now, with AI, it’s so much easier to create good validations for inputs.
It’s especially useful when certain attributes will only fail at terraform apply
stage. Adding validation could help users catch those errors earlier in the development.
For example, the bucket_name
of s3
could not include underscore _
, and it won’t fail until you run terraform apply
.
Without input validation, for this terraform template
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# root project
module "Example" {
source = "./Example"
aws_s3_bucket_name = "my_bucket-name"
}
# Example module
variable "aws_s3_bucket_name" {
description = "The name of the S3 bucket"
type = string
}
resource "aws_s3_bucket" "example" {
bucket = var.aws_s3_bucket_name
}
We will get the followings from validate
and plan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
❯ terraform validate
Success! The configuration is valid.
❯ terraform plan
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
+ create
Terraform will perform the following actions:
# module.Example.aws_s3_bucket.example will be created
+ resource "aws_s3_bucket" "example" {
...
+ bucket = "my_bucket-name"
...
}
Plan: 1 to add, 0 to change, 0 to destroy.
We will only get errors with terraform apply
1
2
3
4
5
6
7
module.Example.aws_s3_bucket.example: Creating...
╷
│ Error: validating S3 Bucket (my_bucket-name) name: only lowercase alphanumeric characters and hyphens allowed in "my_bucket-name"
│
│ with module.Example.aws_s3_bucket.example,
│ on Example/main.tf line 11, in resource "aws_s3_bucket" "example":
│ 11: resource "aws_s3_bucket" "example" {
Firstly, it’s bad because it’s already too late in the pipeline. If the deployment is fully CI/CDed, then you might only get the error after the commit is merged into release branch.
Secondly, there are no var.aws_s3_bucket_name
mentioned in the error message. For the users, who’s not familiar the module implementation, an error message like this means they will have to dig into module’s implementation to figure out the root cause.
With input validation, it will become a much more pleasant experience because the users can catch the error earlier and easier to find where to fix.
1
2
3
4
5
6
7
8
9
variable "aws_s3_bucket_name" {
description = "The name of the S3 bucket"
type = string
validation {
# Just check if includes "_" as validation example
condition = !strcontains(var.aws_s3_bucket_name, "_")
error_message = "The S3 bucket name must include an underscore '_'."
}
}
Now, if you run terraform validate
or plan
, it will give you
1
2
3
4
5
6
7
8
9
10
11
12
❯ terraform validate
╷
│ Error: Invalid value for variable
│
│ on main.tf line 16, in module "Example":
│ 16: aws_s3_bucket_name = "my_bucket-name"
│ ├────────────────
│ │ var.aws_s3_bucket_name is "my_bucket-name"
│
│ The S3 bucket name must include an underscore '_'.
│
│ This was checked by the validation rule at Example/main.tf:4,3-13.
Now the error message contains var.aws_s3_bucket_name
, which gives a much easier pointer for users to look at. You can also customize the error message to leave more instructions when needed.
Multiple Variables Validation.
Chances are sometimes certain attributes only work with each other. For example, a flag enable_resource
and the configs of that resource
. You can only refer the the same variable in the input validation expression, so it’s not feasible to do something like.
1
2
3
4
validation {
condition = var.enable_resource && var.resource_attr_1 == "something"
...
}
An easy work-around for this is to make multiple vars an object, in this way you can reference multiple values in the same condition expression.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
variable "s3_setting" {
type = object({
enable_bucket = bool
bucket_name = optional(string, null)
})
validation {
condition = var.s3_setting.enable_bucket == false || (var.s3_setting.enable_bucket == true && var.s3_setting.bucket_name != null)
error_message = "When 'enable_bucket' is true, 'bucket_name' must be provided."
}
}
# The variable is used as
resource "aws_s3_bucket" "example" {
count = var.s3_setting.enable_bucket ? 1 : 0
bucket = var.s3_setting.bucket_name
}
When you enable s3_bucket but forget to provide the required vars, you’ll get
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
❯ terraform validate
╷
│ Error: Invalid value for variable
│
│ on main.tf line 16, in module "Example":
│ 16: s3_setting = {
│ 17: enable_bucket = true
│ 18: }
│ ├────────────────
│ │ var.s3_setting.bucket_name is null
│ │ var.s3_setting.enable_bucket is true
│
│ When 'enable_bucket' is true, 'bucket_name' must be provided.
│
│ This was checked by the validation rule at Example/main.tf:8,3-13.
You can also use a check
block if you’d like to keep the vars seperated. However, note that when the condition
in check
block fails, it only WARNs you. It won’t stop the plan
or apply
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
variable "enable_bucket" {
type = bool
default = false
}
variable "bucket_name" {
type = string
default = null
}
check "bucket_name_provided" {
assert {
condition = var.enable_bucket == false || (var.enable_bucket == true && var.bucket_name != null)
error_message = "bucket_name must be provided when enable_bucket is true"
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
❯ terraform plan
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
+ create
Terraform will perform the following actions:
# module.Example.aws_s3_bucket.example[0] will be created
+ resource "aws_s3_bucket" "example" {
...
}
Plan: 1 to add, 0 to change, 0 to destroy.
╷
│ Warning: Check block assertion failed
│
│ on Example/main.tf line 13, in check "bucket_name_provided":
│ 13: condition = var.enable_bucket == false || (var.enable_bucket == true && var.bucket_name != null)
│ ├────────────────
│ │ var.bucket_name is null
│ │ var.enable_bucket is true
│
│ bucket_name must be provided when enable_bucket is true
╵
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if
you run "terraform apply" now.
P.S. There is a reason why sometimes we can’t just
1 2 3 4 resource "aws_s3_bucket" "example" { count = var.s3_setting.bucket_name != null ? 1 : 0 bucket = var.s3_setting.bucket_name }Terraform needs to know the value of the var used in
count
orfor_each
before the plan. If the var you used has dependency on the values unknown before apply, you might need to do a multi-stage apply for the first time.
Resource/Output post or pre condition
You can also add postcondition and precondition to the resources.
I think these are generally more useful for end-users to add in root module level, where user may have clearer requirements about what they want pre/post deployment.
Also, since they are configured in lifecycle
block, they are technically one-way doors.
Tests.
Terraform tests is available in Terraform v1.6.0
and later. We use command = plan
for unit tests (triggered on every commit) and command = apply
for integration tests (triggered on every release candidate).
Investing on testing has proven to be a good investment. Highly recommand even if you have a narrow timeframe.
Having unit tests for our modules has significantly increased our velocity to develop the modules.
It makes it much easier for other developers to contribute to our modules, which helped our goal for company-wide adoption.
Documentation.
Terraform docs is pretty good. To make sure our doc is always up-to-date, we put a diff
pipeline to trigger on every commit.
The only caveat we had with
terraform docs
is the format –markdown table
is quite hard to read when the description of the variable becomes longer, although it looks more similar to most of terraform docs out there.
markdown document
renders much better with the default markdown of our code repository.