Tutorial

Abusing Terraform to Upload Static Websites to S3

October 6, 2021
Table of Contents

S3 has been a great option for hosting static websites for a long time, but it's still a pain to set up by hand. You need to traverse dozens of pages in the AWS Console to create and manage users, buckets, certificates, a CDN, and about a hundred different configuration options. If you do this repeatedly, it gets old fast. We can automate the process with Terraform, a well-known "infrastructure as code" tool, which lets us declare resources (e.g. servers, storage buckets, users, policies, DNS records) and let Terraform figure out how to build and connect them.

Terraform can create the infrastructure needed for a static website on AWS (e.g. users, bucket, CDN, DNS), *and* it can create and update the content (e.g. webpages, CSS/JS files, images), which goes outside the *infrastructure* part of "infrastructure as code" and is why I'm labeling it as an abuse or misuse of Terraform. Still, it works and has a few benefits:

- You can define the bucket, properties, DNS, CDN, etc. in the same place as your content
- You have a fully-automated process for standing up websites that only requires a single tool, Terraform

... and a few downsides:

- Uploading files is slow compared to something like the [AWS CLI's sync command](https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html)
- Terraform isn't meant for transforming or managing *content*, so you may outgrow Terraform's capabilities if you want advanced features or optimization

This article will breeze over the infrastructure parts of creating a static website on AWS and focus more on how to upload content and manage content metadata (MIME types and caching behavior). If you want to learn more about the infrastructure parts (e.g. setting up CloudFront, an SSL certificate, DNS routes), there are many great tutorials out there. Here are a few:

- [https://www.alexhyett.com/terraform-s3-static-website-hosting/](https://www.alexhyett.com/terraform-s3-static-website-hosting/)
- [https://medium.com/modern-stack/5-minute-static-ssl-website-in-aws-with-terraform-76819a12d412](https://medium.com/modern-stack/5-minute-static-ssl-website-in-aws-with-terraform-76819a12d412)
- [https://medium.com/@dblencowe/hosting-a-static-website-on-s3-using-terraform-0-12-aa5ffe4103e](https://medium.com/@dblencowe/hosting-a-static-website-on-s3-using-terraform-0-12-aa5ffe4103e)

Let's get on to the code! If you want just the code, you can find it here: <https://gitlab.com/tangram-vision/oss/tangram-visions-blog/-/tree/main/2021.10.06_TerraformS3Upload>

## The Boilerplate

We need *some* boilerplate to set up infrastructure before we can upload files to an S3 bucket. So, let's create a bucket with Terraform and the [AWS provider](https://registry.terraform.io/providers/hashicorp/aws/latest/docs). We'll configure the provider and create the bucket in a `main.tf` file containing the following:

```bash
terraform {
 required_providers {
   aws = {
     source  = "hashicorp/aws"
     version = "3.60.0"
   }
 }
}

provider "aws" {
 # This should match the profile name in the credentials file described below
 profile = "aws_admin"
 # Choose the region where you want the S3 bucket to be hosted
 region  = "us-west-1"
}

# To avoid repeatedly specifying the path, we'll declare it as a variable
variable "website_root" {
 type        = string
 description = "Path to the root of website content"
 default     = "../content"
}

resource "aws_s3_bucket" "my_static_website" {
 bucket = "blog-example-m9wtv64y"
 acl    = "private"

 website {
   index_document = "index.html"
 }
}

# To print the bucket's website URL after creation
output "website_endpoint" {
 value = aws_s3_bucket.my_static_website.website_endpoint
}
```

### AWS Credentials

To create or interact with AWS resources, we need to provide credentials. The AWS Terraform provider [accepts authentication in a variety of ways](https://registry.terraform.io/providers/hashicorp/aws/latest/docs#authentication), but I'm going to use a credential file. That file is located at `~/.aws/credentials` and looks like:

```bash
[aws_admin]
aws_access_key_id = AKIA...
aws_secret_access_key = ...
```

If you don't have credentials handy, you can [follow AWS documentation to create a new user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html#id_users_create_console) with a policy that grants S3 permissions.

## Uploading Files to S3 with Terraform

Here's where we start using Terraform... creatively, i.e. for managing content instead of just infrastructure. For the content, I've created a [basic multi-page website](https://gitlab.com/tangram-vision/oss/tangram-visions-blog/-/tree/main/2021.10.06_TerraformS3Upload/content) — a couple HTML files, a CSS file, and a couple images. By using Terraform's [fileset function](https://www.terraform.io/docs/language/functions/fileset.html) and the AWS provider's [s3_bucket_object resource](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_object), we can collect all the files in a directory and upload all of them to objects in S3:

```bash
# in main.tf, below the aforementioned boilerplate
resource "aws_s3_bucket_object" "file" {
 for_each = fileset(var.website_root, "**")

 bucket      = aws_s3_bucket.my_static_website.id
 key         = each.key
 source      = "${var.website_root}/${each.key}"
 source_hash = filemd5("${var.website_root}/${each.key}")
 acl         = "public-read"
}
```

The [for_each meta-argument](https://www.terraform.io/docs/language/meta-arguments/for_each.html) loops over all files in the website directory tree, binding the file path (`index.html`, `assets/normalize.css`, etc.) to `each.key`, which can be used elsewhere in the block. The `source_hash` argument hashes the file, which helps Terraform determine when the file has changed and needs to be re-uploaded to the S3 bucket. (There's a similar `etag` argument, but it [doesn't work when some kinds of S3 encryption are enabled](https://github.com/hashicorp/terraform-provider-aws/pull/11522).)

## Terraform Apply

With our trusty `main.tf` file in hand, we can now invoke dark and mysterious powers, conjuring infinite computational power out of nothing! With the merest flourish of our terminal, unfathomable forces precipitate to our whim — we are the tactician, the champion and commander over greater numbers than were ever deployed in any Greek myth!

![206p.gif](https://cdn.prod.website-files.com/5fff85e7332ae877ea9e15ce/615ddb4010f7309ce247e407_206p.gif)

Ahem... anyway, do the following:

```bash
# Initialize terraform in the current directory and download the AWS provider
terraform init
# Preview what changes will be made
terraform plan
# Make the changes (create and populate the S3 bucket)
terraform apply
```

At the end of the output from the apply command, you should see the website endpoint:

```bash
...
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.

Outputs:

website_endpoint = "blog-example-m9wtv64y.s3-website-us-west-1.amazonaws.com"
```

## Content Types, MIME Types, Oh My

Let's visit that URL in a browser and...

![aws_screenshot.png](https://cdn.prod.website-files.com/5fff85e7332ae877ea9e15ce/615ddb3f94487f170cdf91dc_aws_screenshot.png)

That's not what we expected. It turns out that S3 assigns a content type of `binary/octet-stream` to uploaded files by default. When visiting the website endpoint URL (which serves the `index.html` file), the browser sees that `Content-Type: binary/octet-stream` header and thinks "This is a binary file, so I'll prompt the user to download it".

We would prefer the browser to treat our HTML files as HTML, the CSS files as CSS, and so on. For that, we need the browser to receive the correct [MIME type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) (e.g. `text/html`, `text/css`, `image/png`) in the `Content-Type` header. The easiest way to do that is to specify the correct [content type when uploading files](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_object#content_type). To determine the correct type of our files, there are 2 approaches.

### Determining MIME Types with a CLI Tool

The first approach is to use a command-line tool like `file`, `xdg-mime` or `mimetype`. These tools use different approaches:

- `file` uses "magic tests" (looking for identifying bits at a small fixed offset into the file) to determine the type of files
- `xdg-mime` and `mimetype` match against the file extension first, falling back to using `file` if the file doesn't have an extension

The below shell session demonstrates basic usage of each command (a dollar sign is used to distinguish input commands from output results):

```bash
# Demo of file
$ file --brief --mime-type index.html
text/html
$ file --brief --mime-type assets/normalize.css
text/plain

# Demo of xdg-mime
$ xdg-mime query filetype index.html
text/html
$ xdg-mime query filetype assets/normalize.css
text/css

# Demo of mimetype
$ mimetype --brief index.html
text/html
$ mimetype --brief assets/normalize.css
text/css
```

A subtle detail in the above is that `file` may not label text files very precisely — it outputs the CSS file as `text/plain` instead of `text/css` because there's no magic test or consistent file header that can identify CSS files (nor the many other variations of text file types).

To determine MIME types with a CLI tool in our Terraform file, we'll add three pieces:

1. An [external data source](https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source) which, for each file to be uploaded, will call...
2. An external script that calls a CLI tool (e.g. `mimetype`) to determine the file's MIME type
3. The `content_type` argument of the `aws_s3_bucket_object` resource to assign the MIME type for each uploaded file

The external data source is a new block in `main.tf` as follows (I've turned the file list into a [local value](https://www.terraform.io/docs/language/values/locals.html), because we're using it in multiple places now):

```bash
locals {
 website_files = fileset(var.website_root, "**")
}

data "external" "get_mime" {
 for_each = local.website_files
 program  = ["bash", "./get_mime.sh"]
 query = {
   filepath : "${var.website_root}/${each.key}"
 }
}
```

The data source calls `bash ./get_mime.sh` once for each file, passing the filepath as [JSON to stdin](https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source#external-program-protocol). Using [the example from the Terraform docs](https://registry.terraform.io/providers/hashicorp/external/latest/docs/data-sources/data_source#processing-json-in-shell-scripts), we can implement the bash script to grab the JSON filepath from stdin, run `mimetype` on the file, and export the result as a JSON object on stdout.

```bash
#!/bin/bash

# Exit if any of the intermediate steps fail
set -e

# Extract "filepath" from the input JSON into FILEPATH shell variable.
eval "$(jq -r '@sh "FILEPATH=\(.filepath)"')"

# Run mimetype on filepath to get the correct mime type.
MIME=$(mimetype --brief $FILEPATH)

# Safely produce a JSON object containing the result value.
jq -n --arg mime "$MIME" '{"mime":$mime}'
```

And finally in `main.tf`, we associate the correct MIME type from the bash script with the file when uploading to S3

```bash
resource "aws_s3_bucket_object" "file" {
 for_each = local.website_files

 bucket       = aws_s3_bucket.my_static_website.id
 key          = each.key
 source       = "${var.website_root}/${each.key}"
 source_hash  = filemd5("${var.website_root}/${each.key}")
 acl          = "public-read"
 # added:
 content_type = data.external.get_mime[each.key].result.mime
}
```

### Determining MIME Types with a File Extension Map

The second approach to determining correct MIME types for our files is to simply provide a map of file extensions to MIME types. I first ran into this approach (for uploading files with Terraform) in [this article on the StateFarm engineering blog](https://engineering.statefarm.com/blog/terraform-s3-upload-with-mime/), but it's a common approach in general:

- The [hashicorp/dir/template Terraform module](https://registry.terraform.io/modules/hashicorp/dir/template/latest) has a [mapping of extensions and MIME types](https://github.com/hashicorp/terraform-template-dir/blob/17b81de441645a94f4db1449fc8269cd32f26fde/variables.tf#L18)
   - Sidenote: An [open Terraform issue](https://github.com/hashicorp/terraform/issues/27737) requesting native MIME type detection directs users to use this Terraform module.
- The [AWS CLI uses the python mimetypes module](https://github.com/aws/aws-cli/blob/8df550b8c28c1fa71d5c680f998e46107596f198/awscli/customizations/s3/utils.py#L340), which has a [built-in mapping](https://github.com/python/cpython/blob/97ea18ecede8bfd33d5ab2dd0e7e2aada2051111/Lib/mimetypes.py#L431) as a fallback if it can't read a mapping from the system (at `/etc/mime.types`)
- In non-desktop environments, the [xdg-mime tool falls back to using the mimetype tool](https://cgit.freedesktop.org/xdg/xdg-utils/tree/scripts/xdg-mime.in#n100), which [checks file extensions before performing magic tests](https://github.com/mbeijen/File-MimeInfo/blob/master/lib/File/MimeInfo/Magic.pm#L31) (for the most part)

To use this approach, we add a `mime.json` file that maps file extensions to MIME types for whatever files we need to upload. It could be as simple as the below:

```json
{
   ".html": "text/html",
   ".css": "text/css",
   ".png": "image/png"
}
```

And we load that file as a local variable in Terraform and use it when looking up the content type:

```bash
locals {
 website_files = fileset(var.website_root, "**")

 mime_types = jsondecode(file("mime.json"))
}

resource "aws_s3_bucket_object" "file" {
 for_each = local.website_files

 bucket       = aws_s3_bucket.my_static_website.id
 key          = each.key
 source       = "${var.website_root}/${each.key}"
 source_hash  = filemd5("${var.website_root}/${each.key}")
 acl          = "public-read"
 content_type = lookup(local.mime_types, regex("\\.[^.]+$", each.key), null)
}
```

This mapping-based approach has the advantages of being simple and more cross-platform than shelling out to CLI tools. The downside is that you need to make sure all filetypes you're using exist in the extension-to-MIME mapping and are correct.

## Fixing a Stale CloudFront Cache

Now we have [a working static website](http://blog-example-m9wtv64y.s3-website-us-west-1.amazonaws.com/) that we can visit in our browser! If you don't care about SSL or caching for some reason, you could stop here. But, I would argue that an important part of modern websites is making them secure and fast, so you'll likely want to put a CloudFront distribution in front of your S3 bucket. There are many other tutorials (such as all the ones linked at the top of this article) that cover CloudFront, so I won't dig into the details of that. However, I do want to dig into a problem that you run into when serving a static website via CloudFront: a stale cache.

By default, CloudFront applies a TTL of 86400 seconds (1 day), meaning CloudFront will fetch website files from your S3 bucket and serve the same files to visitors for a full day before re-fetching from S3. If you update website content (e.g. change CSS styles or javascript behavior) in S3, visitors may continue receiving cached versions from CloudFront and won't see your updates for up to a whole day! We'd prefer visitors to see the latest version of all website content, but we'd also like CloudFront to cache files as long as possible, so files can be served faster (directly from cache).

### Cache Busting

One solution is [cache-busting](https://javascript.plainenglish.io/what-is-cache-busting-55366b3ac022), which involves adding a hash (or "fingerprint") to non-HTML files' names. If the files' content changes, then the hash changes, so the browser downloads a completely different file (which can be cached forever).

I tried to implement this with Terraform, but uh... Terraform isn't meant for this sort of thing. Between the Terraform [filemd5](https://www.terraform.io/docs/language/functions/filemd5.html) and [regex](https://www.terraform.io/docs/language/functions/regex.html) functions, you can get close, but I hit a wall when trying to [replace](https://www.terraform.io/docs/language/functions/replace.html) filenames with their hashed version in all files. This could maybe work if you used [template](https://www.terraform.io/docs/language/functions/templatefile.html) variables (e.g. `<link href="${main.css}">` instead of `<link ref="main.css">`), but then you can no longer browse your website via the filesystem or a local server. Alas, here dies my ill-advised dream of making a Terraform-based static-site generator/bundler.

![melting_emoji.png](https://cdn.prod.website-files.com/5fff85e7332ae877ea9e15ce/615ddb40a9390dfd8a805bba_melting_emoji.png)

Fun fact: the [melting face emoji](https://www.unicode.org/L2/L2020/20072-melting-face-emoji.pdf) was recently approved!

### Cache Invalidation

The other solution to a stale CloudFront cache is [invalidating files](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html). This approach does not fit into Terraform's declarative paradigm — there are no resources for invalidations in the AWS provider and no third-party modules either. So, it requires more hacky-ness, in the form of a [null_resource](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) that triggers based on changes in file hashes and [shells out](https://www.terraform.io/docs/language/resources/provisioners/local-exec.html) to the AWS CLI to create a new invalidation. That approach might look something like the below:

```bash
locals {
 website_files = fileset(var.website_root, "**")

 file_hashes = {
   for filename in local.website_files :
   filename => filemd5("${var.website_root}/${filename}")
 }
}

resource "null_resource" "invalidate_cache" {
 triggers = local.file_hashes

 provisioner "local-exec" {
   command = "aws --profile=aws_admin cloudfront create-invalidation --distribution-id=${aws_cloudfront_distribution.my_distribution.id} --paths=/*"
 }
}
```

The null resource is a new provider, so you'll need to run `terraform init` again.

## What About Browser Caching?

We've talked about CloudFront caching, but there's another cache in between your content and your visitor: the browser. The browser cache and the `Cache-Control` header are a big topic all on their own; [Harry Roberts's Cache-Control for Civilians](https://csswizardry.com/2019/03/cache-control-for-civilians/) is a great resource if you want to learn more.

For the purpose of this article, it's important to note that you shouldn't set an aggressive cache control header (e.g. `Cache-Control: public, max-age=604800, immutable`) on your website files without fingerprinting them. Otherwise, visitors' browsers will keep serving a file from their local cache for the `max-age` duration (one week, in the above example) before they send a request to CloudFront to check if the file is stale. CloudFront invalidations force CloudFront to fetch fresh content, but have no impact on the caching of visitors' browsers.

---

That's all for this adventure — thanks for joining me in pushing Terraform out of its comfort zone! If you have any suggestions or corrections, please let me know or [send us a tweet](https://www.twitter.com/tangramvision), and if you’re curious to learn more about how we improve perception sensors, visit us at [Tangram Vision](https://www.tangramvision.com/).

# Corrections

- 2022-06-13: Thanks to [Antoine Bolvy on Twitter](https://twitter.com/saveman71/status/1535168445523935232) for catching a couple typos (`locals.file_hashes -> local.file_hashes` and `website_content_filepath -> website_root`)!

Share On: