How Terraform Tests Saved a Prod Deployment

Picture this: It’s 1 AM. I am not even joking:

You’ve just refactored your Terraform module to add the auto-scaling magic. You merge. You deploy. You go to bed. The next morning? Production is literally on fire 🔥 because your “tiny” change accidentally nuked the database.

How to stop “Oops” from becoming “OH NO” …

Test-Driven Chaos Prevention 🧪

Terraform tests (available in v1.6+) let you validate config changes before they touch your infrastructure. Think of them as your code’s personal bouncer, checking IDs at the door.

# valid_string_concat.tftest.hcl
run "did_i_break_everything" {
  command = plan
  assert {
    condition = aws_s3_bucket.bucket.name == "my-glittery-unicorn-bucket"
    error_message = "Name mismatch! Abort mission! 🚨"
  }
}

Translation: “If the bucket name isn’t ‘my-glittery-unicorn-bucket,’ error and abort.”

How Terraform Tests Save You 🤗

1️⃣ command = plan: Simulate changes without touching real infra. “What if…?” but for adults.
2️⃣ Assertions: Like a clingy ex, they’ll text you 100x if something’s wrong. Example:

assert {
  condition = output.bucket_name == "test-bucket" 
  error_message = "This is NOT the bucket you’re looking for. 👋"
}

3️⃣ Variables & Overrides: Test edge cases without redeploying. Example: “What if someone sets bucket_prefix to 🔥?”

Some Tips !

Mock Providers (v1.7+): Fake it ’til you make it. Test AWS without paying AWS 👍
Expect Failure: Want to validate that a config should break? Use expect_failures. Example:

run "expect_chaos" {
  variables { input = 1 } # Odd number → should fail validation
  expect_failures = [var.input]
}

Translation: “If this doesn’t fail, I’ve lost faith in humanity.” (I have already tbh)

Modules in Tests: Reuse setup/teardown logic like a lazy genius. Example: A “test” module that pre-creates a VPC so you can focus on actual work.

module "consul" {
  source  = "hashicorp/consul/aws"
  version = "0.0.5"

  servers = 3
}

The Takeaway 🚀

Testing is like adding seat belts to your code: boring until you crash !

Use run blocks, assertions, and provider mocking to:

Avoid “Works on My Machine” syndrome
Sleep better (no 3 AM “WHY IS S3 DOWN”)
Brag in PR reviews (“My tests caught 10 bugs. Your move, Karen.”)

TL;DR: Write tests. Save your sanity.

Resources:
[1] https://www.paloaltonetworks.com/blog/prisma-cloud/hashicorp-terraform-cloud-run-tasks-integration
[2] https://developer.hashicorp.com/terraform/language/tests

Site Reliability Engineering 2023 Conference

Last month, I had the incredible opportunity to attend Conf42 Site Reliability Engineering 2023 Conference!

The conference as a whole was really interesting but my highlights are:

Replacing Privileged Users With Automated Just-in-Time Access Requests by Travis Rodgers

Managing privileged access to resources can be cumbersome, with developers often needing temporary access beyond their regular duties.
Just-in-time access solutions allow engineers to escalate privileges when necessary, applying the principle of least privilege in a secure manner.
Role-Based Access Control (RBAC): Implementing RBAC further enhances security by defining and assigning roles, reducing the need for admin accounts.

Building Automated Quality Gates into your CI pipelines by Craig Risi (My favourite)

How to incorporate automated quality checks at various stages of the CI pipeline to ensure the delivery of high-quality software.
It highlights the benefits of having automated quality gates in place, including early bug detection and prevention.
Practical guidance on implementing quality gates using tools and techniques such as static code analysis, code coverage analysis, and automated testing.

GPT: Revolutionizing Monitoring and Logging Systems by Clay Langston

Use GPT (Generative Pre-trained Transformer) to enhance logging and monitoring performance.
Maximize log value and improve system performance.
Automate the process by integrating with ELK (Elasticsearch, Logstash, and Kibana).
Construct effective prompts to extract valuable insights from logs.

Observability: one of the strongest muscles for SRE by Jhonnatan Gil Chaves

Focus on the big picture when implementing SRE practices.
Recognize the importance of the team and tools in SRE implementations.
Don’t overlook the broader view of your IT components.

CICD – The SRE-DevOps Overlay by Safeer CM and Garima Bajpai

Site reliability engineering (SRE) and DevOps practices have overlapping boundaries in many organizations.
Continuous integration and continuous delivery (CI/CD) are essential aspects of this overlap.
CI/CD serves as a prerequisite for many core SRE practices.

How to achieve the scalability, high availability, and elastic ability of your database infrastructure on Kubernetes by Trista Pan

How to make the clusters scalable, elastic, and highly available.
Traffic governance between applications and databases plays a crucial role in achieving these goals.
Effective way to manage and distribute traffic.

Measuring Reliability in Production Ramon Medrano Llamas

Identifying Critical User Journeys (CUJs) and recommendations for selecting appropriate metrics as SLI and SLO targets.
Practical insights and actionable steps for implementing SLIs and SLOs in your own applications.

If you missed out on Conf42 SRE 2023, fear not! The link with the abstract of the talks, the speakers and other details is here and you can also watch below the talks on youtube 🙌

Devops – Brief explanation

The Devops movement is built around a group of people who believe that the application of a combination of appropriate technology and attitude can revolutionize the world of software development and delivery. Communication is the key here.

The attitude that we are talking about is: Imagine all technical people feeling empowered, and capable of helping in all areas. Devs creating scenarios and thinking in business problems, QA thinking in infrastructure solutions, etc.

DevOps expands the concept of Agile not only for the code part but the entire project, from the scratch up to the maintenance phase. For this reason I found DevOps very similar with BDD.

Look this:

One of the key concepts is is involving the entire team (DBAS, Devs, QA, System administrators …) in the project from the scratch.

As you can see, DevOps and BDD follow the concept of all the team working together and participating since the beginning of the project. So, you can find the problems before you actually start develop something. The aims here are reduce time to deliver, increase the quality, share knowledge, cooperative work, earn money.

“DevOps” doesn’t differentiate between different sysadmin sub-disciplines – “Ops” is a blanket term for systems engineers, system administrators, operations staff, release engineers, DBAs, network engineers, security professionals, and various other subdisciplines and job titles. “Dev” is used as shorthand for developers in particular, but really in practice it is even wider and means “all the people involved in developing the product,” which can include Product, QA, and other kinds of disciplines.

This post on The Agile Admin compares & contrasts DevOps with Agile:
“The best way to define <Devops> in depth is to compare to the definition of agile development. Agile development, according to Wikipedia and the agile manifesto, consists of a couple different “levels” of thinking.

Agile Principles – like “business/users and developers working together.” These are the core values that inform agile, like collaboration, people over process, software over documentation, and responding to change over planning.

Agile Methods – specific process types used to implement the agile principles. Iterations, Lean, XP, Scrum. “As opposed to waterfall.”

Agile Practices – techniques often found in conjunction with agile development, not linked to a given method flavor, like test driven development, continuous integration, etc.
I believe the different parts of DevOps that people are talking about map directly to these three levels.

DevOps Principles – How we need to think differently about operations. Examples include dev/ops collaboration, “infrastructure as code,” and other high level concepts; things likeJames Turnbull’s 4-part model seem to be spot on examples of trying to define this arena.

DevOps Methods – Process you use to conduct agile operations – including iterations, lean/kanban, stuff you’d read in Visible Ops.

DevOps Practices – Specific techniques and tools used as part of implementing the processes, like automated build and provisioning, continuous deployment, monitoring, anything you’d have a “toolchain” for.”

So, the Devops movement is characterized by people with a multidisciplinary skill set – people who are comfortable with infrastructure and configuration, but also happy to roll up their sleeves, write tests, debug, and ship features.

I’ve summarised with a mix of researches and my own opinion. I hope this helps you have a better understanding what is this trend and how you can apply it. See you next week ! Thank you 🙂

Sources:

http://www.jedi.be/blog/2010/02/12/what-is-this-devops-thing-anyway/

http://theagileadmin.com/what-is-devops/

http://en.wikipedia.org/wiki/Agile_software_development

http://www.itpi.org/home/visibleops.php

http://www.kartar.net/2010/02/what-devops-means-to-me/

http://agilemanifesto.org/

http://theagileadmin.com/what-is-devops/