Skip to main content
Jorge Bernhardt Jorge Bernhardt
  1. Posts/

Terraform - Getting Started with Azure Chaos Studio

·1836 words·9 mins· 100 views · 5 likes ·
Azure CLI Azure Cloud Shell Microsoft Microsoft Azure

Hi everyone! Today, in this blog post, we’ll explore how to deploy Azure Chaos Studio using Terraform. As you may know, Azure Chaos Studio is a powerful service that lets us safely test our infrastructure by simulating real-world failures. This helps us find weaknesses and improve system reliability before problems happen in production.

In this guide, we’ll go step by step to set up and deploy Chaos Studio using Terraform. We’ll cover the basics, like creating a resource group, setting up identities and role assignments, and deploying Chaos Studio experiments. In a future post, we’ll look at more advanced settings and complex experiments to make the most of Chaos Studio. But for now, this guide has everything you need to get started!

Understanding Chaos Studio Components>

Understanding Chaos Studio Components #

Before deploying Azure Chaos Studio with Terraform, it’s important to understand the three key components that make up Chaos Studio experiments:

  • Targets: These are the Azure resources where failures will be applied. They can be virtual machines, web apps, databases, or other services.
  • Capabilities: These define the type of failure that will be injected into the target, such as shutting down a VM, increasing CPU usage, or blocking network traffic.
  • Experiments: These are structured workflows that define how and when failures will be applied. An experiment brings together targets and capabilities to test the resilience of a system.

By combining these components, we can simulate real-world failures in a controlled manner, allowing us to identify weaknesses in our infrastructure before problems occur in production.

Prerequisites>

Prerequisites #

  • You need Terraform CLI on your local machine, if you’re new to using Terraform to deploy Microsoft Azure resources, then I recommend you check out this  link.
  • A text editor or IDE of your choice (Visual Studio Code with terraform extension is my recommendation)
Declare Azure Provider in Terraform>

Declare Azure Provider in Terraform #

The provider.tf file in Terraform is used to specify and configure the providers used in your Terraform configuration. A provider is a service or platform where the resources will be managed. This could be a cloud provider like Microsoft Azure, AWS, Google Cloud, etc.

This file is important because it tells Terraform which provider’s API to use when creating, updating, and deleting resources. Without it, Terraform wouldn’t know where to manage your resources.

provider "azurerm" {
  features {}
  subscription_id = var.subscription_id
}

Important: Terraform now requires an explicit subscription ID in the provider configuration for clearer, more predictable, and secure infrastructure management in complex cloud environments.

Deploy Azure Resources Using Terraform>

Deploy Azure Resources Using Terraform #

For deploying Azure Chaos Studio, the main.tf file defines the essential infrastructure and configurations required to set up and manage chaos experiments.

  • azurerm_resource_group: This block sets up the Azure Resource Group where all other resources will be deployed.
  • azurerm_user_assigned_identity: This block creates a user-assigned managed identity that the Chaos Studio experiments will use. This identity can be assigned specific roles and permissions to interact with the targeted resources.
  • azurerm_chaos_studio_experiment: This block defines the Chaos Studio experiments that will be deployed. It configures each experiment with a name, location, and resource group. It also includes selectors, which reference registered Chaos Studio targets, and steps, which define the sequence of actions to be performed, using enabled capabilities to inject faults into the targets.
  • azurerm_chaos_studio_target: This block registers specific Azure resources as Chaos Studio targets. Each target is linked to an existing resource ID and resource type, allowing it to be used in experiments. Without this registration, a resource cannot be selected in a Chaos Studio experiment.
  • azurerm_chaos_studio_capability: This block enables fault injection capabilities for the registered targets. Each capability type (e.g., Shutdown-1.0, CPU Pressure-1.0) must be explicitly assigned to a Chaos Studio target before it can be used in an experiment.
  • azurerm_role_assignment: This block grants the required permissions for the Chaos Studio experiment’s identity. It assigns a predefined Azure role (such as Contributor or Reader) to the system-assigned identity of each experiment, ensuring that it has sufficient privileges to execute fault injection actions on its associated targets.
// User Assigned Identity
resource "azurerm_user_assigned_identity" "identity" {
  resource_group_name = var.resource_group_name
  location            = var.location
  name                = "chaosstudio-identity"
}

// Role Assignment for Chaos Studio Identity
resource "azurerm_role_assignment" "chaos_studio_identity_role_assignment" {
  for_each             = { for exp in var.experiments : exp.name => exp } 
  scope                = each.value.target_resource_id
  role_definition_name = each.value.role_definition_name
  principal_id         = azurerm_user_assigned_identity.identity.principal_id

  depends_on = [azurerm_chaos_studio_experiment.experiment]
}

// Chaos Studio Experiment
resource "azurerm_chaos_studio_experiment" "experiment" {
  for_each            = { for exp in var.experiments : exp.name => exp }
  name                = each.key
  location            = var.location
  resource_group_name = var.resource_group_name

  identity {
    type = "SystemAssigned"
  }

  selectors {
    name                    = "Selector-${each.key}"
    chaos_studio_target_ids = [azurerm_chaos_studio_target.target[each.key].id]
  }

  steps {
    name = "step-${each.key}"
    branch {
      name = "branch-${each.key}"
      actions {
        urn           = azurerm_chaos_studio_capability.capability[each.key].urn
        selector_name = "Selector-${each.key}"
        parameters    = each.value.parameters
        action_type   = each.value.action_type
        duration      = each.value.duration
      }
    }
  }
}

// Chaos Studio Target
resource "azurerm_chaos_studio_target" "target" {
  for_each           = { for exp in var.experiments : exp.name => exp }
  location           = var.location
  target_resource_id = each.value.target_resource_id
  target_type        = each.value.target_resource_type
}

// Chaos Studio Capability
resource "azurerm_chaos_studio_capability" "capability" {
  for_each               = { for exp in var.experiments : exp.name => exp }
  capability_type        = each.value.capability_type
  chaos_studio_target_id = azurerm_chaos_studio_target.target[each.key].id
}
Declaration of input variables>

Declaration of input variables #

The variables.tf file in Terraform defines the variables used in the main.tf file. These variables allow for more flexibility and reusability in the code.

In this example, the variables defined in the variables.tf include:

  • subscription_id: This variable is used to dynamically pass the Azure subscription ID to Terraform, improving flexibility and security.

  • resource_group_name: This block declares a variable named resource_group_name, which is a string. It is used to specify the name of the Azure Resource Group where all resources will be deployed.

  • location: This variable defines the Azure region where resources will be deployed. It ensures that all resources are created in the specified geographic location.

  • experiments: This variable is a list of objects that define the configuration for Chaos Studio experiments. Each experiment includes several attributes:

    • name: A unique identifier for the experiment.
    • duration: Specifies how long the experiment will run.
    • target_resource_id: The Azure resource that will be affected by the experiment.
    • target_resource_type: Specifies the type of the targeted resource. Check the compatibility in this link.
    • role_definition_name: The role assigned to the experiment’s identity to ensure it has the necessary permissions.
    • capability_type: The specific Chaos Studio capability used in the experiment, such as shutting down a virtual machine.
    • action_type: Determines how the action is executed, for example, continuously or Discrete.
    • os_type (optional): Specifies the operating system type if required for the experiment.
    • parameters: A set of key-value pairs defining additional configuration details for the experiment.
// Azure Subscription ID
variable "subscription_id" {
  type        = string
  description = "Azure subscription where resources will be deployed."
}

// Azure Region
variable "location" {
  type        = string
  description = "The Azure region where resources will be created."
}

// Resource Group Name
variable "resource_group_name" {
  type        = string
  description = "Name of the resource group where resources will be deployed."
}

// Chaos Studio Experiments Configuration
variable "experiments" {
  description = "List of Chaos Studio experiments and their configurations."
  type = list(object({
    name                 = string  // Experiment name
    duration             = string  // Duration (e.g., 'PT5M')
    target_resource_id   = string  // Target Azure resource
    target_resource_type = string  // Resource type (e.g 'Microsoft-AppService')
    role_definition_name = string  // Assigned role
    capability_type      = string  // Chaos Studio capability (e.g., 'Stop-1.0')
    action_type          = string  // Action type (e.g., 'continuous')
    os_type              = optional(string, null) // OS type (if needed)
    parameters           = map(string) // Key-value parameters
  }))
}
Declaration of output values>

Declaration of output values #

The output.tf file in Terraform extracts and displays information about the resources created or managed by the configuration. The outputs include details about the user-assigned identity, such as its unique identifier and principal ID. They also provide information on the role assignments for Chaos Studio, detailing the scope, principal ID, and role definition name.

In addition, the file includes specifics about Chaos Studio targets, such as their resource IDs, types, and associated target identifiers. It also presents information on Chaos Studio capabilities, including their IDs, capability types, and the corresponding target IDs.

Finally, the output.tf file provides information about Chaos Studio experiments, including their IDs, names, assigned identities, resource groups, selectors, and steps.

Once Terraform has finished applying the configuration, it displays the defined outputs, allowing users to review the deployed resources and their attributes.

// Output for User Assigned Identity
output "user_assigned_identity" {
  description = "Details of the User Assigned Identity."
  value = {
    id           = azurerm_user_assigned_identity.identity.id
    principal_id = azurerm_user_assigned_identity.identity.principal_id
  }
}

// Output for Chaos Studio Role Assignments
output "chaos_studio_role_assignments" {
  description = "The role assignments for the Chaos Studio identity."
  value = {
    for k, v in azurerm_role_assignment.chaos_studio_identity_role_assignment :
    k => {
      id           = v.id
      scope        = v.scope
      principal_id = v.principal_id
      role_name    = v.role_definition_name
    }
  }
}

// Output for Chaos Studio Targets
output "chaos_studio_targets" {
  description = "The Chaos Studio targets created."
  value = {
    for k, v in azurerm_chaos_studio_target.target :
    k => {
      id              = v.id
      target_resource = v.target_resource_id
      target_type     = v.target_type
    }
  }
}

// Output for Chaos Studio Capabilities
output "chaos_studio_capabilities" {
  description = "The Chaos Studio capabilities created."
  value = {
    for k, v in azurerm_chaos_studio_capability.capability :
    k => {
      id              = v.id
      capability_type = v.capability_type
      target_id       = v.chaos_studio_target_id
    }
  }
}

// Output for Chaos Studio Experiments
output "chaos_studio_experiments" {
  description = "The Chaos Studio experiments created."
  value = {
    for k, v in azurerm_chaos_studio_experiment.experiment :
    k => {
      id                  = v.id
      name                = v.name
      identity_principal  = v.identity[0].principal_id
      resource_group_name = v.resource_group_name
      selectors           = v.selectors
      steps               = v.steps
    }
  }
}
Executing the Terraform Deployment>

Executing the Terraform Deployment #

Now that you’ve declared the resources correctly, it’s time to take the following steps to deploy them in your Azure environment.

  • Initialization: To begin, execute the terraform init command. This will initialize your working directory that holds the .tf files and download the provider specified in the provider.tf file, and configure the Terraform backend. If you want to know how, check this  link.

  • Planning: Next, execute the terraform plan. This command creates an execution plan and shows Terraform’s actions to achieve the desired state defined in your .tf files. This gives you a chance to review the changes before applying them.

  • Apply: When you’re satisfied with the plan, execute the terraform apply command. This will implement the required modifications to attain the intended infrastructure state. Before making any changes, you will be asked to confirm your decision.

  • Inspection: After applying the changes, you can use terraform show command to see the current state of your infrastructure.

  • Destroy (optional): when a project is no longer needed or when resources have become outdated. You can use the terraform destroy command. This will remove all the resources that Terraform has created.

References and useful links #

Thank you for taking the time to read my post. I sincerely hope that you find it helpful.