ph_piter March 23, 2018 at 12:00

Terraform: a new approach to Infrastructure as code

Transfer

Hello colleagues! While the brilliant Elon Musk is hatching ambitious plans for terraforming Mars , we are interested in new opportunities related to the " Infrastructure as Code " paradigm and want to offer you a translation of an article about one of the representatives of the "magnificent seven" - Terraform . Evgeni Brikman’s book on the topic is not bad, but she will be back soon, so please speak out if you want to see it in Russian by

Kamal Marhubi from Heap.

Our infrastructure is based on AWS, and we manage it with Terraform. In this publication, we have picked up for you practical tips and tricks that were useful to us in the course of work.

Terraform and code-level infrastructure

Terraform is a tool from Hashicorp that helps you declaratively manage your infrastructure. In this case, you do not have to manually create instances, networks, etc. in the console of your cloud provider; just write a configuration that outlines how you see your future infrastructure. This configuration is created in human-readable text format. If you want to change your infrastructure, then edit the configuration and run terraform apply. Terraform will route API calls to your cloud provider to bring the infrastructure into line with the configuration specified in this file.

If you transfer infrastructure management to text files, you will be able to arm yourself with all your favorite tools for managing source code and processes, and then reorient them to work with the infrastructure. Now the infrastructure is subject to version control systems, just like the source code, it can be reviewed in the same way or rolled back to an earlier state if something goes wrong.

Here's how, for example, Terraform defines an EC2 instance with an EBS volume:

resource "aws_instance" "example" {
  ami           = "ami-2757f631"
  instance_type = "t2.micro"
  ebs_block_device {
    device_name = "/dev/xvdb"
    volume_type = "gp2"
    volume_size = 100
  }
}

If you have not tried Terraform yet, then this beginner's guide is suitable and will help you quickly get used to the flow of tasks in this tool.

Terraform Data Model

In a general perspective, the Terraform data model is simple: Terraform manages resources, and resources have attributes. A few examples from the AWS world:

Instance EC2 is a resource with attributes such as machine type, boot image, availability zone, and security groups
An EBS volume is a resource with attributes such as volume size, volume type, IOPS
An elastic load balancer is a resource with attributes for backup instances, their performance characteristics, and some other phenomena.

Terraform provides the mapping of resources described in the configuration file with the corresponding resources of the cloud provider. This mapping is called a state ; it is a giant JSON file. When launched, terraform applyTerraform updates the state by sending a corresponding request to the cloud provider. It then compares the returned resources with the information that is recorded in your Terraform configuration. If any difference is found, then a plan is created, in essence, a list of changes that need to be made to the resources of the cloud provider so that the actual configuration matches the one specified in your configuration. Finally, Terraform applies these changes, directing the appropriate calls to the cloud provider.

Not Every Terraform Resource Is an AWS Resource

It is not so difficult to understand such a data model with resources and attributes, however, it may not completely coincide with the cloud provider API. In fact, a single Terraform resource can correspond to one or several basic objects of a cloud provider - or even not match any. Here are some examples from AWS:

aws_ebs_volume Terraform matches one AWS EBS volume
aws_instancein Terraform with a built-in block ebs_block_deviceas in the previous example corresponds to two EC2 resources: instance and volume
aws_volume_attachment Terraform does not match any object in EC2!

The latter may seem surprising. When creating aws_volume_attachmentTerraform will make a request AttachVolume; upon destruction of this volume, it will make a request DetachVolume. Not a single EC2 object is involved: aws_volume_attachmentTerraform is completely synthetic! Like all resources in Terraform, it has an ID. But, while in most cases the ID is purchased from a cloud provider, the ID aws_volume_attachmentis just a hash from the volume ID, instance ID, and device name . There are other cases where the synthetic resources appear in Terraform - for example, aws_route53_zone_association, aws_elb_attachmentand aws_security_group_rule. To find them, you can search in the name of the resource associationor attachment, which, however, does not always help.

All tasks are solved in several ways, so be careful when choosing!

When working with Terraform, exactly the same infrastructure can be represented in several different ways. Here is another version of the description of our example instance with the EBS volume in Terraform, which gives exactly the same EC2 resources:

resource "aws_instance" "example" {
  ami           = "ami-2757f631"
  instance_type = "t2.micro"
}
resource "aws_ebs_volume" "example-volume" {
  availability_zone = "${aws_instance.example.availability_zone}"
  type              = "gp2"
  size              = 100
}
resource "aws_volume_attachment" "example-volume-attachment" {
  device_name = "/dev/xvdb"
  instance_id = "${aws_instance.example.id}"
  volume_id   = "${aws_ebs_volume.example-volume.id}"
}

Now the EBS volume has become a full-fledged Terraform resource, we have delimited it from the EC2 instance. There is a third resource, synthetic, linking the first two. If we imagine our instance and volume, we can add and remove volumes, simply adding and removing resources aws_ebs_volume and aws_volume_attachment.

It often doesn’t matter which representation of EBS you choose. But sometimes, if the choice was wrong, changing your infrastructure after that will be quite difficult!

We made a mistake with the choice

This is where we got burned. We work with a large AWS PostgreSQL cluster, and 18 EBS volumes are attached to each instance as storage. All of these instances are represented in Terraform as the only aws_instance resource with EBS volumes defined in blocks ebs_block_device.

In the instances of our database, information is stored in the ZFS file system. ZFS allows you to dynamically add block devices to grow the file system without any delay. Thus, we are gradually increasing our storage as customers send us more and more data. Since we are an analytical company and collect all kinds of information, such an opportunity is a huge help for us. We constantly optimize queries and insert operations in our cluster.. Thus, we will not get caught up in the rigid CPU-storage ratio that we selected when preparing the cluster, but we will be able to adjust the balance on the fly in order to effectively use the latest innovations.

This process could be even smoother if not for the blocks ebs_block_device. Yes, you can hope that Terraform will add the 19th block ebs_block_deviceto the instance aws_instance- and everything will just work. However, Terraform sees an overwhelming change here: he “does not know” how to change an instance of 18 volumes to 19. No, Terraform is going to demolish the whole instance and make a new one in its place! We least of all wanted something similar in our database where terabytes of information are stored!

Until recently, we used a workaround and made Terraform synchronize in several stages:

ran a script that used AWS CLI to create and add volumes
run terraform refreshso Terraform updates state and
finally reconfigured to fit the new realities

Between stages 2 and 3, the team terraform planwill show that Terraform was going to destroy and recreate all the instances of our database. Thus, it was not possible to work with these instances in Terraform until someone updated the configuration. Needless to say, how frightening it is to permanently be in this state!

Terraform State: Moving on to Surgery

Having discovered the approach with aws_volume_attachment, we decided to rebuild our view. Each volume has turned into two new Terraform resources: aws_ebs_volumeand aws_volume_attachment. We had 18 volumes per instance in the cluster, and more than a thousand new resources lined up in front of us. Restructuring a view is not just changing a Terraform configuration. We had to get to the bottom of Terraform and change its vision of resources.

Given that we have added more than a thousand resources, we definitely were not going to do it manually. Terraform status is stored in JSON format. Although this format is stable, the documentation states that “ directly editing state files is not recommended". We would have to do it anyway, but we wanted to be sure that we were doing it right. We decided not to engage in reverse development of the JSON format, but wrote a program that uses the internal Terraform elements as a library for reading, modifying, and writing. It was not so simple, because for all of us it was the first program on Go that we had a chance to work with! But we thought it was necessary to make sure: yes, we will not confuse all Terraform states of all instances of our database into one heap.

We posted the tool on GitHub , in case you want to play with it and feel yourself in our shoes.

Terraform accurately

The launch terraform applyis one of those few acts that can seriously damage the entire corporate infrastructure. There are some tips, following which, risks can be reduced - and in general it will not be so scary.

Always prepare a plan `–out`and follow this plan

If run terraform plan -out planfile, Terraform will write the plan to planfile. Then you can get this plan exactly by running terraform apply planfile. Thus, the changes that will be applied at this moment exactly correspond to what Terraform displays at the time of planning. There is no situation in which the infrastructure could suddenly change due to the fact that one of the colleagues adjusted it between your “plan” and “apply” operations.

However, be careful when working with this file: Terraform variables are included there, therefore, if you write something secret there, this information will be recorded in the file system in unencrypted form. For example, if you transfer your credentials as variables to the cloud provider, they will be saved on disk as plain text.

Make a read-only IAM role to enumerate changes

Once launched, terraform planTerraform updates the underlying view of your architecture. To do this, he just needs access to your cloud provider with read-only rights. Having laid such a role for him, you can go over the changes made to the configuration and check them with the help terraform plan, without even risking that a careless team will applycross out all the work done for you in a day - or in a week!

With AWS, you can manage IAM roles and associated access rights in Terraform. The role in Terraform looks like this:

resource "aws_iam_role" "terraform-readonly" {
  name = "terraform-readonly"
  path = "/",
  assume_role_policy = "${data.aws_iam_policy_document.assume-terraform-readonly-role-policy.json}"
}

The assume_role_policylist of users who have the right to accept this role is simply displayed.

Finally, you need a policy that provides read-only access to all AWS resources. Amazon kindly provides a document with a description of the policies that you can simply copy and paste - this is the document we used. We define the policy aws_iam_policythat refers to this document:

resource "aws_iam_policy" "terraform-readonly" {
  name = "terraform-readonly"
  path = "/"
  description = "Readonly policy for terraform planning"
  policy = "${file("policies/terraform-readonly.json")}"
}

Then apply the policy to the role terraform-readonly, adding aws_iam_policy_attachment:

resource "aws_iam_policy_attachment" "terraform-readonly-attachment" {
  name = "Terraform read-only attachment"
  roles = ["${aws_iam_role.terraform-readonly.name}"]
  policy_arn = "${aws_iam_policy.terraform-readonly.arn}"
}

You can now use the AssumeRoleSecure Token Service API method to obtain temporary credentials that only allow you to request AWS but not make changes. When launched terraform plan, we will update the state of Terraform so that it reflects the current state of the infrastructure. If you are working with a local state, then this information will be written to a file terraform.tfstate. If you use a remote state, for example, in S3, then your read-only role will also require the right to write - otherwise you cannot get to S3.

Organizing such a role was much easier than rewriting the entire Terraform state for use.aws_volume_attachmentwith volumes of our database. We knew that no changes were planned in the AWS infrastructure - only its presentation in Terraform should change. In the end, we were absolutely not going to change the infrastructure - why do we need such an opportunity?

Ideas for the future

Our team is growing and new employees are learning to make changes to the infrastructure with Terraform. I want this process to be simple and safe. Most failures are related to human errors and changes in configuration , and changes using Terraform can be fraught with both of them - it's horrible, you must admit.

For example, in a small team it is easy to guarantee that only one person will work with Terraform at any given time. In a larger team, this cannot be guaranteed - it remains only to hope for it. If the command terraform applyis simultaneously launched from two nodes, the result may be a terrible non-deterministic hash. Terraform 0.9 introduces the ability to lock state- it ensures that only one command can be applied at any given time terraform apply.

Another area of work where you want to achieve ease and security is to review the changes made to the infrastructure. At this stage, when reviewing, we simply copy and paste the output terraform planas a comment on the review - and when the option is approved, we make all changes manually.

We have already adapted to use our continuous integration tool when validating the Terraform configuration. For now, just run the command terraform validate, and the tool checks for syntax errors in the code. Our next challenge is to adapt the continuous integration tool.terraform planand display the changes made to the infrastructure as a comment on the code review. The continuous integration system should automatically start terraform applyas soon as the change is approved. So we exclude one of the steps that needed to be performed manually, and also provide a more consistent control track (history of changes), which can be traced in the comments. There is such an opportunity in the Terraform Enterprise version - so we recommend that you take a closer look at it.

Only registered users can participate in the survey. Please come in.

Would buy from you a new book on

68.4% Ansible 104
9.2% Chef 14
67.7% Terraform 103
9.8% Puppet 15
10.5% SaltStack 16

Tags: