
Terraform: a new approach to Infrastructure as code
- Transfer
Hello colleagues! While the brilliant Elon Musk is hatching ambitious plans for terraforming Mars , we are interested in new opportunities related to the " Infrastructure as Code " paradigm and want to offer you a translation of an article about one of the representatives of the "magnificent seven" - Terraform . Evgeni Brikman’s book on the topic is not bad, but she will be back soon, so please speak out if you want to see it in Russian by
Kamal Marhubi from Heap.
Our infrastructure is based on AWS, and we manage it with Terraform. In this publication, we have picked up for you practical tips and tricks that were useful to us in the course of work.
Terraform is a tool from Hashicorp that helps you declaratively manage your infrastructure. In this case, you do not have to manually create instances, networks, etc. in the console of your cloud provider; just write a configuration that outlines how you see your future infrastructure. This configuration is created in human-readable text format. If you want to change your infrastructure, then edit the configuration and run
If you transfer infrastructure management to text files, you will be able to arm yourself with all your favorite tools for managing source code and processes, and then reorient them to work with the infrastructure. Now the infrastructure is subject to version control systems, just like the source code, it can be reviewed in the same way or rolled back to an earlier state if something goes wrong.
Here's how, for example, Terraform defines an EC2 instance with an EBS volume:
If you have not tried Terraform yet, then this beginner's guide is suitable and will help you quickly get used to the flow of tasks in this tool.
In a general perspective, the Terraform data model is simple: Terraform manages resources, and resources have attributes. A few examples from the AWS world:
Terraform provides the mapping of resources described in the configuration file with the corresponding resources of the cloud provider. This mapping is called a state ; it is a giant JSON file. When launched,
It is not so difficult to understand such a data model with resources and attributes, however, it may not completely coincide with the cloud provider API. In fact, a single Terraform resource can correspond to one or several basic objects of a cloud provider - or even not match any. Here are some examples from AWS:
The latter may seem surprising. When creating
When working with Terraform, exactly the same infrastructure can be represented in several different ways. Here is another version of the description of our example instance with the EBS volume in Terraform, which gives exactly the same EC2 resources:
Now the EBS volume has become a full-fledged Terraform resource, we have delimited it from the EC2 instance. There is a third resource, synthetic, linking the first two. If we imagine our instance and volume, we can add and remove volumes, simply adding and removing resources
It often doesn’t matter which representation of EBS you choose. But sometimes, if the choice was wrong, changing your infrastructure after that will be quite difficult!
This is where we got burned. We work with a large AWS PostgreSQL cluster, and 18 EBS volumes are attached to each instance as storage. All of these instances are represented in Terraform as the only aws_instance resource with EBS volumes defined in blocks
In the instances of our database, information is stored in the ZFS file system. ZFS allows you to dynamically add block devices to grow the file system without any delay. Thus, we are gradually increasing our storage as customers send us more and more data. Since we are an analytical company and collect all kinds of information, such an opportunity is a huge help for us. We constantly optimize queries and insert operations in our cluster.. Thus, we will not get caught up in the rigid CPU-storage ratio that we selected when preparing the cluster, but we will be able to adjust the balance on the fly in order to effectively use the latest innovations.
This process could be even smoother if not for the blocks
Until recently, we used a workaround and made Terraform synchronize in several stages:
Between stages 2 and 3, the team
Having discovered the approach with
Given that we have added more than a thousand resources, we definitely were not going to do it manually. Terraform status is stored in JSON format. Although this format is stable, the documentation states that “ directly editing state files is not recommended". We would have to do it anyway, but we wanted to be sure that we were doing it right. We decided not to engage in reverse development of the JSON format, but wrote a program that uses the internal Terraform elements as a library for reading, modifying, and writing. It was not so simple, because for all of us it was the first program on Go that we had a chance to work with! But we thought it was necessary to make sure: yes, we will not confuse all Terraform states of all instances of our database into one heap.
We posted the tool on GitHub , in case you want to play with it and feel yourself in our shoes.
The launch
Always prepare a plan
If run
However, be careful when working with this file: Terraform variables are included there, therefore, if you write something secret there, this information will be recorded in the file system in unencrypted form. For example, if you transfer your credentials as variables to the cloud provider, they will be saved on disk as plain text.
Once launched,
With AWS, you can manage IAM roles and associated access rights in Terraform. The role in Terraform looks like this:
The
Finally, you need a policy that provides read-only access to all AWS resources. Amazon kindly provides a document with a description of the policies that you can simply copy and paste - this is the document we used. We define the policy
Then apply the policy to the role
You can now use the
Organizing such a role was much easier than rewriting the entire Terraform state for use.
Our team is growing and new employees are learning to make changes to the infrastructure with Terraform. I want this process to be simple and safe. Most failures are related to human errors and changes in configuration , and changes using Terraform can be fraught with both of them - it's horrible, you must admit.
For example, in a small team it is easy to guarantee that only one person will work with Terraform at any given time. In a larger team, this cannot be guaranteed - it remains only to hope for it. If the command
Another area of work where you want to achieve ease and security is to review the changes made to the infrastructure. At this stage, when reviewing, we simply copy and paste the output
We have already adapted to use our continuous integration tool when validating the Terraform configuration. For now, just run the command
Kamal Marhubi from Heap.
Our infrastructure is based on AWS, and we manage it with Terraform. In this publication, we have picked up for you practical tips and tricks that were useful to us in the course of work.
Terraform and code-level infrastructure
Terraform is a tool from Hashicorp that helps you declaratively manage your infrastructure. In this case, you do not have to manually create instances, networks, etc. in the console of your cloud provider; just write a configuration that outlines how you see your future infrastructure. This configuration is created in human-readable text format. If you want to change your infrastructure, then edit the configuration and run
terraform apply
. Terraform will route API calls to your cloud provider to bring the infrastructure into line with the configuration specified in this file.If you transfer infrastructure management to text files, you will be able to arm yourself with all your favorite tools for managing source code and processes, and then reorient them to work with the infrastructure. Now the infrastructure is subject to version control systems, just like the source code, it can be reviewed in the same way or rolled back to an earlier state if something goes wrong.
Here's how, for example, Terraform defines an EC2 instance with an EBS volume:
resource "aws_instance" "example" {
ami = "ami-2757f631"
instance_type = "t2.micro"
ebs_block_device {
device_name = "/dev/xvdb"
volume_type = "gp2"
volume_size = 100
}
}
If you have not tried Terraform yet, then this beginner's guide is suitable and will help you quickly get used to the flow of tasks in this tool.
Terraform Data Model
In a general perspective, the Terraform data model is simple: Terraform manages resources, and resources have attributes. A few examples from the AWS world:
- Instance EC2 is a resource with attributes such as machine type, boot image, availability zone, and security groups
- An EBS volume is a resource with attributes such as volume size, volume type, IOPS
- An elastic load balancer is a resource with attributes for backup instances, their performance characteristics, and some other phenomena.
Terraform provides the mapping of resources described in the configuration file with the corresponding resources of the cloud provider. This mapping is called a state ; it is a giant JSON file. When launched,
terraform apply
Terraform updates the state by sending a corresponding request to the cloud provider. It then compares the returned resources with the information that is recorded in your Terraform configuration. If any difference is found, then a plan is created, in essence, a list of changes that need to be made to the resources of the cloud provider so that the actual configuration matches the one specified in your configuration. Finally, Terraform applies these changes, directing the appropriate calls to the cloud provider.Not Every Terraform Resource Is an AWS Resource
It is not so difficult to understand such a data model with resources and attributes, however, it may not completely coincide with the cloud provider API. In fact, a single Terraform resource can correspond to one or several basic objects of a cloud provider - or even not match any. Here are some examples from AWS:
aws_ebs_volume
Terraform matches one AWS EBS volumeaws_instance
in Terraform with a built-in blockebs_block_device
as in the previous example corresponds to two EC2 resources: instance and volumeaws_volume_attachment
Terraform does not match any object in EC2!
The latter may seem surprising. When creating
aws_volume_attachment
Terraform will make a request AttachVolume
; upon destruction of this volume, it will make a request DetachVolume
. Not a single EC2 object is involved: aws_volume_attachment
Terraform is completely synthetic! Like all resources in Terraform, it has an ID. But, while in most cases the ID is purchased from a cloud provider, the ID aws_volume_attachment
is just a hash from the volume ID, instance ID, and device name . There are other cases where the synthetic resources appear in Terraform - for example, aws_route53_zone_association
, aws_elb_attachment
and aws_security_group_rule
. To find them, you can search in the name of the resource association
or attachment
, which, however, does not always help.All tasks are solved in several ways, so be careful when choosing!
When working with Terraform, exactly the same infrastructure can be represented in several different ways. Here is another version of the description of our example instance with the EBS volume in Terraform, which gives exactly the same EC2 resources:
resource "aws_instance" "example" {
ami = "ami-2757f631"
instance_type = "t2.micro"
}
resource "aws_ebs_volume" "example-volume" {
availability_zone = "${aws_instance.example.availability_zone}"
type = "gp2"
size = 100
}
resource "aws_volume_attachment" "example-volume-attachment" {
device_name = "/dev/xvdb"
instance_id = "${aws_instance.example.id}"
volume_id = "${aws_ebs_volume.example-volume.id}"
}
Now the EBS volume has become a full-fledged Terraform resource, we have delimited it from the EC2 instance. There is a third resource, synthetic, linking the first two. If we imagine our instance and volume, we can add and remove volumes, simply adding and removing resources
aws_ebs_volume
and aws_volume_attachment
. It often doesn’t matter which representation of EBS you choose. But sometimes, if the choice was wrong, changing your infrastructure after that will be quite difficult!
We made a mistake with the choice
This is where we got burned. We work with a large AWS PostgreSQL cluster, and 18 EBS volumes are attached to each instance as storage. All of these instances are represented in Terraform as the only aws_instance resource with EBS volumes defined in blocks
ebs_block_device
. In the instances of our database, information is stored in the ZFS file system. ZFS allows you to dynamically add block devices to grow the file system without any delay. Thus, we are gradually increasing our storage as customers send us more and more data. Since we are an analytical company and collect all kinds of information, such an opportunity is a huge help for us. We constantly optimize queries and insert operations in our cluster.. Thus, we will not get caught up in the rigid CPU-storage ratio that we selected when preparing the cluster, but we will be able to adjust the balance on the fly in order to effectively use the latest innovations.
This process could be even smoother if not for the blocks
ebs_block_device
. Yes, you can hope that Terraform will add the 19th block ebs_block_device
to the instance aws_instance
- and everything will just work. However, Terraform sees an overwhelming change here: he “does not know” how to change an instance of 18 volumes to 19. No, Terraform is going to demolish the whole instance and make a new one in its place! We least of all wanted something similar in our database where terabytes of information are stored!Until recently, we used a workaround and made Terraform synchronize in several stages:
- ran a script that used AWS CLI to create and add volumes
- run
terraform refresh
so Terraform updates state and - finally reconfigured to fit the new realities
Between stages 2 and 3, the team
terraform plan
will show that Terraform was going to destroy and recreate all the instances of our database. Thus, it was not possible to work with these instances in Terraform until someone updated the configuration. Needless to say, how frightening it is to permanently be in this state!Terraform State: Moving on to Surgery
Having discovered the approach with
aws_volume_attachment
, we decided to rebuild our view. Each volume has turned into two new Terraform resources: aws_ebs_volume
and aws_volume_attachment. We had 18 volumes per instance in the cluster, and more than a thousand new resources lined up in front of us. Restructuring a view is not just changing a Terraform configuration. We had to get to the bottom of Terraform and change its vision of resources. Given that we have added more than a thousand resources, we definitely were not going to do it manually. Terraform status is stored in JSON format. Although this format is stable, the documentation states that “ directly editing state files is not recommended". We would have to do it anyway, but we wanted to be sure that we were doing it right. We decided not to engage in reverse development of the JSON format, but wrote a program that uses the internal Terraform elements as a library for reading, modifying, and writing. It was not so simple, because for all of us it was the first program on Go that we had a chance to work with! But we thought it was necessary to make sure: yes, we will not confuse all Terraform states of all instances of our database into one heap.
We posted the tool on GitHub , in case you want to play with it and feel yourself in our shoes.
Terraform accurately
The launch
terraform apply
is one of those few acts that can seriously damage the entire corporate infrastructure. There are some tips, following which, risks can be reduced - and in general it will not be so scary.Always prepare a plan –out
and follow this plan
If run
terraform plan -out planfile
, Terraform will write the plan to planfile
. Then you can get this plan exactly by running terraform apply planfile
. Thus, the changes that will be applied at this moment exactly correspond to what Terraform displays at the time of planning. There is no situation in which the infrastructure could suddenly change due to the fact that one of the colleagues adjusted it between your “plan” and “apply” operations. However, be careful when working with this file: Terraform variables are included there, therefore, if you write something secret there, this information will be recorded in the file system in unencrypted form. For example, if you transfer your credentials as variables to the cloud provider, they will be saved on disk as plain text.
Make a read-only IAM role to enumerate changes
Once launched,
terraform plan
Terraform updates the underlying view of your architecture. To do this, he just needs access to your cloud provider with read-only rights. Having laid such a role for him, you can go over the changes made to the configuration and check them with the help terraform plan
, without even risking that a careless team will apply
cross out all the work done for you in a day - or in a week! With AWS, you can manage IAM roles and associated access rights in Terraform. The role in Terraform looks like this:
resource "aws_iam_role" "terraform-readonly" {
name = "terraform-readonly"
path = "/",
assume_role_policy = "${data.aws_iam_policy_document.assume-terraform-readonly-role-policy.json}"
}
The
assume_role_policy
list of users who have the right to accept this role is simply displayed. Finally, you need a policy that provides read-only access to all AWS resources. Amazon kindly provides a document with a description of the policies that you can simply copy and paste - this is the document we used. We define the policy
aws_iam_policy
that refers to this document:resource "aws_iam_policy" "terraform-readonly" {
name = "terraform-readonly"
path = "/"
description = "Readonly policy for terraform planning"
policy = "${file("policies/terraform-readonly.json")}"
}
Then apply the policy to the role
terraform-readonly
, adding aws_iam_policy_attachment
:resource "aws_iam_policy_attachment" "terraform-readonly-attachment" {
name = "Terraform read-only attachment"
roles = ["${aws_iam_role.terraform-readonly.name}"]
policy_arn = "${aws_iam_policy.terraform-readonly.arn}"
}
You can now use the
AssumeRole
Secure Token Service API method to obtain temporary credentials that only allow you to request AWS but not make changes. When launched terraform plan
, we will update the state of Terraform so that it reflects the current state of the infrastructure. If you are working with a local state, then this information will be written to a file terraform.tfstate
. If you use a remote state, for example, in S3, then your read-only role will also require the right to write - otherwise you cannot get to S3. Organizing such a role was much easier than rewriting the entire Terraform state for use.
aws_volume_attachment
with volumes of our database. We knew that no changes were planned in the AWS infrastructure - only its presentation in Terraform should change. In the end, we were absolutely not going to change the infrastructure - why do we need such an opportunity?Ideas for the future
Our team is growing and new employees are learning to make changes to the infrastructure with Terraform. I want this process to be simple and safe. Most failures are related to human errors and changes in configuration , and changes using Terraform can be fraught with both of them - it's horrible, you must admit.
For example, in a small team it is easy to guarantee that only one person will work with Terraform at any given time. In a larger team, this cannot be guaranteed - it remains only to hope for it. If the command
terraform apply
is simultaneously launched from two nodes, the result may be a terrible non-deterministic hash. Terraform 0.9 introduces the ability to lock state- it ensures that only one command can be applied at any given time terraform apply
. Another area of work where you want to achieve ease and security is to review the changes made to the infrastructure. At this stage, when reviewing, we simply copy and paste the output
terraform plan
as a comment on the review - and when the option is approved, we make all changes manually. We have already adapted to use our continuous integration tool when validating the Terraform configuration. For now, just run the command
terraform validate
, and the tool checks for syntax errors in the code. Our next challenge is to adapt the continuous integration tool.terraform plan
and display the changes made to the infrastructure as a comment on the code review. The continuous integration system should automatically start terraform apply
as soon as the change is approved. So we exclude one of the steps that needed to be performed manually, and also provide a more consistent control track (history of changes), which can be traced in the comments. There is such an opportunity in the Terraform Enterprise version - so we recommend that you take a closer look at it.Only registered users can participate in the survey. Please come in.
Would buy from you a new book on
- 68.4% Ansible 104
- 9.2% Chef 14
- 67.7% Terraform 103
- 9.8% Puppet 15
- 10.5% SaltStack 16