eugenechepurniy May 23, 2016 at 17:08

[Terraform + SaltStack] Preparing a PrestoDB cluster in a pressure cooker (Part # 1)

Tutorial

What is interesting here?

The recipe for a tasty and healthy PrestoDB cluster using a Terraform and SaltStack-based pressure cooker in an AWS public cloud. Consider in detail the nuances of preparing for the work of the pressure cooker itself, the necessary steps for the proper preparation of the dish itself and, of course, we will talk a little about the consumption of the finished dish. This part can be used as a training material on Terraform.

so let's get started:

Ingredients for the recipe

Terraform - 1 pc.
SaltStack - 1 master and 1+ minien
PrestoDB - 1 coordinator and 1+ worker
AWS account - 1 pc.
Savvy and file to taste

Let's consider the ingredients in more detail: (without the rules for their preparation)
1. Terraform - A wonderful tool from the guys from Hashicorp (they also made such very useful things like Vagrant, Consul , Packer, Vault, etc.) used to create and modify infrastructures in various cloudy and not only surroundings.
2. SaltStack - A tool for the automated configuration and configuration of servers. Your humble servant already wrote about it here and here .
3. PrestoDB- Add-on for Big Data providers for the ability to make queries to them in native and understandable SQL. Developed by the guys from Facebook, who transferred it to OSS status, for which many thanks to them.
4. AWS (or any other public / private cloud, for example: GCE or OpenStack ) from the list of supported Terraform in which our PrestoDB cluster will subsequently work. We will use AWS as it is most common (among public cloud platforms) and is understandable to many without a lot of additional explanation.
5. The article will describe only the basic principles of a bunch of these products, and some tricks to facilitate the process, but I will not dwell on the nuances of the work of a component - for each of them, in principle, you can write a book. Therefore, to adapt these techniques using the head is very welcome. And yet - do not write in the comments that something is not optimally configured (in particular PrestoDB) - this is not the goal that I am pursuing.

Cooking a pressure cooker!

In any culinary recipe, there is a silence that pans and pans are already ready for cooking, but in our case, the correct preparation of a pressure cooker (Terraform + SaltStack) is almost 80% the key to successful cooking.
So, let's start with Terraform. Well, there is CloudFormation for AWS or SaltCloud from the creators of SaltStack, so why is Terraform chosen? The main feature of Terraform in its simplicity and understandable DSL is that to create an instance (or 10) you need and enough of this description (we mean Terraform is downloaded and is within $ PATH):

provider "aws" {
        access_key = "XXXXXXXXXXXXXXXXXXXXX" # AWS IAM key 
        secret_key = "******************************************" # AWS IAM secret
        region = "us-east-1" # region used to create resources
}
resource "aws_instance" "example_inst" {
        ami = "ami-6d1c2007" # CentOS 7 AMI located in US-East-1
        instance_type = "t2.medium" 
        count = "1" # or "10" can be used for parallel creation
        vpc_security_group_ids = [ "default-sg" ] # some security group with at least 22 port opened 
        key_name = "secure_key" # pre created AWS E2 key pair
        subnet_id = "sub-abcdef123" # AWS VPC subnet 
}

and a simple sequence of commands: the narrative is understandable and, it seems to me, requires no explanation for those familiar with AWS. Read more about available AWS resources here . Of course, we mean that the AWS account whose keys are specified in the Terraform configuration has privileges to create the necessary resources.

terraform plan

terraform apply

Actually, the most interesting thing lies in the calls of Terraform itself - the terraform plan - makes an “estimate” of what needs to be done from the last state (in our example, we need to create a new instance) and shows which resources will be created, deleted or modified, apply - it will actually start the process creating planned resources. If Terraform has already started and you changed the configuration (say, added instances) at the planning stage, it will show which missing resources will be created and apply can create the missing ones. helps to completely remove all resources created using Terraform (files located in the current .tfstate directory that store a description of the state of the created infrastructure are taken into account).

terraform destroy

An important point that you should not forget about - terraform in most cases will not modify existing resources - it will simply delete the old ones and recreate it again. This means, for example, that if you created an instance of the t2.medium type and then changed the configuration by specifying a new type for the instance, say m4.xlarge, then when you apply apply Terraform will first destroy the previously created one and then create a new one.This may seem strange to AWS users (you could stop the instance, change its type and restart without losing data on the disk), but this was done to provide the same predicted behavior on all platforms. And one more thing: Terraform is not able (and by its nature should not be able) to control resources during their life cycle - this means that Terraform does not provide commands like stop or reboot for instances created using it - you must use other means to managing established infrastructure.
Terraform provides an excellent set of functionality available in its DSL - these are variables (https://www.terraform.io/docs/configuration/variables.html), interpolators(necessary for iteration, modification of variables), modules , etc. Here is one example of using all of this:

# Cluster shortname
variable cluster_name { default = "example-presto" }
# Count of nodes in cluster
variable cluster_size { default = 3 }
# Default owner for all nodes
variable cluster_owner { default = "user@example.com" }
# Default AWS AMI to use for cluster provisioning
variable cluster_node_ami { default = "ami-6d1c2007" }
# Default AWS type to use for cluster provisioning
variable cluster_node_type { default = "t2.large" }
# Defualt VPC subnet
variable cluster_vpc_subnet { default = "subnet-da628fad" }
# Default Security group to apply to instances
variable cluster_sg { default = "sg-xxxxxxx" }
# Default KeyPair to use for provisioning
variable cluster_keyname { default = "secure_key" }
# Cluster worker nodes
resource "aws_instance" "worker_nodes" {
        ami = "${var.cluster_node_ami}"
        instance_type = "${var.cluster_node_type}"
        count = "${var.cluster_size - 1}" # one node will be used for coordinator
        vpc_security_group_ids = [ "${var.cluster_sg}" ]
        key_name = "${var.cluster_keyname}"
        subnet_id = "${var.cluster_vpc_subnet}"
        disable_api_termination = true
        tags {
                Name = "${var.cluster_name}-cluster-worker-${format("%02d", count.index+1)}"
                Owner = "${var.cluster_owner}"
                Purpose = "PrestoDB cluster '${var.cluster_name}' node ${format("%02d", count.index+1)}"
        }
}

here is an example of using variables, arithmetic operations on them, interpolation using format, using the index of the current element (if several instances of the same type are created), as well as tagging resources.
But just creating / destroying instances is not enough - you need to somehow initialize them (copy files, install and configure specific software, update the system, configure the cluster, etc.) for this Terraform introduces the concept of Provisioners . The main ones are file , remote-exec , chef, and null-resource . Typical operations are copying files and running scripts on a remote instance.
Here is the previous example with provisioning operations enabled:

# Localy stored SSH private key filename
variable cluster_keyfile { default = "~/.ssh/secure_key.pem" }
# Cluster worker nodes
resource "aws_instance" "worker_nodes" {
        ami = "${var.cluster_node_ami}"
        instance_type = "${var.cluster_node_type}"
        count = "${var.cluster_size - 1}" # one node will be used for coordinator
        vpc_security_group_ids = [ "${var.cluster_sg}" ]
        key_name = "${var.cluster_keyname}"
        subnet_id = "${var.cluster_vpc_subnet}"
        disable_api_termination = true
        tags {
                Name = "${var.cluster_name}-cluster-worker-${format("%02d", count.index+1)}"
                Owner = "${var.cluster_owner}"
                Purpose = "PrestoDB cluster '${var.cluster_name}' node ${format("%02d", count.index+1)}"
        }
        # Copy bootstrap script
        provisioner "file" {
                source = "bootstrap-script.sh"
                destination = "/tmp/bootstrap-script.sh"
                connection {
                        type = "ssh"
                        user = "centos"
                        private_key = "${file("${var.cluster_keyfile}")}"
                }
        }
        # Running provisioning commands
        provisioner "remote-exec" {
                inline = [
                        "yum -y update",
                        "sudo sh /tmp/bootstrap-script.sh"
                ]
                connection {
                        type = "ssh"
                        user = "centos"
                        private_key = "${file("${var.cluster_keyfile}")}"
                }
        }
}

The main point is to provide information about connecting to a remote host - for AWS, this is most often key access - therefore, you must specify where this key lies (for convenience, a variable was entered). Please note that the private_key attribute in the connection section cannot accept the path to the file (only the key is text) - instead, the $ file {} interpolator is used which opens the file on disk and returns its contents.
We got to the creation of a simple cluster consisting of several instances (we will not go into details of the contents of the bootstrap-script.sh file - we will assume that the installation of the necessary software is registered there). Let's look at how to make a cluster with a dedicated master in our pressure cooker. In general, we will assume that the worker nodes of the cluster must know where the master node is located in order to register in it and receive tasks in the future (let's leave all sorts of goodies like Raft and Gossip protocols for setting the master and disseminating information in the cluster for other articles) - for simplicity - suppose worker must know the ip address of the master. How to implement this in Terraform? First you need to create a separate instance for the master:

resource "aws_instance" "master_node" {
        ami = "${var.cluster_node_ami}"
        instance_type = "${var.cluster_node_type}"
        count = "1"
        <...skipped...>
        provisioners {
        <...skipped...>
        }
}

then, add the dependency to the worker nodes:

# Clurter worker nodes
resource "aws_instance" "worker_nodes" {
        depends_on = ["aws_instance.master_node"] # dependency from master node introduced
        ami = "${var.cluster_node_ami}"
        instance_type = "${var.cluster_node_type}"
        count = "${var.cluster_size - 1}" # one node will be used for coordinator
        <...skipped...>
}

the depends_on resource modifier can be used to set the execution order of tasks for creating infrastructures - Terraform will not create worker nodes until the master node is completely created. As you can see from the example, as a dependency (tei), you can specify a list constructed from the type of resource with an indication of its name through a dot. In AWS, you can create not only instances, but also VPCs, networks, etc. - they will need to be specified as dependencies for resources using VPC - this will guarantee the correct order of creation.
But, let's continue with passing the address of the master node to all worker nodes. For this, Terraform provides a mechanism for referencing previously created resources - i.e. you can simply extract information about the ip address of the master node in the description of the worker:

# Clurter worker nodes
resource "aws_instance" "worker_nodes" {
        depends_on = ["aws_instance.master_node"] # dependency from master node introduced
        ami = "${var.cluster_node_ami}"
        instance_type = "${var.cluster_node_type}"
        count = "${var.cluster_size - 1}" # one node will be used for coordinator
        <...skipped...>
       # Running provisioning commands
        provisioner "remote-exec" {
                inline = [
                        "yum -y update",
                        "sudo sh /tmp/bootstrap-script.sh ${aws_instance.master_node.private_ip}" # master-ip passed to script
                ]
                connection {
                        type = "ssh"
                        user = "centos"
                        private_key = "${file("${var.cluster_keyfile}")}"
                }
        }
}

those. Using variables of the form $ {aws_instance.master_node.private_ip}, you can access almost any resource information. In this example, we assume that bootstrap-script.sh can take the address of the master node as a parameter and use it later for internal configuration.
Sometimes such connections are not enough, for example, you need to call some scripts on the side of the master node after connecting the worker node (accept keys, run init tasks on worker nodes, etc.) there is a mechanism for this in Terraform called null -resource is a fake resource which, using the dependency mechanism (see above), can be created after all master and worker nodes are created. Here is an example of such a resource:

resource "null_resource" "cluster_provision" {
        depends_on = [
                "aws_instance.master_node",
                "aws_instance.worker_nodes"
        ]
        # Changes to any instance of the workers' cluster nodes or master node requires re-provisioning
        triggers {
                cluster_instance_ids = "${aws_instance.master_node.id},${join(",", aws_instance.worker_nodes.*.id)}"
        }
        # Bootstrap script can run only on the master node
        connection {
                host = "${aws_instance.master_node.private_ip}"
                type = "ssh"
                user = "centos"
                private_key = "${file("${var.cluster_keyfile}")}"
        }
        provisioner "remote-exec" { 
                inline = [
                        <... some after-provision scripts calls on master node...>
                ]
        }
}

a small explanation:
1. depends_on - we indicate a list of those resources that should be ready in advance.
2. triggers - we form a drain (id of all instances, separated by commas, in our case), a change in which will cause the execution of all of the provision agents specified in this resource.
3. indicate on which instance it is necessary to execute the provisioning scripts specified in this resource in the connection section.

If you need to take several steps on different servers - create several null-resource with the necessary dependencies.

In general, the described will be enough to create quite complex infrastructures using Terraform.
Here are some more important tips for those who like to learn from other people's mistakes:
1. Do not forget to carefully store .tfstate files in which Terraform stores the last state of the created infrastructure (in addition, it is a json file that can be used as an exhaustive source of information about the created resources)
2. Do not change the resources created using Terraform manually (using the management console by the services themselves and other external frameworks) - the next time you start plan & apply, you will receive a recreation of a resource that does not match the current description, which will be very unexpected and often deplorable.
3. Try to test your configurations first on small instances / a small number of them - it’s very difficult to catch a lot of errors when creating configurations, and the validator built into Terraform will show only syntax errors (and that’s not all).

In the second part, we consider the continuation of the preparation of the pressure cooker for work - we will describe how to put SaltStack master + minions on top of the infrastructure to put PrestoDB.

Tags:

[Terraform + SaltStack] Preparing a PrestoDB cluster in a pressure cooker (Part # 1)

What is interesting here?

Ingredients for the recipe

Cooking a pressure cooker!

Also popular now: