Own cloud hosting in 5 minutes. Part 2: Service Discovery

    Cloud hosting

    Hi Habr! In a previous article, I talked about how to build your cloud hosting in 5 minutes using Ansible , Docker and Docker Swarm . In this part, I’ll talk about how services running in the cloud find each other, how load balancing between them occurs and their fault tolerance is ensured.

    This is an introductory article, here we will focus on a review of tools that will solve the problem of “discovering services” in our cloud. In the next part, we will begin to practice, so I decided to give you time to get to know them better.

    Content



    Problem


    Let's look at the most common problem and its common solution - we have a web application and we must ensure load balancing and fault tolerance.

    We can run several copies of our web application that Supervisor will monitor . Supervisor will restart our web application if any errors occur, and will also add such events to the log. The problem of load balancing will be solved by installing Nginx . Nginx configuration will look something like this:

    upstream app {
        server192.168.1.2:8080 max_fails=3 fail_timeout=5s;
        server192.168.1.2:8081 max_fails=3 fail_timeout=5s;
        server192.168.1.2:8082 max_fails=3 fail_timeout=5s;
    }
    server {
        location / {
            proxy_pass http://app;
            health_check;
        }
    }
    

    The specified configuration will work like this - if within 5 seconds the number of unsuccessful attempts when accessing one of the web applications reaches 3, then such an application will be marked as inoperative for 5 seconds ( if it crashes with an error, then Supervisor will restart it ). Thus, the entire load is evenly distributed only between working copies of applications.

    disadvantages


    This is actually a good configuration, and if you have few applications and the load is more or less uniform, then it is better to use it.

    But we are building a cloud where this or that application will be launched - we do not know. Our load may vary for different sites / web applications in different ways, so it would be nice to be able to change the number of running copies of our applications depending on the situation. In other words, we cannot configure Nginx / Apache / etc in advance for such a configuration.

    It would be cool if Nginx and our other services adapted to the dynamic nature of our cloud. We will deal with this particular problem in this article.

    Requirements


    We need a place where our services can register themselves and receive information about each other. Docker Swarm , which we started using in the previous article , can work with etcd , Consul, and Zookeeper out of the box .

    We need our services to be automatically registered and deleted from the above systems ( we will not teach this to every application ). For these purposes, we use Registrator ( we will consider it in more detail below ), which works out of the box with Consul , etcd and SkyDNS 2 (Zookeeper support plans ).

    Our services should be able to find each other using DNS queries. This task can be solved by Consul and SkyDNS 2 ( which works in tandem with etcd ).

    Monitoring the health of services is also necessary for us. It is available to us in Consul ( which we will use ) “out of the box” and it is supported by Registrator ( it should transmit information about how monitoring of a particular service should take place ).

    Last but not least, we need a service to automatically configure our components. If we launched 10 copies of one web application and 20 copies of another, it should understand and immediately respond to this ( changing the configuration of Nginx, for example ). This role will be performed by the Consul Template ( we will consider it in more detail below ).

    Note
    As you can see, there are different solutions to our problem. Before writing this article, I worked on my configuration for a little over a month and did not encounter any problems.

    Consul


    Consul

    Of the above options ( Consul , Zookeeper , etcd ), Consul is the most independent project, which is able to solve our problem of finding services out of the box.

    Despite the fact that Consul , Zookeeper and etcd are located here in the same row, I would not compare them with each other. All 3 projects implement distributed key / value storage, and this is where their common features end.

    Consul will provide us with a DNS server that is not in Zookeeper and etcd ( can be added using SkyDNS 2 ). Moreover,Consul will give us health monitoring ( which neither etcd nor Zookeeper can boast of ), which is also necessary for a full Service Discovery.

    In the load with Consul, we get a Web UI (a demo of which you can see now ) and high-quality official documentation .

    Note
    Даже если вы планируете использовать такую же конфигурацию, которую описываю я и использование Zookeeper и SkyDNS 2 в ваши планы не входит, я бы все равно ознакомился с этими проектами.

    Registrator


    Registrator

    Registrator receives information from Docker about starting / stopping containers ( via socket connection, using the Docker API ) and adds / removes them to / from Consul 'a. Registrator automatically receives

    information about a particular service based on published ports and from the environment variables of the Docker container. In other words, this works with any containers that you have and requires additional configuration only if you need to override the parameters received automatically. And since all our services work exclusively in Docker containers ( including the Registrator itself ), then

    Consul will always have information about all the running services of our cloud.

    This is all cool, of course, but even cooler is that Registrator can tell Consul how to check the health of our services. This is done using the same environment variables.

    Note
    Consul умеет проверять здоровье сервисов, если для сохранения информации о них используется Consul Service Catalog (который мы и задействуем).

    Если же используется Consul Key-value Store (который тоже поддерживается Registrator'ом и использует, например, Docker Swarm для сохранения информации о Docker нодах), такой функции нет.

    Let's look at an example:

    $ docker run -d --name nginx.0 -p 4443:443 -p 8000:80 \
        -e "SERVICE_443_NAME=https" \
        -e "SERVICE_443_CHECK_SCRIPT=curl --silent --fail https://our-https-site.com" \
        -e "SERVICE_443_CHECK_INTERVAL=5s" \
        -e "SERVICE_80_NAME=http" \
        -e "SERVICE_80_CHECK_HTTP=/health/endpoint/path" \
        -e "SERVICE_80_CHECK_INTERVAL=15s" \
        -e "SERVICE_80_CHECK_TIMEOUT=3s" \
        -e "SERVICE_TAGS=www" nginx
    

    After a similar launch, Consul ’s list of our services will look like this:

    {
      "services": [
        {
          "id": "hostname:nginx.0:443",
          "name": "https",
          "tags": [
            "www"
          ],
          "address": "192.168.1.102",
          "port": 4443,
          "checks": [
            {
              "script" : "curl --silent --fail https://our-https-site.com",
              "interval": "5s"
            }
          ]
        },
        {
          "id": "hostname:nginx.0:80",
          "name": "http",
          "tags": [
            "www"
          ],
          "address": "192.168.1.102",
          "port": 8000,
          "checks": [
            {
              "http": "/health/endpoint/path",
              "interval": "15s",
              "timeout": "3s"
            }
          ]
        },
        ...
      ]
    }
    

    As you can see, based on the published ports, Registrator concluded that 2 services ( http and https ) should be registered . Moreover, Consul 'a now has all the necessary information on how to check the health of these services.

    In the first case, the command " curl --silent --fail our-https-site.com " will be executed every 5 seconds and the test result will depend on the exit code of this command.

    In the second case, every 15 seconds, Consul will pull the URL we passed. If the server’s response code is 2xx , then our service is “healthy”, if 429 Too Many Requests , then it’s in an “emergency state”, if everything else is there, then “land for it.”

    You can get more examples and more detailed information from the official documentation .

    Consul template


    Consul template
    We decided where to store information about all the services of our cloud, as well as how it will get there and automatically updated there. But we have not yet figured out how we will receive information from there and how, in the future, we will transmit it to our services. This is what the Consul Template will do .

    To do this, take the configuration file of our application ( which we want to configure ) and make a template out of it, according to the rules of the HashiCorp Configuration Language .

    Let's look at a simple example with the Nginx configuration file :

    upstream app {
        least_conn;
        # list of all healthy services
        {{range service "tag1.cool-app""passing"}}server {{.Address}}:{{.Port}} max_fails=3 fail_timeout=60s weight=1;
        {{else}}server127.0.0.1:65535; # force a 502{{end}}
    }
    ...
    

    After we explain the Consul Template where the given template is located, where to put the result and which command to execute ( he can do it too ) when changing it ( in this case, restart Nginx ), the magic will begin. In this case, the Consul Template will receive the addresses and port numbers of all copies of the " cool-app " application , which are marked with the tag " tag1 " and are in a "healthy" state and will add them to the configuration file. If there are no such applications, then, as you might have guessed, all that is left after {{else}} will remain .

    Each time you add and remove the " cool-app " service with the tag " tag1“the configuration file will be overwritten, and after that Nginx will be reloaded. All this happens automatically and does not require intervention, we just launch the necessary number of copies of our application and do not worry about anything.

    You can find more examples in the official documentation .

    Conclusion


    Today, there are a sufficient number of tools to solve the problem of discovering services, but there are not many tools that could solve this problem out of the box and immediately provide us with everything we need.

    In the next part, I published a set of scripts for Ansible that will configure for us all of the above tools and we can get started in practice.

    That's all. Thank you all for your attention. Stable to you clouds and good luck!

    Follow me on Twitter , I talk about working in a startup, my mistakes and the right decisions, about python and everything related to web development.

    PS I'm looking for developers in the company, the details are in my profile .

    Also popular now: