New L4 Load Balancer with native implementation of SRV record service discovery and Docker API service Discovery
How it all began
In the course of working with microservices, we repeatedly encountered problems with the service of discovery during autoscaling, collapse of extra nodes.
Almost all the solutions that existed or exist at the moment were tried, but as usual - nothing fell perfectly on our dynamic environment (dozens of stops / starts of the same type of containers per hour). The closest solution was NGINX + Consul + Consul templates, but it was ugly, required a restart, and it made it impossible to use external helchecks other than through Consul.
In general, as always happens - it was decided to write your decision. During the discussion, dozens of things surfaced that would be good to implement, the most critical for us and interesting for the public were selected from.
The incorporated functionality on the first iteration
Types of balancing:
Iphash
Leastconn
Roundrobin
Weight
Discovery (definition of the backend pool for each frontend):
Static is just a list of servers in the config.
Docker - a request to the Docker / Swarm docker API filtered by label and internal ports of the container.
Exec - initiating the launch of an external script, reading from stdout and parsing it on a regular basis (while it is registered hard).
JSON - requests through an http request a URL and parses it by patterns (supports multi-level JSON).
Plaintext - requests a URL through an http request and parses it according to the regular expression specified in the config.
SRV - asks for a given DNS SRV record by service name.
Helscheki
Since it was initially clear that we simply won’t pull out to implement helchecks in the same haproxy volume, it was decided to get off a little blood and make only 2 types of helchecks.
Ping - simple TCP ping;
Exec - launching an arbitrary binary with passing parameters to it and reading output from stdout.
Examples of using
SRV Discovery
We have arbitrary services that register themselves, for example, in Consul. We will use Consul dns to determine the server pool.
In this example, we defined the type of balancing as "srv", the DNS server and its port are also defined, which will be used for requests for the discovery service. The frequency of updating the list of servers was determined, as well as an important variable - the policy for the case when the DNS server did not respond. For maximum environment consistency, set failpolicy = "setempty". In this case, if there is no response from DNS, the entire pool of server backends will be reset, and incoming connections will be dropped. Otherwise, you need to use failpolicy = "keeplast", then the balancer will use the latest data that came before the DNS connection failed.
toml
[servers.sample2]
bind = "localhost:3001"
protocol = "tcp"
balance = "weight"
[servers.sample2.discovery]
failpolicy = "keeplast"
kind = "srv"
srv_lookup_server = "66.66.66.66:8600" # dns server and port
srv_lookup_pattern = "api.service.ireland.consul." # SRV service pattern
[servers.sample2.healthcheck]
fails = 1
passes = 1
interval = "2s"
kind = "ping"
timeout = "500ms"
Docker / Swarm Balancing.
In fact, there are no differences in API and configuration method for Docker / Docker Swarm. We can work equally with the Docker host and the Docker Swarm cluster. Consider working with them in one example.
Now we will balance certain services using Docker Swarm, as a more general example. Everything described below works for a separate Docker host. In this example, we define the type of discovery as "docker", define the base docker url, labels and the internal port of the container docker (from the network side of the container itself) by which the balancer will make selections from which the server backend pool is formed.
For this example, we will use a more "advanced" type of helchecks, namely exec helchecks. In addition to the parameters for starting checks, there is also script execution time. The time between launches should be longer than the script execution time so that there are no “raids”. The command to launch this helcheck is formed as / path / to / script [ip] [port]. After working out the script, it should output to stdout a string that is compared with the positive and negative expected results.
[servers.sample3]
bind = "localhost:3002"
protocol = "tcp"
balance = "weight"
[servers.sample3.discovery]
interval = "10s"
timeout = "2s"
kind = "docker"
docker_endpoint = "http://localhost:2377" # Docker / Swarm API
docker_container_label = "api=true" # label to filter containers
docker_container_private_port = 80 # gobetween will take public container port for this private port
[servers.sample3.healthcheck]
kind = "exec"
interval = "2s"
exec_command = "/etc/gobetween/checks/exec_healthcheck.sh"
exec_expected_positive_output = "1"
exec_expected_negative_output = "0"
exec_timeout_duration = "1s"
In future articles, I plan to give several examples of the more complex use of other types of discovery. The specifics of configuration and installation under Windows will also be described.