AndreiYemelianov February 10, 2015 at 11:35

Load balancing: basic algorithms and methods

The issue of load planning should be decided at an early stage in the development of any web project. A server crash (and it always happens unexpectedly, at the most inopportune moment) is fraught with very serious consequences - both moral and material. Initially, problems of insufficient server performance due to increased workloads can be solved by increasing the server capacity, or by optimizing the algorithms used, program codes, and so on. But sooner or later there comes a time when these measures are also insufficient.

We have to resort to clustering: several servers are combined into a cluster; the load between them is distributed using a set of special methods called balancing. In addition to solving the problem of high workloads, clustering also helps ensure that servers are backed up against each other.
The efficiency of clustering directly depends on how the load is distributed (balanced) between the elements of the cluster.

Load balancing can be done using both hardware and software tools. We would like to talk about the main methods and algorithms and balancing in this article.

Balancing levels

The balancing procedure is carried out using a whole range of algorithms and methods corresponding to the following levels of the OSI model:

network;
transport;
applied.

Consider these levels in more detail.

Network Balancing

Balancing at the network level involves the solution of the following problem: it is necessary to make sure that different physical machines are responsible for one specific IP address of the server. Such balancing can be accomplished using a variety of different methods.

DNS balancing. Multiple IP addresses are allocated per domain name. The server to which the client request will be directed is usually determined using the Round Robin algorithm (we will describe the balancing methods and algorithms in detail below).
Building an NLB cluster. When using this method, the servers are combined in a cluster consisting of input and computing nodes. The load distribution is carried out using a special algorithm. Used in solutions from Microsoft.
IP balancing using an optional router.
Territorial balancing is carried out by placing the same services with the same addresses in geographically different regions of the Internet (this is the way Anycast DNS technology, which we already wrote about ). Territorial balancing is also used in many CDNs (see an interesting implementation example ).

Transport leveling

This type of balancing is the simplest: the client contacts the balancer, who redirects the request to one of the servers, which will process it. The choice of the server on which the request will be processed can be carried out in accordance with a variety of algorithms (this will be discussed below): by simple round-robin search, by selecting the least loaded server from the pool, etc.

Sometimes balancing at the transport level is difficult to distinguish from balancing at the network level. Consider the following rule for the pf network filter in BSD systems: for example, formally we are talking about balancing traffic on a specific TCP port (an example for a pf network filter in BSD systems):

web_servers = "{10.0.0.10, 10.0.0.11, 10.0.0.13}"
match in on $ ext_if proto tcp to port 80 rdr-to $ web_servers round-robin sticky-address

It is about balancing traffic on a specific TCP port.

Now consider another example:

pass in on $ int_if from $ lan_net \
   route-to {($ ext_if1 $ ext_gw1), ($ ext_if2 $ ext_gw2)} \
   round-robin

This rule deals with balancing outgoing traffic at the network level. It does not indicate either a specific port or a specific protocol.

The difference between the balancing levels can be explained as follows. The network layer includes solutions that do not terminate user sessions. They simply redirect traffic and do not work in proxy mode.
At the network level, the balancer simply decides which server to send packets to. The session with the client is carried out by the server.

At the transport level, communication with the client is closed on the balancer, which works as a proxy. It interacts with servers on its behalf, passing information about the client in additional data and headers. Thus, for example, the popular HAProxy software balancer works.

Application Level Balancing

When balancing at the application level, the balancer works in the "smart proxy" mode. It analyzes client requests and redirects them to different servers depending on the nature of the requested content. This is how the Nginx web server works, for example, distributing requests between the frontend and backend. For balancing in Nginx, the Upstream module is responsible. You can read more about the features of Nginx balancing based on various algorithms, for example, here .

Another example of an application-level balancing tool is pgpool, an intermediate layer between the client and the PostgreSQL database server. With it, you can distribute queries to database servers depending on their content: for example, read requests will be transmitted to one server, and write requests to another. Read more about pgpool and the specifics of working with it in this article ).

Algorithms and balancing methods

There are many different load balancing algorithms and methods. Choosing a specific algorithm, it is necessary to proceed, firstly, from the specifics of a particular project, and secondly, from goals. which we plan to achieve.

Among the goals for which balancing is used, the following should be highlighted:

fairness : it is necessary to guarantee that system resources are allocated for the processing of each request and to prevent situations when one request is processed and all the others are waiting in line;
efficiency : all servers that process requests should be 100% busy; it is advisable not to allow a situation where one of the servers is idle waiting for processing requests (we will immediately make a reservation that in real practice this goal is not always achieved);
reduction of query execution time : it is necessary to ensure the minimum time between the start of processing the request (or placing it in the queue for processing) and its completion;
reduced response time : you need to minimize the response time to a user request.

It is also highly desirable that the balancing algorithm has the following properties:

predictability : you need to clearly understand in what situations and under what loads the algorithm will be effective for solving the tasks;
uniform loading of system resources ;
scalability : the algorithm should remain operational when the load increases.

Round robin

Round Robin, or the round-robin algorithm, is a round-robin search: the first request is sent to one server, then the next request is sent to another and so on until the last server is reached, and then everything starts all over again.

The most common implementation of this algorithm is, of course, the Round Robin DNS balancing method. As you know, any DNS server stores a pair of "host name - IP address" for each machine in a particular domain. This list may look like this:

example.com xxx.xxx.xxx.2
www.example.com xxx.xxx.xxx.3

You can associate several IP addresses with each name in the list:

example.com xxx.xxx.xxx.2
www.example.com xxx.xxx.xxx.3
www.example.com xxx.xxx.xxx.4
www.example.com xxx.xxx.xxx.5
www.example.com xxx.xxx.xxx.6

The DNS server goes through all the records in the table and gives the following IP address for each new request: for example, xxx.xxx.xxx.2 for the first request, xxx.xxxx.xxx.3 for the second, and so on. As a result, all servers in the cluster receive the same number of requests.

Among the undoubted advantages of this algorithm should be called, firstly, independence from the high-level protocol. To work on the Round Robin algorithm, any protocol is used in which the server is accessed by name.
Balancing based on the Round Robin algorithm does not depend on the load on the server in any way: caching DNS servers will help to cope with any influx of clients.

Using the Round Robin algorithm does not require communication between servers, so it can be used for both local and global balancing.
Finally, solutions based on the Round Robin algorithm are notable for their low cost: to get them working, just add a few records to the DNS.

The Round Robin algorithm also has a number of significant drawbacks. In order for the load distribution according to this algorithm to meet the criteria of fairness and efficiency mentioned above, it is necessary that each server has the same set of resources. When performing all operations, the same amount of resources must also be involved. In real practice, these conditions in most cases turn out to be impossible.

Also, when balancing according to the Round Robin algorithm, the load of a server in the cluster is not taken into account at all. Imagine the following hypothetical situation: one of the nodes is 100% loaded, while the others are only 10-15% loaded. The Round Robin algorithm does not take into account the possibility of such a situation in principle, therefore, an overloaded node will still receive requests. In this case, there can be no question of any justice, efficiency and predictability.

Due to the circumstances described above, the scope of the Round Robin algorithm is very limited.

Weighted Round Robin

This is an improved version of the Round Robin algorithm. The essence of the improvements is as follows: each server is assigned a weight coefficient in accordance with its performance and power. This helps to distribute the load more flexibly: servers with more weight process more requests. However, this does not solve all the problems with fault tolerance. More efficient balancing is provided by other methods in which a larger number of parameters are taken into account when planning and distributing the load.

Least connections

In the previous section, we listed the main disadvantages of the Round Robin algorithm. Let's call one more: it does not take into account the number of currently active connections.

Consider a practical example. There are two servers - we will designate them arbitrarily as A and B. To server A fewer users are connected than to server B. At the same time, server A is more overloaded. How is this possible? The answer is quite simple: connections to server A are supported for a longer time compared to connections to server B.

The described problem can be solved using an algorithm known as least connections (in short - leastconn). It takes into account the number of connections currently supported by the servers. Each next question is passed to the server with the least number of active connections.

There is an improved version of this algorithm, intended primarily for use in clusters consisting of servers with different technical characteristics and different performance. It is called Weighted Least Connections and takes into account when distributing the load not only the number of active connections, but also the weight coefficient of the servers.

Other improvements to the Least Connections algorithm include Locality-Based Least Connection Scheduling and Locality-Based Least Connection Scheduling with Replication Scheduling.

The first method was created specifically for caching proxies. Its essence is as follows: the largest number of requests is sent to servers with the least number of active connections. Each client server is assigned a group of client IPs. Requests from these IPs are sent to the "native" server if it is not fully loaded. Otherwise, the request will be redirected to another server (it should be less than half loaded).

In the Locality-Based Least Connection Scheduling with Replication Scheduling algorithm, each IP address or group of IP addresses is assigned not to a separate server, but to a whole group of servers. The request is passed to the least loaded server from the group. If all the servers from the “native” group are overloaded, a new server will be reserved. This new server will be added to the IP serving group from which the request was sent. In turn, the busiest server from this group will be deleted - this avoids excessive replication.

Destination Hash Scheduling and Source Hash Scheduling

The Destination Hash Scheduling algorithm was created to work with a cluster of caching proxies, but it is often used in other cases. In this algorithm, the server processing the request is selected from the static table by the IP address of the recipient.

The Source Hash Scheduling algorithm is based on the same principles as the previous one, only the server that will process the request is selected from the table by the IP address of the sender.

Sticky sessions

Sticky Sessions - an algorithm for distributing incoming requests, in which connections are transmitted to the same server in the group. It is used, for example, in the Nginx web server. User sessions can be assigned to a specific server using the IP hash method (for details, see the official documentation ). Using this method, requests are distributed to servers based on the client's IP address. As stated in the documentation (see link above), "the method ensures that requests from the same client will be transmitted to the same server." If the server assigned to a specific address is unavailable, the request will be redirected to another server. An example of a fragment of a configuration file:

upstream backend {
ip_hash;
server backend1.example.com;
server backend2.example.com;
server backend3.example.com;
server backend4.example.com;
}

Starting with version 1.2.2 in Nginx for each server you can specify the weight.

The use of this method is fraught with some problems. Session binding problems can occur if the client uses a dynamic IP. In a situation where a large number of requests go through one proxy server, balancing can hardly be called effective and fair. The problems described, however, can be solved using cookies. The commercial version of Nginx has a special sticky module, which just uses cookies to balance. He also has free analogues - for example, nginx-sticky-module .
You can use the sticky-sessions method in HAProxy as well - for more details, see here.

Conclusion

This article is essentially an introduction to load balancing. We will continue discussing this topic in future publications. If you have questions, comments and additions - welcome to comments. We would also be grateful if you share some non-trivial practical examples of organizing load balancing for various projects.

Readers who for one reason or another cannot post comments here are welcome to join our blog .

Tags: