nitro2005 November 5, 2014 at 12:41

ZeroMQ: sockets in a new way

From the sandbox

In any medium or large application, whether it is desktop or web, for business or for personal use, the programmer needs to solve an important architectural problem - how will threads, processes, modules, nodes, clusters, and other parts of the eco-system of his application communicate with each other .

Many developers decide to follow the path of least resistance, entrusting this task, for example, to the DBMS. Let's say one process put the data in the database, read the second one, processed it - put it and so on.
It’s a shame to talk about file sharing in our years, but this happens.
Other programmers try to create some kind of their own, specialized solution and, as a rule, choose sockets.

The task of designing and developing the application architecture is extremely interesting, but this is a separate topic. In this post I would like to share my first impression of getting to know the ZeroMQ library.

ZeroMQ offers the developer a certain high level of abstraction when working with "sockets". The library takes on some of the concerns of buffering data, queuing, establishing and restoring connections, and other things. Instead of doing such stupid things, you can focus on the main thing - the architecture and logic of the application.

However, in this world free cheese is only in a mousetrap. Therefore, I tried, to the best of my ability and experience, to find out what I would have to pay for convenience, which pros and cons I found when using this library.

A direct description of ZeroMQ, its API and a bunch of other useful information can be found on the official ZeroMQ website .

In addition, I highly recommend reading the entire Guide on the official website, even if you do not use the library - it is full of correct messages and is generally useful for studying various types of network architectures.

We will solve a typical problem and compare the solution based on traditional sockets and “ZeroMQ sockets”.

So the challenge

Suppose we have a service that accepts a client connection on a socket, receives requests from it, and sends responses to them.
For simplicity, let it be an echo service, i.e. what he received is what he sent.

Next, you need to decide on the exchange format.
A traditional socket works with a sequence of bytes, which is not good for an application that exchanges some structured information. Therefore, we will have to create some kind of “package” with data, for simplicity, the package will have one attribute - length. That is, we first transmit the length of the packet, then the data itself of the specified length. Upon receipt, we accordingly buffer the received sequence of bytes and parse it into “packets”.
Inside the “package” itself, we can stuff anything: a binary structure, text, JSON, BSON, XML, etc.

For simplicity, our server will receive and transmit data in one stream.
But data processing on the server should occur in several threads (we will call them workers).

Decision

As a solution, I created two sources, one with regular sockets, the other with ZeroMQ.
I will not publish the source code in the post itself, for viewing follow the links:
1) Traditional sockets (19 Kb)
2) ZeroMQ sockets (11.74 Kb)

More about tests

Each file with the source code is a ready-made test, at the start of which both the server and clients start (in the same process, but in different threads).
The test runs for several seconds and gives the results of each client: how many packets and bytes received, as well as the average speed of receiving packets.
When a client stream starts, one or more packets with data are transmitted, and when each packet is received, it is transmitted back.
Test parameters can be changed, they are set in # define in each file.

As you can see, ZeroMQ reduced the amount of code by about 2 times, readability improved.
Now let's see how much we paid for it.

On my machine, with the initial parameters, the test produced approximately the following results:

1) 400 packets per second (traditional sockets);
2) 500 packets per second (ZeroMQ).
* Note: by default, in the test there are 10 client flows and 2 workers, the packet size is 1Kb, the processing time (we simulate usleep) of one packet by the server is 2ms .

I’ll make a reservation right away that if the data processing we had went in one stream, along with the reception and transmission, then ZeroMQ would lose 2-4 times to ordinary sockets. It was also tested on a similar test, but I will not publish it yet, because a single-threaded server that processes only one request at a time, and the rest of the clients are waiting - this is not our case.

Let's see why ZeroMQ performed better than regular sockets, despite some overhead due to the level of abstraction.

The main reason, of course, lies in the source code of the test itself. Processing data in multiple streams on ordinary sockets is a rather complicated task. In my test, it was implemented in a far from optimal way:

1) there is no queue of tasks and accepted packages, we do not accept data corny if we cannot process it;
2) when the worker has finished processing the request, he sleeps for nothing until the main thread writes the next task to his buffer;
3) the main thread in the case of busy workers idly goes through the main loop until the worker is freed (or I / O events occur);
4) when the result of processing the request by the worker writes to the client transfer buffer, the main thread is blocked (or worker waits until the main thread passes the main loop).

Elimination of these shortcomings will significantly increase the amount of code and the complexity of the task, and the likelihood of errors will increase.

Now let's turn to the option with ZeroMQ.

The source code is more readable, and most importantly, it is devoid of any locks (mutex-s, as in the problem with ordinary sockets). This is the main advantage of ZeroMQ.

In traditional asynchronous programming, locks are inevitable, with an increase in the amount of code, you must put an extra lock somewhere, and forget to put the right one somewhere. Then there will be nested locks, which eventually lead to deadlocks and various race conditions. If errors will occur in rare cases, on the application in production you are tormented to look for them. And the effect will be amazing - your service will freeze tightly, unsaved data will be lost, and clients will disconnect.

ZeroMQ solves this problem simply - processes and threads only exchange messages. At the same time, a reservation must be made that it is not recommended to share any common data between threads and use locks. ZeroMQ eliminates the need to split socket data and their buffers between streams, however, the data of the application itself remains a headache for the developer.
Messaging can also occur between threads within a process, and not necessarily through TCP. It is enough to pass something like “ipc: // mysock” to the functions zmq_bind / zmq_connect instead of “tcp: //127.0.0.1: 1010” and your exchange is already working through UNIX sockets, and put “inproc: // mysock” - and the exchange will go through the internal memory of the process. It is much faster and more economical than sockets.
Take the test source as an example.
A thread that performs data processing (worker) is the same client, but only internal. It connects to the main stream through the specified socket (most efficiently inproc: //) and receives a task, after which it sends the result back to the main stream. The latter is already redirecting the result to the external client.
ZeroMQ allows you not to worry about the distribution of tasks and finding a free worker. In this example, it automatically queues the package for processing (sending the worker).

Undoubtedly, ZeroMQ has some pretty cons. Although this library takes on a bunch of worries, it does not guarantee delivery and the safety of your messages. This is left to the developer, which is completely correct, in my opinion.

Let's go through some of the most important aspects of working with ZeroMQ.

Connections

Pros:
+ ZeroMQ automatically restores outgoing connections. In the application, you may not notice a disconnection, unless, of course, you specifically monitor this event (see zmq_socket_monitor ())

Cons:
- I have not figured out how to find out the real IP address, host name, or at least the client descriptor, from which the message came. The maximum that ZeroMQ gives is a certain client identifier (for a socket of type ZMQ_ROUTER), which can be either automatically assigned by ZeroMQ or set by the client independently before establishing a connection.
- Again, I have not yet figured out how to forcibly disconnect the client (for example, I did not log in on time). And this is fraught with the accumulation of unnecessary connections.

Queues

Pros:
+ messages sent to ZeroMQ fall into the internal queue, which allows not to wait for the end of the sending, and in the case of an outgoing connection, it does not matter if it is installed or not. Queue size may vary.
+ There is also a waiting list, which is why the so-called strategy is being implemented "Fair line." In the case of an incoming connection, you receive messages from the general queue for reception for all clients.

Cons:
- as far as I know, you cannot manage queues - clear, count the actual size, etc.
- in case of queue overflow, new messages are discarded

Messages

Pros:
+ In ZeroMQ, you are working not with a stream of bytes, but with individual messages whose length is known.
+ A message in ZeroMQ consists of one or more so-called “Frames”, which is quite convenient - you can add / remove frames with meta-information as the message goes through the nodes without touching the data frame. This approach, in particular, is used in a socket of the ZMQ_ROUTER type - ZeroMQ, when a message is received, automatically adds the identifier of the client from which it was received as the first frame.
+ Each message is atomic, i.e. will always be received or transmitted in its entirety, including all frames.

Minuses:
- Each message should fit in memory, i.e. if you need to send large messages - you have to break it into parts (into messages, not frames) yourself. The maximum message size, however, can be configured.

Lyrical digression

In ZeroMQ, in addition to various types of transport (tcp, ipc, inproc, etc.), there are several types of sockets: REQ, REP, ROUTER, DEALER, PUB, SUB, etc.
I advise you to read them carefully according to the documentation. The behavior of the socket on both ends depends on the type of socket. Some types of sockets use additional mandatory frames.
The Guide mentioned above is pretty good at introducing you to the basic types of sockets.

Conclusion

If you are just starting to design your application, or some of its individual simple parts, modules and subtasks, I highly recommend looking at ZeroMQ.
In a real application with asynchronous data processing, ZeroMQ will provide not only a reduction in the amount of code, but also a slight increase in performance.
The libraries of this library are for many programming languages: C ++, C #, CL, Delphi, Erlang, F #, Felix, Haskell, Java, Objective-C, Ruby, Ada, Basic, Clojure, Go, Haxe, Node.js, ooc, Perl , Scala.
The library is cross-platform, i.e. can be used both on Linux and under Windows. True, unfortunately, so far I have not found the official version for MinGW.
But the project is developing rapidly, it has already been used a lot, we will hope and believe.

Comments in the comments are welcome!

Tags: