KonstantinKuklin October 8, 2014 at 11:41

Don't be afraid to use HandlerSocket

From the sandbox

(an example of the HandlerSocket protocol in the picture)

Introduction

In the previous project, there was a need to unload the database, then life pushed me against HandlerSocket.

HandlerSocket is a protocol implemented in the plugin for the MySQL RDBMS of the same name that allows you to use the NoSQL technique to access data stored in InnoDB tables. The main reason why NoSQL solutions are used is a very fast primary key search.

More about HandlerSocket

HandlerSocket acts as a daemon inside the mysql process, accepting TCP connections and executing client requests. It does not support SQL queries; instead, it provides a simple query language for CRUD table operations. That is why it is much faster than mysqld / libmysql in some cases:

HandlerSocket operates on data without parsing the SQL query, which reduces the CPU load.
It supports batch query execution. You can send several requests at once and get the result at a time, which again reduces the load on the processor and the network.
HandlerSocket protocol is more compact than mysql / libmysql, which reduces network load.

More details can be read here:

Under the cut you will find:

New library for working with HS, written in PHP;
Comparison of the performance of existing solutions + new;
Symfony2 bundle for working with HS;
Munin plugins for monitoring HS activity;
Different thoughts out loud and stories about "bumps".

Studying the issue of implementing HS in a project, I was very surprised that there wasn’t a normal PHP library for it that would just be convenient to use and cover the protocol functionality. The solutions available at that time were similar to those being developed, but not at all ready for use.

We were not satisfied with the ready-made solutions and they did not give me hours from the project, so I took and threw something of my own, so the HS implementation option was discarded. The project has long been behind, but the idea that the HS cool thing has taken possession of me. And now more than a year has passed, when I again tried to find solutions and found that nothing had moved from that dead center.

The tough guys from Badoo once said that they use HS under their hood, but I could not find in the public domain what they use, I suspect that something is self-written.

Analysis of ready-made solutions

There were three ready-made solutions: 1 was written in PHP ( HSPHP ) and 2 extensions to the interpreter written in C.

All 3 had a couple of common problems:

lack of suffix support;
lack of authorization support;
weak documentation;
test coverage is not very high.

With this set of C problems, extensions were immediately rejected, because there was absolutely no desire to understand them and finish on C, HSPHP remained.

While I was dealing with HSPHP, I managed to make a couple pull requests. Everything is written inside through callbacks, which I don’t really like. All of this together made me abandon this library. By the way, we recently added authorization support in HSPHP, but at the same time everything completely broke.

Why your bike

I needed an easy-to-use tool that out of the box would not require any modifications, was convenient to use, and was clearly kept up to date. Since I did not find anything like this, and the interest in the protocol has not cooled down yet, the time and desire were found by themselves and the body sat down to write code in a semi-automatic mode.

Meet - HandlerSocketLibrary

Briefly about the results:

currently the library covers 100% of the protocol;
distributed under the MIT license;
several ways to send a request;
select queries can return an associative array (for example, array ('id' => '1', 'message' => 'text')), in the rest of the libraries you had to map everything from the columns on your own or remember the order of the columns when opening the index;
HS code was studied, on the basis of which it was possible to create an Exception for each HS protocol error. Inside the class, it is described in a human-readable language, which can lead to the appearance of such an error;
a script was written for the library benchmark ( PHPBenchHs );
The documentation is written in Russian and English ;
covered in tests ;
code quality .

Architecture

(Clickable image)

This is a graph of methods called under the hood, a library drawn using the XHProf profiler . It clearly shows that the 2 slowest sections of code are working with the socket and parsing the response. I use a wrapper for working with STREAM \ sockets SteamLibrary .

The idea is simple - create a connection to a socket, send a command, get a response. The architecture has the ability to flexibly play with various options for reading data through a socket (fgets, fgetc, fread ...). I have not yet tested the speed of work through various wrappers on sockets, but this is all already in the architecture.

Request Examples

I will not consider all possible requests here because they are described in the documentation (link to the documentation), I will try to give general examples and talk about the features.

There are 2 sockets - reading and writing. The difference in them all comes down to the supported teams. The reading socket is capable of processing 3 commands:

authorization team;
index opening command;
reading information from an open index.

To connect to a reading socket:

$reader = new \HS\Reader('localhost', 9998, 'passwordRead');

The constructor accepts - host, port and password for authorization (optional). The authorization command is sent automatically.

Open the index:

$indexId = $reader->getIndexId(
    'database',
    'tableName',
    'PRIMARY',
    array('key', 'text'),
    array('num')
);

It is necessary to specify the database, table and key (if the key is empty, then by default work will continue with the PRIMARY key), then a list of columns (key, text) to be opened, and an additional column for filtering (num). Filtering is optional, but only these columns will be available in the future for filtering, in addition, the values of these columns will not be returned when the server responds.

On subsequent calls, the index will not open; the already open one will be used.

To get data through an open index, execute the code:

$selectQuery = $reader->selectByIndex($indexId, Comparison::EQUAL, array(42));

We take values that are 42, because the sequence is (key, text) in the list of columns of this index, then 42 is mapped to 1 column, namely key, the comparison type is equality (Comparison :: EQUAL).

By default, HS returns just a list of values. You can get the output in the form of an associative array (the values are automatically mapped into the array by the name of the column) or a vector.

To do this, you just need to add the following line after the request and before sending it:

$selectQuery->setReturnType(SelectQuery::ASSOC); // пример для получения ассоциативного массива

To get the data, you need to send a request and parse the results:

$resultList = $reader->getResultList();

The variable $ resultList contains an array of the results of all the commands that were sent for this iteration, but raking all the answers is not fast and inconvenient, so all the answers automatically map to requests.

$selectResult = $selectQuery->getResult(); // вернет не NULL только после выполнения $reader->getResultList();
$arrayResultList = $selectResult->getData(); // является массивом с результатами.

Another way is to open the index and send the command at once:

$selectQuery = $reader->select(
    array('key', 'text'), // колонки
    'database',
    'tableName',
    'PRIMARY',
    Comparison::MORE, // тип сравнения, в нашем случае это >
    array('1'), // sql like: key > 1
    0, // offset
    99, // limit
    array('num'), // колонка фильтрации
    array(new Filter(Comparison::EQUAL, 0, '3')) //  sql like: num = 3
);
// SELECT key,text from database.tableName WHERE key > 1 and num = 3 LIMIT 0,99;
$this->getReader()->getResultList();

First of all, it will be checked if there is an index that contains the entire list of columns. If yes, it will be used, and if not, a new index will open with the required number of columns.

The third way is through “QueryBuilder”, I think it’s not worth commenting already:

$selectQueryBuilder = \HS\QueryBuilder::select(array('key', 'date', 'varchar', 'text', 'set', 'union'))
    ->fromDataBase($this->getDatabase())
    ->fromTable($this->getTableName())
    ->where(Comparison::MORE, array('key' => 2))
    ->andWhere('float', Comparison::EQUAL, 3);
$selectQuery = $reader->addQueryBuilder($selectQueryBuilder);
$reader->getResultList();
$selectQuery->getResult()->getData();

These are all commands that can work through a reading socket. Everyone else will drop out with ReadOnlyError.

The writing socket processes these 3 commands and 5 more:

insert
delete
update
increment
decrement

Examples of how to work with these commands are in the HandlerSocketLibrary docks . The first 3 I generally consider intuitive and it will be enough just to look at the examples, so I skip them.

But increment and decrement are commands that increase or decrease values in the database. You need to know that in the case when the value during the operation should change from negative to positive or vice versa, such an operation will not pass and the value will not change. That is, the value -1 will never become 0 with the increment command. This is a feature of the HS plugin itself.

If you set the flag $ suffix = true in the modifying commands, then the answer will be SelectResult with values that fit your criteria for modification BEFORE the modification process. That is, if you update 2 columns and specify suffix = true, then the values of these columns will be returned to you before the update, and not after.

If the limit and offset are not specified, then by default they are 1 and 0, respectively.

It is important not to forget such a feature of incremental / decremental queries: the limit value is multiplied with the value by which you need to increase or decrease the record.

Performance Comparison with Analogs

On the graphs, the comparison was based on the speed of processing requests and on the RAM used for it. A total of 1,000 requests were completed.

Measurements were made on a synthetic database running Mac Os X, i5 + 4Gb Ram. Absolute values were not critical for me, I needed a percentage ratio, but you can do the same on your hardware. For this you can use the scripts of my tests posted on github .

At Insert I cheated and just inserted all 1000 elements with the 1st request.

More is better.

Less is better.

HSPHP should be given, I thought that it will lose more in speed and memory consumption to SI extensions, but it turned out to be a little. But such a huge performance gap just caught me off guard, I expected a maximum of 30-50% lag.

It was necessary to correct, it was necessary to look for what was the trouble. First of all, I pulled XHProf from under the table and wrote down the result for 1000 Select queries: The first problem is working through an array. During the design, an array was chosen for the internal exchange of information. Of course, I realized that I would lose in performance, but so much so ... The heavily stripped down ParameterBag class was used

from HttpFoundation components to work with array. The point is that if the array contains values with the key N, then we return them, if not, defaultValue, which you can specify yourself for any value.

The graph above shows that working through the ParameterBag takes with it an inadmissible amount of processor time.

The goal is to check if there is something there or not, there are 2 of the most obvious ways: isset and array_key_exists .
Isset does not correctly handle null, so if you use it, you need to remove all possible moments from the architecture when null or an empty string comes into the array, since this will not find the element through isset. In array_key_exists, everything is fine with this, that's why I chose it initially. But thanks to the profiler, the discovery was unexpected and google the speed problem I found that array_key_exists loses at about 10! times isset. The most obvious solution was a temporary crutch - to change the condition for checking the value in the array - add isset before array_key_exists. This check turned out:

if(isset($array[$key]) || array_key_exists($array, $key)) {
// ~(0.0)~
}

By simply adding a check through isset before array_key_exists, the performance of the entire library grew by 4% ! It was fun, but 4% is not a cool result as a whole, so I had to completely get rid of the ParameterBag, rewriting everything to specific values in the queries.

The result was expected: The thrown array reduced the execution time of Select queries by 44%, and the memory consumption by 45%. I can’t say that the speed of work is now excellent, I’m still not happy with it. Then we have to get rid of unnecessary abstractions and internal optimizations for each specific request. This work has yet to be done, but the gap has decreased to 2.5-3 times, instead of 5.

Error list

Another nice thing that appeared due to the large number of questions on the network about decoding HS errors. HS sources were studied and all possible kinds of errors were written out, since it was written in a clear language and there are not many sources. The following is a list of what errors can be returned when running HS using HandlerSocketLibrary and what they mean. I would not post this list in the article if it were not for the lack of information on this topic on the network.

error list

AuthenticationError - the server requires authorization and you have not passed yet;
ComparisonOperatorError - an invalid comparison operator was passed (only 4> =, <=, <,> is available);
InListSizeError - the search mode is selected in the list of values and the number of these values is indicated incorrectly;
KeyIndexError - the specified column number exceeds the total number of columns in the open index;
ModifyOperatorError - invalid modifying operator passed. (+, -, D, U, +?, -?, D?, U?);
UnknownError - super prize - aaaaa! If you win a car - create an issue about this;
ColumnParseError - invalid column values were passed;
FilterColumnError - you are trying to open a column with a number greater than the number of open columns for filtering;
IndexOverFlowError - you are trying to use an uninitialized index number;
KeyLengthError - the length of the list of keys is either greater than the list or <1;
OpenTableError - trying to open a nonexistent table, or nonexistent index;
CommandError - you sent an invalid command;
FilterTypeError - passed the wrong type of filtering. (F, W);
InternalMysqlError - MySQL could not execute the command. (For example, insert an existing value);
LockTableError - you are trying to open a locked table;
ReadOnlyError - a modifying command was sent to the reading socket.

Here in this php file you can see the logic of parsing errors.

Plans

First of all, you need to add all the possible use cases and cover them with tests. As a result of the struggle for the speed of query execution, rudiments appeared in the code that need to be refactored. Not all requests have a complete list of features, that is, some requests may miss the ability to use filters or keylists (yes, the code is slightly damp). In existing tests, there’s an easy mess, because at the beginning I wrote neatly, at the end it’s easy to cover the case. Database modification tests do not check the state of the entire table after a modifying query.

You need to add the ability to return values in the form of an object on Select queries. HS supports CRUD - so you can add a wrapper to ORM, for example, to the same Doctrine.

Why is there so little information about HS and a small community?

I want to make it clear right away: HS is a great thing that you can and should use, it's just that not everyone can do it due to the features of the protocol and its limitations.

So why is the community not growing and why is it so hard to start working in real projects with HS?

I’m sure that there were other developers in my place, when between the idea of implementing HS and the actual implementation of HS, tasks pop up to finish the infrastructure associated with HS, which is absolutely not related to the business tasks of the project. The result is a choice: either your manager \ those director \ someone else pays for this work and generally gives you the go-ahead, or you do it after school hours, and after that you raise the question that everything is ready and can be implemented and will be only profit, or you score on HS, which most often happened, apparently.

Naturally, the fact that HS does not have access control for databases; this sharply limits the scope, all hosting services with the number of clients> 1 drop out.

I tried to contribute slightly to solving this problem.

Monitoring

We decided, therefore, to use HS on the project, but how to measure the load passing through it? HS has the ability to configure the number of workers, here we will monitor their status and load using Munin .

Plugins connect to MySQL and parse the output of the show processlist command. Available for download from here . This data should be enough to assess how busy everything is in peaks.

Symfony2 bundle

If your project uses the Symfony 2nd version framework , then out of the box you get the opportunity to indulge in HS. You just need to add a new dependency to composer.json: "konstantin-kuklin / handlersocket-bundle": "dev-master".

Setting:

Add configs for the socket for reading and writing:

hs:
    reader:
        host: localhost
        port: 9998
        debug: "%kernel.debug%"
        auth_key: "Password_Read1"
    writer:
        host: localhost
        port: 9999
        debug: "%kernel.debug%"

The auth_key parameter is optional and its absence means there is no need for authorization (your QoS). The Debug flag changes the operating mode of the HS library under the hood, requests are sent by the piece and processing time is recorded for each request, and the operational expense for additional storage of requests is also increased.>

Then everything is simple - we work with 2 services:

/** @var \HS\Reader $reader */
$reader = $this->get("hs_reader");
/** @var \HS\Writer $writer */
$writer = $this->get("hs_writer");

There is also “Nishtyaki” for debugging:
In dev mode, we go to the site and see a new icon on the line of the web profiler:

Very simplistic statistics are collected : how many requests and how much time it took to complete them (full time - sending, receiving, parsing). When clicking on this new button, we will see more detailed information about the requests:

Above is the information where the requests were sent and the information about the requests themselves, if you click on Display Data, we get the result of the request:

In fact, my imagination has run out, at this point I’m with the protocol I’ve already met quite well and therefore it has become difficult to come up with cases for displaying such information. I will be glad to hear your ideas about what else information could be useful for debugging.

Of the minuses - the bundle is not tested in any way.

I plan to make 1 entry point in HS with the indication of the database, that is, in the queries you do not have to transfer the database.

Conclusion

All of this can theoretically be used in production, but I need help in testing and I hope this article will bring together people who have experience with people using HS.

Constructive criticism, pull requests, suggestions, bugs found are welcome.

Now starting to use HS is easy, PerconaServer and MariaDB contain this plugin in their builds. Act!

Thanks for attention!

PS On Saturday, the news flew that the Badoo code would be uploaded .

Tags: