Memcached plugin: NoSQL in MySQL



    Hello! My name is Maxim Matyukhin, I'm a PHP programmer at Badoo . In our work, we actively use MySQL. But sometimes we lack its performance, so we are constantly looking for ways to speed up its work.

    In 2010, Yoshinori Matsunobu introduced the NoSQL MySQL plugin called HandlerSocket. It was claimed that this plugin allows you to perform more than 750,000 requests per second. We became curious, and almost immediately we began to use this solution. We liked the result so much that we started making presentations and writing articles promoting HandlerSocket.

    Apparently, we were one of the few users of this plugin - since MySQL 5.7 it stopped working. But in this version another plugin from Oracle appeared - InnoDB memcached plugin, which promised similar functionality.

    Despite the fact that the memcached plugin appeared back in MySQL 5.6 in 2013, there are not so many articles about it and for the most part they repeat the documentation: a simple label is created and requests are made to it through the memcached client.

    We have extensive experience with Memcached and are used to the ease of interacting with it. From InnoDB memcached plugin we expected the same simplicity. But in fact, it turned out that if the patterns for using the plug-in are at least slightly different from those described in the documentation and articles, a lot of nuances and limitations pop up, which are definitely worth considering if you are going to use the plug-in.

    MySQL HandlerSocket


    In this article, we will one way or another compare the new memcached plugin with the old HandlerSocket. Therefore, I recall that it was the latter.

    After installing the HandlerSocket plugin, MySQL started listening to two additional ports:

    1. The first port received client requests for reading data.
    2. The second port received client requests for data recording.

    The client had to establish a regular TCP connection on one of these ports (no authentication was supported), and after that it was necessary to send the “open index” command (a special command with which the client informed which table of which index which fields we were going to read (or write)).

    If the “open index” command worked successfully, then you could send GETs or INSERT / UPDATE / DELETE commands depending on the port to which the connection was established.

    HandlerSocket allowed to perform not only GETs on the primary key, but also simple samples from a non-unique index, range samples, supported multigets and LIMIT. At the same time, it was possible to work with the table both from ordinary SQL, and through the plugin. This, for example, allowed you to make some changes in transactions through SQL, and then read this data through HandlerSocket.

    It is important that HandlerSocket handled all connections with a limited pool of threads through epoll, so it was easy to support tens of thousands of connections, while in MySQL itself a thread was created for each connection and their number was very limited.

    At the same time, it is still an ordinary MySQL server - a technology familiar to us. We know how to replicate and monitor it. Monitoring HandlerSocket is difficult because it does not provide any specific metrics; however, some of the standard MySQL and InnoDB metrics are useful.

    There were, of course, inconveniences, in particular, this plugin did not support working with the timestamp type. Well, the HandlerSocket protocol is harder to read and therefore harder to debug.

    Read more about HandlerSocket here . You can also watch one of our presentations .

    InnoDB memcached plugin


    What does the new memcached plugin offer us?

    As the name implies, his idea is to use the memcached client to work with MySQL and to receive and save data through memcached commands.

    You can read about the main advantages of the plugin here .

    We are most interested in the following:

    1. Low CPU consumption.
    2. Data is stored in InnoDB, which gives certain guarantees.
    3. You can work with data both through Memcached and through SQL; they can be replicated using MySQL built-in tools.

    You can add such pluses to this list as:

    1. Fast and cheap connection. A regular MySQL connection is processed by one thread, and the number of threads is limited, and in the memcached plugin, one thread processes all connections in the event loop.
    2. The ability to request multiple keys with one GET request.
    3. If compared with MySQL HandlerSocket, then in the memcached plugin you do not need to use the “Open Table” command and all read and write operations occur on the same port.


    More details about the plugin can be found in the official documentation . For us, the most useful pages were:

    1. InnoDB memcached Architecture .
    2. InnoDB memcached Plugin Internals .

    After installing the plugin, MySQL starts accepting connections on port 11211 (standard memcached port). A special database (schema) innodb_memcache also appears, in which you will configure access to your tables.

    Simple example


    Suppose you already have a table that you want to work with through the memcached protocol:

    CREATE TABLE `auth` (
      `email` varchar(96) NOT NULL,
      `password` varchar(64) NOT NULL,
      `type` varchar(32) NOT NULL DEFAULT '',
      PRIMARY KEY (`email`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    

    and you want to receive and modify data on the primary key.

    You need to first describe the correspondence between the memcached key and the SQL table in the innodb_memcache.containers table. This table looks something like this (I removed the encoding description to make it easier to read):

    CREATE TABLE `containers` (
      `name` varchar(50) NOT NULL,
      `db_schema` varchar(250) NOT NULL,
      `db_table` varchar(250) NOT NULL,
      `key_columns` varchar(250) NOT NULL,
      `value_columns` varchar(250) DEFAULT NULL,
      `flags` varchar(250) NOT NULL DEFAULT '0',
      `cas_column` varchar(250) DEFAULT NULL,
      `expire_time_column` varchar(250) DEFAULT NULL,
      `unique_idx_name_on_key` varchar(250) NOT NULL,
      PRIMARY KEY (`name`)
    ) ENGINE=InnoDB DEFAULT

    The most important fields:

    • name - prefix of your Memcached key;
    • db_schema - name of the base (circuit);
    • db_table is your table;
    • key_columns - the name of the field in the table by which we will search (usually this is your primary key);
    • value_columns - a list of fields from the table that will be available to the memcached plugin;
    • unique_idx_name_on_key is the index by which to search (although you have already specified key_columns, they can be in different indexes and you need to specify the index explicitly).

    The remaining fields are not very important for a start.

    Add a description of our table to innodb_memcache.containers:

    INSERT INTO innodb_memcache.containers SET 
        name='auth', 
        db_schema='test', 
        db_table='auth', 
        key_columns='email', 
        value_columns='password|type', 
        flags='0', 
        cas_column='0',
        expire_time_column='0',
        unique_idx_name_on_key='PRIMARY';

    In this example, name = 'auth' is the prefix of our memcached key. In the documentation it is often called table_id, and later in the article I will use this term.

    Now TELNET connect to the memcached plugin and try to save and get the data:

    [21:26:22] maxm@localhost: ~> telnet memchached-mysql.dev 11211
    Trying 127.0.0.1...
    Connected to memchached-mysql.dev.
    Escape character is '^]'.
    get @@auth.max@example.com
    END
    set @@auth.max@example.com 0 0 10
    1234567|89
    STORED
    get @@auth.max@example.com
    VALUE @@auth.max@example.com 0 10
    1234567|89
    END

    First we sent a GET request, it didn’t return anything to us. Then we saved the data with a SET request, after which we got it back with a GET.

    GET returned the following line: 1234567 | 89. These are the values ​​of the "password" and "type" fields, separated by the "|" symbol. Fields are returned in the order in which they were described in innodb_memcache.containers.value_columns.

    Perhaps you are now wondering: “What will happen if the symbol“ | ”is found in the“ password ”?” I will discuss this below.

    Through SQL, this data is also available:

    MySQL [(none)]> select * from auth where email='max@example.com';
    +-----------------+----------+------+
    | email       	
    | password | type |
    +-----------------+----------+------+
    | max@example.com | 1234567  | 89   |
    +-----------------+----------+------+
    1 row in set (0.00 sec)

    Default table_id


    There is also such a mode of operation:

    get @@auth
    VALUE @@auth 0 21
    test/auth
    END
    get max@example.com
    VALUE max@example.com 0 10
    1234567|99
    END
    set ivan@example.com 0 0 10
    qwerty|xxx
    STORED
    get ivan@example.com
    VALUE ivan@example.com 0 10
    qwerty|xxx
    END

    In this example, with get @@ auth, we make table_id auth the default prefix for this connection. After that, all subsequent queries can be done without specifying table_id.

    So far, everything is simple and logical. But if you start to understand, then there are many nuances. I’ll tell you what we found.

    Nuances


    Caching innodb_memcache.containers table


    The memcached plugin reads the innodb_memcache.containers table once at startup. Further, if an unknown table_id arrives via the Memcached protocol, the plugin searches for it in the table. Therefore, you can easily add new keys (table_id), but if you want to change the settings of an existing table_id, you have to restart the memcached plugin:

    mysql> UNINSTALL PLUGIN daemon_memcached;
    mysql> INSTALL PLUGIN daemon_memcached soname "libmemcached.so";

    Between these two requests, the Memcached interface will not work. Because of this, it is often easier to create a new table_id than to change the existing one and restart the plugin.

    It was a surprise for us that such an important nuance of the plug-in operation is described on the Adapting a memcached Application for the InnoDB memcached Plugin page , which is not a very logical place for such information.

    Flags, cas_column, expire_time_column


    These fields are needed to simulate some features of Memcached. The documentation for them is inconsistent. Most of the examples in it illustrate working with tables in which these fields are. There may be a concern that you will need to add them to your tables (and these are at least three INT fields). But no. If you do not have such fields in the tables and you are not going to use such Memcached functionality as CAS, expiration or flags, then you do not need to add these fields to the tables.

    When configuring the table in innodb_memcache.containers, you need to enter '0' in these fields, make exactly the line with zero:

    INSERT INTO innodb_memcache.containers SET 
        name='auth', 
        db_schema='test', 
        db_table='auth', 
        key_columns='email', 
        value_columns='password|type', 
        flags='0', 
        cas_column='0',
        expire_time_column='0',
        unique_idx_name_on_key='PRIMARY';

    It is annoying that cas_column and expire_time_column have a default value of NULL, and if you execute INSERT INTO innodb_memcache.containers without specifying a value of '0' for these fields, NULL will be saved in them and this memcache prefix simply will not work.

    Data types


    From the documentation it is not very clear what data types can be used when working with the plugin. In several places it is said that the plugin can only work with text fields (CHAR, VARCHAR, BLOB). Here: Adapting an Existing MySQL Schema for the InnoDB memcached Plugin offers to store numbers in string fields, and if you then need to work with these number fields from SQL, then create a VIEW in which VARCHAR fields with numbers will be converted to INTEGER fields :

    CREATE VIEW numbers AS SELECT c1 KEY, CAST(c2 AS UNSIGNED INTEGER) val
      FROM demo_test WHERE c2 BETWEEN '0' and '9999999999';

    However, in some places in the documentation it is still written that you can work with numbers. So far, we have only real production experience with text fields, but the experimental results show that the plugin also works with numbers:

    CREATE TABLE `numbers` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `counter` int(10) unsigned NOT NULL DEFAULT '0',
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB
    INSERT INTO innodb_memcache.containers SET name='numbers', db_schema='test', db_table='numbers', key_columns='id', value_columns='counter', flags='0', cas_column='0',expire_time_column='0',unique_idx_name_on_key='PRIMARY';

    After that, through the Memcached protocol:

    get @@numbers.1
    END
    set @@numbers.1 0 0 2
    12
    STORED
    get @@numbers.1
    VALUE @@numbers.1 0 2
    12
    END

    We see that the memcached plugin can return any data types. But he returns them in the form in which they lie in InnoDB, so, for example, in the case of timestamp / datetime / float / decimal / JSON, a binary string is returned. But integers are returned as we see them through SQL.

    Multiget


    The memcached protocol allows you to request multiple keys with a single request:

    get @@numbers.2 @@numbers.1
    VALUE @@numbers.2 0 2
    12
    VALUE @@numbers.1 0 2
    13
    END

    The fact that multiget works is already good. But it works within the framework of one table_id:

    get @@auth.ivan@example.com @@numbers.2
    VALUE @@auth.ivan@example.com 0 10
    qwerty|xxx
    END

    This point is described in the documentation here: https://dev.mysql.com/doc/refman/8.0/en/innodb-memcached-multiple-get-range-query.html . It turns out that in multiget you can specify table_id only for the first key, if all the other keys are taken from the default table_id (example from the documentation):

    get @@aaa.AA BB
    VALUE @@aaa.AA 8 12
    HELLO, HELLO
    VALUE BB 10 16
    GOODBYE, GOODBYE
    END

    In this example, the second key is taken from the default table_id. We could specify a lot more keys from the default table_id, and for the first key we specified a separate table_id, and this is possible only in the case of the first key.

    We can say that multiget works within the framework of one table, because you don’t feel like relying on such logic in production code: it is not obvious, it’s easy to forget about it, make a mistake.

    If compared with HandlerSocket, then there, too, multiget worked in the same table. But this restriction looked natural: the client opens the index in the table and requests one or more values ​​from it. But when working with the multiget memcached plugin on several keys with different prefixes, this is normal practice. And you expect the same from the MySQL memcached plugin. But no :(

    INCR, DEL


    I have already given examples of GET / SET requests. INCR and DEL queries have a feature. It lies in the fact that they only work when using the default table_id:

    DELETE @@numbers.1
    ERROR
    get @@numbers
    VALUE @@numbers 0 24
    test/numbers
    END
    delete 1
    DELETED

    Memcached protocol limitations


    Memcached has a text protocol, which imposes some limitations. For example, memcached keys should not contain white space characters (space, line feed). If you look again at the description of the table from our example:

    CREATE TABLE `auth` (
      `email` varchar(96) NOT NULL,
      `password` varchar(64) NOT NULL,
      `type` varchar(32) NOT NULL DEFAULT '',
      PRIMARY KEY (`email`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

    this means that in the field “email” there should not be such characters.

    Also, memcached keys must be less than 250 bytes (bytes, not characters). If you send more, you get an error:

    "CLIENT_ERROR bad command line format"

    In addition, one must take into account the fact that the memcached plugin adds its own syntax to the memcached protocol. For example, it uses the character "|" as a field separator in the response. You need to make sure that this symbol is not used in your table. The separator can be configured, but the settings will apply to all tables on the entire MySQL server.

    Field delimiter value_columns


    If you need to return several columns through the memcached protocol, as in our first example:

    get @@auth.max@example.com
    VALUE @@auth.max@example.com 0 10
    1234567|89
    END

    then the column values ​​are separated by the standard separator "|". The question arises: “What will happen if, for example, the character“ | ”is in the first field in the line?” The memcached plugin in this case will return the string as is, something like this: 1234 | 567 | 89. In the general case, it is impossible to understand where there is a field.

    Therefore, it is important to choose the right separator right away. And since it will be used for all keys of all tables, it should be a universal symbol that will not be found in any field with which you will work through the memcached protocol.

    Summary


    This is not to say that the memcached plugin is bad. But one gets the impression that it was written for a specific scheme of work: a MySQL server with one table that can be accessed using the memcached protocol, and this table_id is made default. Clients establish a persistent connection with the Memcached plugin and make requests to the default table_id. Probably, in such a scheme, everything will work flawlessly. If you move away from it, you come across various inconveniences.

    You might have expected to see some plugin performance reports. But we have not yet decided to use it in highly loaded places. We used it only in a few not very loaded systems and there it works at about the same speed as the HandlerSocket, but we did not make honest benchmarks. But nevertheless, the plugin provides such an interface with which the programmer can easily make a mistake - you need to keep a lot of nuances in mind. Therefore, we are not yet ready to use this plugin in bulk.

    We made several feature requests in the MySQL bug tracker:

    https://bugs.mysql.com/bug.php?id=95091
    https://bugs.mysql.com/bug.php?id=95092
    https: // bugs .mysql.com / bug.php? id = 95093
    https://bugs.mysql.com/bug.php?id=95094

    Let's hope the memcached plugin development team will improve its product.

    Also popular now: