Tarantool Data and Protocol


    Tarantool is a great high-performance no-Sql solution developed by Mail.Ru. Sources

    This solution allows you to use both the key / value mode and the selection of many records in the recordset according to one or more criteria (search fields). Analogs in RuNet and not only, I have not yet met. With a stretch, you can compare the radish. But in the radish - list data and they can not be selected by key. Judging by the claims of the developers, the key access speed is faster than memcache, while backdoor data is constantly saved to disk. But unfortunately, this development has a single perl client for accessing data, which is why it does not have such popularity as, for example, redis or memcache.

    In the doc / box-protocol sources there is a description of the Protocol that I have currently revised to write a client in C and PHP. Having studied the Protocol, you can implement the native client in your favorite language. I hope this article is useful to you in this.


    All data in this database is divided into namespace, most likely it is an analogue of the database in MySQL. The numbering of all namespaces is digital (0, 1, 2, etc.). Each namespace can be overlaid with a specific index. Indexing is also digital. Indexes are superimposed on one or several fields. Index can be of type HASH or TREE.

    All Indices and Namespaces are registered in the Config. An example is shown below, where two indices are written, digital and character. Moreover, the second index is composite:namespace[0].enabled = 1
    namespace[0].index[0].type = "HASH"
    namespace[0].index[0].unique = 1
    namespace[0].index[0].key_field[0].fieldno = 0
    namespace[0].index[0].key_field[0].type = "NUM"
    namespace[0].index[0].key_field[1].fieldno = 1
    namespace[0].index[1].type = "TREE"
    namespace[0].index[1].key_field[0].fieldno = 0
    namespace[0].index[1].key_field[0].type = "STR"
    namespace[0].index[1].key_field[1].fieldno = 1

    It should be noted about the keys. Keys can be digital (1,2,3 ... 6.2 * 10 ^ 9) or symbolic.

    All data in the namespace is stored as tuples. A cart is a set of fields. The field can be either digital or symbolic.

    The exchange between the client and server is carried out by Messages. All Messages in the tarantool Protocol are divided into Request Request and Response. Each Message has a mandatory Header Header and may also have a Body.

    The message header includes: message type, body length, and request identifier.

    Header Structure:
    typedef struct { uint32_t type; uint32_t len; uint32_t request_id; } Header;

    The following Message types are defined:
    INSERT 0xd (13)
    SELECT 0x11 (17)
    UPDATE 0x13 (19)
    DELETE 0x14 (29)
    PING 0xff 0xff 0x0 0x0 (65280)

    The request identifier is set by the client and can be zero.

    The general structure of the request:
    typedef struct {
    	Header header
    	union {
    		InsertRequest insert;
    		SelectRequest select;
    		UpdateRequest insert;
    		DeleteRequest insert;
    	};
    } Request;


    The PING command has no body, so there is no PingRequest;)

    The body of the INSERT command consists of the namespace number over which the operation will be performed, a flag and a tuple.

    namespace - This is the space in which tuples are stored. Numberspace names are digital. At each namespace, indices can be defined. The primary index (PRIMARY) must be
    present. Indexes are defined in the configuration file.

    Currently, only a single flag BOX_RETURN_TUPLE (0x01) is defined that indicates whether to return data in the response body.
    The structure of the INSERT request:
    typedef struct {
    	uint32_t namespaceNo;
    	uint32_t flag;
    	Tuple tuple;
    } InsertRequest;


    All data is described using Tuple tuples. A tuple consists of a cardinality field, which is the dimension of the tuple (number of fields) and an array of fields. In general terms, it will look like this:
    typedef struct {
    	uint32_t card;
    	Field field [];
    } Tuple;


    Each field is represented by an array of bytes. A field can have: int32, int64, or a stream of bytes.
    Currently, I have defined it like this:
    typedef Field u_char * data;

    All field data is packed using LED128 en.wikipedia.org/wiki/LEB128 the

    body of a SELECT request includes: namespace number, index number used to select, offset offset and output size limit and number of tuples and tuples themselves. The offset and limit parameters are similar to the selection: SELECT * FROM ... LIMIT

    typedef struct {
    	uint32_t namespaceNo;
    	uint32_t indexNo;
    	uint32_t offset;
    	uint32_t limit;
    	uint32_t count;
    	Tuple tuples [];
    } SelectRequest;


    If we specify SELECT * FROM t0 WHERE k0 = 1, then the number of tuples = 1 and the Tuple value should correspond to 1. If a secondary composite index k1 (a digital field and a character) is defined, then to the query
    SELECT * FROM t0 WHERE k1 = ( 21, 'USSR')
    number of tuples = 2 and two Tuple values ​​must be provided. It is necessary to clarify that the presented sql is schematic and does not comply with the SQL'92 standard. The fact is that the data in tarantool / box is represented by tuples, not tables (columns and rows). A tuple can contain any number of fields. All tuples are stored in non-space. However, on the memspace you can set up a HASH or rbTREE index by which the search will be performed.

    The body of an UPDATE request includes: namespace number, flag, tuple, number of operations, and operations themselves. The flag and tuple fields are similar to the UPDATE operation. The number of operations can be equal to zero. The structure will look like this:

    typedef struct {
    	uint32_t namespaceNo;
    	uint32_t flag;
    	Tuple tuple;
    	int32_t count;
    	Operation operation [];
    } UpdateRequest;


    Each Operation is a structure containing the number of the field through which the operation will be performed, the operation code and the argument.

    The operation codes are used:
    0 - assignment of an argument to this field.
    If the argument is of type int32, then the following actions are also possible:
    1 - add the argument to the existing field
    2 - execute AND with the existing field
    3 - execute XOR with the existing field
    4 - execute OR with the existing field
    typedef struct {
    	int32_t fieldNo;
    	int8_t opcode;
    	Field arg;
    } Operation;


    The DELETE operation is always performed on the primary key and contains the namespace number and tuple. The structure of the DELETE operation is presented below:
    typedef struct {
    	uint32_t namespaceNo;
    	Tuple tuple;
    } SelectRequest;


    Each server response contains: Header header, response code and, if necessary, response body. Response Header - Similar to the request header. The return code 0 is success, or we look at errors in include / iproto.h

    In general, the following response structure is obtained:
    typedef struct {				
    		Header header
    		int32_t code;
    		union {
    			SelectResponce selectBody;
    			insertResponce insertBody;
    			uint8_t * data;
    			int32_t count;
    		};			
    } Responce;
    


    The body of a response to a SELECT query consists of a field containing the total number of tuples and the set of returned tuples. If the query result is empty, then the tuples are not returned and the quantity field contains zero.
    typedef struct {
    	int32_t count;
    	FqTuple tuples [];
    } SelectResponce;


    Each returned tuple (FqTuple) contains the size of the tuple, some identification cardinality, which acts as a separator (boundaries) and the tuple itself.
    typedef struct {
    	int32_t size;
    	uint32_t card;
    	Tuple tuple;
    } FqTuple;


    If the BOX_RETURN_TUPLE flag is set in the InsertRequest request, then the response may contain the body:
    typedef struct {
    	int32_t count;
    	FqTuple tuple;
    } InsertResponce;


    The response to the UPDATE query is similar.

    The Delete query returns the number of deleted records. Since only the primary index is used during deletion, we can delete only one record, respectively, it will return 0 or 1. This is the count field of the Responce structure. Even in the structure, an array of data bytes is allocated for data analysis.

    There are some inaccuracies in the text, somewhere you can use int32_t instead of uint32_t.
    Perhaps I misunderstood something, so I will be happy with business criticism from the authors of this wonderful project.

    Also popular now: