Rocker - rocksdb driver for Erlang

Introduction

There is a lot of information and disputes on the Internet regarding the choice of the sql / nosql approach, as well as the pros and cons of this or that KV-storage. What you are reading now is not a guide to rocksdb or campaigning for using this storage and my driver for it. I would like to share the intermediate result of the work on optimizing the NIF development process for Erlang. This article presents a workable driver for rocksdb, developed for a couple of evenings.

So, in one of the projects there was a problem of reliable processing of a large volume of events. Each event takes from 50 to 350 bytes, more than 80 million events are generated per node per day. Just want to note that the issues of resiliency of message delivery to the nodes are not considered. Also, one of the limitations of processing is the atomic and consistent change of a group of events.

Thus, the main requirements for the driver are:

Reliability
Performance
Security (in a canonical sense)
Functionality:
- All basic kv functions
- Column families
- Transactions
- Data compression
- Support for flexible storage configuration
Minimum code base

Review of existing solutions

erocksdb - solution from leofs developers. The advantages include approbation in a real project. By cons - outdated code base and lack of transactionality. This driver is based on rocksdb 4.13.
rockse has a number of limitations, for example, lack of configuration options, but most importantly, all keys and values must be strings. He got into the review only as an example of a number of drivers that implement this or that functionality and limit the other.
erlang-rocksdb is a full-featured project, the development of which began in 2014. Like erocksdb used in real projects. It has a large code base in C / C ++ and wide functionality. This driver is suitable for general practice and use in most projects.

After a cursory analysis of the current erlang drivers for rocksdb, it became clear that none of them fully complied with the project requirements. Although erlang-rocksdb could have been used, a couple of free evenings appeared, and after successful development and implementation of the Bloom filter on Rust, and curiosity: is it possible to implement all the requirements of the current project and implement most of the functions in NIF in a short period of time?

Rocker

Rocker is a NIF for Erlang, using Rust rocksdb wrapper. Key features are security, performance, and a minimum code base. Keys and data are stored in binary form, which does not impose any restrictions on the storage format. At the moment, the project is suitable for use in third-party solutions.
The source code is in the project repository .

API Overview

Opening the base

Working with the base is possible in two modes:

Common key space. In this mode, all your keys will be placed in one set. Rocksdb allows you to flexibly configure storage options for current tasks. Depending on them, the base can be opened in two ways:

using the standard set of options
```
rocker:open_default(<<"/project/priv/db_default_path">>) -> {ok, Db}.
```
The result of this operation will be a pointer to work with the base, and the base will be blocked for any other attempts to open. The base will be automatically unlocked immediately after clearing this pointer.

or set options for the task

{ok, Db} = rocker:open(<<"/project/priv/db_path">>, #{
create_if_missing => true,
set_max_open_files => 1000,
set_use_fsync => false,
set_bytes_per_sync => 8388608,
optimize_for_point_lookup => 1024,
set_table_cache_num_shard_bits => 6,
set_max_write_buffer_number => 32,
set_write_buffer_size => 536870912,
set_target_file_size_base => 1073741824,
set_min_write_buffer_number_to_merge => 4,
set_level_zero_stop_writes_trigger => 2000,
set_level_zero_slowdown_writes_trigger => 0,
set_max_background_compactions => 4,
set_max_background_flushes => 4,
set_disable_auto_compactions => true,
set_compaction_style => universal
}).

Breakdown into several spaces. Keys are stored in the so-called column families, and each column family can have different options. Consider the example of opening a database with standard options for all column families
```
{ok, Db} = case rocker:list_cf(BookDbPath) of
{ok, CfList} -> rocker:open_cf_default(BookDbPath, CfList);
_Else -> CfList = [], rocker:open_default(BookDbPath)
end.
```

Base removal

For correct deletion of the database, it is necessary to call. rocker:destroy(Path).The database should not be used.

Base recovery after failure

In the event of a system failure, the base can be restored using the method rocker:repair(Path). This process consists of 4 steps:

file search
restoring tables by playing WAL
metadata retrieval
handle record

Creating column family

Cf = <<"testcf1">>,
rocker:create_cf_default(Db, Cf) -> ok.

Remove column family

Cf = <<"testcf1">>,
rocker:drop_cf(Db, Cf) -> ok.

CRUD operations

Write data by key

rocker:put(Db, <<"key">>, <<"value">>) -> ok.

Receiving data by key

rocker:get(Db, <<"key">>) -> {ok, <<"value">>} | notfound

Deleting data by key

rocker:delete(Db, <<"key">>) -> ok.

Writing data by key within CF

rocker:put_cf(Db, <<"testcf">>, <<"key">>, <<"value">>) -> ok.

Data acquisition by key within CF

rocker:get_cf(Db, <<"testcf">>, <<"key">>) -> {ok, <<"value">>} | notfound

Deleting data by key within CF

rocker:delete_cf(Db, <<"testcf">>, <<"key">>) -> ok

Iterators

As you know, one of the basic principles of rocksdb is the orderly storage of keys. This feature is very useful in real-world tasks. To use it we need data iterators. In rocksdb, there are several modes of walking through data (detailed code examples can be found in tests ):

From the beginning of the table. An iterator is responsible for this in rocker.{'start'}
C end of the table: {'end'}
Starting from a specific key forward {'from', Key, forward}
Starting from a certain key back {'from', Key, reverse}

It is worth noting that these modes also work to pass through the data stored in column families.

Create an iterator

rocker:iterator(Db, {'start'}) -> {ok, Iter}.

Iterator check

rocker:iterator_valid(Iter) -> {ok, true} | {ok, false}.

Create an iterator for CF

rocker:iterator_cf(Db, Cf, {'start'}) -> {ok, Iter}.

Create a prefix iterator

The prefix iterator requires explicitly specifying the prefix length when creating the database.

{ok, Db} = rocker:open(Path, #{
    prefix_length => 3
}).

An example of creating an iterator with the prefix “aaa”:

{ok, Iter} = rocker:prefix_iterator(Db, <<"aaa">>).

Create a prefix iterator for CF

Similar to the previous prefix iterator, requires an explicit assignment prefix_lengthfor column family

{ok, Iter} = rocker:prefix_iterator_cf(Db, Cf, <<"aaa">>).

Getting the next item

The method returns the following key / value, or ok if the iterator has completed.

rocker:next(Iter) -> {ok, <<"key">>, <<"value">>} | ok

Transactions

A fairly frequent occurrence is the requirement of simultaneously writing changes to a group of keys. Rocker allows you to combine CRUD operations both within a common set and within CF.
This example illustrates working with transactions:

{ok, 6} = rocker:tx(Db, [
    {put, <<"k1">>, <<"v1">>},
    {put, <<"k2">>, <<"v2">>},
    {delete, <<"k0">>, <<"v0">>},
    {put_cf, Cf, <<"k1">>, <<"v1">>},
    {put_cf, Cf, <<"k2">>, <<"v2">>},
    {delete_cf, Cf, <<"k0">>, <<"v0">>}
]).

Performance

In the test suite you can find a performance test. It shows about 30k RPS for writing and 200k RPS for reading on my machine. In real conditions, you can expect 15-20k RPS per write and about 120k RPS per read, with an average data size of about 1 KB per key and the total number of keys more than 1 billion.

Conclusion

The development and application of Rocker in our project allowed us to reduce the response time of the system, increase reliability, and reduce the restart time. These advantages were obtained with minimal development and implementation costs.

Personally for myself, I concluded that for Erlang projects that require optimization, the application of Rust is optimal. On Erlang, it is possible to quickly and efficiently implement 95% of the code, and on Rust, rewrite / append 5% inhibitory values without reducing the overall reliability of the system.

PS There is a positive experience in developing NIF for Arbitrary-precision arithmetic in Erlang, which can be made into a separate article. I would like to clarify, is the topic of the NIF interesting to the Rust community?

Tags: