Under the hood of Redis: Lines

    If you know why a simple string `strings` in Redis will occupy 56 bytes in RAM - you, I think, will not be interested in the article. I will try to tell everyone else what strings are in Redis and why it is important for a developer using this database to understand how they work and work. This knowledge is especially important if you are trying to calculate the actual memory consumption of your application or planning to build highly loaded statistics or data accounting systems. Or, as often happens, you are trying to urgently understand why suddenly your redis instance began to consume unexpectedly a lot of memory.

    What I will talk about is how strings are stored in Redis, what internal structures are used to store strings, what kinds of optimizations Redis uses under the hood. How to effectively store large structures and in what situations it is not worth using strings or structures built on their basis. Lines are the key structure of Redis, HSET / ZSET / LIST are built on their basis adding a small overhead to represent their internal structures. Why do I need this article - I have been reading and actively responding to stackoverflow in the redis tag for more than a year. Throughout this time, I constantly see a steady stream of questions one way or another related to the fact that the developers do not understand the features of Redis working with RAM and what you have to pay for high speed.

    The answer to the question of how much memory will be used actually depends on the operating system, the compiler, the type of your process and the used allocator (in redis, jemalloc by default). I give all further calculations for redis 3.0.5 assembled on a 64 bit server running centos 7.
    It seems to me that here a little interlude is absolutely necessary for those who do not write C / C ++ or are not very familiar with how everything works at a low level. Let's monstrously simplify a few concepts to make it easier for you to understand the calculations. When you declare a structure in a c / c ++ program and you have unsigned int fields in it (without signed integers of 4 bytes), the compiler carefully aligns their size to 8 bytes in real RAM (for x64 architecture). This article will periodically talk about the memory allocator - this is the kind of thing that allocates memory "smartly." For example, jemalloc tries to optimize for you the speed of searching for new blocks of memory, relying on alignment of allocated fragments. Jemallos memory allocation and alignment strategy is well describedHowever, I think that we should use the simplification that any size of the selected fragment will be rounded to the nearest degree 2. You ask for 24 bytes - allocate 32. Ask for 61 - allocate 64. I greatly simplify it and I hope it will be a little clearer. These are the things that, in the interpreted languages, you should not worry, logically, but here I beg you to draw your attention to them.

    The concept and implementation of the lines for the authorship of Salvatore Sanfilippo (aka antirez) lies in one of the radish projects called SDS (Simple Dynamic String, github.com/antirez/sds ):
    +--------+-------------------------------+-----------+
    | Header | Binary safe C alike string... | Null term |
    +--------+-------------------------------+-----------+
             |
             `-> Pointer returned to the user.
    


    This is a simple `c` structure, the heading of which stores the current size and free space in the already allocated memory, the line itself and the mandatory ending zero, which the radish itself adds. In sds lines, we will be most interested in the cost of the header, the strategy for changing their size and the penalty for aligning structures when allocating memory.

    July 4, 2015 ended a long story with row optimization, which should fall into radish 3.1. This optimization will bring big memory savings in line headers (from 16% to 200% on synthetic tests). Removes the limitation of 512MB on the maximum line length in the radish. All this will be possible due to the dynamic change in the length of the header when changing the length of the string. So the header will occupy only 3 bytes for strings up to 256 bytes long, 5 bytes for strings less than 65 kb, 9 bytes (as of now) for strings up to 512 mb and 17 bytes for strings whose size fits in uint64_t (64 bit unsigned integer). By the way, with this change, our real farm saves about 19.3% of memory (~ 42 GB). However, at the latest at the current moment 3.0.x with the header, everything is simple - 8 bytes + 1 byte to the final zero. Let's estimate how much memory the string “strings” will take: 16 (header) + 7 (string length) + 1 (trailing zero) = 24 bytes (16 bytes per header, because the compiler will align 2 unsigned ints for you). In doing so, jemalloc will allocate 32 bytes for you. Let's omit this for now (I hope it will be clear later why).

    What happens when a string resizes? Whenever you increase the size of a line and there is not enough memory already allocated, the radish checks the new length with the constant SDS_MAX_PREALLOC (defined in sds.h and is 1,048,576 bytes). If the new length is less than this value, memory will be allocated twice as much as requested. If the string length already exceeds SDS_MAX_PREALLOC, the value of this constant will be added to the new requested length. This feature will be important in the story of “disappearing memory when using bitmap”. By the way, when allocating memory for bitmap, 2 times the requested amount will always be allocated, due to the peculiarities of the setbit command implementation (see setbitCommand in bitops.c).

    Now we could say that our line will occupy 32 bytes in RAM (taking into account alignment). Those who read the guys’s tips from hashedin.com ( redis memory optimization guide ) may recall that they strongly recommend not using strings less than 100 bytes long, as to store a short string, say when using the `set foo bar` command, you spend ~ 96 bytes of which 90 bytes is an overhead (on a 64 bit machine). Cunning? Let's figure it out next.

    All values ​​in the radish are stored in a structure of type redisObject. This allows the radish to know the type of value, its internal representation (in the radish this is called encoding), the data for the LRU, the number of objects referencing the value of the value, and the value itself:
    +------+----------+-----+----------+-------------------------+
    | Type | Encoding | LRU | RefCount | Pointer to  data (ptr*) |
    +------+----------+-----+----------+-------------------------+
    


    A little later, we calculate its size for our line, taking into account the compiler alignment and jemalloc features. In terms of strings, it is very important for us to know what encodings are used to store strings. Right now, radishes are using three different storage strategies:

    1. REDIS_ENCODING_INT is quite simple. Strings can be stored in this form if the value cast to a long value is in the range LONG_MIN , LONG_MAX . So, the string “dict” will be stored exactly in the form of this encoding and will be the number 1952672100 (0x74636964). The same encoding is used for a pre-allocated range of special values ​​in the range REDIS_SHARED_INTEGERS (defined in redis.h and equal to 10000 by default). The values ​​of this range are highlighted immediately at the start of the radish.
    2. REDIS_ENCODING_EMBSTR is used for strings with a length of up to 39 bytes (the value of the constant REDIS_ENCODING_EMBSTR_SIZE_LIMIT from object.c). This means that redisObject and the structure with the sds line will be placed in a single memory area allocated by the allocator. With this in mind, we can correctly calculate the alignment. However, this is no less important for understanding the problem of memory fragmentation in radishes and how to live with it.
    3. REDIS_ENCODING_RAW is used for all rows whose length exceeds REDIS_ENCODING_EMBSTR_SIZE_LIMIT . In this case, our ptr * stores a regular pointer to a memory area with an sds line.

    EMBSTR appeared in 2012 and brought a 60-70% increase in performance when working with short lines, but there are still no serious studies on the effect on memory and its fragmentation.

    The length of our string “strings” is only 7 bytes, i.e. the type of its internal representation is EMBSTR. The line created in this way is placed in memory like this:
    +--------------+--------------+------------+--------+----+
    | robj data... | robj->ptr    | sds header | string | \0 |
    +--------------+-----+--------+------------+--------+----+
                         |                       ^
                         +-----------------------+
    

    Now we are ready to calculate how much RAM redis will need to store our string “strings”.
    (4 + 4) * + 8 (encoding) + 8 (lru) + 8 (refcount) + 8 (ptr) + 16 (sds header) + 7 (line itself) + 1 (terminating zero) = 56 bytes.

    The type and value in redisObject uses only 4 low and high bits of the same number, so these two fields after alignment will take 8 bytes.

    Let's check that I do not drive you by the nose. Let's see the encoding and value. We will use one little-known command for debugging lines - DEBUG SDSLEN. By the way, the command is not in the official documentation , it was added in redis 2.6 and can be very useful:
    set key strings
    +OK
    debug object key
    +Value at:0x7fa037c35dc0 refcount:1 encoding:embstr serializedlength:8 lru:3802212 lru_seconds_idle:14
    debug sdslen key
    +key_sds_len:3, key_sds_avail:0, val_sds_len:7, val_sds_avail:0
    

    The encoding used is embstr, the string length is 7 bytes (val_sds_len). What about the 96 bytes the hashedin.com guys were talking about? In my understanding, they were a little mistaken, their example with `set foo bar` would require 112 bytes of RAM (56 bytes per value and the same amount per key), of which 106 are overhead.

    A little higher, I promised a story about fading memory when using BITMAP. The feature that I want to talk about constantly leaks out of the attention of some developers using it. Guys, consultants on optimization of memory regularly earn on this. Such as redis-labs or datadog. The “Bit and byte level operations” team appeared in redis 2.2 and immediately positioned as a lifesaver for real-time counters (for example, an article from Spool), which allow you to save memory. The official guide to optimizing memory also has an advertising slogan about using this family of data to store online "For 100 million users, this data will occupy only 12 megabytes of RAM." In the description, SETBIT and SETRANGE warn of possible server lags when allocating memory, while omitting the important, as it seems to me, section “When you should not use BITMAP” or “When is it better to use SET instead of BITMAP”.

    Armed with an understanding of how lines grow in radishes, you can see that bitmap:
    • Do not use for sparse data.
    • understand the relationship between the payload and the real load (see the example below).
    • consider the dynamics of filling your bitmap.

    Consider an example. Suppose you have up to 10 million people registered and your ten millionth user is online:
    setbit online 10000000 1
    :0
    debug sdslen online
    +key_sds_len:6, key_sds_avail:0, val_sds_len:1250001, val_sds_avail:1048576
    

    Your actual memory consumption was 2,298,577 bytes, with 1,250,001 bytes that are “useful” to you. Storage of one of your users cost you ~ 2.3 mb. Using SET, you would need ~ 64 bytes (with 4 bytes of payload). You need to choose the right aggregation intervals in such a way as to reduce the sparseness of the data and try to keep the bitmap filling in the range from 30% - in this case, you will actually use memory effectively for this data structure. I say this to the fact that if you have a multimillion audience, and say 10,000 - 100,000 people watch online, then using bitmap for this purpose can be memory overhead.

    Finally, resizing lines in a radish is a constant reallocation of memory blocks. Fragmentation of memory is another specificity of radishes, which developers think little about.
    info memory
    $222# Memory
    used_memory:506920
    used_memory_human:495.04K
    used_memory_rss:7565312
    used_memory_peak:2810024
    used_memory_peak_human:2.68M
    used_memory_lua:36864
    mem_fragmentation_ratio:14.92
    mem_allocator:jemalloc-3.6.0
    

    The mem_fragmentation_ratio metric shows the relationship between the memory allocated by the operating system ( used_memory_rss ) and the memory used by the radish ( used_memory ). At the same time, used_memory and used_memory_rss will already include both the data itself and the costs of storing the internal radish structures for their storage and presentation. Radish considers RSS (Resident Set Size) as the amount of memory allocated by the operating system, in which, in addition to user data (and the cost of its internal presentation), fragmentation costs are also taken into account when the memory is physically allocated by the operating system itself.

    How to understand mem_fragmentation_ratio? A value of 2.1 tells us that we use 210% more memory for data storage than we need. A value less than 1 indicates that the memory is over and the operating system will swap.

    In practice, if the values ​​for mem_fragmentation_ratio fall outside the range of 1 - 1.5, it means that something is wrong with you. Try:
    • Reload your radish. The longer the radish into which you write actively worked without rebooting, the higher you will have mem_fragmentation_ratio. Largely due to the features of the allocator. In particular, this is guaranteed to help if you have a big difference between used_memory and used_memory_peak. The last indicator says what is the maximum amount of memory that your radish instance has ever needed since it started.
    • See what kind of data and how much you plan to store. So, if 4 GB is enough to store your data, use 32-bit radish assemblies. At least if you are using a 64-bit assembly, at least try to deploy your dame on a 32-bit version (rdb does not depend on the radish bit size and you can easily run rdb created by a 64-bit instance on a 32-bit one). Almost guaranteed, this reduces fragmentation (and the amount of memory used) by ~ 7% (due to savings in alignment).
    • If you understand the difference and features, try changing the allocator. Radishes can be collected with glibс malloc, jemalloc (read what facebook engineers think about it ), tcmalloc.


    When talking about fragmentation, I don’t take into account the specifics of radishes with LRU turned on or additional difficulties with a large number of ordinary string keys - all this draws to a separate article. I would be grateful if you share suggestions whether it is worth writing about it and what else seems important to you when working with radishes.

    User pansa correctly observes that in a swap situation, radishes will not recalculate the value of used_memory_rss after the operating system returns part of the RAM to the process. Radish will recalculate this value already when accessing data.

    Table of contents:

    Additional materials for reference:

    Only registered users can participate in the survey. Please come in.

    It is worth writing about how other radish structures are arranged under the hood?


    Also popular now: