Three types of memory leaks

Original author: Nelson Elhage
  • Transfer
Hello colleagues.

Our long search for timeless bestsellers on optimizing the code so far only gives the first results, but we are ready to please you, that literally just finished the translation of the legendary book of Ben Watson " Writing High Performance .NET Code ". In stores - approximately in April, watch for advertising.

And today we offer you to read a purely practical article on the most pressing types of RAM leaks, which Nelson Elhage wrote from Stripe .

So, you have a program, the execution of which is spent the further - the more time. It is probably not difficult for you to understand that this is a sure sign of a leak in the memory.
However, what exactly do we mean by “memory leak”? In my experience, obvious leaks in memory are divided into three main categories, each of which is characterized by a particular behavior, and for debugging each category, special tools and techniques are needed. In this article I want to describe all three classes and suggest how to properly recognize
which of the classes you are dealing with and how to find a leak.

Type (1): an unreachable fragment of memory

is allocated. This is a classic memory leak in C / C ++. Someone allocated memory with newor malloc, and never called freeordelete, to free up memory at the end of work with her.

  char *leaked = malloc(4096);
  /* Упс, забыл вызвать free() */

How to determine that a leak belongs to this category

  • If you are writing in C or C ++, especially in C ++ without the ubiquitous use of smart pointers to control the lifetime of memory segments, then this is the first option to consider.
  • If the program is executed in an environment with garbage collection, then it is possible that a leak of this type is triggered by a native code extension , however, you must first eliminate leaks of types (2) and (3).

How to find such a leak

  • Use ASAN . Use ASAN. Use ASAN.
  • Use another detector. I tried Valgrind or heap tcmalloc tools, there are also other tools in other environments.
  • Some memory allocators allow you to dump a heap profile in which all unallocated chunks of memory will be shown. If you have a leak, then after a while, almost all of the active secretions will flow from it, so finding it is probably not difficult.
  • If nothing helps, output the memory dump and learn it as thoroughly as possible . But to begin with it definitely should not be.

Type (2): unplanned, long-lived memory allocations

These situations are not “leaks” in the classical sense of the word, since the link to this memory location is still preserved, so it can eventually be released (if the program has time to get there without spending all the memory).
Situations in this category can arise for many specific reasons. The most common are:

  • Inadvertent state accumulation in the global structure; for example, the HTTP server writes each object to the global list Request.
  • Caches without a well thought out obsolescence policy. For example, an ORM cache that caches all of the uploaded objects that are active during the migration process, during which all the records that are present in the table are loaded.
  • Too volumetric state is captured in the circuit. Such a case is especially common in JavaScript, but can also occur in other environments.
  • In a broader sense, the unintentional retention of each of the elements of an array or stream, while it was assumed that these elements would be processed online.

How to determine that a leak belongs to this category

  • If the program is executed in an environment with garbage collection, then this option is considered first of all.
  • Compare the heap size displayed in the garbage collector statistics with the free memory size reported by the operating system. If the leak falls into this category, the numbers will be comparable and, most importantly, over time will follow each other.

How to find such a leak

Use profilers or heap dump tools that are in your environment. I know there is a guppy in Python or a memory_profiler in Ruby, and I myself wrote ObjectSpace directly in Ruby.

Type (3): free, but unused or unusable memory.

Characterizing this category is the most difficult, but it is the most important to understand and take into account.

This type of leakage occurs in the gray area, between memory, which is considered “free” from the point of view of the allocator inside the VM or runtime environment, and memory, which is “free” from the point of view of the operating system. The most common (but not the only) reason for this phenomenon isheap fragmentation . Some distributors simply take and do not return memory to the operating system after it has been allocated.

A case of this kind can be seen on the example of a short program written in Python:

import sys
from guppy import hpy
hp = hpy()
defrss():return4096 * int(open('/proc/self/stat').read().split(' ')[23])
defgcsize():return hp.heap().size
rss0, gc0 = (rss(), gcsize())
buf = [bytearray(1024) for i in range(200*1024)]
print("start rss={}   gcsize={}".format(rss()-rss0, gcsize()-gc0))
buf = buf[::2]
print("end   rss={}   gcsize={}".format(rss()-rss0, gcsize()-gc0))

We allocate 200,000 1-kb buffers, and then save each subsequent one. We deduce every second the state of memory from the point of view of the operating system and from the point of view of our own Python garbage collection.

I get something like this on my laptop: We can make sure that Python actually freed up half of the buffers, because gcsize dropped almost half the peak value, but could not return a single byte to the operating system. The freed memory remains available to the same Python process, but to no other process on this machine.

start rss=232222720 gcsize=11667592
end rss=232222720 gcsize=5769520

Such free but unused fragments of memory can be both problematic and harmless. If a Python program acts this way and then allocates a handful of 1kb fragments, then this space is simply reused, and all is well.

But, if we did this during the initial setup, and later allocated memory by the minimum, or if all the fragments subsequently allocated were at 1.5kb and did not fit into these previously left buffers, then all the memory allocated in this way would always stand idle. would be in vain.

Problems of this kind are particularly relevant in a specific environment, namely, in multiprocess server systems for working with languages ​​such as Ruby or Python.

Suppose we set up a system in which:

  • Each server uses N single-threaded workers that handle requests in a competitive manner. Let's take N = 10 for accuracy.
  • As a rule, each employee has almost a constant amount of memory. For accuracy, let's take 500MB.
  • With some low frequency we receive requests that require much more memory than the median request. For accuracy, let's assume that once a minute we receive a request, for the execution time of which an extra 1GB of memory is additionally required, and upon completion of the processing of the request this memory is released.

Every minute comes a "cetaceous" queries that we commit to one of the 10 workers, for example, at random: ~random. Ideally, at the time of processing this request, the employee should allocate 1GB of RAM, and after finishing work, return this memory to the operating system so that it can be used again later. In order to process requests unlimitedly by this principle, the server will need only 10 * 500MB + 1GB = 6GB RAM.

However, let's assume that due to fragmentation or for some other reason, the virtual machine can never return this memory to the operating system. That is, the amount of RAM that it requires from the OS is equal to the largest amount of memory that ever has to be allocated at a time. In this case, when a particular employee serves such a resource-intensive request, the area occupied by such a process in memory will swell forever by a whole gigabyte.

When you start the server, you will see that the amount of memory used is 10 * 500MB = 5GB. As soon as the first large request arrives, the first worker will grab 1GB of memory, and then will not give it back. The total memory used will jump to 6GB. The following incoming requests may from time to time be dropped by the process that has previously processed the "whale", and in this case the amount of memory used will not change. But sometimes such a large request will be given to another employee, which will cause the memory to expand by another 1GB, and so on until each employee has had time to process such a large request at least once. In this case, you will use these operations up to 10 * (500MB + 1GB) = 15GB of RAM, which is much more than the ideal 6GB! Moreover, if we consider

How to determine that a leak belongs to this category

  • Compare the heap size displayed in the garbage collector statistics with the free memory size reported by the operating system. If the leak falls into this (third) category, then the numbers will diverge over time.
  • I like to set up my application servers so that both of these numbers periodically beat off in my time series infrastructure, so it’s convenient to display graphics on them.
  • In Linux, view the state of the operating system in field 24 of /proc/self/stat, and view the memory allocator through a language-specific or virtual machine-specific API.

How to find such a leakage

As already mentioned, this category is a bit more insidious than the previous ones, since the problem often arises, even when all the components work “as intended”. However, there are a number of good practices that can help mitigate or reduce the impact of such “virtual leaks”:

  • Restart your processes more often. If the problem grows slowly, then perhaps restarting all the processes of the application once every 15 minutes or once an hour may not be difficult.
  • An even more radical approach: you can teach all processes to restart on their own as soon as the space they occupy in memory exceeds a certain threshold value or grows by a specified amount. However, try to ensure that your entire server park cannot start up in a spontaneous synchronous restart.
  • Change the memory allocator. In the long run, tcmalloc and jemalloc usually cope with fragmentation much better than the default allocator, and experimenting with them is very convenient with a variable LD_PRELOAD.
  • Find out if you have individual requests that consume much more memory than others. In Stripe, API servers measure RSS (constant memory consumption) before and after servicing each API request and log the delta. Then, we can easily query our log aggregation systems to determine if there are such terminals and users (and if patterns are traced) to which memory consumption bursts can be written off.
  • Adjust the garbage collector / memory allocator. Many of them have customizable parameters that allow you to specify how actively such a mechanism will return memory to the operating system, how optimized it is to eliminate fragmentation; There are other useful options. Everything is also quite difficult here: make sure that you understand exactly what you are measuring and optimizing, and also try to find an expert on the relevant virtual machine and consult with it.

Also popular now: