dbarashev March 2, 2011 at 03:50

Alternative memory allocators

Transfer

Posted by Stephen Tovey at 2:29 am on programming (Google Translate humor joke)
Introduction from myself: this article, advertised by Alena C ++ , is intended mainly for game developers for consoles, but it will probably be useful to everyone who has to deal with extreme allocation dynamic memory. Perhaps lovers of comparing memory management in C ++ and Java will also find something to think about.

Original with interesting discussion in the comments: altdevblogaday.org/2011/02/12/alternatives-to-malloc-and-new

Mandatory Opening Fable

I really like sushi. It is tasty and comfortable. I like that you can go to the sushi restaurant with a conveyor, take a seat and grab something fresh and tasty from the ribbon without spending a whole lunch hour. But with all this, what I really would not want to be is to be a waiter in a sushi restaurant, especially if it would be my duty to seat visitors in places.

A company of three of them comes in and asks to show them the places. Satisfied with the arrival of visitors, you kindly seat them. Before they had time to sit down and put themselves some tasty Tekka Maki, as the door opens again, and four more come in! You're lucky today, like a drowned man, man! The restaurant now looks like this ...

Since lunchtime, the restaurant quickly fills to capacity. Finally, having absorbed everything that she could, and having paid the bill, the first company (those of which three) leaves, and in return a couple comes in and you offer new visitors new vacant seats. This happens several times and finally the restaurant looks like this ...

And then four people come and ask to seat them. Pragmatic by nature, you carefully monitored how many empty seats were left, and look, today is your day, there are four places! There is one “but”: these four are terribly social and go to sit next to each other. You look desperately, but even though you have four empty seats, you cannot seat this company nearby! To ask existing visitors to move in the middle of dinner would be rude, so, unfortunately, you have no choice but to give a turn from the gate, and they may never return. All this is terribly sad. If you're wondering how this fable relates to game development, read on. In our programs, we must manage memory. Memory is a precious resource that must be carefully managed, just like the places in our metaphorical sushi restaurant. Everytime, when we dynamically allocate memory, we reserve space in a thing called a heap. In C and C ++, this is usually done, respectively, using a functionmalloc and the new operator . Continuing our rotten analogy (I won’t, I promise!) It is similar to how our fearless companions of eaters are asked to show them their places. But what really shit is that our hypothetical scenario also happens when working with memory, and its consequences are much worse than a pair of empty tummies. This is called fragmentation , and it's a nightmare!

What is wrong with malloc and new?

Unfortunately, fishing fables end here and the further text will be about malloc and new, and why they have very limited use in the context of embedded systems (such as game consoles). Although fragmentation is only one aspect of the problems associated with dynamic memory allocation, it is perhaps the most serious. But before we come up with alternatives, we need to look at all the problems:

malloc and new try to be all in one bottle for all programmers ...
They will give you several bytes in exactly the same way as several megabytes. They have no idea what kind of data this is, the place for which they give you and what kind of data the life cycle will be. In other words, they do not have that broader picture that programmers have.
Relatively poor performance ...
Allocating memory using standard library functions or operators usually requires kernel calls. This can have all sorts of unpleasant side effects on the performance of your application, including flushing associative translation buffers, copying memory blocks, etc. This reason alone is sufficient to make the use of dynamic memory allocation very expensive in terms of performance. The cost of free or delete operations in some memory allocation schemes can also be high, since in many cases a lot of additional work is done to try to improve the heap state before subsequent allocations. “Extra work” is a rather vague term, but it can mean a combination of memory blocks, and in some cases, it may mean passing the entire list of memory areas allocated to your application! This is definitely not what you would like to be spending precious processor cycles on if you can avoid it!
They are the cause of fragmentation of the heap ...
If you have never worked on a project that suffers from problems associated with fragmentation, then consider that you are very lucky, but the rest of the guys know that fragmentation of the heap can be a complete and absolute nightmare.
Tracking dynamically allocated memory can be a daunting task ...
Together with dynamic allocation, the inevitable risk of memory leak comes as a present. I am sure that we all know what memory leaks are, but if not, read here . Most studios build infrastructure on top of their dynamic locations to keep track of which memory is being used and where.
Bad locality of links ...
In essence, there is no way to find out where the memory returned by malloc or new will be in relation to other areas of memory in your application. This can lead to the fact that we will have more expensive misses in the cache than we need, and in the end we will dance in memory like on coals. So what is the alternative? The purpose of this post is to give you information (and a bit of code!) About several different alternatives that can be used instead of malloc and new in order to deal with the problems just announced.

Modest linear allocator

The title of this section, as it were, hints that this allocator is by far the simplest of all that I am going to present, although in truth, they are all simple (and this makes them beautiful). A linear allocator essentially implies the absence of a petty release of allocated memory resources and allocates pieces of memory one by one from a fixed pool in a linear way. Look at the picture.

A good example where a linear allocator proves extremely useful can be found almost everywhere when programming a synergistic processor. The persistence of local data beyond the scope of the current task is not important, and in many cases the amount of data arriving at the local storage (at least data slightly varying in size) is quite limited. However, do not be fooled, this is far from his only application. Here is an example of how to implement a simple linear allocator in C.

/** Header structure for linear buffer. */
typedef struct _LinearBuffer {
uint8_t *mem;       /*!< Pointer to buffer memory. */
uint32_t totalSize; /*!< Total size in bytes. */
uint32_t offset;    /*!< Offset. */
} LinearBuffer;
/* non-aligned allocation from linear buffer. */
void* linearBufferAlloc(LinearBuffer* buf, uint32_t size) {
if(!buf || !size)
return NULL;
uint32_t newOffset = buf->offset + size;
if(newOffset <= buf->totalSize) {
void* ptr = buf->mem + buf->offset;
buf->offset = newOffset;
return ptr;
}
return NULL; /* out of memory */
}

By applying bitwise operations to the offset value during memory allocation, aligned allocation can be maintained. It can be very useful, but be aware that depending on the size of the data that you place in your buffer (and in some cases on the order in which the placements are made), you may get unused space between the allocated memory in the buffer. For reasonable-sized alignments, this is generally acceptable, but can become excessively wasteful if you allocate memory with much greater alignment, such as 1 MB each. In such situations, the arrangement of objects in a linear allocator can have a radical effect on the amount of unused memory. All you need to do to reset the allocator (possibly at the end of the level) is to set the offset value to zero. As always when working with dynamic memory, allocator clients must be sure that they do not have pointers to the memory that you allocate in this way, otherwise you risk breaking the allocator. For all C ++ objects that you have placed in the buffer, you will need to manually call the destructor.

Stack

A few reservations before you start talking about this allocator. When I talk about the “stack allocator” in this particular case, I'm not talking about the call stack , however, as we will see later, that onethe stack plays an important role in getting rid of the dynamic allocation of memory from the heap. So what am I talking about then? The linear allocator described above is a great solution for many memory allocation tasks, but what if you want a little more control over how you free up your resources? It will give you a stack allocator. Toward the end of the linear allocator description, I noted that to reset it, you can simply set the offset to zero, which will free up all the resources of the allocator. The principle of setting a specific offset value is the basic one used in the implementation of a stack allocator. If you are not familiar with the concept of stack as a data structure, then probably now is the time to do this, for example here. Done? Perfectly. With each piece of memory allocated by our stack allocator, a pointer to the status of the stack allocator immediately prior to allocation will be optionally associated . This means that if we restore this state of the allocator (by changing its offset) we will actually 'free' memory that can be reused. This is shown in the diagram below.

This can be very useful if you want to temporarily allocate memory for data with a limited lifetime. For example, the limited lifetime of a particular function or subsystem. This strategy can also be useful for things like level resources that have a clearly defined release order (inverse to the placement order). Here is a small C example illustrating what was just said.

typedef uint32_t StackHandle;
void* stackBufferAlloc(StackBuffer* buf, uint32_t size, StackHandle* handle) {
if(!buf || !size)
return NULL;
const uint32_t currOffset = buf->offset;
if(currOffset + size <= buf->totalSize) {
uint8_t* ptr = buf->mem + currOffset;
buf->offset += size;
if(handle)
*handle = currOffset; /* set the handle to old offset */
return (void*)ptr;
}
return NULL;
}
void stackBufferSet(StackBuffer* buf, StackHandle handle) {
buf->offset = handle;
return;
}

Double sided stack

If you have mastered the concept of stack described above, then we can go to a two-sided stack. It looks like a stack allocator, except that it has two stacks, one of which grows from the bottom of the buffer up, and the other grows from the top of the buffer down. See the picture:

Where can this be used? A good example would be any situation where you have data of a similar type, but with completely different lifetimes, or if you had data that would be closely related and should be located in the same memory area, but have different the size. It should be noted that the place where the offsets of the two stacks meet is not fixed. In particular, stacks need not occupy half the buffer each. The bottom stack can grow to very large sizes, and the top one can be smaller, and vice versa. In order to make the best use of the memory buffer, this additional flexibility may sometimes be necessary. I think you do not need the advice of Captain O., and I will not give here examples of code for an allocator with a double-sided stack, since it naturally looks like the already discussed one-way stack allocator.

Pool

We will now shift the focus a little from the family of allocators described above, based on linear movements of pointers or offsets, and move on to a few other things. The pool allocator, which I will talk about now, is designed to work with data of the same type or size. It divides the memory buffer it controls into segmentsthe same size, and then allows the client to isolate and free these pieces at his request (see diagram below). To do this, he must constantly monitor free segments, and I know several ways to implement this. I personally avoid implementations that use a stack of indexes (or pointers) to free segments, because they need extra space, which can often be inadmissible, but they generally exist. The implementation, which I will discuss here, does not require additional space for managing free segments in the pool. In fact, there are only two fields in the service data structure of this allocator, which makes it the smallest of all allocators described in this note.

So how does it work? To manage free segments, we will use a data structure known aslinked list . Again, if you are not familiar with this data structure, then try reading this.. Having experience with the PlayStation3 and Xbox360, where access to road memory, I generally find node-based data structures (like a linked list) pretty bleak, but I think this is probably one of those uses that I approve of. In fact, in the service structure of the allocator there will be a pointer to a linked list. The linked list itself is spread out throughout the pool, occupying the same space as the free segments in the memory buffer. When we initialize the allocator, we go through the segments of the buffer and write in the first four (or eight) bytes of each segment a pointer with the address of the next free segment. The heading points to the first item in this list. Some limitation on storing pointers in free pool segments is that the size of the segment must be no less than the size of the pointer on the target hardware. See the chart below.

When allocating memory from the pool, you just need to move the pointer to the head of the linked list in the header to the second element of the list, and then return the pointer to the first element. It is very simple, when allocating memory, we always return the first element in the linked list. Similarly, when we free a segment and return it to the pool, we simply insert it into the head of the linked list. Inserting a freed segment into the head, and not into the tail, has several advantages: first of all, we should not go through the linked list (or store an additional pointer to the tail in the header) and secondly (and this is more important), the node just freed, more likely to stay in the cache on subsequent placements. After several allocations and deallocations of memory, your pool may become similar to this:

Here's a bit of C code for initializing an allocator and allocating and freeing memory.

/* allocate a chunk from the pool. */
void* poolAlloc(Pool* buf) {
if(!buf)
return NULL;
if(!buf->head)
return NULL; /* out of memory */
uint8_t* currPtr = buf->head;
buf->head = (*((uint8_t**)(buf->head)));
return currPtr;
}
/* return a chunk to the pool. */
void poolFree(Pool* buf, void* ptr) {
if(!buf || !ptr)
return;
*((uint8_t**)ptr) = buf->head;
buf->head = (uint8_t*)ptr;
return;
}
/* initialise the pool header structure, and set all chunks in the pool as empty. */
void poolInit(Pool* buf, uint8_t* mem, uint32_t size, uint32_t chunkSize) {
if(!buf || !mem || !size || !chunkSize)
return;
const uint32_t chunkCount = (size / chunkSize) - 1;
for(uint32_t chunkIndex=0; chunkIndexmem = buf->head = mem;
return;
}

A few words about stack allocations (alloca to help you)

If you remember, I said earlier that I will mention stack allocations in the context of the call stack. I'm sure you saw code that conceptually looks something like this:

myFunction() {
myTemporaryMemoryBuffer = malloc(myMemorySize);
/* обработка, не выходящая за пределы этой функции */
free (myTemporaryMemoryBuffer);
}

Most C compilers support a function that should mean (depending on the size of the allocated memory) that you do not have to resort to allocating memory on the heap for temporary buffers of this kind. This alloca function. The internal allocationa device depends on the architecture, but in essence it allocates memory by setting the stack frame of your function, which allows you to write data to the call stack area. It could just be moving the stack pointer (just like in the linear allocator mentioned at the beginning). The memory allocated to you by the alloca function is freed when you exit the function. However, there are a few pitfalls to be aware of when using alloca. Remember to check the size of the requested memory to make sure that you are not asking for a stack of crazy size. If your stack overflows, the consequences will be unpleasant. For the same reason, if you are thinking about allocating a large chunk of memory using alloca, it is better to know all the places from where your function will be called in the context of the general program stream.

You can find further details in your favorite search engine.

And finally ...

Often, the memory allocated during the execution of a task is temporary and is stored only for the lifetime of one frame. To combat the unpleasant consequences of fragmentation in systems with limited memory, it is important to use this information and place this kind of memory in separate buffers. Any of the above allocation schemes would work, but I would probably suggest trying one of the linear ones, since they are much easier to reset than the pool. Noel Llopis told more details on this subject in this excellent article.. Which allocator is best for you depends on many factors, and what the task you are trying to solve requires. I would advise you to think carefully about the memory allocation and freeing patterns that you plan to do on your system, think about the size and lifetime of the allocated memory, and try to manage its allocators that make sense in a given context. Sometimes it helps to draw a memory allocation on paper (yes, with a pencil and all that) or sketch a graph of what kind of memory usage versus time you expect. Believe it or not, it can really help you understand how the system produces and receives data. If you do this, then it will be easier for you to understand how to separate the memory allocations in such a way as to make memory management so easy, fast and unfragmented,

Keep in mind that I talked here about just a small selection of the simplest strategies that I usually prefer when programming games for console consoles. I hope that if you read it all the way here and still haven’t done such things, then your brain is now living with ideas about the code written in the past, which could use these methods to mitigate the significant shortcomings associated with dynamic memory allocation, or about other strategies that you could use to solve limited memory management tasks without dynamic allocation.

In conclusion, I believe that programmers should consider the consequences of dynamic memory allocation, especially in console games, and think twice about using the malloc function or the new operator. It is easy to convince yourself that you are not doing so many allocations, which means it does not matter much, but this type of thinking is spread by an avalanche throughout the team, and leads to a slow, painful death. Fragmentation and performance losses associated with the use of dynamic memory, not being suppressed in the bud, can have catastrophic hard to resolve consequences in your future development cycle. Projects where memory management and allocation are not well thought out often suffer from accidental failures after a long gaming session due to lack of memory (which, incidentally,

Remember: it’s never too early to start thinking about memory, and whenever you start to do it, you will regret not having done it even earlier!

Additional Information...

Here are some links to related topics, or topics that I couldn’t cover:

gamesfromwithin.com/start-pre-allocating-and-stop-worrying
en.wikipedia.org/wiki/Circular_buffer
en.wikipedia.org/wiki/Buddy_memory_allocation
www.memorymanagement.org/articles/alloc.html

Thanks to Sarah, Jameen, Richard and Dat for proofreading this nonsense.

Also from a translator: I'm not a game developer, so translations of specific terms may not be very correct. For example, I have no idea what a synergistic processor is :) To translate a game of English words was also not easy, but I hope that this is not a bad option.

Tags: