Garbage collection in V8: how the new Orinoco GC works
- Transfer
To be honest, this is one of the most cruel articles I've read lately: there is a lot about death at a young age, about persecution from one area of memory to another, and about a fierce struggle for productivity. In general, welcome to kat - there is a translation of an excellent article by Peter Marshall on how garbage collection works in V8 today.
Over the past few years, the approach to garbage collection in V8 has changed a lot. As part of the Orinoco project, he has gone from a consistent stop-the-world approach to parallel and competitive approaches with incremental fallback.
Note: if you prefer to watch the report than read the article, you can do it here . If not, then read on.
Any garbage collector has a set of tasks that need to be performed periodically:
These tasks can be performed sequentially, or you can alternate. The easiest way is to stop the execution of JavaScript and do everything sequentially in the main thread. However, this can lead to delays, which we talked about in previous posts , as well as to a decrease in the performance of the program as a whole.
The main GC collects garbage from the entire heap.
Garbage cleaning takes place in three stages: labeling, disposal and compaction
Determining which objects you can free up memory from is an essential part of the garbage collector. He considers the object alive based on information about its reachability. This means that any object referenced from the current runtime must be stored in memory, and all inaccessible objects can be assembled by GC.
Marking is the process of finding reachable objects. GC has a set of pointers with which it starts to search, the so-called root set. This set includes objects from the current execution stack and global object. Starting with this set, GC follows each pointer to a JavaScript object and marks each as reachable, after which it moves to pointers from objects to other objects and repeats this process recursively until every reachable object is marked.
Disposal is a process in which memory areas left over from dead objects are entered into a list called free-list. Once the marking process is complete, GC finds such areas and adds them to the appropriate list. Free-lists differ from each other in what size memory areas are stored in them, which allows you to quickly find the right one. Subsequently, when we want to allocate memory, we will look in one of the lists and find a section of suitable size.
Also, the main GC sometimes makes decisions about cleaning / compacting some memory pages based on its own heuristic estimates based on the degree of page fragmentation. You can think of compaction as an analogue of defragmenting a hard drive on older PCs. We copy the surviving objects to other pages that have not yet been compressed (here just use the free-list). Thus we can reuse the small scattered pieces of memory left over from dead objects.
One of the drawbacks of the GC that copies surviving objects is that when you create a lot of long-lived objects, you have to pay a high price for copying them. It is for this reason that only some highly fragmented pages of memory are compressed, while the rest are simply disposed of, which does not require copying of surviving objects.
The heap in V8 is broken down into areas called generations. There is a young generation (which in turn is subdivided into a “manger” and “intermediate” generation) and old generations. Created objects are placed in the "manger". Subsequently, if they survive the next garbage collection, they remain in the younger generation, but pass into the category of “intermediate”. If they survive after the next assembly, they are placed in the older generation.
A bunch of V8 is broken into generations. Objects move from younger to older if they survive garbage collection
In the collection of garbage, there is the important term “generational hypothesis”. In simple terms, this means that most of the objects "die young." In other words, most objects are created and die almost immediately from the point of view of the GC. And this statement is true not only for JavaScript, but for most dynamic programming languages.
The heap organization in V8 is based on the above hypothesis. For example, at first glance, it might seem counterintuitive that the GC is compacting / moving objects that have survived garbage collection, because copying objects is quite an expensive operation to perform during garbage collection. But, based on the generational hypothesis, we know that very few objects will survive this procedure. So if you move only surviving objects, everything that was not moved can automatically be considered garbage. This means that the price we pay for copying is proportional to the number of surviving objects, and not all created.
There are actually two garbage collectors in V8. The main (mark-compact) collects garbage from the entire heap quite efficiently, while the secondary one collects garbage only in young memory, because the generation hypothesis tells us that the main garbage collection efforts should be directed there.
The operating principle of the auxiliary GC is that surviving objects always move to a new memory page. In V8, young memory is divided into two halves. One is always free to allow surviving objects to be moved into it, and during assembly this initially empty area is called To-space. The area from which copying occurs is called From-space. In the worst case, every object can survive, and then you have to copy them all.
For this type of assembly, there is a separate set of pointers that refer from old memory to young. And instead of scanning the entire heap, we use write barriers to maintain this set. So, combining this set with a stack and a global object, we get all the links in the young memory without having to scan all the objects from the old memory.
When copying objects from From-space to To-space, all surviving objects are placed in a continuous section of memory. Thus, it is possible to get rid of fragmentation - memory gaps left from dead objects. After the transfer is complete, To-space becomes From-space, and vice versa. As soon as the GC finishes its work, memory for new objects will be allocated starting from the first free address in From-space.
Scavenger transfers the surviving objects to a new memory page.
If you use only this strategy and do not move objects from young memory, the memory will end quite quickly. Therefore, objects that survived two garbage collections are moved to the old memory.
The final step is updating the pointers to objects that have been moved. Each copied object leaves its original address, leaving forwarding address instead, which is necessary to find the original object in the future.
Scavenger transfers “intermediate” objects to the old memory, and objects from the “manger” - to the new page.
Thus, garbage collection in the young memory consists of three steps: marking objects, copying them, updating pointers.
Most of these algorithms are described in various sources and are often used in runtime environments that support automatic garbage collection. But the GC in V8 has come a long way before becoming a truly modern tool. One of the significant metrics that describe its performance is how often and how long the main thread pauses while the garbage collector performs its functions. For the classic stop-the-world builders, this time leaves its mark on the experience of using the page due to delays, poor-quality rendering and increased response time.
Orinoco GC V8 Logo
Orinoco is the GC code name using state-of-the-art parallel, incremental and competitive garbage collection techniques. There are some terms that have specific meanings in the context of the GC, so let's first give their definitions.
Parallelism is when the main and auxiliary threads do approximately the same amount of work per unit of time. This is still the stop-the-world approach, but the pause duration in this case is divided by the number of threads participating in the work (minus the cost of synchronization).
This is the simplest of the three techniques. The heap does not change because JavaScript is not executed, so it is enough for threads to maintain synchronization of access to objects.
Main and auxiliary threads work on the same task at the same time
Incrementality is when the main thread does a small amount of work intermittently. Instead of complete garbage collection, small tasks for partial collection are done.
This is a more difficult task, because JavaScript runs between incremental assemblies, which means that the heap state changes, which in turn can invalidate part of the work done in the previous iteration.
As can be seen from the diagram, this approach does not reduce the total amount of work (and, as a rule, even increases it), but it distributes this work in time. Therefore, this is a good way to solve one of the main tasks - reducing the response time of the main stream.
By allowing JavaScript to run with little interruption in garbage collection, the application can continue to be responsive: respond to user input and update animations.
Small areas of GC work in the main thread
Competition is when the main thread runs JavaScript continuously and the auxiliary threads collect garbage in the background. This is the most difficult of the three techniques: the heap can change at any time, invalidating the work done by GC before.
On top of that, there are also read / write races, since the auxiliary and main streams simultaneously read or modify the same objects.
The assembly takes place completely in the background, the main thread at this time can execute JavaScript
V8 distributes garbage collection work between auxiliary threads in young memory (scavenging). Each thread receives a set of pointers, following which it moves all living objects to To-space.
While moving objects in To-space, threads need to synchronize through atomic read / write / compare and swap operations to avoid a situation where, for example, another thread has detected the same object, but following a different path, and also tries to move it.
The thread that moved the object to To-space then returns and leaves a forwarding pointer so that other threads that find this object can follow the correct address. For fast and synchronization-free memory allocation for surviving objects, threads use thread local buffers.
Parallel assembly distributes work between several auxiliary threads and the main thread
The main GC in V8 begins by marking objects. As soon as the heap reaches a certain limit (calculated dynamically), competitive markers begin their work. Each of the streams receives a set of pointers, and, following them, they mark each object found as reachable.
Competitive labeling happens completely in the background while JavaScript is running in the main thread. Entry barriers ( the write Barriers ) are used to keep track of new links between objects that are created in JavaScript, while streams are engaged in labeling.
Primary GC uses competitive labeling, disposal, and parallel compaction and updating of pointers
At the end of competitive labeling, the main thread carries out a quick step to end labeling. During this, JavaScript execution in the main thread is paused.
The root set is scanned again to make sure that all living objects are marked, and then memory compression and updating of pointers begin in several threads.
Not all pages in the old memory are compacted - those that are not will be scanned to the freed memory areas (sweeping) for listing them in free-lists.
During this pause, sweeping tasks that compete with the tasks of memory compression and the main thread and can continue even when JavaScript is executed in the main thread are started.
JavaScript developers do not have access to the GC - it is part of the implementation environment. And although the JS code cannot invoke the GC directly, V8 provides such access to the environment that embeds the engine.
GC can send tasks (idle tasks) that can be done "in your free time" and which are parts of the work that would have to be done anyway. An environment like Chrome, where the engine is embedded, may have an idea of what to consider as free time. For example, in Chrome, at a frame rate of 60 frames per second, the browser has approximately 16.6 ms to render an animation frame.
If the animation work is completed earlier, in your free time, before the next frame, Chrome can perform some of the tasks received from the GC.
The GC uses the free time of the main thread to do a preliminary cleanup. For
more details, see our Idle-time GC publication .
GC in V8 has come a long way since its introduction. Adding parallel, incremental and competitive techniques to it took several years, but paid off, allowing you to do most of the work in the background.
Everything related to pauses of the main stream, response time and page load has significantly improved, which allows making animations, scrolling and user interaction on the page much smoother. The parallel collector allowed to reduce the total processing time of young memory by 20-50%, depending on the load.
Idle-time GC reduces the size of the used heap for Gmail by 45%. Competitive labeling and disposal (sweeping) can reduce the duration of GC-pauses in heavy WebGL-games up to 50%.
However, the work is not finished yet. Reducing pauses remains an important task to simplify the life of web users, and we are looking for the possibility of using more advanced techniques to achieve the goal.
On top of that, Blink (a renderer in Chrome) is also equipped with an oilpan, and we are working to improve the interaction between the two GCs, as well as to use Orinoco techniques in Oilpan.
Most JavaScript developers do not need to think about how the GC works, but some understanding of this can help you make the best decisions regarding memory usage and programming patterns. For example, given the generational structure of the V8 heap, low-living objects are actually quite cheap from a GC point of view, since we pay mainly for surviving objects. And this kind of patterns are characteristic not only of JavaScript, but also of many languages with support for garbage collection.
Over the past few years, the approach to garbage collection in V8 has changed a lot. As part of the Orinoco project, he has gone from a consistent stop-the-world approach to parallel and competitive approaches with incremental fallback.
Note: if you prefer to watch the report than read the article, you can do it here . If not, then read on.
Any garbage collector has a set of tasks that need to be performed periodically:
- Find living / dead objects in memory.
- Reuse memory occupied by dead objects.
- Compact / defragment memory (optional).
These tasks can be performed sequentially, or you can alternate. The easiest way is to stop the execution of JavaScript and do everything sequentially in the main thread. However, this can lead to delays, which we talked about in previous posts , as well as to a decrease in the performance of the program as a whole.
Main GC (full mark-compact)
The main GC collects garbage from the entire heap.
Garbage cleaning takes place in three stages: labeling, disposal and compaction
Marking
Determining which objects you can free up memory from is an essential part of the garbage collector. He considers the object alive based on information about its reachability. This means that any object referenced from the current runtime must be stored in memory, and all inaccessible objects can be assembled by GC.
Marking is the process of finding reachable objects. GC has a set of pointers with which it starts to search, the so-called root set. This set includes objects from the current execution stack and global object. Starting with this set, GC follows each pointer to a JavaScript object and marks each as reachable, after which it moves to pointers from objects to other objects and repeats this process recursively until every reachable object is marked.
Disposal
Disposal is a process in which memory areas left over from dead objects are entered into a list called free-list. Once the marking process is complete, GC finds such areas and adds them to the appropriate list. Free-lists differ from each other in what size memory areas are stored in them, which allows you to quickly find the right one. Subsequently, when we want to allocate memory, we will look in one of the lists and find a section of suitable size.
Seal
Also, the main GC sometimes makes decisions about cleaning / compacting some memory pages based on its own heuristic estimates based on the degree of page fragmentation. You can think of compaction as an analogue of defragmenting a hard drive on older PCs. We copy the surviving objects to other pages that have not yet been compressed (here just use the free-list). Thus we can reuse the small scattered pieces of memory left over from dead objects.
One of the drawbacks of the GC that copies surviving objects is that when you create a lot of long-lived objects, you have to pay a high price for copying them. It is for this reason that only some highly fragmented pages of memory are compressed, while the rest are simply disposed of, which does not require copying of surviving objects.
Memory generation device
The heap in V8 is broken down into areas called generations. There is a young generation (which in turn is subdivided into a “manger” and “intermediate” generation) and old generations. Created objects are placed in the "manger". Subsequently, if they survive the next garbage collection, they remain in the younger generation, but pass into the category of “intermediate”. If they survive after the next assembly, they are placed in the older generation.
A bunch of V8 is broken into generations. Objects move from younger to older if they survive garbage collection
In the collection of garbage, there is the important term “generational hypothesis”. In simple terms, this means that most of the objects "die young." In other words, most objects are created and die almost immediately from the point of view of the GC. And this statement is true not only for JavaScript, but for most dynamic programming languages.
The heap organization in V8 is based on the above hypothesis. For example, at first glance, it might seem counterintuitive that the GC is compacting / moving objects that have survived garbage collection, because copying objects is quite an expensive operation to perform during garbage collection. But, based on the generational hypothesis, we know that very few objects will survive this procedure. So if you move only surviving objects, everything that was not moved can automatically be considered garbage. This means that the price we pay for copying is proportional to the number of surviving objects, and not all created.
Auxiliary GC (scavenger)
There are actually two garbage collectors in V8. The main (mark-compact) collects garbage from the entire heap quite efficiently, while the secondary one collects garbage only in young memory, because the generation hypothesis tells us that the main garbage collection efforts should be directed there.
The operating principle of the auxiliary GC is that surviving objects always move to a new memory page. In V8, young memory is divided into two halves. One is always free to allow surviving objects to be moved into it, and during assembly this initially empty area is called To-space. The area from which copying occurs is called From-space. In the worst case, every object can survive, and then you have to copy them all.
For this type of assembly, there is a separate set of pointers that refer from old memory to young. And instead of scanning the entire heap, we use write barriers to maintain this set. So, combining this set with a stack and a global object, we get all the links in the young memory without having to scan all the objects from the old memory.
When copying objects from From-space to To-space, all surviving objects are placed in a continuous section of memory. Thus, it is possible to get rid of fragmentation - memory gaps left from dead objects. After the transfer is complete, To-space becomes From-space, and vice versa. As soon as the GC finishes its work, memory for new objects will be allocated starting from the first free address in From-space.
Scavenger transfers the surviving objects to a new memory page.
If you use only this strategy and do not move objects from young memory, the memory will end quite quickly. Therefore, objects that survived two garbage collections are moved to the old memory.
The final step is updating the pointers to objects that have been moved. Each copied object leaves its original address, leaving forwarding address instead, which is necessary to find the original object in the future.
Scavenger transfers “intermediate” objects to the old memory, and objects from the “manger” - to the new page.
Thus, garbage collection in the young memory consists of three steps: marking objects, copying them, updating pointers.
Orinoco
Most of these algorithms are described in various sources and are often used in runtime environments that support automatic garbage collection. But the GC in V8 has come a long way before becoming a truly modern tool. One of the significant metrics that describe its performance is how often and how long the main thread pauses while the garbage collector performs its functions. For the classic stop-the-world builders, this time leaves its mark on the experience of using the page due to delays, poor-quality rendering and increased response time.
Orinoco GC V8 Logo
Orinoco is the GC code name using state-of-the-art parallel, incremental and competitive garbage collection techniques. There are some terms that have specific meanings in the context of the GC, so let's first give their definitions.
Parallelism
Parallelism is when the main and auxiliary threads do approximately the same amount of work per unit of time. This is still the stop-the-world approach, but the pause duration in this case is divided by the number of threads participating in the work (minus the cost of synchronization).
This is the simplest of the three techniques. The heap does not change because JavaScript is not executed, so it is enough for threads to maintain synchronization of access to objects.
Main and auxiliary threads work on the same task at the same time
Incrementality
Incrementality is when the main thread does a small amount of work intermittently. Instead of complete garbage collection, small tasks for partial collection are done.
This is a more difficult task, because JavaScript runs between incremental assemblies, which means that the heap state changes, which in turn can invalidate part of the work done in the previous iteration.
As can be seen from the diagram, this approach does not reduce the total amount of work (and, as a rule, even increases it), but it distributes this work in time. Therefore, this is a good way to solve one of the main tasks - reducing the response time of the main stream.
By allowing JavaScript to run with little interruption in garbage collection, the application can continue to be responsive: respond to user input and update animations.
Small areas of GC work in the main thread
Competitiveness
Competition is when the main thread runs JavaScript continuously and the auxiliary threads collect garbage in the background. This is the most difficult of the three techniques: the heap can change at any time, invalidating the work done by GC before.
On top of that, there are also read / write races, since the auxiliary and main streams simultaneously read or modify the same objects.
The assembly takes place completely in the background, the main thread at this time can execute JavaScript
GC Status in V8
Scavenging
V8 distributes garbage collection work between auxiliary threads in young memory (scavenging). Each thread receives a set of pointers, following which it moves all living objects to To-space.
While moving objects in To-space, threads need to synchronize through atomic read / write / compare and swap operations to avoid a situation where, for example, another thread has detected the same object, but following a different path, and also tries to move it.
The thread that moved the object to To-space then returns and leaves a forwarding pointer so that other threads that find this object can follow the correct address. For fast and synchronization-free memory allocation for surviving objects, threads use thread local buffers.
Parallel assembly distributes work between several auxiliary threads and the main thread
Core gc
The main GC in V8 begins by marking objects. As soon as the heap reaches a certain limit (calculated dynamically), competitive markers begin their work. Each of the streams receives a set of pointers, and, following them, they mark each object found as reachable.
Competitive labeling happens completely in the background while JavaScript is running in the main thread. Entry barriers ( the write Barriers ) are used to keep track of new links between objects that are created in JavaScript, while streams are engaged in labeling.
Primary GC uses competitive labeling, disposal, and parallel compaction and updating of pointers
At the end of competitive labeling, the main thread carries out a quick step to end labeling. During this, JavaScript execution in the main thread is paused.
The root set is scanned again to make sure that all living objects are marked, and then memory compression and updating of pointers begin in several threads.
Not all pages in the old memory are compacted - those that are not will be scanned to the freed memory areas (sweeping) for listing them in free-lists.
During this pause, sweeping tasks that compete with the tasks of memory compression and the main thread and can continue even when JavaScript is executed in the main thread are started.
Idle-time GC
JavaScript developers do not have access to the GC - it is part of the implementation environment. And although the JS code cannot invoke the GC directly, V8 provides such access to the environment that embeds the engine.
GC can send tasks (idle tasks) that can be done "in your free time" and which are parts of the work that would have to be done anyway. An environment like Chrome, where the engine is embedded, may have an idea of what to consider as free time. For example, in Chrome, at a frame rate of 60 frames per second, the browser has approximately 16.6 ms to render an animation frame.
If the animation work is completed earlier, in your free time, before the next frame, Chrome can perform some of the tasks received from the GC.
The GC uses the free time of the main thread to do a preliminary cleanup. For
more details, see our Idle-time GC publication .
Summary
GC in V8 has come a long way since its introduction. Adding parallel, incremental and competitive techniques to it took several years, but paid off, allowing you to do most of the work in the background.
Everything related to pauses of the main stream, response time and page load has significantly improved, which allows making animations, scrolling and user interaction on the page much smoother. The parallel collector allowed to reduce the total processing time of young memory by 20-50%, depending on the load.
Idle-time GC reduces the size of the used heap for Gmail by 45%. Competitive labeling and disposal (sweeping) can reduce the duration of GC-pauses in heavy WebGL-games up to 50%.
However, the work is not finished yet. Reducing pauses remains an important task to simplify the life of web users, and we are looking for the possibility of using more advanced techniques to achieve the goal.
On top of that, Blink (a renderer in Chrome) is also equipped with an oilpan, and we are working to improve the interaction between the two GCs, as well as to use Orinoco techniques in Oilpan.
Most JavaScript developers do not need to think about how the GC works, but some understanding of this can help you make the best decisions regarding memory usage and programming patterns. For example, given the generational structure of the V8 heap, low-living objects are actually quite cheap from a GC point of view, since we pay mainly for surviving objects. And this kind of patterns are characteristic not only of JavaScript, but also of many languages with support for garbage collection.