How to deal with GC pauses
In this topic, we will talk about the causes of the long pauses of the garbage collector and how to deal with them. I will talk about CMS (low pause), because at the moment it is the most commonly used algorithm for applications with large memory and low latency requirements. The description is given on the assumption that your application is spinning on a box with a large amount of memory and a large number of processors.
The general principles of operation of GC and CMS in particular are described in detail here . I only briefly summarize here what we need to understand this topic:
So, in our case, we have the following moments when our application stops completely.
First you need to determine if the pauses in these four cases are so large that they spoil the life of your application. It’s better for everyone to look at the production system, since there are exactly those conditions that you need. Thank God this can be done absolutely without performance degradation by launching the JVM with the following parameters. Details on how to read CMS logs can be found here . If you want time to be printed not in relative seconds from the start of jvm, but humanly, you can use the parser that I wrote in python . We are now only interested in parts of the logs about stop-the-world events.
The main algorithm for minor assemblies is copying, so the more live objects in YougGen, the longer the minor assembly works. I see at least three things to look at when the pauses are too long and you are not happy.
a. Your JVM uses an inappropriate algorithm. The above log uses a single-threaded algorithm (DefNew). I recommend trying a new multi-threaded algorithm (in the logs it will be called ParNew), which can be forcibly turned on with the -XX: + UseParNewGC parameter. You can also mention that if you see the name PSYoungGen in the logs, it means that your JVM uses a parallel algorithm, but an old implementation. Although coupled with CMS, it seems to be not available.
b.You have allocated too large a chunk of memory for YoungGen (in the given log it is 235968K). It can be reduced by setting the -Xmn parameter.
c. You are using a too large survivor space with a large allowed age for objects, so objects are copied back there and cannot get trapped in OldGen, giving an unnecessary minor GC load. This situation can be corrected with the -XX: SurvivorRatio and -XX: MaxTenuringThreshold parameters. For a more detailed analysis of this case, you can run the JVM with the -XX: + PrintTenuringDistribution parameter to get more information about generations of objects in GC logs.
In this phase, the garbage collector marks all objects directly accessible from GC Roots. If you saw that this phase takes a lot of time, then perhaps you have a lot of links from local variables and static fields.
Here you can try to increase the number of threads (ParallelCMSThreads) participating in this phase. By default, it is calculated as (ParallelGCThreads + 3) / 4). Those. if ParallelGCThreads = 8, then only two threads will participate in the init-mark phase, which may not give any particular growth due to overhead arising due to parallelism.
During the mark phase, a special process runs that monitors all link changes. Remark phase is needed just to view all the changed links.
a. If you see the phrase “Rescan (non-parallel)” in the logs, then I recommend that you enable the -XX: + CMSParallelRemarkEnabled option to enable multiple threads for this phase.
b. Since the week reference is being cleaned up in this phase, then see if you use too many of them. (For example, java.util.WeakHashMap)
c.Perhaps your links are changing a lot. See how much time passes between inital-mark and remark. The less time has passed between these phases, the less links will be changed and the faster its remark function. Starting with the fifth java just before the remark phase, the abortable-preclean phase was added, which essentially does nothing, I just hangs and waits until the minor assembly works, then it waits a little longer and ends, thus starting the next remark phase. There are two reasons for this logic. The first - remark also scans YoungGen and for the possibility of working in multi-threaded mode, a minor assembly is required, after which it becomes possible to effectively divide the remaining objects in YoungGen into areas for parallel processing. And the second is a remarkably long stop-the-workd phase, and if it works right after the minor build, you get one big long pause. There are several parameters that allow you to control this behavior: CMSScheduleRemarkEdenSizeThreshold, CMSScheduleRemarkEdenPenetration, CMSMaxAbortablePrecleanTime. I suggest trying CMSScavengeBeforeRemark which will cause a minor assembly to be called immediately before remark. Thus, you will reduce the time between init-mark and remark as much as possible and there will be less work for the remark phase. This will be especially effective if pauses of minor assemblies are much less than remark, which usually happens. CMSMaxAbortablePrecleanTime. I suggest trying CMSScavengeBeforeRemark which will cause a minor assembly to be called immediately before remark. Thus, you will reduce the time between init-mark and remark as much as possible and there will be less work for the remark phase. This will be especially effective if pauses of minor assemblies are much less than remark, which usually happens. CMSMaxAbortablePrecleanTime. I suggest trying CMSScavengeBeforeRemark which will cause a minor assembly to be called immediately before remark. Thus, you will reduce the time between init-mark and remark as much as possible and there will be less work for the remark phase. This will be especially effective if pauses of minor assemblies are much less than remark, which usually happens.
This and several other cases causing Full GC I have already described in detail here .
That's all I wanted to tell about the pauses. Oh yes, please do not use incremental CMS (-XX: + CMSIncrementalMode) unless you have one or two kernels. Everything will only work more slowly.
And a few words about other algorithms.
Garbage First (G1), which should by default appear in Java 7 and it is possible to enable it in the sixth Java starting with Java SE 6 Update 14 with the options -XX: + UnlockExperimentalVMOptions and -XX: + UseG1GC. The idea is to divide the entire area into small sections of memory that can be collected in different sections of time, thereby making very small pauses. There are various JVM parameters that allow you to set the desired pauses, based on which the memory is divided into areas. It should be noted that this approach cannot be called universal, since its effectiveness depends very much on the topology of objects in memory. If you actively use various caches, the objects of which you have scattered links throughout the application, then the assembly of one cavity can pull along scans of a large number of other areas, which will cause noticeable pauses.
Recently, I often come across posts about Azul GC, which works without pauses at all, regardless of the topology of objects, size and memory area. It sounds very promising, but their solution has long been available only on their own hardware (Azul's Vega systems), since the algorithm requires special LVB instructions (loaded value barrier). The good news is that finally there is an opportunity to implement a similar mechanism on the x86-64 architecture of Intel processors. If I were to write an ultra-low-latency application from scratch, I would definitely consider using this JVM, but if your application is already in production and its stability is one of the most important requirements, then switch from Oracle HotSpot JVM to any another rather risky move. Let us recall how many problems the users stumbled upon, even moving from fifth to sixth Java.
The general principles of operation of GC and CMS in particular are described in detail here . I only briefly summarize here what we need to understand this topic:
- The memory is divided into two areas YoungGen and OldGen
- YoungGen's newly created objects
- OldGen gets objects that survive several minor garbage collections
- Minor garbage collection clean YoungGen
- Major garbage collection cleans OldGen
- Full GC clean both areas
- Stop-the-world means your application is completely stopped when garbage collection is running
- Concurrent algorithms and phases do not stop the application, i.e. garbage collection runs parallel to the application
- Parallel algorithms and phases are activities working in several threads. They can be either concurrent or stop-the-world. Unless explicitly stated, usually the documentation implies exactly stop-the-world.
- Minor garbage collection (minor GC) - always just stop-the-world
- Full GC is stop-the-world
- CMS (Concurrent Mark Sweep) has the following main phases:
- initial mark - stop-the-world
- mark - concurrent
- preclean - concurrent
- remark - Stop-the-world
- sweep - concurrent
- Search for dead objects (for which there are no links in the application) in traditional garbage collectors is carried out by searching for all living objects (which are reachable via links from GC roots)
- CMS does not defragment the memory and uses free lists to manage it.
- GC Ergonomic (parameters specifying the desired maximum pauses) does not work with CMS
So, in our case, we have the following moments when our application stops completely.
- Minor garbage collection
- Init-mark CMS phase
- Remark Phase CMS
- Full gc
First you need to determine if the pauses in these four cases are so large that they spoil the life of your application. It’s better for everyone to look at the production system, since there are exactly those conditions that you need. Thank God this can be done absolutely without performance degradation by launching the JVM with the following parameters. Details on how to read CMS logs can be found here . If you want time to be printed not in relative seconds from the start of jvm, but humanly, you can use the parser that I wrote in python . We are now only interested in parts of the logs about stop-the-world events.
-verbose:gc
-Xloggc:gc.log
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationStoppedTime
1. Slows minor garbage collection.
[GC [DefNew: 209100K->25808K(235968K), 0.0828063 secs] 209100K->202964K(1284544K), 0.0828351 secs] [Times: user=0.02 sys=0.08, real=0.08 secs]
Total time for which application threads were stopped: 0.0829205 seconds
The main algorithm for minor assemblies is copying, so the more live objects in YougGen, the longer the minor assembly works. I see at least three things to look at when the pauses are too long and you are not happy.
a. Your JVM uses an inappropriate algorithm. The above log uses a single-threaded algorithm (DefNew). I recommend trying a new multi-threaded algorithm (in the logs it will be called ParNew), which can be forcibly turned on with the -XX: + UseParNewGC parameter. You can also mention that if you see the name PSYoungGen in the logs, it means that your JVM uses a parallel algorithm, but an old implementation. Although coupled with CMS, it seems to be not available.
b.You have allocated too large a chunk of memory for YoungGen (in the given log it is 235968K). It can be reduced by setting the -Xmn parameter.
c. You are using a too large survivor space with a large allowed age for objects, so objects are copied back there and cannot get trapped in OldGen, giving an unnecessary minor GC load. This situation can be corrected with the -XX: SurvivorRatio and -XX: MaxTenuringThreshold parameters. For a more detailed analysis of this case, you can run the JVM with the -XX: + PrintTenuringDistribution parameter to get more information about generations of objects in GC logs.
2. Long-running init-mark
[GC [1 CMS-initial-mark: 680168K(1048576K)] 706792K(1284544K), 0.0001161 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
Total time for which application threads were stopped: 0.0002740 seconds
In this phase, the garbage collector marks all objects directly accessible from GC Roots. If you saw that this phase takes a lot of time, then perhaps you have a lot of links from local variables and static fields.
Here you can try to increase the number of threads (ParallelCMSThreads) participating in this phase. By default, it is calculated as (ParallelGCThreads + 3) / 4). Those. if ParallelGCThreads = 8, then only two threads will participate in the init-mark phase, which may not give any particular growth due to overhead arising due to parallelism.
3. Large pauses in the Remark phase
[GC[YG occupancy: 26624 K (235968 K)][Rescan (non-parallel) [grey object rescan, 0.0056478 secs][root rescan, 0.0001873 secs], 0.0059038 secs][weak refs processing, 0.0000090 secs] [1 CMS-remark: 750825K(1048576K)] 777449K(1284544K), 0.0059808 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]
Total time for which application threads were stopped: 0.0061668 seconds
During the mark phase, a special process runs that monitors all link changes. Remark phase is needed just to view all the changed links.
a. If you see the phrase “Rescan (non-parallel)” in the logs, then I recommend that you enable the -XX: + CMSParallelRemarkEnabled option to enable multiple threads for this phase.
b. Since the week reference is being cleaned up in this phase, then see if you use too many of them. (For example, java.util.WeakHashMap)
c.Perhaps your links are changing a lot. See how much time passes between inital-mark and remark. The less time has passed between these phases, the less links will be changed and the faster its remark function. Starting with the fifth java just before the remark phase, the abortable-preclean phase was added, which essentially does nothing, I just hangs and waits until the minor assembly works, then it waits a little longer and ends, thus starting the next remark phase. There are two reasons for this logic. The first - remark also scans YoungGen and for the possibility of working in multi-threaded mode, a minor assembly is required, after which it becomes possible to effectively divide the remaining objects in YoungGen into areas for parallel processing. And the second is a remarkably long stop-the-workd phase, and if it works right after the minor build, you get one big long pause. There are several parameters that allow you to control this behavior: CMSScheduleRemarkEdenSizeThreshold, CMSScheduleRemarkEdenPenetration, CMSMaxAbortablePrecleanTime. I suggest trying CMSScavengeBeforeRemark which will cause a minor assembly to be called immediately before remark. Thus, you will reduce the time between init-mark and remark as much as possible and there will be less work for the remark phase. This will be especially effective if pauses of minor assemblies are much less than remark, which usually happens. CMSMaxAbortablePrecleanTime. I suggest trying CMSScavengeBeforeRemark which will cause a minor assembly to be called immediately before remark. Thus, you will reduce the time between init-mark and remark as much as possible and there will be less work for the remark phase. This will be especially effective if pauses of minor assemblies are much less than remark, which usually happens. CMSMaxAbortablePrecleanTime. I suggest trying CMSScavengeBeforeRemark which will cause a minor assembly to be called immediately before remark. Thus, you will reduce the time between init-mark and remark as much as possible and there will be less work for the remark phase. This will be especially effective if pauses of minor assemblies are much less than remark, which usually happens.
4. Full GC in the log
(concurrent mode failure): 798958K->74965K(1048576K), 0.0270334 secs] 1033467K->74965K(1284544K), [CMS Perm : 3022K->3022K(21248K)], 0.0270963 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.0271630 seconds
This and several other cases causing Full GC I have already described in detail here .
That's all I wanted to tell about the pauses. Oh yes, please do not use incremental CMS (-XX: + CMSIncrementalMode) unless you have one or two kernels. Everything will only work more slowly.
And a few words about other algorithms.
Garbage First (G1), which should by default appear in Java 7 and it is possible to enable it in the sixth Java starting with Java SE 6 Update 14 with the options -XX: + UnlockExperimentalVMOptions and -XX: + UseG1GC. The idea is to divide the entire area into small sections of memory that can be collected in different sections of time, thereby making very small pauses. There are various JVM parameters that allow you to set the desired pauses, based on which the memory is divided into areas. It should be noted that this approach cannot be called universal, since its effectiveness depends very much on the topology of objects in memory. If you actively use various caches, the objects of which you have scattered links throughout the application, then the assembly of one cavity can pull along scans of a large number of other areas, which will cause noticeable pauses.
Recently, I often come across posts about Azul GC, which works without pauses at all, regardless of the topology of objects, size and memory area. It sounds very promising, but their solution has long been available only on their own hardware (Azul's Vega systems), since the algorithm requires special LVB instructions (loaded value barrier). The good news is that finally there is an opportunity to implement a similar mechanism on the x86-64 architecture of Intel processors. If I were to write an ultra-low-latency application from scratch, I would definitely consider using this JVM, but if your application is already in production and its stability is one of the most important requirements, then switch from Oracle HotSpot JVM to any another rather risky move. Let us recall how many problems the users stumbled upon, even moving from fifth to sixth Java.
Links to the topic:
- Official java memory management documentation
- White papers on setting up the garbage collector
- Blog of a Sun employee working on the GC. Here he describes in detail various aspects of garbage collection. Highly recommend.
- FAQ on various aspects of JVM memory management
- Description of the alternative Azul GC. It also points out the shortcomings of existing solutions and explains what caused them.