Ahaha, HotSpot, what are you doing, stop it!

Original author: Gleb Smirnov
  • Transfer
  • Tutorial


As you probably already know , the next Joker conference will be held in St. Petersburg soon . I am going to give her a report on how to investigate the behavior of the JVM, which at first seems mysterious and mysterious. This post is a teaser designed to give you an opportunity to understand what to expect from the report.

Suppose you suddenly receive information about a problem: when collecting garbage, the reasons for “Last Ditch Collection” and “No GC” are displayed , and an Internet search does not give anything intelligible. Fortunately, HotSpot is almost completely assembled from OpenJDK, and therefore, at least in theory, we can find the answers to all our questions directly in the source. What will we do!


Last ditch collection

Armed with the OpenJDK source code , let's look for the line we are interested in:

$ grep -irn "Last Ditch Collection" .
./src/share/vm/gc_interface/gcCause.cpp:94:      return "Last ditch collection";

We go into the file and find there a reference to _last_ditch_collection- an element enumcontaining all the possible reasons for garbage collection. Immediately we will look at the necessary constant for “No GC”, it will turn out to be _no_gc. Continue to grab:

$ grep -rn "GCCause::_last_ditch_collection" .
...
./src/share/vm/gc_implementation/shared/vmGCOperations.cpp:286:        heap->collect_as_vm_thread(GCCause::_last_ditch_collection);
...


That seems to be it! After reading the method in which the call is located, and in particular its comments, we quickly understand what is happening.

Apparently, when HotSpot cannot allocate space for something that should be in metaspace (for example, for a class), garbage collection with the cause starts first GCCause::_metadata_GC_threshold. If this does not help, then an attempt is made to expand metaspace. If this does not help, then the full garbage collection with the cause is started GCCause::_last_ditch_collection. During this build, soft references are cleared. If even this doesn’t help, then the place really ended, and it's time to throw OOM.

It sounds plausible, but it is worth learning to reproduce it stably. If you believe the above, then with active littering metaspace we should see at least one assembly with the Last Ditch collection reason:

ClassPool pool = ClassPool.getDefault();
for(long l = 0; l < Long.MAX_VALUE; l++) {
    pool.makeClass("com.example.Kitty" + l).toClass();
}

If you run this with the default settings, then you have to wait quite a while. But we can reduce the size of metaspace and get the result faster:

$ java -cp build/libs/labs-8-all.jar -XX:+PrintGCDetails -XX:MaxMetaspaceSize=10m ru.gvsmirnov.perv.labs.gc.NoGcTrigger
...
[Full GC (Last ditch collection) [...] 14470K->14425K(32768K), [Metaspace: 8971K->8971K(1056768K)], 0.0481030 secs]
...
Exception in thread "main" javassist.CannotCompileException: by java.lang.OutOfMemoryError: Metaspace
    at javassist.ClassPool.toClass(ClassPool.java:1099)
    ...


In most cases, we will see more than one assembly with a “last ditch collection” cause. This is quite expected, since otherwise they would have been no use, and it would hardly have been implemented at all.

By the way, here we looked at the sources of Java 9, but the logic of the last ditch collection has not changed for a long time, so there is nothing to worry about.

Another funny thing worth mentioning: we, in general, did not need to write code that actively spoiled in metaspace. Any code at all: after all, the JVM is also very willing to load classes. If we just run it java -XX:+PrintGCDetails -XX:MaxMetaspaceSize=2m X, we get several times the last ditch collection, and then the OOM. The virtual machine does not even have time to understand that the class Xdoes not exist.

No gc

Let's move on to No GC. The captain tells us that the developers of HotSpot would hardly have named the normal reason for garbage collection. Anyway, the only case in which, logically, we should see this is if we have time to run jstat before at least one garbage collection occurs:

$ jstat -gccause -t `jps | grep NoGc | cut -d' ' -f1` 100
Timestamp     S0     S1      E      O      P    YGC     YGCT   FGC     FGCT      GCT     LGCC     GCC
      0.7   0.00   0.00   4.00   0.00  40.34      0    0.000     0    0.000    0.000    No GC   No GC
      0.8   0.00   0.00   6.00   0.00  45.82      0    0.000     0    0.000    0.000    No GC   No GC
      0.9   0.00   0.00   6.00   0.00  51.44      0    0.000     0    0.000    0.000    No GC   No GC

Or if he does a check when the garbage collector is not active:

Timestamp     S0     S1      E      O      P    YGC     YGCT   FGC     FGCT      GCT                 LGCC     GCC
      2.8   0.00   0.00   0.00   0.78  31.00      5    0.080     5    1.051    1.131   Allocation Failure   No GC


So if it was discovered in a different situation, then it must be a bug. And since when we analyzed the last ditch collection, such oddities were not observed, that is, there is a chance that it has already been fixed. Let's show a little optimism and look for all suitable commits:

$ hg grep --all '::_no_gc'
...
src/share/vm/gc_implementation/shared/vmGCOperations.cpp:2097:+:  assert(((_gc_cause != GCCause::_no_gc) &&
...


Yeah! It seems that the problem was really fixed in the commit number 2097 , back in February 2013. Looking at the hotspot_version file , we find that the last HotSpot in which the problem should be observed is 21.0-b01 . I had on hand Java 6 with the HotSpot version 20.45-b01 :

$ java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)


When you run our cat example, immediately the following result is obtained:

Timestamp     S0     S1      E      O      P    YGC     YGCT   FGC     FGCT      GCT                 LGCC                     GCC
      2.7   0.00   0.00   0.00   0.78 100.00      5    0.080     5    1.051    1.131      unknown GCCause      Allocation Failure
      2.8   0.00   0.00   0.00   0.78 100.00      5    0.080     5    1.051    1.131      unknown GCCause      Allocation Failure


This is pretty close, but not quite what we were looking for. However, with a more detailed study of the patch, it becomes obvious that you just need to use a different garbage collector. Adding -XX:+UseG1GCwill give us the desired result:

Timestamp     S0     S1      E      O      P    YGC     YGCT    FGC    FGCT      GCT                  LGCC                        GCC
      1.6      ?      ?      ?      ?  96.78      ?        ?     ?        ?        ?   G1 Evacuation Pause                      No GC
      1.7      ?      ?      ?      ? 100.00      ?        ?     ?        ?        ?   No GC                Permanent Generation Full
      1.8      ?      ?      ?      ? 100.00      ?        ?     ?        ?        ?   No GC                Permanent Generation Full
      1.9      ?      ?      ?      ? 100.00      ?        ?     ?        ?        ?   No GC                Permanent Generation Full
      2.0      ?      ?      ?      ? 100.00      ?        ?     ?        ?        ?   No GC                Permanent Generation Full


Success! By the way, the person who will be the first to answer why there are some question marks, and what exactly should lead us to use G1, will receive an invitation to Joker Unconference;) SerCe has already given the correct answer .

Afterword

So, just based on one single tweet and using basic command line utilities, we answered a non-google question, found an old bug in HotSpot, were able to evaluate the JVM version from above, and even found out which garbage collector was used.

The moral is simple: HotSpot is not some kind of mysterious shaitan machine. There is nothing mysterious in the world at all, it is only in our heads .

PS If you have an interesting example from this opera, then feel free to send it. It is possible that I will analyze it in a report, or even just like that :)

PPS Several invited speakers who do not know the Russian language will come to the conference. they will be happy if there are more reports in English. I added a survey to understand how the majority of students can relate to this.

Only registered users can participate in the survey. Please come in.

Are you ready to listen to this report in English, but not in Russian?

  • 57.3% Yes, ready. Let foreign colleagues listen too. 129
  • 42.6% No, not ready. 96

Also popular now: