ARG89 March 29, 2017 at 10:25

Everything you wanted to know about stack traces and hip dumps. Part 2

Everything you wanted to know about stack traces and hip dumps. Part 1

Here is the second part of the decryption of the report by Andrey Pangin aka apangin from Odnoklassniki from one of the JUGs (a dopped and expanded version of his report from JPoint 2016). This time we will end the conversation about stack traces, and also talk about dump dumps and hip dumps.

So, we continue ... Since we are talking about recursion, what will happen if I run such a recursive method that never returns:

static int depth;
static void recursion() {	
    depth++;
    recursion();
}
public static void main(String[] args) {
    recursion();
}

How many calls will pass before a StackOverflowError appears (with a standard stack size)?
Let's measure:

package demo4;
public class Recursion {
    static int depth;
    static void recursion() {
        depth++;
        recursion();
    }
    public static void main(String[] args) {
        try {
            recursion();
        } catch (StackOverflowError e) {
            System.out.println(depth);
        }
    }
}

The same code, just added catch StackOverflowError.
On a 64-bit system with a stack size of 1 MB, the result varies from 22 to 35 thousand calls. Why is there such a big difference? The point is JIT: methods are compiled in the background stream of the compiler in parallel with the execution of java code. At some point (after the recursion method has already been called several times), compilation of this method starts, and at this time execution in the interpreter continues. As soon as the compiler finishes its work, the next call will go into the compiled code.

Starting with Java 8, by default, we have 2 compilers in one VM - “light” C1 and “heavy” C2, ie a situation is possible when we have three types of frames on the stack: interpreted, compiled C1 and compiled C2. The frame size can vary greatly. The interpreter has the most cumbersome frames, because everything is stored on the stack (all arguments, local variables, current bytecode pointer, etc.). A lot of this is not needed in compiled code, and the more optimal the compiler is, the less it needs to be stored on the stack. C2, for example, will not get a place on the stack for local variables at all - it pushes everything into registers, and it also links up one level.

If the same code is executed in a purely interpreted mode with a key

-Xint

The result is almost always 12,500 (± several frames).
Now the same thing, but after the C1 compiler. In the case of the C1 compiler, the results are also quite stable - about 25 thousand. If you compile everything at once in C2: all this will work longer, but the result is 62 thousand frames. If you divide the standard stack size (1 MB) by 62 thousand, it turns out that about 16 bytes are spent per frame. I checked by compiled code - it is. The frame size is actually not 16 bytes, but 32, but in 1 frame 2 levels of nesting are immediately connected. By default, on a 64-bit architecture, the stack size is 1 MB, but it can be adjusted. These 2 keys are synonyms. A lesser known fact is that you can change the stack size of one particular thread:

-Xcomp -XX:TieredStopAtLevel=1

-Xcomp -XX:-TieredCompilation

-Xss, -XX:ThreadStackSize

Thread(ThreadGroup, target, name, stackSize)

But when creating large stacks, one should not forget that they occupy a place in memory and such a situation can happen that many threads with a large stack size will lead to out of memory: An interesting fact: if you look with the jvm key on Linux, it really will produce, that it has a ThreadStackSize of 1 MB, and if you look at Windows, the default value of the ThreadStackSize key will be 0. Where does 1 MB come from? It was a revelation for me that the default stack size is set in the exe file (the default size for the application is specified in the attributes of the exe format). The minimum stack size on a 64-bit system is approximately 228 Kb (it can vary from version to version of the JDK). How is the stack arranged and where does this minimum size come from?

java.lang.OutOfMemoryError: Unable to create new native thread

-XX:+PrintFlagsFinal

On the stack, in addition to the frames of your Java methods, there is still some reserved space. This is at least always 1 red zone (with a size of 1 page - 4 Kb) at the very top of the stack and several pages of the yellow zone.

The red and yellow zones are needed to check stack overflow. At the beginning, both zones are write protected. Each Java method, through an attempt to write to the address of the current stack pointer, checks for a red or yellow zone (when trying to write, the operating system generates an exception that the virtual machine catches and processes). When the yellow zone is reached, it unlocks so that there is enough space to start the stack overflow handler, and control is transferred to a special method that creates an instance of StackOverflowError and passes it on. When it enters the red zone, an unrecoverable error occurs and the virtual machine fatally terminates.

There is also a so-called shadow zone. It has a rather strange size: on Windows - 6 pages, on Linux, Solaris and other operating systems - 20 pages. This space is reserved for native methods inside the JDK and the needs of the virtual machine itself.

When I was preparing the presentation, I launched my recursive test in Java 8 and Java 9. At the last I got such a wonderful crash of the virtual machine (output fragment): Naturally, I downloaded the latest available build (at the time of the report it was 9.0 +119), this problem is also reproduced on it. This is a very good case for crash dump analysis (Andrey Pangin - Analysis of JVM crash dumps ). All skills were useful here, in particular, disassembly.

#

# A fatal error has been detected by the Java Runtime Environment:

#

#  EXCEPTION_STACK_OVERFLOW (0xc00000fd) at pc=0x0000019507acb5e0, pid=9048, tid=10544

#

# JRE version: Java(TM) SE Runtime Environment (9.0+119) (build 9-ea+119)

# Java VM: Java HotSpot(TM) 64-Bit Server VM (9-ea+119, mixed mode, tiered, compressed oops, g1 gc, windows-amd64)

# Problematic frame:

# J 155 C2 demo4.Recursion.recursion()V (12 bytes) @ 0x0000019507acb5e0 [0x0000019507acb5e0+0x0000000000000000]

#

# No core dump will be written. Minidumps are not enabled by default on client versions of Windows

#

# If you would like to submit a bug report, please visit:

#   http://bugreport.java.com/bugreport/crash.jsp

#

...

Here is an instruction that writes a value relative to the current stack pointer. The crash occurred just on this instruction: Using the value of the RSP register, we can calculate the address to which we write. You need to subtract 6000 in HEX from this address, you get some kind of value: We write by this value. There, in the crash dump, the stack ranges of the current stream are indicated: We see that this value falls at the end of the very first (topmost) page of this stack, i.e. just in the red zone. Indeed, there is such a bug. I analyzed it and found the reason: some JVM functions lack 6 available shadow pages on Windows (they take up more at runtime). The developers of the virtual machine miscalculated. By the way, the size of these zones can be changed with JVM keys.

Instructions:

00000000: 89 84 24 00 a0 ff ff             mov    DWORD PTR [rsp-0x6000],eax

00000007: 55                               push   rbp

00000008: 48 83 ec 10                      sub    rsp,0x10

0000000c: 49 ba 78 71 88 8d 00 00 00 00    movabs r10,0x8d887178

00000016: 41 83 42 70 02                   add    DWORD PTR [r10+0x70],0x2

0000001b: e8 e0 ff ff ff                   call   0x00000000


Registers:

RSP=0x0000007632e00ff8


Java Threads:

=>0x0000019571d71800 JavaThread "main" [_thread_in_Java, id=10544,


stack(0x0000007632e00000,0x0000007632f00000)]

      0x0000007632e00ff8

Why do we need big stacks at all? For Java EE, not otherwise.
Here is one of my favorite pictures on this subject.

On 2 lines of business logic, hundreds of frames from various frameworks and application servers are generated.

Performance Measurement Stacks

Profiling is an essential part of measuring the performance of your system. All profilers can be divided into 2 large groups: sampling and instrumenting.
The instrumenting profiler simply marks the methods: it adds at the beginning some signal about entering the method, and at the end - the signal about leaving it. It is clear that if we instruct each method in this way, all this will create a big overhead, although the measurement will be quite accurate.

public void someMethod(String... args) {
    Profiler.onMethodEnter("myClass.someMethod");
    // method body
    Profiler.onMethodExit("myClass.someMethod");
}

In production, a different approach is most often used - a sampling profiler. It with some periodicity (10-100 times per second) takes a dump of flows and looks at which threads with which traces are currently executing. The methods that most often fall into these stack traces are hot.

Let's look at an example of how this works right away. I wrote a small program. Despite the fact that it is small, you can’t immediately say that it can slow down in it.
Firstly, it generates 2 random geographic coordinates. Then, in a cycle, it calculates the distance from a randomly generated coordinate to another given point (Moscow) - i.e. There is a distanceTo function in which there is a lot of mathematics.
Results are added in a hash map.

All this in a loop runs many, many times:

package demo5;
import java.util.IdentityHashMap;
import java.util.Map;
import java.util.concurrent.ThreadLocalRandom;
public class Location {
    static final double R = 6371009;
    double lat;
    double lng;
    public Location(double lat, double lng) {
        this.lat = lat;
        this.lng = lng;
    }
    public static Location random() {
        double lat = ThreadLocalRandom.current().nextDouble() * 30 + 40;
        double lng = ThreadLocalRandom.current().nextDouble() * 100 + 35;
        return new Location(lat, lng);
    }
    private static double toRadians(double x) {
        return x * Math.PI / 180;
    }
    public double distanceTo(Location other) {
        double dlat = toRadians(other.lat - this.lat);
        double dlng = toRadians(other.lng - this.lng);
        double mlat = toRadians((this.lat + other.lat) / 2);
        return R * Math.sqrt(Math.pow(dlat, 2) + Math.pow(Math.cos(mlat) * dlng, 2));
    }
    private static Map calcDistances(Location target) {
        Map distances = new IdentityHashMap<>();
        for (int i = 0; i < 100; i++) {
            Location location = Location.random();
            distances.put(location, location.distanceTo(target));
        }
        return distances;
    }
    public static void main(String[] args) throws Exception {
        Location moscow = new Location(55.755773, 37.617761);
        for (int i = 0; i < 10000000; i++) {
            calcDistances(moscow);
        }
    }
}

Here, everything can slow down: both the generation of a random coordinate, and the measurement of distance (there is a lot of mathematics), and the layout on the map. Let's run the profiler and see what exactly the time is spent on.

I’ll take Java VisualVM (included in the standard JDK package - nothing is simpler), I find our process on the Sampler tab, click on the CPU and start the measurement (let’s work for half a minute). The default measurement interval is once every 100 ms.

What happened:

A little less than completely (according to the Java VisualVM profiler), we spend time on IdentiryHashMap.put.

If you look at a flat plate by methods, sorted by SelfTime:

As if nothing else is being done.
The same can be measured by other profilers (JProfiler, YourKit, etc.), I assure you, the result will be the same.

Are HashMap So Braking? Not. Just profilers lie.
At the same time, they are arranged in the same way: with a given frequency, they call the method through JMX or JVMTI, which receives a dump of all threads. For example, the JVMTI has a GetAllStackTraces method . What the virtual machine says, they summarize and print here.

Hand on heart, these are not profilers lying, but the JVM - it gives the wrong stack traces. All profilers have one big problem: the stack traces of threads can be removed only at safepoint moments, and these are certain points in the code about which the virtual machine knows that the thread can be safely stopped. And there really are few such points: they are inside the loops and at the exit points from the methods. If you have a large canvas of code - the same math; without cycles - it may not have a safepoint at all, which means that this canvas will never get on stack traces.

Another problem is that the threads that are sleeping and the threads that are running are sampled the same way, because we dump the dump from all threads. This is not very good, because we are not interested in sleeping threads in stack traces.

Also, most profilers cannot distinguish between native methods that sleep on some kind of blocking system call. For example, when waiting for data from a socket, the stream will be in the RUNNABLE state and the profiler will indicate that the stream is eating 100% of the CPU. But he does not eat the CPU, just a virtual machine cannot distinguish a working native method from a blocking system call.

What to do?

The OS provides opportunities for profiling native code. If we talk about Linux, there is a setittimer system call, which sets a timer and with a given frequency generates a special OS signal (SIGPROF) for profiling. The thread that is currently executing will receive it. It would be good for us to be able to use the capabilities of the OS and in the SIGPROF signal handler to collect stack traces of the current stream, even if it is not in safepoint. And in the HotSpot virtual machine, such an opportunity is provided. There is an undocumented private API: AsyncGetCallTrace, which can be called to get the current thread stack not in safepoint.
This hole was sawn specifically for Oracle Developer Studio. This is almost the only profiler who gets an honest stack trace.
While preparing this report, I watched if there was anyone else who uses these methods. I found literally 2 projects: one is old and already abandoned, and the other appeared relatively recently (in 2015) and is called honest-profiler.

The API here is quite simple: we prepare the place where we will stack the stack and call the method:

The third parameter of this method is the current context that comes to us in the signal handler.

Link to my own open source profiler: https://github.com/apangin/async-profiler . Take it - use it. Now he is already in a state where he is not ashamed to show people. True, now it is implemented only for Linux (Note: macOS support has been added since the report).

Let's check on the same example.
We say which process we are profiling.

Our pid is 3202.
A feature of my profiler (which I have not seen before) is that it can connect on the fly (the honest-profiler mentioned must be run as a Java agent when the application starts).

Let's give a few seconds to profiling. Here is the result:

At the end is a flat list of methods; a little higher - separate details (all stacks of threads). The picture is radically different. Almost a third of all time goes to math - calculating the distance. IdentityHashMap.put - generally at the bottom with a result of 2% (according to the first profiler, it occupied 100%). But it really takes time to calculate the identityHashCode of the object. And a lot of time is spent on put and resize itself. By the way, the generation of random locations is also not free (at least 12%).
Feel the difference.

The overhead of this profiler is much less. It can be started at least 1000 times per second, and this will be normal, since it removes the stack trace of only the active thread. And he adds the results into a very compact structure - he does not generate all these names of methods, classes. This is all calculated only when printed. And during profiling, only jmethodID (actually, pointers to methods) are added.

Stream dumps

There are many ways to dump threads: from Java code or from a native, from within the process itself or from the outside.
If we talk about analyzing the process from the inside, that is, the Java API getAllStackTraces, which gives us an array of StackTraceElement with all that it implies.

When we tried to use it at production, for the case of 2 thousand streams, each of which has a stack depth of 50-60 frames, this array alone occupied about 50 Mb.
There is a similar method for JMX (it is useful in that it can be pulled remotely). It returns the same StackTraceElement array, and in addition information about captured monitors.

If we talk about generating stack traces from the application itself, a much better way is JVMTI (Tool Interface) - a native interface for developing tools, profilers, analyzers, etc.

There is a GetAllStackTraces method, which is usually used by just profilers. Compared to the Java API, it has a very compact representation.

When removing dumps from the outside, the easiest way is to send the SIGQUIT process (either kill -3 or in the console the appropriate combination):

The advantage of the method is that the Java machine prints the stack trace itself. This is done at maximum speed. This happens anyway during safepoint, but we don't need to create any intermediate structures.
An alternative way is the jstack utility. It works through the dynamic attach mechanism (I will dwell on it in more detail).

It is important to understand that jstack and jmap utilities have 2 modes of operation. They differ in just one -F switch, but in essence they are 2 different utilities that serve the same thing, but work in two completely different ways.
Let me explain what the difference between them is.
Dynamic attach is a mechanism for communicating the utility with the JVM through a special interface. How does this happen (using Linux as an example)?

The jstack utility creates a certain file in the current directory - a signal that the utility wants to connect to the JVM, and sends a SIGQUIT signal to the virtual machine. The virtual machine processes this signal, sees the .attach_pid signal file in the current directory and, in response to this, launches a special thread - AttachListener (if it is already running, it does nothing). And in this thread, a UNIX domain socket is opened for communication between the jstack utility and the JVM. When the utility connects to this socket, the JVM checks the user rights on the other side so that other users cannot connect to the virtual machine and get some kind of private information. But the check there is very simple - only the exact correspondence of the effective UID and GID is checked (in the end there is such a popular problem that when starting jstack from under another user, even root,

After the UNIX socket connection is established, the utility sends a command, and the virtual machine executes this command, and the response sends back to the utility on the same socket.

Everything works in Window in a slightly different way (I don’t know why it couldn’t be done the same way; on Windows there aren’t UNIX sockets, but there are named pipes) - there is another beautiful API that I like, so I couldn’t but mention it here.

The beginning is about the same - a named pipe is created. Further, in the Windows API, there is a WriteProcessMemory function, which can write some data directly to the memory of another process, if it has rights to it. Through this function, a temporary auxiliary memory page is created in the address space of the java process, the command to be executed is written there, the arguments and the pipe name, where the answer will go. Another no less wonderful feature that allows you to implement a thread into someone else's process is CreateRemoteThread. The jstack utility starts a remote thread - it is executed already in the context of the virtual machine process. And as an argument to this thread, a pointer is passed to the previously created memory area, where there is all the information about the command.

Then everything is the same: the JVM itself executes the command and sends the result back.

The advantages of this approach:

all operations are performed directly by the virtual machine in the most efficient way;
since the interface is not dependent on the VM version, one jstack utility can remove dumps from different processes, regardless of which version of Java these VMs are running on.

The disadvantages include:

already mentioned restriction on user non-compliance;
it can be executed only on a “live” virtual machine, since the commands are executed by the virtual machine itself;
this mechanism can be disabled (for security reasons, for example) with a special JVM option

-XX:+DisableAttachMechanism.

As a “proof of concept”, I decided to write a simple utility in C that connects to a remote Java process in this way and executes the command there passed on the command line ( https://github.com/apangin/jattach ).

The virtual machine supports the following commands:

These are thread dumps, heap dumps, obtaining heap histograms, printing and setting virtual machine flags, executing commands that jcmd utility can execute, and load is probably the most interesting command that allows you to load the JVMTI agent library into remote virtual machine. Using the load command, my async profiler works (uploads its library to the remote JVM).

Briefly demonstrate how this works. I’ll start some process, for example, tomcat.
The pid of the process is 8856.

The same dump of the stream is issued upon command. Since this is not a java utility, but C, we do not need to waste time starting Java. The utility is very short - literally 100 lines for Windows and for Linux. Available on GitHub.

Through this mechanism, not only jstack utility works, but also jmap, jinfo and jcmd utilities (in fact, one of my jattach plays the role of all these utilities).

The second way is jstack -F mode. It differs in that there is no longer any cooperation from the JVM - the utility does everything itself.

On Linux, through the PTRACE_ATTACH system call (there is a similar one on Windows), it completely stops the process we are connecting to, freezing its state. Further, through the debug API, which allows you to read memory from another process, the jstack utility pulls everything you need from the remote JVM process. Next, it reconstructs the JVM structures and runs through them, it restores the stack itself.

Since the PTRACE_PEEKDATA operation can read only 1 machine word from someone else’s process in 1 time, if we need a big hip, we will have to make a lot of system calls (which, of course, is very slow).

But the plus is that:

no cooperation is required from the virtual machine - you can run the jstack -F utility even on a frozen or looped VM;
root can connect to a virtual machine created by any user.

Cons are also clear:

It is running slower than if the virtual machine itself were doing it.
since the jstack utility reconstructs the virtual machine structures, the utility must know where it is located, so the version of the jstack utility must exactly match the version of the JVM against which it runs.

What is it for? For example, in a situation where the server starts for some reason to slow down and stops managing the flow of incoming requests. When the application detects this situation (for example, the thread pool is exhausted), a thread dump is launched so that the developer can see in the logs what the system was doing at that moment, what exactly was dull.

In our project, we previously used the Java API to get a stack of threads, it was very slow and took up a lot of memory. Instead, we now collect information through Dynamic Attach - there is a Java API to it, through which the jstack utility works. We recognize the pid of our own process, connect to ourselves through Dynamic Attach and force the virtual machine to generate a dump for us.

public static void dump() throws AttachNonSupportedException, IOException {
    String vmName = ManagementFactory.getRuntimeMXBean().getName();
    String pid = vmName.substring(0, vmName.indexOf(‘@’));
    HotSpotVirtualMachine vm = (HotSpotvirtualMachine) VirtualMachine.attach(pid);
    try {		
        vm.localDataDump();	
    } finally {	
        vm.detach();	
    }
}

GitHub link for example: https://github.com/odnoklassniki/one-nio/blob/master/src/one/nio/mgt/

Hip dumps

The jmap utility is basically the same.

jmap -dump:live,format=b,file=heap.bin PID
The utility can dump the entire contents of the heap or, if you do not need the whole heap, simply build class histograms: The

virtual machine runs through the heap and considers how much memory the instances of each class occupy. It then displays a histogram sorted by class size. From this histogram, you can understand which objects are most littering in hip.

Jmap also has 2 modes of operation: through Dynamic Attach and through the Serviceability Agent (with the same pros and cons).

jmap without -F is fast, it can dump dumps from applications of other versions, but it only runs on a live JVM. In turn, jmap -F can take dumps even from hung applications, but this is very slow.

In which case jmap might not work? Often you want to get a hip dump when some kind of problem occurs. Suppose there was some leak in the heap, the VM went into an endless garbage collection cycle - at this very moment you want to dump the heap to see what it is littered with. But just at that moment, the virtual machine cannot do this, because it is simply impossible to connect through the Dynamic Attach mechanism. You will have to either restart the utility, hoping to get to the interval when the application code is still executed, or use the -F option. But here they can watch for trouble. If garbage collection is working now, then the state of the heap may not be consistent. The jmap utility cannot restore anything, simply because some pointers were rewritten in the heap.

Fortunately, there is a tricky way to quickly dump dumps from dead or frozen processes in forced mode. You do not need to run through the entire hip, analyze objects - you can generate not a dump of the heap, but a dump of the whole process - core dump. This is done much faster. Nothing is scanned, just the memory is written in a row to the disk at maximum disk speed. And then the application can continue to work. Then jmap can be set against the resulting core dump. Let's show an example. I have tomcat running with pid 2362. Let's try to remove jmap in forced mode: This takes a long time. We are waiting for a couple of minutes and interrupt. At the same time, if I call gcore, we get the core dump in a second. It occupies 227 MB.

$ sudo gcore 1234

$ jmap -dump:format=b,file=heap.bin /path/to/java core.1234

The application works further: your users did not even notice that tomcat stopped accepting connections for a second.

Then jmap can be set against this core file.

It will also work for a long time, because the same mechanism here, but you don’t care anymore, because the application is working, and processing is offline (it’s even a little faster than jmap -F, because here we get data from the disk in portions, and not pull out a word from the remote process through system calls). True, if the dump is inconsistent, jmap -F will not be able to process anything in this case either.

In those cases when something went badly, I want to generate a hip-dump automatically. There is a virtual machine option for this:

-XX:+HeapDumpOnOutOfMemoryError

It is useful for the most common case - when you had out of memory. You can’t wait for the GC assembly, it’s better to immediately dump the hip for further analysis and restart the application.

You can also dump hip before and after the complete garbage collection:

Optionally, you can specify the path or specific file name where it will all be added.

Interestingly, all of the options listed are manageable, i.e. You can turn them on and off right in runtime, either externally through the jinfo command, or from the application through the JMX interface.

In Java 8 update 92, there are 2 more new options for automating hip dumps (just to make downtime as small as possible):

Honestly, the options are, of course, useful, but you could have done without them. Nothing stopped setting these 2 options at the same time (dump the hip on out of memory and then beat the same application):

What are the ways to generate a hip dump?

This can be done from java, from native, from inside the process, from outside the process.
From within the process, the easiest way is through the corresponding MXBean:

HotSpotDiagnosticMXBean bean = ManagementFactory.newPlatformMXBeanProxy(
            ManagementFactory.getPlatformMBeanServer(),
            "com.sun.management:type=HotSpotDiagnostic",
            HotSpotDiagnosticMXBean.class);
bean.dumpHeap("/tmp/heap.bin", true);

Similarly, if the JMX remote interface is sticking out from the outside, you can connect through it and force a hip dump on the remote virtual machine.
An important point: jmap always works only on the local machine, so if you can’t connect via ssh to the production server, but you need to dump the hip, the only way to get it is through the JMX remote interface.

Faster and more flexible from the application allows you to do the same JVMTI. There is such a wonderful IterateOverInstancesOfClass function .

You can immediately say this function that you need to bypass only instances of a particular class or interface. For example, you want to find objects only of the type you need, and you have 16 GB of hip.

An interesting point is that you need to use this function in two stages. At first, this function itself does not return anything, but it makes it possible to mark the objects that you want to receive later. And only then by calling GetObjectsWithTags you can get the marked objects into the jobject array.

My favorite topic is the serviceability agent - the API that HotSpot has. It was originally created for the developers of the JDK and JVM, but found application among simple Java programmers.

There is a sa-jdi.jar file in Java libraries - this is just the API serviceability agent. He knows everything about the insides of a virtual machine: what are the internal structures of the JVM, what offsets what lies, and there is a Java API to use it all. In particular, you can connect to another VM process and bypass the hip of another process.

Let's look at an example.
Imagine that you are a hacker, got access to the server for a short time. You want to get some information from there, for example, private keys. But if you dump the whole hip, they will notice it very quickly, because the application idle monitor will immediately work. And with the help of the serviceability agent, you can bypass only objects of the required classes in heap and print them.
The API is very simple.

package demo6;
import sun.jvm.hotspot.oops.DefaultHeapVisitor;
import sun.jvm.hotspot.oops.Klass;
import sun.jvm.hotspot.oops.Oop;
import sun.jvm.hotspot.runtime.VM;
import sun.jvm.hotspot.tools.Tool;
public class KeyScanner extends Tool {
    @Override
    public void run() {
        Klass klass = VM.getVM().getSystemDictionary().find("java/security/PrivateKey", null, null);
        VM.getVM().getObjectHeap().iterateObjectsOfKlass(new DefaultHeapVisitor() {
            @Override
            public boolean doObj(Oop oop) {
                oop.iterate(new FieldPrinter("key"), false);
                return false;
            }
        }, klass);
    }
    public static void main(String[] args) {
        new KeyScanner().execute(args);
    }
}

The entire framework has already been prepared (in the form of the Tool class), and the utility should simply extend this class and call its execute method with arguments (even the processing of the arguments already takes place for you). It remains only in the run method to implement the logic.

This all works through the serviceability agent mechanism, when the debugged process is completely stopped and the utility communicates with the Java process by reading the remote memory. At the end of the utility, the process resumes.
Let's try to see our tomcat in this way.

After a couple of seconds of scanning the hip, the result was obtained - all the objects of the desired type that are in the hip were printed.

Of course, I cheated a little. Print displays a flat description of the object, and the key itself is inside this byte array.

Therefore, you still need to go around this field. I have prepared for this case the FieldPrinter class, which moves around the object and, if its name matches the specified fieldName, converts it to a byte array and prints it.

package demo6;
import sun.jvm.hotspot.oops.DefaultOopVisitor;
import sun.jvm.hotspot.oops.OopField;
import sun.jvm.hotspot.oops.TypeArray;
public class FieldPrinter extends DefaultOopVisitor {
    private String fieldName;
    FieldPrinter(String fieldName) {
        this.fieldName = fieldName;
    }
    @Override
    public void doOop(OopField field, boolean isVMField) {
        if (field.getID().getName().equals(fieldName)) {
            TypeArray array = (TypeArray) field.getValue(getObj());
            long length = array.getLength();
            System.out.print(fieldName + ": ");
            for (long i = 0; i < length; i++) {
                System.out.printf("%02x", array.getByteAt(i));
            }
            System.out.println();
        }
    }
}

At the output, we have in pure form a private key from the server.

The serviceability agent still has all sorts of features that any other API cannot get. For example, such an exotic case: you need to bypass not all objects, but only those in oldGen. Since the serviceability agent knows everything about internal structures, it even has such an API. You can get the heap representation in the remote Java machine and get the oldGen representation from it, again with this iterator go through all the objects and see if this object runs only in oldGen, only then print it.
In this case, I will print all the tomcat lines that are in oldGen:

package demo6;
import sun.jvm.hotspot.gc_implementation.parallelScavenge.PSOldGen;
import sun.jvm.hotspot.gc_implementation.parallelScavenge.ParallelScavengeHeap;
import sun.jvm.hotspot.gc_interface.CollectedHeap;
import sun.jvm.hotspot.oops.DefaultHeapVisitor;
import sun.jvm.hotspot.oops.Klass;
import sun.jvm.hotspot.oops.Oop;
import sun.jvm.hotspot.runtime.VM;
import sun.jvm.hotspot.tools.Tool;
public class OldGen extends Tool {
    @Override
    public void run() {
        CollectedHeap heap = VM.getVM().getUniverse().heap();
        PSOldGen oldGen = ((ParallelScavengeHeap) heap).oldGen();
        Klass klass = VM.getVM().getSystemDictionary().find("java/lang/String", null, null);
        VM.getVM().getObjectHeap().iterateObjectsOfKlass(new DefaultHeapVisitor() {
            @Override
            public boolean doObj(Oop oop) {
                if (oldGen.isIn(oop.getHandle())) {
                    oop.printValue();
                    System.out.println();
                }
                return false;
            }
        }, klass);
    }
    public static void main(String[] args) {
        new OldGen().execute(args);
    }
}

Instead of conclusions:

Use thread dumps and hip dumps in production;
лучше, конечно, легальным способом. Для этого есть широкие возможности API (для снятия дампов изнутри и снаружи процесса, с помощью JVM и независимо от нее).

This concludes the second part of the post.

It will be possible to meet Andrei personally in Moscow very soon - April 7-8 at JPoint 2017 . This time he will give a talk “ JVM-profiler with a sense of tact ”, in which he will tell how you can get honest measurements of application performance by combining several approaches to profiling. The report will be “only” hourly, but in the discussion area no one will restrict you from questions and heated debates!

In addition, JPoint has a whole bunch of cool reports on almost everything from the Java world - we gave an overview of the planned reports in another post , but you can simply find the conference program on the event website .

Tags:

Everything you wanted to know about stack traces and hip dumps. Part 2

Performance Measurement Stacks

Stream dumps

Hip dumps

Also popular now: