How not to litter in java

From the sandbox

There is a popular misconception that if you don’t like the garbage collection, you should write not in Java, but in C / C ++. For the last three years I have been writing low-latency Java code for currency trading, and I had to avoid creating unnecessary objects in every possible way. As a result, I formulated for myself a few simple rules on how to reduce allocations in Java, if not to zero, then to a certain reasonable minimum, without resorting to manual memory management. Perhaps someone from the community will also benefit.

Why avoid creating garbage at all

About what are the GC and how to customize them said and written a lot. But ultimately, no matter how you customize the GC - the code that litters will work suboptimally. There is always a trade-off between throughput and latency. It becomes impossible to improve one without deteriorating the other. As a rule, GC overheads are measured by studying logs - you can see from them at what time there were pauses and how long they took. However, the GC logs contain far from all the information about these overheads. The object created by the thread is automatically placed in the L1 cache of the processor core on which the thread is running. This leads to crowding out of other potentially useful data. With a large number of allocations, payload can be pushed out of the L3 cache. The next time the thread accesses this data, the miss cache will occur, which will lead to delays in the execution of the program. Moreover, since the L3 cache is common to all cores within the same processor, the trash will push data and other threads / applications out of the L3 cache, and they will already be confronted with extra expensive cache missas, even if they themselves are written in C and do not create garbage. No settings, no garbage collectors (neither C4 nor ZGC) will help to cope with this problem. The only way to improve the situation as a whole is not to create unnecessary objects unnecessarily. Java, unlike C ++, does not have a rich arsenal of mechanisms for working with memory, but nevertheless there are a number of ways to minimize allocations. About them will be discussed. The trash stream will push data and other threads / applications out of the L3 cache, and they will already be confronted with extra expensive cache missas, even if they themselves are written in bare C and do not create garbage. No settings, no garbage collectors (neither C4 nor ZGC) will help to cope with this problem. The only way to improve the situation as a whole is not to create unnecessary objects unnecessarily. Java, unlike C ++, does not have a rich arsenal of mechanisms for working with memory, but nevertheless there are a number of ways to minimize allocations. About them will be discussed. The trash stream will push data and other threads / applications out of the L3 cache, and they will already be confronted with extra expensive cache missas, even if they themselves are written in bare C and do not create garbage. No settings, no garbage collectors (neither C4 nor ZGC) will help to cope with this problem. The only way to improve the situation as a whole is not to create unnecessary objects unnecessarily. Java, unlike C ++, does not have a rich arsenal of mechanisms for working with memory, but nevertheless there are a number of ways to minimize allocations. About them will be discussed. neither ZGC) will help to cope with this problem. The only way to improve the situation as a whole is not to create unnecessary objects unnecessarily. Java, unlike C ++, does not have a rich arsenal of mechanisms for working with memory, but nevertheless there are a number of ways to minimize allocations. About them will be discussed. neither ZGC) will help to cope with this problem. The only way to improve the situation as a whole is not to create unnecessary objects unnecessarily. Java, unlike C ++, does not have a rich arsenal of mechanisms for working with memory, but nevertheless there are a number of ways to minimize allocations. About them will be discussed.

Lyrical digression

Разумеется, не нужно писать весь код в стиле garbage free. Фишка языка Java как раз в том, что можно сильно упростить себе жизнь, убирая только основные источники мусора. Можно также не заниматься safe memory reclamation при написании lock-free алгоритмов. Если некий код выполняется только один раз при старте приложения, то он может аллоцировать сколько угодно, и это не страшно. Ну и разумеется, основной рабочий инструмент при избавлении от лишнего мусора — это allocation profiler.

Using primitive types

The simplest thing to do in many cases is to use primitive types instead of object ones. The JVM has a number of optimizations that minimize the overhead of object types, such as caching small values of integer types and inlining simple classes. But these optimizations are not always worth relying on, because they may not work: the integer value may not be cached, and inlineing may not occur. Moreover, when working with a conditional Integer, we are forced to follow the link, which potentially leads to a cache mission. Also, all objects have headers that take up extra space in the cache, displacing other data from there. Let's assume that a primitive int is 4 bytes. ObjectIntegeroccupies 16 bytes + the size of the link to this Integer is 4 bytes minimum (in the case of compressed oops). In total, it turns out that it Integertakes five (!) Times more space than int. Therefore, it is better to personally use primitive types. I will give a few examples.

Example 1. Conventional calculations

Suppose we have a regular function that just counts something.

Integer getValue(Integer a, Integer b, Integer c){
   return (a + b) / c;
}

Such code is most likely to be inlined (both the method and the classes) and will not lead to redundant allocations, but you cannot be sure of this. Even if this happens, there will be a problem with the fact that it can fly out of here NullPointerException. One way or another, the JVM will either have to put checks on nullunder the hood, or somehow understand from the context that nullit cannot come as an argument. Anyway, it is better to just write the same code in primitives.

intgetValue(int a, int b, int c){
   return (a + b) / c;
}

Example 2. Lambda

Sometimes objects are created without our knowledge. For example, if we transfer primitive types to where object is expected. This often happens when using lambda expressions.
Imagine that we have the following code:

voidcalculate(Consumer<Integer> calculator){
   int x = System.currentTimeMillis();
   calculator.accept(x);
}

Despite the fact that the variable x is a primitive, an object of type Integer will be created that will be passed to the calculator. To avoid this, you should use IntConsumerinstead of Consumer<Integer>:

voidcalculate(IntConsumer calculator){
   int x = System.currentTimeMillis();
   calculator.accept(x);
}

This code will not lead to the creation of an extra object. In java.util.function there are a set of standard interfaces adapted for use primitive types DoubleSupplier, LongFunctionetc. Well, if something is missing, you can always add the necessary interface with primitives. For example, instead BiConsumer<Integer, Double>you can use a homemade interface.

interfaceIntDoubleConsumer{
    voidaccept(int x, double y);
}

Example 3. Collections

The use of a primitive type can be complicated by the fact that a variable of this type lies in a certain collection. Suppose that we have a certain one List<Integer>and we want to find out what numbers it contains and count how many times each number repeats. For this we use HashMap<Integer, Integer>. The code looks like this:

List<Integer> numbers = new ArrayList<>();
// fill numbers somehow
Map<Integer, Integer> counters = new HashMap<>();
for (Integer x : numbers) {
    counters.compute(x, (k, v) -> v == null ? 1 : v + 1);
}

This code is bad in several ways. First, it uses an intermediate data structure, without which one could probably do without. Well, okay, for simplicity, we assume that this list will be needed later, i.e. it can not be completely removed. Secondly, in both places the object is used Integerinstead of the primitive int. Thirdly, many allocations occur in the method compute. Fourth, the iterator is allocated. This allocation is most likely to be inline, but nonetheless. How to turn this code into a garbage free code? You just need to use the collection on primitives from some third-party library. There are a number of libraries containing such collections. The following piece of code uses the agrona library .

IntArrayList numbers = new IntArrayList();
// fill numbers somehow
Int2IntCounterMap counters = new Int2IntCounterMap(0);
for (int i = 0; i < numbers.size(); i++) {
    counters.incrementAndGet(numbers.getInt(i));
}

The objects that are created here are two collections and two int[]that are inside these collections. Both collections can be reused by calling their method clear(). Using collections on primitives, we did not complicate our code (and even simplified by removing the compute method with a complex lambda inside it) and received the following additional bonuses compared to using standard collections:

Almost complete lack of allocations. If the collection is reused, then there will be no allocation at all.
Significant memory savings ( IntArrayListtakes about five times less space than ArrayList<Integer>. As already mentioned, we care about the economical use of processor caches, and not RAM.
Sequential memory access. A lot has been written about why this is important, so I will not stop there. Here are a couple of articles: Martin Thompson and Ulrich Drepper .

Another small comment about the collections. It may be that the collection contains values of different types, and therefore it cannot be replaced with a collection with primitives. In my opinion, this is a sign of a poorly designed data structure or algorithm as a whole. Most likely in this case, the allocation of unnecessary objects is not the main problem.

Mutable objects

And what to do if primitives do not work? For example, if the method we need should return several values. The answer is simple - use mutable objects.

Small retreat

В некоторых языках делается упор на использование immutable объектов, например в Scala. Основной аргумент в их пользу заключается в том, что сильно упрощается написание многопоточного кода. Тем не менее, имеются и накладные расходы, связанные с избыточной аллокацией мусора. Если мы хотим этого их избежать, то нам не следует создавать короткоживущие immutable объекты.

How does it look in practice? Suppose we need to calculate the quotient and the remainder of the division. And for this we use the following code.

classIntPair{
   int x;
   int y;
}
IntPair divide(int value, int divisor){
   IntPair result = new IntPair();
   result.x = value / divisor;
   result.y = value % divisor;
   return result;
}

How can I get rid of allocation in this case? That's right, pass IntPairin the argument and write the result there. In this case, you need to write a detailed javadoc, and even better to use some kind of convention for the names of variables where the result is written. For example, you can start with the prefix out. Garbage free code in this case will look like this:

voiddivide(int value, int divisor, IntPair outResult){
   outResult.x = value / divisor;
   outResult.y = value % divisor;
}

I want to note that the method divideshould not save the link to the pair anywhere or pass it to the methods that can do this, otherwise we may have big problems. As we see, mutable objects are harder to use than primitive types, so if there is an opportunity to use primitives, then it is better to do so. In fact, in our example, we transferred the problem with allocation from the inside of the divide method to the outside. In all the places where we call this method, we will need to have some kind of dummy IntPairthat we will pass to divide. It is often enough to store this dummy in the finalfield of the object from which we call the method divide. Let me give you a far-fetched example: suppose that our program deals only with the fact that it receives a stream of numbers over the network, divides them and sends the result to the same socket.

classSocketListener{
   privatefinal IntPair pair = new IntPair();
   privatefinal BufferedReader in;
   privatefinal PrintWriter out;
   SocketListener(final Socket socket) throws IOException {
       in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
       out = new PrintWriter(socket.getOutputStream(), true);
   }
   voidlistenSocket()throws IOException {
       while (true) {
           int value = in.read();
           int divisor = in.read();
           divide(value, divisor, pair);
           out.print(pair.x);
           out.print(pair.y);
       }
   }
}

For brevity, I did not write “extra” code for error handling, correct program shutdown, etc. The main idea of this piece of code is that the object we use is IntPaircreated once and saved in the finalfield.

Object Pools

When we use mutable objects, we must first get an empty object from somewhere, then write the data we need into it, use it somewhere, and then return the object “in place”. In the above example, the object was always “in place”, i.e. atfinalfield. Unfortunately, it is not always possible to do it in a simple way. For example, we may not know in advance exactly how many objects we need. In this case, object pools come to our rescue. When we need an empty object, we get it from the object pool, and when it ceases to be needed, we return it there. If there is no free object in the pool, the pool creates a new object. This is in fact a manual memory management with all the ensuing consequences. It is desirable not to resort to this method if it is possible to use the previous methods. What can go wrong?

We can forget to return the object to the pool, and then create garbage ("memory leak"). This is a small problem - the performance will slightly decrease, but the GC will work and the program will continue to work.
We can return the object to the pool, but save the link to it somewhere. Then someone else will get an object from the pool, and at this point in our program there will already be two links to the same object. This is a classic use-after-free problem. It is difficult to debug, because unlike C ++, the program will not fall with a segfolt, but will continue to work incorrectly .

In order to reduce the likelihood of making the errors described above, you can use the standard try-with-resources construction. It may look like this:

publicinterfaceStorage<T> {
   T get();
   voiddispose(T object);
}
classIntPairimplementsAutoCloseable{
    privatestaticfinal Storage<IntPair> STORAGE = new StorageImpl(IntPair::new);
    int x;
    int y;
    privateIntPair(){}
    publicstatic IntPair create(){
        return STORAGE.get();
    }
    @Overridepublicvoidclose(){
        STORAGE.dispose(this);
    }
}

The divide method might look like this:

IntPair divide(int value, int divisor){
    IntPair result = IntPair.create();
    result.x = value / divisor;
    result.y = value % divisor;
    return result;
}

And the method is listenSocketlike this:

voidlistenSocket()throws IOException {
    while (true) {
        int value = in.read();
        int divisor = in.read();
        try (IntPair pair = divide(value, divisor)) {
            out.print(pair.x);
            out.print(pair.y);
        }
    }
}

In the IDE, as a rule, you can customize the highlighting of all use cases of AutoCloseableobjects outside the try-with-resources block. But this is not an absolute option, because Highlighting in the IDE can simply be turned off. Therefore, there is another way to ensure that the object is returned to the pool - inversion of control. I will give an example:

classIntPairimplementsAutoCloseable{
    privatestaticfinal Storage<IntPair> STORAGE = new StorageImpl(IntPair::new);
    int x;
    int y;
    privateIntPair(){}
    privatestaticvoidapply(Consumer<IntPair> consumer){
        try(IntPair pair = STORAGE.get()) {
            consumer.accept(pair);
        }
    }
    @Overridepublicvoidclose(){
        STORAGE.dispose(this);
    }
}

In this case, in principle, we cannot access the class object IntPairfrom the outside. Unfortunately, this method does not always work either. For example, it will not work if one thread pulls objects from the pool and puts them in a queue, while another thread pulls them out of the queue and returns to the pool.

Obviously, if we store in the pool the self-written objects, but some library objects that do not implement AutoCloseable, then the variant with try-with-resources also does not work.

An additional problem here is multithreading. The implementation of the object pool must be very fast, which is quite difficult to achieve. A slow pool can do more harm for performance than good. In turn, the allocation of new objects in TLAB occurs very quickly, much faster than malloc in C. Writing a fast object pool is a separate topic that I would not like to develop now. I can only say that I did not see any good "ready-made" implementations.

Instead of conclusion

In short, reusing objects with object pools is a serious hemorrhoid. Fortunately, almost always you can do without it. My personal experience suggests that excessive use of object pools signals problems with the application architecture. As a rule, we only need one instance of the object cached in the finalfield. But even this is overkill, if it is possible to use primitive types.

Update:

Yes, I remembered another way for those who are not afraid of bitwise shifts: packing several small primitive types into one big one. Suppose we need to return two int. In this particular case, you can not use the object IntPair, and return one long, the first 4 bytes in which will correspond to the first int'y, and the second 4 bytes - to the second. The code might look like this:

longcombine(int left, int right){
   return ((long)left << Integer.SIZE) | (long)right & 0xFFFFFFFFL;
}
intgetLeft(long value){
   return (int)(value >>> Integer.SIZE);
}
intgetRight(long value){
   return (int)value;
}
longdivide(int value, int divisor){
    int x = value / divisor;
    int y = value % divisor;
    return combine(left, right);
}
voidlistenSocket()throws IOException {
    while (true) {
        int value = in.read();
        int divisor = in.read();
        long xy = divide(value, divisor);
        out.print(getLeft(xy));
        out.print(getRight(xy));
    }
}

Of course, such methods need to be thoroughly tested, because it is quite easy to jiggle when writing them. But then just use it.

Tags: