xlix123 July 20, 2013 at 17:59

Library Trove. Primitive Type Collections in Java

The standard Java library lacks the ability to handle collections of primitive types such as int, long, etc. The standard way out is to use objects of classes Integer, Long, etc.

This approach works well on a small number of elements, because, firstly, autoboxing / autounboxing occurs during any operation, and secondly, references to objects in heap are stored in the collection. Objects in heap not only introduce additional overhead from memory, but also create a load on the GC .

There is another unobvious minus of objects in heap - caching in modern processors. The processor loads data into the cache in blocks. In the case of sequential processing of the array, several elements are loaded into the cache at once. In the case of objects scattered on heap, hits in the cache will be less. Read more about caching and memory hierarchy here .

The Trove library provides a standard collection interface for working with primitive types. For most applications, Trove collections are faster and consume less memory.

The collection set includes:

Unlike jdk, Trove Hash uses the Open addressing collision resolution method.

Naming Principle - T.
For example:
TIntList interface - List of int, implementation of TIntArrayList:

TIntList l = new TIntArrayList();

TLongLongMap - Map interface with long keys and long values, TLongLongHashMap implementation:

TLongLongMap m = new TLongLongHashMap();

In jdk collections, if the element is not found, null is returned. “NoEntryValue” returns to Trove, the default is 0. You can learn NoEntryValue , set NoEntryValue when creating a collection.

The advantage of Trove collections is the processing methods - forEach,

    public static long troveEach() {
        final long [] rvalue = {0};
// troveMap - TLongLongHashMap
// TLongProcedure будет вызываться для каждого элемента коллекции, 
// пока не вернет false
// или не кончатся элементы
        troveMap.forEachValue(new TLongProcedure() {
            @Override
            public boolean execute(long value) {
                rvalue[0] += value;
                return true;
            }
        });
        return rvalue[0];
    }

grep , inverseGrep - conditional list compilation (for TList ...) and transformValues - inplace change operations on collection elements.

A useful feature - in the case of a HashMap / HashSet with an object (the descendant of Object) as a key, you can use your hash function Interface HashingStrategy.

For benchmarking, we used the excellent benchmark tool jmh .
It would be great if the developers published it in the maven repository.

The output had to be slightly edited to maintain formatting, one operation - 1 million elements ( full report here ):

$ java -version
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
$ java -server -XX:+AggressiveOpts -Xms2048m \
  -Xmx2048m -jar microbenchmarks.jar ".*Trove.*" -prof gc -i 3 -r 5s

Benchmark Mode Thr Cnt Sec Mean Mean error Units
// Insert To List
IntListJdkInsert thrpt 1 3 5 104.950 6.756 ops / sec
// Complete enumeration List
IntListJdkTraverse thrpt 1 3 5 774.100 71.809 ops / sec
// Insert into TIntArrayList
IntListTroveInsert thrpt 1 3 5 424.556 28.239 ops / sec
// Full enumeration TIntArrayList
IntListTroveTraverse thrpt 1 3 5 3548.806 7.712 ops / sec
// Insert In HashMap
LongHashMapJdkInsert thrpt 1 3 5 24.683 1.994 ops / sec
// search for all elements in the HashMap take turns
LongHashMapJdkSearch thrpt 1 3 5 67.789 1.119 ops / sec
// full enumeration of HashMap values
LongHashMapJdkTraverse thrpt 1 3 5 99.761 0.882 ops / sec
// Insert In TLongLongMap
LongHashMapTroveInsert thrpt 1 3 5 28.750 0.165 ops / sec
// search for all elements in TLongLongMap in turn
LongHashMapTroveSearch thrpt 1 3 5 145.933 0.416 ops / sec
// full enumeration of TLongLongMap values using forEach
LongHashMapTroveTraverse thrpt 1 3 5 318.528 0.980 ops / sec

The amount of occupied memory is not so easy to calculate, but an indirect conclusion can be drawn from the GC activity:

Вставка jdк в List:
Iteration   1 (5s in 1 thread): 103,950 ops/sec
 GC | wall time = 5,002 secs,  GC time = 0,331 secs, GC% = 6,62%, GC count = +24
Вставка Trove в TIntArrayList:
Iteration   1 (5s in 1 thread): 428,400 ops/sec
  GC | wall time = 5,002 secs,  GC time = 0,019 secs, GC% = 0,38%, GC count = +32

If you look at the sources of TIntArrayList, it will become clear where the performance gain comes from - the data is stored as an array:

public class TIntArrayList implements TIntList, Externalizable {
	static final long serialVersionUID = 1L;
    /** the data of the list */
    protected int[] _data;

TLongLongMap search speed is explained using forEach - no temporary objects are created and unboxing is excluded.

The same benchmark, but one operation - 1 thousand elements:

Benchmark Mode Thr Cnt Sec Mean Mean error Units
IntListJdkInsert thrpt 1 3 5 239478.011 871.469 ops / sec
IntListJdkTraverse thrpt 1 3 5 1326701.717 1649.389 ops / sec
IntListTroveInsert thrpt 1 3 5 315562.594 2483.415 ops / sec
IntListTroveTraverse thrpt 1 3 5 3630599.806 10822.903 ops / sec
LongHashMapJdkInsert thrpt 1 3 5 45315.689 47.630 ops / sec
LongHashMapJdkSearch thrpt 1 3 5 114759.789 424.996 ops / sec
LongHashMapJdkTraverse thrpt 1 3 5 210012.550 139.001 ops / sec
LongHashMapTroveInsert thrpt 1 3 5 33078.583 119.127 ops / sec
LongHashMapTroveSearch thrpt 1 3 5 148311.567 267.613 ops / sec
LongHashMapTroveTraverse thrpt 1 3 5 351955.550 901.902 ops / sec

It can be seen that with a reduction in the number of elements in the collection, the gap in performance falls.

It can be concluded that if the collections are small and are used infrequently, then there is no particular sense in including an additional dependence in the project.
But if the project massively uses large collections of primitive types, then the use of Trove can be justified.

Project code is available on GitHub .

PS. The values of the number of elements when creating collections were not set intentionally.

Tags:

Library Trove. Primitive Type Collections in Java

Also popular now: