Library Trove. Primitive Type Collections in Java

    The standard Java library lacks the ability to handle collections of primitive types such as int, long, etc. The standard way out is to use objects of classes Integer, Long, etc.

    This approach works well on a small number of elements, because, firstly, autoboxing / autounboxing occurs during any operation, and secondly, references to objects in heap are stored in the collection. Objects in heap not only introduce additional overhead from memory, but also create a load on the GC .

    There is another unobvious minus of objects in heap - caching in modern processors. The processor loads data into the cache in blocks. In the case of sequential processing of the array, several elements are loaded into the cache at once. In the case of objects scattered on heap, hits in the cache will be less. Read more about caching and memory hierarchy here .

    The Trove library provides a standard collection interface for working with primitive types. For most applications, Trove collections are faster and consume less memory.

    The collection set includes:

    Unlike jdk, Trove Hash uses the Open addressing collision resolution method.

    Naming Principle - T.
    For example:
    TIntList interface - List of int, implementation of TIntArrayList:
    TIntList l = new TIntArrayList();
    

    TLongLongMap - Map interface with long keys and long values, TLongLongHashMap implementation:
    TLongLongMap m = new TLongLongHashMap();
    

    In jdk collections, if the element is not found, null is returned. “NoEntryValue” returns to Trove, the default is 0. You can learn NoEntryValue , set NoEntryValue when creating a collection.

    The advantage of Trove collections is the processing methods - forEach,
        public static long troveEach() {
            final long [] rvalue = {0};
    // troveMap - TLongLongHashMap
    // TLongProcedure будет вызываться для каждого элемента коллекции, 
    // пока не вернет false
    // или не кончатся элементы
            troveMap.forEachValue(new TLongProcedure() {
                @Override
                public boolean execute(long value) {
                    rvalue[0] += value;
                    return true;
                }
            });
            return rvalue[0];
        }
    

    grep , inverseGrep - conditional list compilation (for TList ...) and transformValues - inplace change operations on collection elements.

    A useful feature - in the case of a HashMap / HashSet with an object (the descendant of Object) as a key, you can use your hash function Interface HashingStrategy.

    For benchmarking, we used the excellent benchmark tool jmh .
    It would be great if the developers published it in the maven repository.

    The output had to be slightly edited to maintain formatting, one operation - 1 million elements ( full report here ):
    $ java -version
    java version "1.7.0_21"
    Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
    $ java -server -XX:+AggressiveOpts -Xms2048m \
      -Xmx2048m -jar microbenchmarks.jar ".*Trove.*" -prof gc -i 3 -r 5s
    

    Benchmark Mode Thr Cnt Sec Mean Mean error Units
    // Insert To List
    IntListJdkInsert thrpt 1 3 5 104.950 6.756 ops / sec
    // Complete enumeration List
    IntListJdkTraverse thrpt 1 3 5 774.100 71.809 ops / sec
    // Insert into TIntArrayList
    IntListTroveInsert thrpt 1 3 5 424.556 28.239 ops / sec
    // Full enumeration TIntArrayList
    IntListTroveTraverse thrpt 1 3 5 3548.806 7.712 ops / sec
    // Insert In HashMap
    LongHashMapJdkInsert thrpt 1 3 5 24.683 1.994 ops / sec
    // search for all elements in the HashMap take turns
    LongHashMapJdkSearch thrpt 1 3 5 67.789 1.119 ops / sec
    // full enumeration of HashMap values
    LongHashMapJdkTraverse thrpt 1 3 5 99.761 0.882 ops / sec
    // Insert In TLongLongMap
    LongHashMapTroveInsert thrpt 1 3 5 28.750 0.165 ops / sec
    // search for all elements in TLongLongMap in turn
    LongHashMapTroveSearch thrpt 1 3 5 145.933 0.416 ops / sec
    // full enumeration of TLongLongMap values ​​using forEach
    LongHashMapTroveTraverse thrpt 1 3 5 318.528 0.980 ops / sec
    

    The amount of occupied memory is not so easy to calculate, but an indirect conclusion can be drawn from the GC activity:
    Вставка jdк в List:
    Iteration   1 (5s in 1 thread): 103,950 ops/sec
     GC | wall time = 5,002 secs,  GC time = 0,331 secs, GC% = 6,62%, GC count = +24
    Вставка Trove в TIntArrayList:
    Iteration   1 (5s in 1 thread): 428,400 ops/sec
      GC | wall time = 5,002 secs,  GC time = 0,019 secs, GC% = 0,38%, GC count = +32
    

    If you look at the sources of TIntArrayList, it will become clear where the performance gain comes from - the data is stored as an array:
    public class TIntArrayList implements TIntList, Externalizable {
    	static final long serialVersionUID = 1L;
        /** the data of the list */
        protected int[] _data;
    

    TLongLongMap search speed is explained using forEach - no temporary objects are created and unboxing is excluded.

    The same benchmark, but one operation - 1 thousand elements:
    Benchmark Mode Thr Cnt Sec Mean Mean error Units
    IntListJdkInsert thrpt 1 3 5 239478.011 871.469 ops / sec
    IntListJdkTraverse thrpt 1 3 5 1326701.717 1649.389 ops / sec
    IntListTroveInsert thrpt 1 3 5 315562.594 2483.415 ops / sec
    IntListTroveTraverse thrpt 1 3 5 3630599.806 10822.903 ops / sec
    LongHashMapJdkInsert thrpt 1 3 5 45315.689 47.630 ops / sec
    LongHashMapJdkSearch thrpt 1 3 5 114759.789 424.996 ops / sec
    LongHashMapJdkTraverse thrpt 1 3 5 210012.550 139.001 ops / sec
    LongHashMapTroveInsert thrpt 1 3 5 33078.583 119.127 ops / sec
    LongHashMapTroveSearch thrpt 1 3 5 148311.567 267.613 ops / sec
    LongHashMapTroveTraverse thrpt 1 3 5 351955.550 901.902 ops / sec
    

    It can be seen that with a reduction in the number of elements in the collection, the gap in performance falls.

    It can be concluded that if the collections are small and are used infrequently, then there is no particular sense in including an additional dependence in the project.
    But if the project massively uses large collections of primitive types, then the use of Trove can be justified.

    Project code is available on GitHub .

    PS. The values ​​of the number of elements when creating collections were not set intentionally.

    Also popular now: