How ConcurrentHashMap Works

    In October, a wonderful article about the work of HashMap appeared on the hub . Continuing this topic, I'm going to talk about the implementation of java.util.concurrent.ConcurrentHashMap.
    So, how did ConcurrentHashMap come about, what advantages does it have, and how was it implemented.

    Prerequisites for creating ConcurrentHashMap


    Prior to the introduction of ConcurrentHashMap in JDK 1.5, there were several ways to describe hash tables.
    Initially, JDK 1.0 had a Hashtable class. Hashtable is a thread safe and easy to use hash table implementation. The problem with HashTable was, first of all, that when accessing the elements of the table, it was completely locked. All Hashtable methods were synchronized. This was a serious limitation for a multi-threaded environment, since the fee for locking the entire table was very large.
    In JDK 1.2, the Hashtable came to the rescue of the HashMap and its thread-safe presentation, Collections.synchronizedMap. There were several reasons for this separation:
    • Not every programmer or every solution needed a thread-safe hash table
    • The programmer needed to choose which option is convenient for him to use.

    Thus, with JDK 1.2, the list of hash map implementation options in Java has been supplemented with two more ways. However, these methods did not save developers from the appearance of race conditions in their code, which could lead to the appearance of ConcurrentModificationException. In this article elaborate on the possible causes of their occurrence in the code.
    And so, in JDK 1.5 finally there is a more productive and scalable option.

    Concurrenthashmap


    By the time ConcurrentHashMap appeared, Java developers needed the following hash map implementation:
    • Thread safety
    • The absence of locks on the entire table during access to it
    • It is desirable that there are no table locks when performing a read operation

    Doug Lea presents an implementation option for such a data structure, which is included in JDK 1.5.
    What are the main ideas for implementing ConcurrentHashMap?

    1. Map elements

    Unlike HashMap elements, Entry in ConcurrentHashMap is declared as volatile. This is an important feature, also related to changes in JMM. Doug Lea's answer on the need to use volatile and possible race conditions can be read here .

    static final class HashEntry {
        final K key;
        final int hash;
        volatile V value;
        final hashntry next;
        HashEntry (K key, int hash, HashEntry next, V value) {
            this .key = key;
            this .hash = hash;
            this .next = next;
            this .value = value;
         }
        @SuppressWarnings ("unchecked")
        static final  Hashntry[] newArray (int i) {
            return new HashEntry [i];
        }
    }
    

    2. Hash function

    ConcurrentHashMap also uses an improved hash function.
    Let me remind you what it was like in HashMap from JDK 1.2:

    static int hash (int h) {
        h ^ = (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
    

    Version from ConcurrentHashMap JDK 1.5:

    private static int hash (int h) {
        h + = (h << 15) ^ 0xffffcd7d;
        h ^ = (h >>> 10);
        h + = (h << 3);
        h ^ = (h >>> 6);
        h + = (h << 2) + (h << 14);
        return h ^ (h >>> 16);
    }
    

    What is the need to complicate the hash function? The tables in the hash map have a length determined by the power of two. For hash codes whose binary representations do not differ in the junior and senior positions, we will have collisions. The complication of the hash function just solves this problem, reducing the likelihood of collisions in the map.

    3. Segments

    The card is divided into N different segments (16 by default, the maximum value can be 16-bit and represent a power of two). Each segment is a thread-safe table of map elements.
    A dependency is established between the key hash codes and the corresponding segments based on the application of a bit mask hash code to the upper bits.
    Here's how the elements are stored in the map:

        final segment[] segments;
        transient set keySet;
        transient set> entrySet;
        transient collection values;
    

    Consider what a segment class is:

    static final class segment extends ReentrantLock implements 
                        Serializable {
        private static final long serialVersionUID = 2249069246763182397L;
        transient volatile int count;
        transient int modCount;
        transient int threshold;
        transient volatile hashntry[] table;
         final float loadFactor;
        Segment (int initialCapacity, float lf) {
             loadFactor = lf;
             setTable (HashEntry. newArray (initialCapacity));
         }
        ...
    }
    

    Given the pseudo-random distribution of key hashes within the table, it can be understood that increasing the number of segments will make modification operations affect different segments, which will reduce the likelihood of locks at runtime.

    4. ConcurrencyLevel

    This parameter affects the use of the memory card and the number of segments in the card.
    Let's look at creating a map and how the concurrencyLevel constructor specified as a parameter affects:

    public ConcurrentHashMap (int initialCapacity, float loadFactor, int concurrencyLevel) {
         if (! (loadFactor> 0) || initialCapacity <0
              || concurrencyLevel <= 0)
         throw new IllegalArgumentException ();
         if (concurrencyLevel> MAX_SEGMENTS)
         concurrencyLevel = MAX_SEGMENTS;
         // Find power-of-two sizes best matching arguments
         int sshift = 0;
         int ssize = 1;
         while (ssize <concurrencyLevel) {
              ++ sshift;
              ssize << = 1;
         }
         segmentShift = 32 - sshift;
         segmentMask = ssize - 1;
         this .segments = Segment.newArray (ssize);
         if (initialCapacity> MAXIMUM_CAPACITY)
              initialCapacity = MAXIMUM_CAPACITY;
         int c = initialCapacity / ssize;
         if (c * ssize <initialCapacity)
              ++ c;
         int cap = 1;
         while (cap <c)
              cap << = 1;
         for (int i = 0; i <this .segments.length; ++ i)
              this .segments [i] = new Segment(cap, loadFactor);
    }
    


    The number of segments will be selected as the nearest power of two greater than concurrencyLevel. The capacity of each segment, respectively, will be determined as the ratio of the default card capacity rounded to the nearest greater degree to two to the obtained number of segments.
    It is very important to understand the following two things. Underestimation of concurrencyLevel leads to the fact that streams of card segments are more likely to lock while writing. An overestimation of the indicator leads to inefficient use of memory.

    How to choose concurrencyLevel?

    If only one thread will change the map, and the rest will read, it is recommended to use the value 1.

    It must be remembered that resize tables for storage inside the map is an operation that requires additional time (and, often, is not fast). Therefore, when creating a map, you need to have some rough estimates of the statistics of the performance of possible read and write operations.

    Scalability Estimates


    On javamex, you can find an article on comparing the scalability of synchronizedMap and ConcurrentHashMap:
    image
    As you can see from the graph, there is a serious discrepancy between 5 and 10 million card access operations, which determines the effectiveness of using ConcurrentHashMap in the case of high amounts of stored data and access operations to them.

    Total


    So, the main advantages and features of the ConcurrentHashMap implementation:
    • The map has a hashmap-like interaction interface
    • Read operations do not require locks and run in parallel
    • Write operations can often also be performed in parallel without locks.
    • When creating, the required concurrencyLevel is specified, which is determined by the read and write statistics
    • Map elements have a value declared as volatile

    In conclusion, I would like to say that ConcurrentHashMap should be applied correctly, with a preliminary assessment of the ratio of reading and writing to the card. Also, it still makes sense to use HashMap in programs where there is no multiple access from multiple streams to the stored map.
    Thank you for your attention!

    Useful resources on parallel collections in Java and, in particular, on the work of ConcurrentHashMap:
    1. www.ibm.com/developerworks/en/library/j-jtp07233/index.html
    2. stas-blogspot.blogspot.com/2010/08/concurrenthashmap-revealed.html
    3. www.codercorp.com/blog/java/why-concurrenthashmap-is-better-than-hashtable-and-just-as-good-hashmap.html

    Also popular now: