Interesting surprises ConcurrentDictionary (+ parsing tasks with DotNext 2017 Moscow)

    Hi to everyone who writes code for .NET, especially multi-threaded. You rarely see thread-safe code without thread-safe collections, which means you need to be able to use them. I will talk about the most popular of them - ConcurrentDictionary. Surprisingly, many interesting surprises are hidden in it: both pleasant and not very pleasant.


    First, we will analyze the ConcurrentDictionary device and the computational complexity of operations with it, and then we will talk about convenient tricks and pitfalls associated with memory traffic and garbage collection.



    Applicability Area


    All observations from this article were tested on the latest version of the .NET Framework (4.7.1) at the time of publication. They are probably true in .NET Core, but checking this statement remains an exercise for the reader.


    Briefly about the device


    ConcurrentDictionary has a familiar hash table implementation under the hood. It is based on an array of so-called bucket-s, each of which corresponds to a certain set of hash function values. Each bucket contains a single-linked list of elements: in the event of a collision, a linear search for the key is performed in it.


    Thread safety is ensured by a technique called striped locking . There is a small array of ordinary locks, and each of them is responsible for a whole range of bucket-s (hence the stripe in the name). In order to write something into an arbitrary bucket, you need to capture the corresponding lock. Usually there are much fewer locks than bucket s.



    And here is how these concepts relate to the constructor parameters ConcurrentDictionary(int concurrencyLevel, int capacity):


    • capacity- number of bucket-s. The default is 31.
    • concurrencyLevel- the number of locks. The default is 4 × the number of cores.

    The implementation tries to keep the size of the bucket small, increasing their number as necessary. When expanding, the following occurs:


    • Capacity is approximately doubled. Approximately, because the value is chosen so that it is not divisible by 2, 3, 5, and 7 ( it is useful to make the capacity a simple or poorly factorizable number).
    • The number of locks doubles if concurrencyLevelnot explicitly set. Otherwise, it remains the same.

    Complexity of operations


    Summary plate for a dictionary with N elements and K locks:



    The remaining operations are derived from these:


    • GetOrAdd= TryGetValue+TryAdd
    • AddOrUpdate= TryGetValue+ TryAdd+TryUpdate
    • Indexer(get) = TryGetValue
    • Indexer(set)= TryAddoverwrite

    The bad news


    The Count and IsEmpty properties insidiously capture all the locks in the dictionary. It is better to refrain from frequently calling these properties from multiple threads.


    The properties of Keys and Values ​​are even more insidious: they not only take all the locks, but also copy all the keys and values ​​into a separate List . Unlike the traditional Dictionary, whose eponymous properties return "thin" wrappers, here you need to be prepared for large memory allocations.


    Such blatant inefficiency is caused by the desire to provide the semantics of snapshot: to return some consistent state that existed at a certain point in time.


    Good news


    Not so bad. First: the most common enumeration (working through GetEnumerator) is non-blocking and works without unnecessary data copying. You have to pay for this by the lack of snapshot semantics: as you transfer, the result of parallel recording operations may be reflected.


    The second good news: more often than not, this behavior is either acceptable or completely desirable, and this can be used. For example, to list keys or values ​​efficiently:


    var keys = dictionary.Select(pair => pair.Key);
    var values = dictionary.Select(pair => pair.Value);

    Tips and tricks


    Unlike a regular Dictionary, you can insert or delete from a ConcurrentDictionary directly during an enumeration! This, for example, allows you to conveniently clean out obsolete elements from the cache dictionary with a lifetime:


    foreach (var pair in cache)
    {
        if (IsStale(pair.Value))
        {
            cache.TryRemove(pair.Key, out _);
        }
    }

    You can delete elements not only by the key, but also by the exact coincidence of key + value, and atomically! This is an undocumented feature hidden behind the explicit implementation of the ICollection interface. It allows you to safely clear such a cache even in race conditions with an update of the value:


    var collection = cache as ICollection>
    foreach (var pair in cache)
    {
        if (IsStale(pair.Value))
        {
            // Remove() will return false in case of race with value update
            var removed = collection.Remove(pair); 
        }
    }

    Everyone knows this already, but let me remind you: in conditions of competitive access, GetOrAdd can call a delegate factory for one key much more than once. If this is impossible or expensive, just wrap the value in Lazy:


    var dictionary = new ConcurrentDictionary>();
    var lazyMode = LazyThreadSafetyMode.ExecutionAndPublication;
    var value = dictionary
        .GetOrAdd(key, _ => new Lazy(() => new MyValue(), lazyMode))
        .Value;

    Gc overhead


    When using large dictionaries (starting from tens of thousands of elements), you need to remember that, unlike a regular Dictionary, an additional object is created on the heap in ConcurrentDictionary for each element. Resident ConcurrentDictionary with tens of millions of elements can easily provide second breaks in garbage collection of the second generation.


    // Only a handful of objects in final dictionary.
    // Value types dominate in its internals.
    Dictionary ordinary = Enumerable
        .Range(0, 10 * 1000 * 1000)
        .ToDictionary(i => i, i => i);
    // ~10kk objects in concurrent version (due to linked lists).
    var concurrent = new ConcurrentDictionary(ordinary);

    When overwriting an existing value, a new object may also be allocated, causing memory traffic. This happens if the language standard does not guarantee the atomicity of a value type record. For instance:


    • If the value is int or reference type, its entry is atomic. Then rewriting is done in-place, without selecting a new object.
    • If the value is Guid or another “wide” value type, its record is not atomic. Here, the implementation is forced to create a new internal object.

    Task at DotNext 2017 Moscow


    I first published this article in the fall of 2017 on the internal social network. And the colleagues who were at the DotNext conference in Moscow last November did a task based on the article that could be solved at the Kontur stand.


    There was a snippet of code:


    public void TestIteration(ConcurrentDictionary dictionary)
    {
        Parallel.For(0, 1000, new ParallelOptions { MaxDegreeOfParallelism = 8 }, (i) =>
        {
            foreach (var keyValuePair in dictionary)
            {
                dictionary.AddOrUpdate(keyValuePair.Key, (key) => i, (key, value) => i);
            }
        });
    }
    public void TestKeysProperty(ConcurrentDictionary dictionary)
    {
        Parallel.For(0, 1000, new ParallelOptions { MaxDegreeOfParallelism = 8 }, (i) =>
        {
            foreach (var key in dictionary.Keys)
            {
                dictionary.AddOrUpdate(key, (k) => i, (k, value) => i);
            }
        });
    }

    And three questions - it was necessary to compare the effectiveness of the TestIteration and TestKeysProperty methods in terms of the number of operations, memory and runtime. If you carefully read the article, then it should be clear that in all three cases TestIteration is more effective.


    conclusions


    Parallel programming tools are full of unobvious subtleties when it comes to performance: when carelessly using ConcurrentDictionary, for example, you may encounter global locks and linear cost in memory. I hope this cheat sheet helps you not step on the rake when writing the next cache or index in your application.


    All good and efficient code!


    Also popular now: