Lock-free data structures. Concurrent maps: skip list

    In previous articles ( one , two ), we considered the classic hash map with a hash table and a list of collisions. A lock-free ordered list was built, which served as the basis for the lock-free hash map.
    Unfortunately, lists are characterized by linear search complexity O(N), where Nis the number of elements in the list, so our lock-free ordered list algorithm is of little interest in itself for large ones N.
    Or does it still represent? ..

    There is a rather curious data structure - skip-list , based on the usual singly linked list. It was first described in 1990 by W. Pugh, a translation of his original article ison a habr. This structure is of interest to us, since having the lock-free ordered list algorithm, it is quite easy to implement lock-free skip-list.
    For starters - a little about the principles of building skip-list. Skip-list is a multi-level list: each element (sometimes called a tower) has a certain height h.

    The height his randomly selected from the range [1, Hmax]where Hmaxis the maximum possible height, usually 16 or 32. There is a probability that the height of the tower is one P[h == 1] = 1/2. The probability of the height kis:

    Despite the apparent complexity of calculating the height, it can be calculated quite simply, for example, like this:, h = lsb( rand() )where lsbis the number of the least significant bit of the number.
    Magic 1/2
    Actually, the constant 1/2is my simplification, the original article deals with 0 < p < 1and explores the behavior of skip-list for various values p. But for practical implementation p = 1/2- the best value, it seems to me.

    It turns out that the higher the level, the more the list at this level is sparse in comparison with the underlying ones. This sparseness along with the probabilistic nature gives an estimate of the complexity of the search O(log N), the same as for binary self-balancing trees.
    The search in the skip-list is quite simple: starting from the head-tower, which has the maximum height, we go through the very top level, until we find an element with a key more than the required one (or hit the tail tail), go down one level, look for similarly in it, etc., until we get to level 0, the lowest; Key search example 34:

    Lock-free skip list

    To build a lock-free skip-list, we already have a lock-free algorithm for each level. It remains to develop techniques for working with levels. It would seem that it is impossible to atomically insert a knot-tower with a height of, say, 20, because for this you need to atomically change 20 pointers! It turns out that this is not necessary, it is enough to be able to atomically change one pointer - that we already know how to do in the lock-free list.
    Let's see how insertion occurs in lock-free skip-list. We will insert a tower element with a height of 5 with key 23. The first step is to look for the insertion position, moving through the levels from top to bottom. As a result, we get an array prev[]of insertion positions at each level:

    Next, insert a new element at level 0, the lowest one. We already know how to insert into the lock-free list:

    That's it - the element becomes part of the list, it can be found, it becomes reachable , despite the fact that the whole tower is not yet inserted into the skip-list.
    Further, we rush to insert our tower one level higher, from the bottom up:

    These inserts are already of secondary importance, are designed to increase the efficiency of the search, but in no way affect the principal reachability of the new key.
    An element is deleted in two phases: first we find the element to be deleted and mark it as logically deleted using the marked pointer technique. The difficulty is that for the tower we have to mark all levels of the element, starting from the very top:

    After all the levels of the tower element have been marked, a physical deletion is done (more precisely, the exclusion of the element from the list, since the memory for the element is deleted by Hazard Pointer or RCU), also from top to bottom:

    At each level, the insert / delete algorithm from the usual lock-free ordered list, which we have already considered.

    As you can see, the procedures for inserting / deleting from the lock-free skip-list are multi-step, interference with competing operations is possible at each step, therefore, special care must be taken when programming the skip-list. For example, when inserting, we first look for positions in lists at all levels and form arrays prev[]. It is possible that during the insertion process the list will change at some level and these positions will become invalid. In this case, you should updateprev[], that is, find the element to be inserted, and continue the insertion, starting from the level at which the bummer occurred.
    A more interesting situation is when a key is simultaneously inserted Kand deleted. This is entirely possible: the insertion is considered successful when we linked an element at level 0 of the list. After that, the element is already reachable and it can be completely removed, despite the fact that it is not yet fully inserted into the upper levels. To resolve the collision of insertion and deletion, the order is very important: the insertion is performed from the bottom up, and the deletion (more precisely, its first phase is logical deletion, marked pointer) is counter-directed, from top to bottom. In this case, the insertion procedure will certainly see a deletion mark at some level and will immediately stop its work.
    The removal procedure can also be competitive in both of its phases. In the phase of logical deletion, when labels are placed on the tower from top to bottom, we are not afraid of competition. But at the phase of exclusion of the deleted item from the lists, any change to the skip-list, i.e. violation of arrays prev[]andfound[], determining the position of the deleted item, leads to the fact that we need to form these arrays again. But the tags are already set and the search function simply will not find the item to be deleted! To resolve this situation, we endow the search function with work that is unusual for it: when an element marked at any level is detected, the search function excludes (unlink) this element from the list of this level, that is, it helps to delete elements. After an exception, the function resumes the search from the very beginning, that is, from the highest level (variations are possible here, but the simplest is to start from the beginning). This is a typical example of mutual assistance, often found in lock-free programming: one thread helps another to do its job. That is why the function find()in many lock-free containers is not constant - the search can change the container.
    What else is characterized by skip-list? Firstly, this is an ordered map, unlike a hash map, that is, it supports operations to extract the minimum extract_min()and maximum extract_max()keys.
    Secondly, the skip-list is voracious for the Hazard Pointrer (HP) scheme: with a maximum tower height of 32, the elements of the arrays prev[]and found[]that determine the desired position must be protected by hazard pointers, i.e. we need at least 64 hazard pointers (on the in fact, in the implementation of libcds - 66). This is quite a lot for the HP circuit, see its device for more details . For the RCU scheme , the implementation of the method presents some difficulty find(), since this method can delete elements, and the RCU scheme requires that the removaldeletion (unlink) the element from the list below was RCU critical section, and deletion deallocation of memory - is a critical section, otherwise possible deadlock.
    An interesting practical problem is the implementation of towers for heights greater than 1. Now in the skip-list implementation in the libcds library , memory under towers with a height of more than 1 is allocated separately from the memory for an element even for the intrusive version. Considering the probabilistic nature of the height, it turns out that for 50% of the elements a memory allocation is done - this can affect performance and also negatively affect memory fragmentation. There is an idea of ​​a tower with a height of no more than h_minone block to distribute, and only for tall ones “memory” under the tower:
    struct tower {
        tower *  next[Hmin];
        tower *  high_next; // массив для указателей уровней >= Hmin    

    If Hmin = 4, then with this construction, 93% of the elements will not require the allocation of additional memory for pointers next- high_nextthere will be for them nullptr.
    skip-list: bibliography

    W.Pugh, a year or two after publishing his work on skip-list, presented another work on a competitive implementation based on fine-grained locks.
    The most cited work on lock-free skip-list is the K.Fraser Practical lock-freedom dissertation of 2003, which presents several skip-list algorithms both based on transactional memory and on the software implementation of MCAS (CAS on several memory cells) . This work is now, in my opinion, of purely theoretical interest, since software emulation of transactional memory is quite difficult, as is MCAS. Although, with the release of Haswell, the implementation of transactional memory in hardware went to the masses ...
    The implementation of the skip-list in libcds is based on the monograph The Art of Multiprocessor Programming, which I have repeatedly quoted . “Motivated” because the original algorithm is provided for Java and, it seems to me, has several errors, which, perhaps, were fixed in the second edition. Or, these are not errors at all for Java.
    There is also Fomichev’s dissertation on the resumption of the search in case of conflicts in the skip-list. As I mentioned above, find()when it detects a marked element, it tries to remove it from the list and resumes searching from the very beginning. Fomichev offers a mechanism for arranging back-reference - back links, so that work can be resumed from some previous element, and not from the beginning, which, in principle, should affect performance for the better. The back-reference support algorithm in the current state is quite complicated and it is not clear whether there will be a win or whether the support code for relevance will eat it.

    In the next article, we will try to consider another extensive class of data structures that are the basis for associative containers - trees, more precisely, their competitive implementation.

    Also popular now: