Effective implementation of Readers – writer lock based on “Interlocked Variable Access”

Introduction


The specifics of the project in which I work is such that, on the one hand, the use of third-party libraries is not allowed (with a few exceptions), and on the other, the emphasis is on very deep code optimization. So you often have to reinvent the wheel in the form of your own implementations.

In the course of this publication, I want to share the idea of ​​implementing the well-known synchronization primitive readers-writer lock based on the so-called atomic operations. As you know, readers-writer lock is designed to solve the problem of synchronizing access to a shared resource in such a way as to avoid simultaneous reading and writing, but at the same time to allow parallel reading of an arbitrarily large number of threads.

Search for a solution


Initially, I reacted to the task rather frivolously, but as it turned out in vain. To begin with, after viewing the available resources, I never found an implementation that I would like. As it turned out, the biggest problem was to block incoming "readers" and "writers" so that they did not remain in this state forever. If, for example, organize waiting using condition variables, then it is very easy to get into a situation where an incoming stream must be blocked, but while blocking occurs, the thread that has the resource finishes working with it and the completion signal sent to it does not reach the destination. That, in turn, successfully completes the transition to the "wait" state, and this implementation fails. The problem is actually solvable by introducing additional logic, but in my case, such an implementation ended up being even slower than the meaningless and merciless lock of each incoming stream. Of course, I do not pretend that this is an absolute truth and maybe I missed something, but began to look for other possibilities ... All this time I had a feeling that the problem can be solved using much simpler mechanisms of interlocked variable access, with the help of which I somewhat earlier quite elegantly dealt with double check locking optimization. So I began to think in this direction and as a result the following turned out ...

Implementation


The project requires support for QNX systems (which the product itself is oriented to) and Windows (emulation, debugging, etc.). Since the second one is much more popular, I will give the C ++ implementation specifically for it. However, I will dwell on one moment of porting to POSIX. So:

Class blank

class CRwLock
{
public:
	CRwLock();
	void readLock();
	void writeLock();
	void readUnlock();
	void writeUnlock();
private:
	static const LONG WRITER_BIT = 1L << (sizeof(LONG) * 8 - 2);
	LONG mWriterReaders;
};


With a set of functions, everything is obvious and in my opinion does not require comments. Let's talk about a single field of the mWriterReaders class :

  • The last non-sign bit is given under the sign of the presence of one single "writer" and should be set when he owns the resource. Actually, to work with it, the constant WRITER_BIT
  • The rest will be used as a 30-bit prime integer. They will record the number of threads reading the resource (if the "writer" bit is not set) or waiting to read (when the "writer" bit is set simultaneously, respectively).


Class constructor

CRwLock::CRwLock()
: mWriterReaders(0)
{
}


It is called upon to fulfill the only task - to establish the initial state corresponding to the absence of someone owning the resource, or, more simply put, resetting all bits to 0.

Read lock

As you probably already guessed, I am going to use atomic operations on a single integer variable to achieve our goal.

void CRwLock::readLock()
{
	if(::InterlockedExchangeAdd(&mWriterReaders, 1) >= WRITER_BIT)
	{
		while(::InterlockedCompareExchange(&mWriterReaders, 0, 0) >= WRITER_BIT)
		{
			Sleep(0);
		}
	}	
}


At the input, we immediately add ourselves to the readers and at the same time check if there is someone recording (this is so if the bit of the writing stream is set, and therefore the number itself is greater than or equal to this value). If so, then we must wait for the end of the write operation along with the rest. To do this, I chose the spin lock option during which the same “writer” bit is checked and as soon as it is reset, everyone who is waiting for reading gets access (this implementation gives priority to reading threads, but more on that later).

Here I would like to dwell on the question of the so-called busy wait , when the waiting thread is actively consuming CPU resources. In this case, I compromised by adding the Sleep (0) statementtransferring the remainder of the time from the allocated process to the scheduler, who can issue it to other threads depending on their priority. In other words, time is not burned idle, but can be used to good use. On the other hand, the severity of the reaction from the waiting stream to the change in the state of the flags is dulled, which in the case of my configuration of the iron of the flows and the operations performed by them resulted in an approximately 10% increase in the time of the test program. But in no case should you forget that the system as a whole certainly wins with a free CPU at its disposal.

In the case of POSIX, you can use sched_yield instead of Sleep (0)

Write lock

void CRwLock::writeLock()
{
	while(::InterlockedCompareExchange(&mWriterReaders, WRITER_BIT, 0) != 0)
	{
		Sleep(0);
	}	
}


The thread that needs to write must wait until everyone releases the resource (absolute zero) and only then does it establish itself for the reader. Hence the priority of readers - even when they switch to the standby state with the existing recording stream, they are guaranteed to have priority over the same blocked "writers" in advance.

Unlocking

void CRwLock::readUnlock()
{
	::InterlockedExchangeAdd(&mWriterReaders, -1);
}
void CRwLock::writeUnlock()
{
	::InterlockedExchangeAdd(&mWriterReaders, -WRITER_BIT);
}


A simple decrement in the case of reading and resetting the bit in the case of writing.

Instead of an afterword


I want to say that in my case I got a good performance result compared to the use of critical sections, which behaves predictably in this case when the ratio of writing and reading streams changes.

I will be glad to your criticism and comments.

Also popular now: