Memory Barriers and Non-Blocking Sync in .NET
Introduction
In this article I want to talk about the use of some designs that are used to implement non-blocking synchronization. We will talk about the volatile keyword, the VolatileRead, VolatileWrite, and MemoryBarrier functions. We will consider what problems force us to use these language constructs and their solutions. In discussing memory barriers, we briefly review the .NET memory model.
Optimizations introduced by the compiler
The main problems a programmer encounters when using non-blocking synchronization are compiler optimizations and processor instructions rearrangement.
Let's start by looking at an example where the compiler introduces a problem into a multi-threaded program:
classReorderTest
{
privateint _a;
publicvoidFoo()
{
var task = new Task(Bar);
task.Start();
Thread.Sleep(1000);
_a = 0;
task.Wait();
}
publicvoidBar()
{
_a = 1;
while (_a == 1)
{
}
}
}
By running this example, you can make sure that the program freezes. The reason is that the compiler caches the _a variable in the processor register.
To solve such problems, C # provides the volatile keyword. Applying this keyword to a variable prevents the compiler from optimizing accesses to it in any way.
This is how the revised _a variable declaration will look.
privatevolatileint _a;
Disabling compiler optimizations is not the only effect of using this keyword. Other effects will be discussed later.
Rearrange Instructions
Let us now consider the case when the source of problems is the rearrangement of instructions by the processor.
Let there be the following code:
classReorderTest2
{
privateint _a;
privateint _b;
publicvoidFoo()
{
_a = 1;
_b = 1;
}
publicvoidBar()
{
if (_b == 1)
{
Console.WriteLine(_a);
}
}
}
The Foo and Bar routines are run simultaneously from different threads.
Is this code correct, that is, can we say with confidence that the program will never output zero? If we were talking about single-threaded programs, then to check this code it would be enough to run it once. But, since we are dealing with multithreading, this is not enough. Instead, we need to understand if we have guarantees that the program will work correctly.
.NET memory model
As already mentioned, the incorrect behavior of a multi-threaded program can be caused by permutation of instructions on the processor. Let's consider this problem in more detail.
Any modern processor can rearrange memory read and write instructions for optimization. I will explain this with an example.
int a = _a;
_b = 10;
In this code, the variable _a is read first, then _b is written. But when executing this program, the processor can rearrange the read and write instructions, that is, the variable _b will be written first, and only then _a will be read. For a single-threaded program, this permutation does not matter, but for a multi-threaded program, this can turn into a problem. Now we have examined the permutation download - record. Similar permutations are possible for other combinations of instructions.
The set of permutation rules for such instructions is called a memory model. The .NET platform has its own memory model, which abstracts us from the memory models of a particular processor.
This is how the .NET memory model looks
Permutation type | Permutation allowed |
Download-Download | Yes |
Download Record | Yes |
Recording Download | Yes |
Record-record | Not |
Now we can consider our example in terms of the .NET memory model. Since the permutation write-write is prohibited, then writing to the variable _a will always occur before writing to the variable _b, and here the program will work correctly. The problem is in the Bar procedure. Since swapping read instructions is not prohibited, the _b variable can be read before _a.
After the permutation, the code will be executed as if it were written as follows:
var tmp = _a;
if (_b == 1)
{
Console.WriteLine(tmp);
}
When we talk about permutations of instructions, we mean a permutation of instructions from one thread reading / writing different variables. If in different threads there is a record in the same variable, then their order is in any case random. And if we are talking about reading and writing the same variable, for example, like this:
var a = GetA();
UseA(a);
then, of course, that there cannot be permutations here.
Memory barriers
To solve this problem, there is a universal method - adding a memory barrier (memory barrier).
There are several types of memory barriers: full, release fence and accure fence.
The full barrier ensures that all reads and writes located before / after the barrier will be performed the same before / after the barrier, that is, no memory access instruction can jump over the barrier.
Now we’ll deal with two other types of barriers:
Accure fence ensures that the instructions after the barrier will not be moved to the position before the barrier.
Release fence ensures that instructions before the barrier are not moved to the position after the barrier.
A few words about the terminology. The term volatile write means writing to memory in conjunction with creating a release fence. The term volatile read means reading memory in conjunction with creating an accure fence.
.NET provides the following methods for working with memory barriers:
- The Thread.MemoryBarrier () method creates a complete memory barrier
- The volatile keyword turns every operation on a variable marked with this word into volatile write or volatile read, respectively.
- Thread.VolatileRead () method performs volatile read
- method Thread.VolatileWrite () performs volatile write
Let's get back to our example. As we already understood, a problem may arise due to permutation of read instructions. To solve it, add a memory barrier between reads _a and _b. After that, we have a guarantee that the thread in which the Bar method is executed will see the entries in the correct order.
classReorderTest2
{
privateint _a;
privateint _b;
publicvoidFoo()
{
_a = 1;
_b = 1;
}
publicvoidBar()
{
if (_a == 1)
{
Thread.MemoryBarrier();
Console.WriteLine(_b);
}
}
}
Using a full memory barrier is redundant here. To eliminate the permutation of read instructions, it is sufficient to use volatile read when reading _a. This can be achieved using the Thread.VolatileRead method or the volatile keyword.
Thread.VolatileWrite and Thread.VolatileRead methods
Let's take a look at the Thread.VolatileWrite and Thread.VolatileRead methods in more detail.
In MSDN, about VolatileWrite it is written: “It writes the value directly to the field, so that it becomes visible to all computer processors.”
In fact, this description is not entirely correct. These methods guarantee two things: no compiler 1 optimizations and no instruction permutations according to the volatile read or write properties. Strictly speaking, the VolatileWrite method does not guarantee that the value will immediately become visible to other processors, and the VolatileRead method does not guarantee that the value will not be read from cache 2 . But due to the lack of code optimizations by the compiler and coherenceprocessor caches, we can assume that the description from MSDN is correct.
Consider how these methods are implemented:
[MethodImpl(MethodImplOptions.NoInlining)]
publicstaticintVolatileRead(refint address)
{
int num = address;
Thread.MemoryBarrier();
return num;
}
[MethodImpl(MethodImplOptions.NoInlining)]
publicstaticvoidVolatileWrite(refint address, intvalue)
{
Thread.MemoryBarrier();
address = value;
}
What else interesting can you see here?
Firstly, it uses a complete memory barrier. As we said, volatile write should create a release fence. Since release fence is a special case of a complete barrier, this implementation is correct, but redundant. If release fence was installed here, the processor / compiler would have more opportunities for optimization. It’s hard to say why the .NET development team implemented these functions through a complete barrier. But it’s important to remember that these are just details of the current implementation, and no one guarantees that it will not change in the future.
Compiler and processor optimizations
I want to note again: both the volatile keyword and all three of the considered functions for setting memory barriers affect both processor optimization and compiler optimization.
That is, for example, this code is a completely correct solution to the problem shown in the first example:
publicvoidBar()
{
_a = 1;
while (_a == 1)
{
Thread.MemoryBarrier();
}
}
The dangers of volatile
Looking at the implementation of the VolatileWrite and VolatileRead methods, it becomes clear that such a pair of instructions can be rearranged:
Thread.VolatileWrite(b)
Thread.VolatileRead(a)
Since this behavior is embedded in the definition of the terms volatile read and write, this is not a bug, and operations with variables marked with the volatile keyword have a similar behavior.
But in practice, this behavior may be unexpected.
Consider an example:
classProgram
{
volatileint _firstBool;
volatileint _secondBool;
volatilestring _firstString;
volatilestring _secondString;
int _okCount;
int _failCount;
staticvoidMain(string[] args)
{
new Program().Go();
}
privatevoidGo()
{
while (true)
{
Parallel.Invoke(DoThreadA, DoThreadB);
if (_firstString == null && _secondString == null)
{
_failCount++;
}
else
{
_okCount++;
}
Console.WriteLine("ok - {0}, fail - {1}, fail percent - {2}",
_okCount, _failCount, GetFailPercent());
Clear();
}
}
privatefloatGetFailPercent()
{
return (float)_failCount / (_okCount + _failCount) * 100;
}
privatevoidClear()
{
_firstBool = 0;
_secondBool = 0;
_firstString = null;
_secondString = null;
}
privatevoidDoThreadA()
{
_firstBool = 1;
//Thread.MemoryBarrier();if (_secondBool == 1)
{
_firstString = "a";
}
}
privatevoidDoThreadB()
{
_secondBool = 1;
//Thread.MemoryBarrier();if (_firstBool == 1)
{
_secondString = "a";
}
}
}
If the program instructions were executed in the exact order in which they were defined, then at least one line would always be equal to “a”. In fact, due to the rearrangement of instructions, this is not always the case. Replacing the volatile keyword with appropriate methods, as expected, does not change the result.
To fix the behavior of this program, it is enough to uncomment lines with full memory barriers.
Performance Thread.Volatile * and the volatile keyword
On most platforms (more precisely, on all platforms supported by Windows, except for the dying IA64), all writes and reads are volatile write and volatile read, respectively. Thus, at run time, the volatile keyword has no effect on performance. In contrast, Thread.Volatile * methods, firstly, incur overhead on the method call itself, marked as MethodImplOptions.NoInlining, and secondly, in the current implementation, they create a complete memory barrier. That is, in terms of performance, in most cases, the use of a keyword is preferable.
References
1 See page 514 Joe Duffy. Concurrent Programming on Windows
2 See VolatileWrite implemented incorrectly
Used literature:
- Joseph Albahari. Threading in c #
- Vance morrison Understand the Impact of Low-Lock Techniques in Multithreaded Apps
- Pedram Rezaei. CLR 2.0 memory model
- MS Connect: VolatileWrite implemented incorrectly
- ECMA-335 Common Language Infrastructure (CLI)
- C # Language Specification
- Jeffrey Richter. CLR via C # Third Edition
- Joe Duffy. Concurrent programming on windows
- Joseph Albahari. C # 4.0 in a nutshell