Different versions of JIT in .NET
Every C # developer knows that a C # compiler translates the source code of an application into an intermediate language called Intermediate Language (IL). And for turning IL into a sequence of machine instructions, the Just-In-Time Compiler (JIT) is most often responsible. Yes, today there is NGen, Mono AOT, .NET Native, but JIT compilation still leads the world of .NET applications. But this same JIT works, not everyone knows. If you take into account only the .NET implementation from Microsoft, then it is worth distinguishing between JIT-x86 and JIT-x64. And behind the doors is RyuJIT, which will very soon take the place of honor of the main JIT compiler. And if you like old versions of .NET, then it’s useful to know that in different versions of the CLR the logic of the JIT was different. The source codes are now open, you can see themand realize how big and complex this topic is. Today we will not try to cover it, but only briefly look at a few interesting features of individual versions of JIT compilers. So today in the issue:
Open the constructor source
Are you intrigued? And the thing is that JIT-x86 does not know how to inline methods whose IL code contains instructions
Dear experts, attention, question: what will the following code output for
The correct answer: depends. Most likely you expect to see
Material for additional reading:
Unwinding loops is such a very good optimization that many compilers like to do. The bottom line is that we are replacing a loop of the form
on the
In addition to reducing the number of increment operations, we have improved conditions for additional operations at the processor level (for example, branch prediction and instruction-level parallelism). Alas, the JIT-x86 and RyuJIT are not particularly able to unwind the average cycle. But JIT-x64 can sometimes, although it does it in its own special way. For example, if the number of iterations is divided by 2 or 3, then the code
will turn into something of a kind
This is quite important information. For example, many are looking forward to switching from JIT-x64 to RyuJIT, because Microsoft promises us a lot of goodies: SIMD support and accelerated JIT compilation. But they are somehow silent about the performance of the code itself. You need to understand that the lack of some optimizations in RyuJIT (compared to JIT-x64) can slightly reduce the speed of your program. Useful links:
Here's another puzzle for you:
This cycle can also be untwisted. There are only two iterations, so you can get rid of conditional transitions altogether: just repeat the loop body twice. Interesting fact: in CLR2 JIT-x86 there was a bug that spoiled life and instead
In general, the topic of unwinding small cycles is of particular interest. While JIT-x86 likes to unwind them (it’s difficult to unwind a large cycle, but with a small one it’s much easier), RyuJIT (which is based on the code base of 32-bit JIT) refuses to unwind them. But JIT-x64 here can please us. Say he can take the code
and calculate the value:
But don’t think that RyuJIT is worse than JIT-x64. Yes, with optimizations in the new generation JIT compiler, everything is not so good, but on average, in a hospital, the code is more sane. Learn more about unwinding small loops here:
Then come to our light! Soon, a series of CLRium # 2 seminars will be held in Moscow (April 03–04), Yekaterinburg (May 17) and St. Petersburg (May 29–30) (online translation is included). We will discuss the future of .NET: we will talk about the anatomy of the new CoreCLR, RyuJIT features, hardcore Roslyn examples and the offspring of CoreFx! An endless stream of interesting and useful knowledge will help you not only understand much better how your own C # programs work, but also prepare you for a bright .NET future in which you can use the platform's full power!
- Why a short method may not be inline and how to avoid it
- JIT bugs: dangerous and merciless
- Who unwinds cycles and how
- What is the difference between unwinding small and large cycles
JIT-x86 and starg
Open the constructor source
Decimal
with the type parameter int
from the .NET Reference Source:// Constructs a Decimal from an integer value.
//
public Decimal(int value) {
// JIT today can't inline methods that contains "starg" opcode.
// For more details, see DevDiv Bugs 81184: x86 JIT CQ: Removing the inline striction of "starg".
int value_copy = value;
if (value_copy >= 0) {
flags = 0;
}
else {
flags = SignMask;
value_copy = -value_copy;
}
lo = value_copy;
mid = 0;
hi = 0;
}
Are you intrigued? And the thing is that JIT-x86 does not know how to inline methods whose IL code contains instructions
starg
or ldarga
. It is very desirable to inline the Decimal constructor, so the developers of the standard class went to the trick: they copied the parameter into a local variable to avoid a “bad” instruction. In JIT-x64, this "feature" was removed. For those interested, it is recommended to study:- The story about inlining for JIT-x86 and starg
- .NET Reference Source: Constructs a Decimal from an integer value
- CoreCLR, JIT sources: flowgraph.cpp (Feb 26, 2015)
- CoreCLR, JIT sources: importer.cpp (Feb 26, 2015)
- MSDN: starg
- MSDN: ldarga
- Stackoverflow: .NET local variable optimization
Strange bug in JIT-x64
Dear experts, attention, question: what will the following code output for
step=1
?private int bar;
public void Foo(int step)
{
for (int i = 0; i < step; i++)
{
bar = i + 10;
for (int j = 0; j < 2 * step; j += step)
Console.WriteLine(j + 10);
}
}
The correct answer: depends. Most likely you expect to see
10 11
, but the bug in JIT-x64 optimization will ruin everything and give us 10 21
. On JIT-x86 and RyuJIT, everything works well. You have to put up with the bug; Microsoft does not want to fix it. The example is very fragile; stumbling into it in real life is extremely problematic. Someone will ask: but if this is a rare bug, then why know about it? Why bother with such things? If you are a person of cheerful nature, then you can use the bug for your own purposes. For example, to determine in runtime which version of JIT is currently used:public enum JitVersion
{
Mono, MsX86, MsX64, RyuJit
}
public class JitVersionInfo
{
public JitVersion GetJitVersion()
{
if (IsMono())
return JitVersion.Mono;
if (IsMsX86())
return JitVersion.MsX86;
if (IsMsX64())
return JitVersion.MsX64;
return JitVersion.RyuJit;
}
private int bar;
private bool IsMsX64(int step = 1)
{
var value = 0;
for (int i = 0; i < step; i++)
{
bar = i + 10;
for (int j = 0; j < 2 * step; j += step)
value = j + 10;
}
return value == 20 + step;
}
public static bool IsMono()
{
return Type.GetType("Mono.Runtime") != null;
}
public static bool IsMsX86()
{
return !IsMono() && IntPtr.Size == 4;
}
}
Material for additional reading:
- The story about the bug in JIT-x64
- Determining the JIT version in runtime
- Stackoverflow: JIT .Net compiler bug?
- MS Connect: x64 jitter sub-expression elimination optimizer bug
- StackOverflow: How to detect which .NET runtime is being used (MS vs. Mono)?
- StackOverflow: How do I verify that ryujit is jitting my app?
Unwinding cycles
Unwinding loops is such a very good optimization that many compilers like to do. The bottom line is that we are replacing a loop of the form
for (int i = 0; i < 1024; i++)
Foo(i);
on the
for (int i = 0; i < 1024; i += 4)
{
Foo(i);
Foo(i + 1);
Foo(i + 2);
Foo(i + 3);
}
In addition to reducing the number of increment operations, we have improved conditions for additional operations at the processor level (for example, branch prediction and instruction-level parallelism). Alas, the JIT-x86 and RyuJIT are not particularly able to unwind the average cycle. But JIT-x64 can sometimes, although it does it in its own special way. For example, if the number of iterations is divided by 2 or 3, then the code
int sum = 0;
for (int i = 0; i < 1024; i++)
sum += i;
Console.WriteLine(sum);
will turn into something of a kind
; int sum = 0;
00007FFCC8710090 sub rsp,28h
; for (int i = 0; i < 1024; i++)
00007FFCC8710094 xor ecx,ecx
00007FFCC8710096 mov edx,1 ; edx = i + 1
00007FFCC871009B nop dword ptr [rax+rax]
00007FFCC87100A0 lea eax,[rdx-1] ; eax = i
; sum += i;
00007FFCC87100A3 add ecx,eax ; sum += i
00007FFCC87100A5 add ecx,edx ; sum += i + 1
00007FFCC87100A7 lea eax,[rdx+1] ; eax = i + 2
00007FFCC87100AA add ecx,eax ; sum += i + 2;
00007FFCC87100AC lea eax,[rdx+2] ; eax = i + 3
00007FFCC87100AF add ecx,eax ; sum += i + 3;
00007FFCC87100B1 add edx,4 ; i += 4
; for (int i = 0; i < 1024; i++)
00007FFCC87100B4 cmp edx,401h
00007FFCC87100BA jl 00007FFCC87100A0
This is quite important information. For example, many are looking forward to switching from JIT-x64 to RyuJIT, because Microsoft promises us a lot of goodies: SIMD support and accelerated JIT compilation. But they are somehow silent about the performance of the code itself. You need to understand that the lack of some optimizations in RyuJIT (compared to JIT-x64) can slightly reduce the speed of your program. Useful links:
- RyuJIT CTP5 and unwinding cycles
- Wikipedia: Unwind cycle
- Wikipedia: Loop unrolling
- JC Huang, T. Leng, Generalized Loop-Unrolling: a Method for Program Speed-Up (1998)
- Wikipedia: branch prediction
- Wikipedia: instruction-level parallelism
- Wikipedia: Inline expansion
- Wikipedia: Cache miss
- StackOverflow: http://stackoverflow.com/questions/2349211/when-if-ever-is-loop-unrolling-still-useful
- Blogs.Msdn: RyuJIT: The next-generation JIT compiler for .NET
More interesting JIT bugs
Here's another puzzle for you:
struct Point
{
public int X;
public int Y;
}
static void Print(Point p)
{
Console.WriteLine(p.X + " " + p.Y);
}
static void Main()
{
var p = new Point();
for (p.X = 0; p.X < 2; p.X++)
Print(p);
}
This cycle can also be untwisted. There are only two iterations, so you can get rid of conditional transitions altogether: just repeat the loop body twice. Interesting fact: in CLR2 JIT-x86 there was a bug that spoiled life and instead
0 1 1 0
gave out 2 0 2 0
. It’s not so difficult to stumble upon it. Fortunately, in CLR 4 it was corrected, but in other versions of JIT it was not at all. Keep in mind that if you are working under the .NET Framework 3.5 (yes, some still have to), then this implies CLR2. You need to be prepared that such a simple code turns into; var p = new Point();
05C5178C push esi
05C5178D xor esi,esi ; p.Y = 0
; for (p.X = 0; p.X < 2; p.X++)
05C5178F lea edi,[esi+2] ; p.X = 2
; Print(p);
05C51792 push esi ; push p.Y
05C51793 push edi ; push p.X
05C51794 call dword ptr ds:[54607F4h] ; Print(p)
05C5179A push esi ; push p.Y
05C5179B push edi ; push p.X
05C5179C call dword ptr ds:[54607F4h] ; Print(p)
05C517A2 pop esi
05C517A3 pop edi
05C517A4 pop ebp
05C517A5 ret
In general, the topic of unwinding small cycles is of particular interest. While JIT-x86 likes to unwind them (it’s difficult to unwind a large cycle, but with a small one it’s much easier), RyuJIT (which is based on the code base of 32-bit JIT) refuses to unwind them. But JIT-x64 here can please us. Say he can take the code
int sum = 0;
for (int i = 0; i < 4; i++)
sum += i;
Console.WriteLine(sum);
and calculate the value:
; int sum = 0;
00007FFCC86F3EC0 sub rsp,28h
; Console.WriteLine(sum);
00007FFCC86F3EC4 mov ecx,6 ; sum = 6
00007FFCC86F3EC9 call 00007FFD273DCF10
00007FFCC86F3ECE nop
00007FFCC86F3ECF add rsp,28h
00007FFCC86F3ED3 ret
But don’t think that RyuJIT is worse than JIT-x64. Yes, with optimizations in the new generation JIT compiler, everything is not so good, but on average, in a hospital, the code is more sane. Learn more about unwinding small loops here:
Want to know more about .NET internals?
Then come to our light! Soon, a series of CLRium # 2 seminars will be held in Moscow (April 03–04), Yekaterinburg (May 17) and St. Petersburg (May 29–30) (online translation is included). We will discuss the future of .NET: we will talk about the anatomy of the new CoreCLR, RyuJIT features, hardcore Roslyn examples and the offspring of CoreFx! An endless stream of interesting and useful knowledge will help you not only understand much better how your own C # programs work, but also prepare you for a bright .NET future in which you can use the platform's full power!