DreamWalker March 4, 2015 at 07:25

Different versions of JIT in .NET

Every C # developer knows that a C # compiler translates the source code of an application into an intermediate language called Intermediate Language (IL). And for turning IL into a sequence of machine instructions, the Just-In-Time Compiler (JIT) is most often responsible. Yes, today there is NGen, Mono AOT, .NET Native, but JIT compilation still leads the world of .NET applications. But this same JIT works, not everyone knows. If you take into account only the .NET implementation from Microsoft, then it is worth distinguishing between JIT-x86 and JIT-x64. And behind the doors is RyuJIT, which will very soon take the place of honor of the main JIT compiler. And if you like old versions of .NET, then it’s useful to know that in different versions of the CLR the logic of the JIT was different. The source codes are now open, you can see themand realize how big and complex this topic is. Today we will not try to cover it, but only briefly look at a few interesting features of individual versions of JIT compilers. So today in the issue:

Why a short method may not be inline and how to avoid it
JIT bugs: dangerous and merciless
Who unwinds cycles and how
What is the difference between unwinding small and large cycles

JIT-x86 and starg

Open the constructor sourceDecimal with the type parameter intfrom the .NET Reference Source:

// Constructs a Decimal from an integer value.
//
public Decimal(int value) {
    //  JIT today can't inline methods that contains "starg" opcode.
    //  For more details, see DevDiv Bugs 81184: x86 JIT CQ: Removing the inline striction of "starg".
    int value_copy = value;
    if (value_copy >= 0) {
        flags = 0;
    }
    else {
        flags = SignMask;
        value_copy = -value_copy;
    }
    lo = value_copy;
    mid = 0;
    hi = 0;
}

Are you intrigued? And the thing is that JIT-x86 does not know how to inline methods whose IL code contains instructions stargor ldarga. It is very desirable to inline the Decimal constructor, so the developers of the standard class went to the trick: they copied the parameter into a local variable to avoid a “bad” instruction. In JIT-x64, this "feature" was removed. For those interested, it is recommended to study:

Strange bug in JIT-x64

Dear experts, attention, question: what will the following code output for step=1?

private int bar;
public void Foo(int step)
{
    for (int i = 0; i < step; i++)
    {
        bar = i + 10;
        for (int j = 0; j < 2 * step; j += step)
            Console.WriteLine(j + 10);
    }
}

The correct answer: depends. Most likely you expect to see 10 11, but the bug in JIT-x64 optimization will ruin everything and give us 10 21. On JIT-x86 and RyuJIT, everything works well. You have to put up with the bug; Microsoft does not want to fix it. The example is very fragile; stumbling into it in real life is extremely problematic. Someone will ask: but if this is a rare bug, then why know about it? Why bother with such things? If you are a person of cheerful nature, then you can use the bug for your own purposes. For example, to determine in runtime which version of JIT is currently used:

public enum JitVersion
{
    Mono, MsX86, MsX64, RyuJit
}
public class JitVersionInfo
{
    public JitVersion GetJitVersion()
    {
        if (IsMono())
            return JitVersion.Mono;
        if (IsMsX86())
            return JitVersion.MsX86;
        if (IsMsX64())
            return JitVersion.MsX64;
        return JitVersion.RyuJit;
    }
    private int bar;
    private bool IsMsX64(int step = 1)
    {
        var value = 0;
        for (int i = 0; i < step; i++)
        {
            bar = i + 10;
            for (int j = 0; j < 2 * step; j += step)
                value = j + 10;
        }
        return value == 20 + step;
    }
    public static bool IsMono()
    {
        return Type.GetType("Mono.Runtime") != null;
    }
    public static bool IsMsX86()
    {
        return !IsMono() && IntPtr.Size == 4;
    }
}

Material for additional reading:

Unwinding cycles

Unwinding loops is such a very good optimization that many compilers like to do. The bottom line is that we are replacing a loop of the form

for (int i = 0; i < 1024; i++)
    Foo(i);

on the

for (int i = 0; i < 1024; i += 4)
{
    Foo(i);
    Foo(i + 1);
    Foo(i + 2);
    Foo(i + 3);
}

In addition to reducing the number of increment operations, we have improved conditions for additional operations at the processor level (for example, branch prediction and instruction-level parallelism). Alas, the JIT-x86 and RyuJIT are not particularly able to unwind the average cycle. But JIT-x64 can sometimes, although it does it in its own special way. For example, if the number of iterations is divided by 2 or 3, then the code

int sum = 0;
for (int i = 0; i < 1024; i++)
    sum += i;
Console.WriteLine(sum);

will turn into something of a kind

;        int sum = 0;                               
00007FFCC8710090  sub         rsp,28h              
;        for (int i = 0; i < 1024; i++)             
00007FFCC8710094  xor         ecx,ecx              
00007FFCC8710096  mov         edx,1                ; edx = i + 1
00007FFCC871009B  nop         dword ptr [rax+rax]  
00007FFCC87100A0  lea         eax,[rdx-1]          ; eax = i
;            sum += i;                              
00007FFCC87100A3  add         ecx,eax              ; sum += i
00007FFCC87100A5  add         ecx,edx              ; sum += i + 1
00007FFCC87100A7  lea         eax,[rdx+1]          ; eax = i + 2
00007FFCC87100AA  add         ecx,eax              ; sum += i + 2;
00007FFCC87100AC  lea         eax,[rdx+2]          ; eax = i + 3
00007FFCC87100AF  add         ecx,eax              ; sum += i + 3;
00007FFCC87100B1  add         edx,4                ; i += 4
;        for (int i = 0; i < 1024; i++)             
00007FFCC87100B4  cmp         edx,401h             
00007FFCC87100BA  jl          00007FFCC87100A0

This is quite important information. For example, many are looking forward to switching from JIT-x64 to RyuJIT, because Microsoft promises us a lot of goodies: SIMD support and accelerated JIT compilation. But they are somehow silent about the performance of the code itself. You need to understand that the lack of some optimizations in RyuJIT (compared to JIT-x64) can slightly reduce the speed of your program. Useful links:

More interesting JIT bugs

Here's another puzzle for you:

struct Point
{
    public int X;
    public int Y;
}
static void Print(Point p)
{
    Console.WriteLine(p.X + " " + p.Y);
}
static void Main()
{
    var p = new Point();
    for (p.X = 0; p.X < 2; p.X++)
        Print(p);
}

This cycle can also be untwisted. There are only two iterations, so you can get rid of conditional transitions altogether: just repeat the loop body twice. Interesting fact: in CLR2 JIT-x86 there was a bug that spoiled life and instead 0 1 1 0gave out 2 0 2 0. It’s not so difficult to stumble upon it. Fortunately, in CLR 4 it was corrected, but in other versions of JIT it was not at all. Keep in mind that if you are working under the .NET Framework 3.5 (yes, some still have to), then this implies CLR2. You need to be prepared that such a simple code turns into

;        var p = new Point();                  
05C5178C  push        esi                     
05C5178D  xor         esi,esi                 ; p.Y = 0
;        for (p.X = 0; p.X < 2; p.X++)         
05C5178F  lea         edi,[esi+2]             ; p.X = 2
;            Print(p);                         
05C51792  push        esi                     ; push p.Y
05C51793  push        edi                     ; push p.X
05C51794  call        dword ptr ds:[54607F4h] ; Print(p)
05C5179A  push        esi                     ; push p.Y
05C5179B  push        edi                     ; push p.X
05C5179C  call        dword ptr ds:[54607F4h] ; Print(p)
05C517A2  pop         esi                     
05C517A3  pop         edi                     
05C517A4  pop         ebp                     
05C517A5  ret

In general, the topic of unwinding small cycles is of particular interest. While JIT-x86 likes to unwind them (it’s difficult to unwind a large cycle, but with a small one it’s much easier), RyuJIT (which is based on the code base of 32-bit JIT) refuses to unwind them. But JIT-x64 here can please us. Say he can take the code

int sum = 0;
for (int i = 0; i < 4; i++)
    sum += i;
Console.WriteLine(sum);

and calculate the value:

;        int sum = 0;                            
00007FFCC86F3EC0  sub         rsp,28h           
;        Console.WriteLine(sum);                 
00007FFCC86F3EC4  mov         ecx,6             ; sum = 6
00007FFCC86F3EC9  call        00007FFD273DCF10  
00007FFCC86F3ECE  nop                           
00007FFCC86F3ECF  add         rsp,28h           
00007FFCC86F3ED3  ret

But don’t think that RyuJIT is worse than JIT-x64. Yes, with optimizations in the new generation JIT compiler, everything is not so good, but on average, in a hospital, the code is more sane. Learn more about unwinding small loops here:

Want to know more about .NET internals?

Then come to our light! Soon, a series of CLRium # 2 seminars will be held in Moscow (April 03–04), Yekaterinburg (May 17) and St. Petersburg (May 29–30) (online translation is included). We will discuss the future of .NET: we will talk about the anatomy of the new CoreCLR, RyuJIT features, hardcore Roslyn examples and the offspring of CoreFx! An endless stream of interesting and useful knowledge will help you not only understand much better how your own C # programs work, but also prepare you for a bright .NET future in which you can use the platform's full power!

Tags: