C #: Internal structure of array initializers

Original author: Bart De Smet
Surely almost everyone who has dealt with C # knows of a similar construction:

int[] ints = new int[3] { 1,2,3 };//А если уж вдруг и не была известна, то отныне и впредь уж точно

It would be logical to expect this construction to turn into something similar:

int[] ints = new int[3]; 
ints[0] = 1; 
ints[1] = 2; 
ints[2] = 3;

Alas and oh, in fact, the nut is much more wrinkled than it seems at first glance, and there are some subtleties that will be indicated later. Until then, we put on a worn "IL freak" T-shirt (who has one) and plunge into the bowels of the implementation.

Ultimately, the first construction will turn the compiler into such a squiggle:

What a wonder in my garden? What is thisblablablabullshit? Before I tell you what's what, let's take a look at Q :: Main , where I pointed to the value at the top of the stack before each line of code:

.method private hidebysig static void  Main() cil managed 
  // Code size       20 (0x14) 
  .maxstack  3 
  .locals init (int32[] V_0) 
  IL_0000:  nop 
  // {} 
  IL_0001:  ldc.i4.3 
  // {3} 
  IL_0002:  newarr     [mscorlib]System.Int32 
  // {&int[3]} 
  IL_0007:  dup 
  // {&int[3], &int[3]} 
  IL_0008:  ldtoken    field valuetype '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}'/'__StaticArrayInitTypeSize=12' '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}'::'$$method0x6000001-1' 
  // {&int[3], &int[3], #'$$method0x6000001-1'} 
  IL_000d:  call       void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, 
                                                                                                      valuetype [mscorlib]System.RuntimeFieldHandle) 
  // {&int[3]} 
  IL_0012:  stloc.0 
  // {} 
  IL_0013:  ret 
} // end of method Q::Main

Let's now do a line-by-line analysis:
IL_0001 and IL_0002 - a new array of type System.Int32 and dimension 3 is created .
On IL_0007 we come across the first surprise in the form of a duplicated array reference. Why? Suppose that an array is initializing on IL_0008 and IL_0009 (very soon we will return to this place). Now let's look at IL_0012 , where the value at the top of the stack - again the array - is assigned to a local variable with index 0, i.e. variable ints . But what if we assign the value of the ints variable to IL_0007 ? And this will happen:

newarr     [mscorlib]System.Int32 
stloc.0 //внимание сюда
ldloc.0 //и сюда
ldtoken    field valuetype '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}'/'__StaticArrayInitTypeSize=12' '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}'::'$$method0x6000001-1' 
call       void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, 
                                                                                          valuetype [mscorlib]System.RuntimeFieldHandle)

The assignment will no longer be atomic: from now on, the external observer will notice the array in an uninitialized state, without elements. This is exactly what the lines IL_0008 and IL_0009 do . T.O. the code given at the very beginning is not equivalent to the construction:

nt[] ints = new int[3]; 
ints[0] = 1; 
ints[1] = 2; 
ints[2] = 3;

But rather, it is something like this:

int[] t = new int[3]; 
t[0] = 1; 
t[1] = 2; 
t[2] = 3; 
int[] ints = t;

Although the implementation avoids creating two local variables. This moves us to two abstruse lines of code:

 IL_0008:  ldtoken    field valuetype '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}'/'__StaticArrayInitTypeSize=12' '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}'::'$$method0x6000001-1' 
  IL_000d:  call       void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array,  valuetype [mscorlib]System.RuntimeFieldHandle)

But there is nothing complicated and / or terrible in this. In fact, we observe a call to RuntimeHelpers.InitializeArray , which fills the field whose token is pushed onto the stack on IL_0008 , with an array, the link to which is at the top of the stack after IL_0007 is executed . The value of the token corresponds to the picture below:

In fact, the highlighted line is a static field in a private and, obviously, generated by the compiler, class with an obviously unpronounceable name. A couple of points to pay attention to. Firstly, this class has a nested class called __StaticArrayInitTypeSize = 12 . It is an array of actual size of 12 bytes (4 bytes per each element of System.Int32, the size of each is 4 bytes, total 12). Secondly, it should be noted that the type inherits System.ValueType (I seriously hope that readers are familiar with the fate of instances of significant types after they are created on the stack, so we won’t get stuck with this - author's note. ). But how does the type get those 12 bytes? Obviously, simply slipping the name is not enough for clr to allocate the required amount of memory, so if you look at the implementation through ILDASM you will see this:

.class private auto ansi '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}' 
       extends [mscorlib]System.Object 
  .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 ) 
  .class explicit ansi sealed nested private '__StaticArrayInitTypeSize=12' 
         extends [mscorlib]System.ValueType 
    .pack 1 
    .size 12 //внимание сюда и тут
  } // end of class '__StaticArrayInitTypeSize=12'
  .field static assembly valuetype '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}'/'__StaticArrayInitTypeSize=12' '$$method0x6000001-1' at I_00002050 
} // end of class '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}'

The .size directive somehow tells us and clr that we need to allocate a memory block of 12 bytes at the time of creating an instance of this type. If you are curious about the role of the .pack directive , then the essence is simple: this directive indicates alignment by the specified power of two (only values ​​from 2 to 128 are supported (with a value of 1, alignment is obviously absent - approx. Translations )). Needed for COM compatibility. Let's go back to the field:

.field static assembly valuetype '{8C802ECE-B24C-4A20-AE34-9303FE2DD066}'/'__StaticArrayInitTypeSize=12' '$$method0x6000001-1' at I_00002050

The type is quite simple, despite the fact that the name is quite long due to the nesting of types. In our case, '$$ method0x6000001-1' is the name of the field. But the fun begins after the " at ". This is the so-called data-label , which, in turn, is a piece of data somewhere in the PE file at a given offset. Directly in ILADSM you will see something like this:

.data cil I_00002050 = bytearray ( 
                 01 00 00 00 02 00 00 00 03 00 00 00)

This is the declaration of the data label , which is, as already seen, a sequence of bytes of the final array in little-endian. Now we need to understand how InitailizeArray works :

call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)

An instance of the array is transferred (we already created it with the commands IL_0001 , IL_0002 ) and a pointer to the field specified after the keyword " at ", in which the array data is wrapped. T.O. runtime is able to calculate the required number of bytes to read at a given address, thus constructing an array. In turn, the meaning of the value I_0000 2050 does not constitute any mystery - it is the most extraordinary RVA . You can verify this using dumpbin:

But there is an equally interesting detail: the compiler reuses the __StaticArrayInitTypeSize type when arrays occupy the same amount of memory space. T.O. listing:

int[]  ints  = { 1, 2, 3, 4, 5, 6, 7, 8 }; 
long[] longs = { 1, 2, 3, 4 }; 
byte[] bytes = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32 };

It forces the compiler to use the same type, because all arrays in memory occupy 32 bytes each:

.field static assembly valuetype '{AA6C9D77-5FAD-47E0-8B55-1D8739074F1F}'/'__StaticArrayInitTypeSize=32' '$$method0x6000001-1' at I_00002050 
.field static assembly valuetype '{AA6C9D77-5FAD-47E0-8B55-1D8739074F1F}'/'__StaticArrayInitTypeSize=32' '$$method0x6000001-2' at I_00002070 
.field static assembly valuetype '{AA6C9D77-5FAD-47E0-8B55-1D8739074F1F}'/'__StaticArrayInitTypeSize=32' '$$method0x6000001-3' at I_00002090 
.data cil I_00002050 = bytearray ( 
                 01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00 
                 05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00) 
.data cil I_00002070 = bytearray ( 
                 01 00 00 00 00 00 00 00 02 00 00 00 00 00 00 00 
                 03 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00) 
.data cil I_00002090 = bytearray ( 
                 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 
                 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20)

So, for arrays of sizes 1 and 2 elements , such IL code will be generated:
.method private hidebysig static void  Main() cil managed 
  // Code size       19 (0x13) 
  .maxstack  3 
  .locals init (int32[] V_0, 
           int32[] V_1) 
  IL_0000:  nop 
  IL_0001:  ldc.i4.2 
  IL_0002:  newarr     [mscorlib]System.Int32 
  IL_0007:  stloc.1 
  // V_1[0] = 1 
  IL_0008:  ldloc.1 
  IL_0009:  ldc.i4.0 
  IL_000a:  ldc.i4.1 
  IL_000b:  stelem.i4 
  // V_1[1] = 2 
  IL_000c:  ldloc.1 
  IL_000d:  ldc.i4.1 
  IL_000e:  ldc.i4.2 
  IL_000f:  stelem.i4 
  // V_0 = V_1 
  IL_0010:  ldloc.1 
  IL_0011:  stloc.0 
  IL_0012:  ret 
} // end of method Q::Main

And here, in fact, is the same trick with two local variables: one of them is temporary, in which the values ​​are placed as the array fills, after which the link to the array is transferred to the main variable. The reasons for this approach (with a separate method for filling the array) are obvious: in the case of a naive implementation, we would have 4 commands for each element, which would increase the volume of the code for constructing the array linearly proportional to the size of the array, instead the code volume is constant.

ps This article describes the behavior of the compilers C # 2.0 and 3.0 versions from Microsoft. The behavior of the code generated by compilers of other versions or compilers from third-party developers (for example, Mono) may differ from the one given in the article.

Also popular now: