IL2CPP: generated code tour

Original author: Josh Peterson
  • Transfer
Here is the second article in a series on IL2CPP. This time we will talk about C ++ code generated by il2cpp.exe utility, and also consider the representation of managed types in machine code, run-time checks that are used to support the .NET virtual machine, loop generation, and much more.



We will use very specific code for this, which is likely to change in future versions of Unity. But the basic principles will remain unchanged.

Project Example

For this example, I will use the latest available version of Unity 5.0.1p1. As in the previous article, I will create a new empty project and add one script with the following contents:

using UnityEngine;
public class HelloWorld : MonoBehaviour {
  private class Important {
    public static int ClassIdentifier = 42;
    public int InstanceIdentifier;
  }
  void Start () {
    Debug.Log("Hello, IL2CPP!");
    Debug.LogFormat("Static field: {0}", Important.ClassIdentifier);
    var importantData = new [] { 
      new Important { InstanceIdentifier = 0 },
      new Important { InstanceIdentifier = 1 } };
    Debug.LogFormat("First value: {0}", importantData[0].InstanceIdentifier);
    Debug.LogFormat("Second value: {0}", importantData[1].InstanceIdentifier);
    try {
      throw new InvalidOperationException("Don't panic");
    }
    catch (InvalidOperationException e) {
      Debug.Log(e.Message);
    }
    for (var i = 0; i < 3; ++i) {
      Debug.LogFormat("Loop iteration: {0}", i);
    }
  }
}


I will build this project under WebGL using the Unity editor on Windows. To get relatively good names in the generated C ++ code, I turned on the Development Player option in Build Settings. In addition, I set Full to Enable Exceptions in WebGL Player Settings.

Overview of the generated code

After the assembly is completed, the generated C ++ code can be found in the Temp \ StagingArea \ Data \ il2cppOutput directory in the project folder. As soon as I close the editor, this directory will be deleted, but while it is open, you can carefully examine it.

The il2cpp.exe utility has generated many files even for such a small project: 4625 header files and 89 C ++ source files. To test this amount of code, I prefer to use a text editor with Exuberant CTags support .Typically, CTags quickly generates a tag file, which greatly simplifies code navigation.

You may notice that many of the generated C ++ files do not contain simple code from our script, but converted code from standard libraries such as mscorlib.dll. As mentioned in a previous article, the IL2CPP scripting engine uses the same standard library code as Mono. Please note that we convert the code mscorlib.dll and other standard libraries every time you run il2cpp.exe. This may seem unnecessary as the code does not change.

The fact is that IL2CPP always clears the bytecode to reduce the size of the executable. Therefore, even small changes in the script code can lead to the fact that various parts of the code of the standard library will be used or not, depending on the circumstances. Therefore, mscorlib.dll must be converted at every build. We are trying to improve the incremental assembly process, but so far without much success.

Mapping managed code in generated C ++ code

For each type in managed code, il2cpp.exe generates 2 header files: for determining the type and declaring methods for this type. For example, let's look at the contents of the converted type UnityEngine.Vector3. The header file for this type is called UnityEngine_UnityEngine_Vector3.h. The name is created based on the assembly name (UnityEngine.dll), namespace, and type name. The code is as follows:

// UnityEngine.Vector3
struct Vector3_t78 
{
  // System.Single UnityEngine.Vector3::x
  float ___x_1;
  // System.Single UnityEngine.Vector3::y
  float ___y_2;
  // System.Single UnityEngine.Vector3::z
  float ___z_3;
};


The il2cpp.exe utility converts each of the three fields of the instance and slightly changes the names using the initial underscores to avoid possible conflicts with reserved words. We use reserved names in C ++, but so far we have never seen them conflict with the code of standard libraries.

The UnityEngine_UnityEngine_Vector3MethodDeclarations.h file contains declarations for all methods in Vector3. For example, Vector3 overrides the Object.ToString method:

// System.String UnityEngine.Vector3::ToString()
extern "C" String_t* Vector3_ToString_m2315 (Vector3_t78 * __this, MethodInfo* method) IL2CPP_METHOD_ATTR


Notice the comment that identifies the managed method that represents the original ad. This can be useful for finding output files by the name of a managed method in this format, especially for methods with common names, such as ToString.
The methods converted by il2cpp.exe have several interesting features:

• They are not member functions in C ++, but are free functions with the this pointer as the first argument. For the first argument of static functions in managed code, IL2CPP always passes NULL. By declaring methods with the this pointer as the first argument, we simplify code generation in il2cpp.exe and call methods through other methods (e.g. delegates) for the generated code.

• Each method has an additional argument of type MethodInfo * containing metadata about the method, which can be used, for example, to call a virtual method. Mono uses platform-specific transports to convey this metadata. But in the case of IL2CPP, we decided not to use them to improve portability.
• All methods are declared via extern “C” so that il2cpp.exe can trick the C ++ compiler if necessary and treat all methods as if they were of the same type.

• Type names contain the suffix "_t", method names contain the suffix "_m". Name conflicts are resolved by adding a unique number for each name. In case of any changes in the user script code, these numbers also change, so you should not rely on them when switching to a new assembly.

The first 2 points imply that each method has at least 2 parameters: the this pointer and the MethodInfo pointer. Do these options add extra resources? Yes, they add, but this does not affect performance, as it might seem at first glance. At least that's what profiling results say.

Let's move on to defining the ToString method using Ctags. It is located in the Bulk_UnityEngine_0.cpp file. The code in this method definition is not like the C # code in the Vector3 :: ToString () method. However, if you use a tool like ILSpy to view the code of the Vector3 :: ToString () method, you may notice that the generated C ++ code is very similar to the IL code.

Why does il2cpp.exe not generate a separate C ++ file for defining methods of each type, how does it do for declaring methods? The Bulk_UnityEngine_0.cpp file is quite large - 20,481 lines! The used C ++ compilers hardly cope with a large number of source files. Compiling 4,000 .cpp files took longer than compiling the same source code in 80 .cpp files. Therefore, il2cpp.exe divides the method definitions for types into groups and generates one C ++ file for each of them.

Now let's go back to the header file of the method declaration and pay attention to the line at the top of the file:

#include "codegen/il2cpp-codegen.h"


The il2cpp-codegen.h file contains an interface through which the generated code accesses the libil2cpp environment. Later we will discuss several ways to use this environment.

Method Prolog

Let's look at the definition of the Vector3 :: ToString () method, namely the generic prolog created by il2cpp.exe for all methods.

StackTraceSentry _stackTraceSentry(&Vector3_ToString_m2315_MethodInfo);
static bool Vector3_ToString_m2315_init;
if (!Vector3_ToString_m2315_init)
{
  ObjectU5BU5D_t4_il2cpp_TypeInfo_var = il2cpp_codegen_class_from_type(&ObjectU5BU5D_t4_0_0_0);
  Vector3_ToString_m2315_init = true;
}


The first line of the prolog creates a local variable of type StackTraceSentry. It is used to track the stack of managed calls, for example, using Environment.StackTrace. Actually, the generation of this code is optional, in this case it started because of passing the --enable-stacktrace argument to il2cpp.exe (since I set the Full value to Enable Exceptions in WebGL Player Settings). We found that for small functions, this variable increases the cost of resources and negatively affects performance. Therefore, we never add this code for iOS and other platforms where you can get stack trace information without it. The WebGL platform does not support stack tracing, so for correct operation it is necessary to allow managed code exceptions.

The second part of the prolog starts “lazy” initialization of the metadata type for any array or universal types used in the method body. Thus, ObjectU5BU5D_t4 is a name of type System.Object []. This part of the prolog is executed only once and does nothing if the type has already been initialized, therefore, no negative impact on performance has been noticed.

But what about streaming security? What if two threads call Vector3 :: ToString () at the same time? It's okay: all the code in the libil2cpp environment used to initialize the type is safe to call from multiple threads. Most likely, the il2cpp_codegen_class_from_type function will be called several times, but in fact it will work only once, in one thread. Method execution will not resume until initialization is complete. Therefore, this method prologue is thread safe.

Checks at Run Time

The next part of the method creates an array of objects, stores the value of the X field for Vector3 in a local variable, then packs this variable and adds it to the array with a zero index. The generated C ++ code (with comments) looks like this:

// Create a new single-dimension, zero-based object array
ObjectU5BU5D_t4* L_0 = ((ObjectU5BU5D_t4*)SZArrayNew(ObjectU5BU5D_t4_il2cpp_TypeInfo_var, 3));
// Store the Vector3::x field in a local
float L_1 = (__this->___x_1);
float L_2 = L_1;
// Box the float instance, since it is a value type.
Object_t * L_3 = Box(InitializedTypeInfo(&Single_t264_il2cpp_TypeInfo), &L_2);
// Here are three important runtime checks
NullCheck(L_0);
IL2CPP_ARRAY_BOUNDS_CHECK(L_0, 0);
ArrayElementTypeCheck (L_0, L_3);
// Store the boxed value in the array at index 0
*((Object_t **)(Object_t **)SZArrayLdElema(L_0, 0)) = (Object_t *)L_3;


Il2cpp.exe adds 3 checks that are missing from the IL code:

• If the array is NULL, the NullCheck check throws a NullReferenceException.
• If the array index is incorrect, the IL2CPP_ARRAY_BOUNDS_CHECK check throws an IndexOutOfRangeException.
• If the type of the element being added to the array is incorrect, the ArrayElementTypeCheck throws an ArrayTypeMismatchException.

These runtime checks ensure that the data for the .NET virtual machine is correct. Instead of injecting code, Mono uses the mechanisms of the target platform to process these checks. In the case of IL2CPP, we wanted to cover as many platforms as possible, including such as WebGL, which did not have their own verification mechanism. Therefore, the il2cpp.exe utility implements these checks itself.

Do these checks create performance problems? In most cases, no problems were noticed. Moreover, validations provide additional benefits and security for the .NET virtual machine. In some individual cases, we still recorded a decrease in performance, especially in continuous cycles. Now we are trying to find a way that allows managed code to remove dynamic checks when il2cpp.exe generates C ++ code. Keep for updates.

Static fields

Now that we have seen what the instance fields look like (using Vector3 as an example), let's see how static fields are converted and how they are accessed. First, find the definition of the HelloWorld_Start_m3 method, which is in the Bulk_Assembly-CSharp_0.cpp file in my assembly, and then move on to the Important_t1 type (in the AssemblyU2DCSharp_HelloWorld_Important.h file):

struct Important_t1  : public Object_t
{
  // System.Int32 HelloWorld/Important::InstanceIdentifier
  int32_t ___InstanceIdentifier_1;
};
struct Important_t1_StaticFields
{
  // System.Int32 HelloWorld/Important::ClassIdentifier
  int32_t ___ClassIdentifier_0;
};


Note that il2cpp.exe has created a separate C ++ structure to provide a static field that is accessible to all instances of this type. Thus, at run time, one instance of type Important_t1_StaticFields will be created, and all instances of type Important_t1 will use it as a static field. In the generated code, access to the static field is as follows:

int32_t L_1 = (((Important_t1_StaticFields*)InitializedTypeInfo(&Important_t1_il2cpp_TypeInfo)->static_fields)->___ClassIdentifier_0);


The type metadata for Important_t1 contains a pointer to one instance of the type Important_t1_StaticFields, as well as information that this instance is used to get the value of a static field.

Exceptions

Il2cpp.exe converts managed exceptions to C ++ exceptions. We chose this approach so that, again, it does not depend on specific platforms. When il2cpp.exe needs to generate code to create a managed exception, it calls the il2cpp_codegen_raise_exception function. The code for calling and catching managed exceptions in our HelloWorld_Start_m3 method looks like this:

try
{ // begin try (depth: 1)
  InvalidOperationException_t7 * L_17 = (InvalidOperationException_t7 *)il2cpp_codegen_object_new (InitializedTypeInfo(&InvalidOperationException_t7_il2cpp_TypeInfo));
  InvalidOperationException__ctor_m8(L_17, (String_t*) &_stringLiteral5, /*hidden argument*/&InvalidOperationException__ctor_m8_MethodInfo);
  il2cpp_codegen_raise_exception(L_17);
  // IL_0092: leave IL_00a8
  goto IL_00a8;
} // end try (depth: 1)
catch(Il2CppExceptionWrapper& e)
{
  __exception_local = (Exception_t8 *)e.ex;
  if(il2cpp_codegen_class_is_assignable_from (&InvalidOperationException_t7_il2cpp_TypeInfo, e.ex->object.klass))
  goto IL_0097;
  throw e;
}
IL_0097:
{ // begin catch(System.InvalidOperationException)
  V_1 = ((InvalidOperationException_t7 *)__exception_local);
  NullCheck(V_1);
  String_t* L_18 = (String_t*)VirtFuncInvoker0< String_t* >::Invoke(&Exception_get_Message_m9_MethodInfo, V_1);
  Debug_Log_m6(NULL /*static, unused*/, L_18, /*hidden argument*/&Debug_Log_m6_MethodInfo);
// IL_00a3: leave IL_00a8
  goto IL_00a8;
} // end catch (depth: 1)


All managed exceptions are wrapped in type Il2CppExceptionWrapper. When the generated code catches an exception of this type, it unpacks its C ++ representation (having the type Exception_t8). In this case, we are only looking for an InvalidOperationException, so if we do not find an exception of this type, C ++ will throw a copy again. If we find an exception of this type, the code will start the interception handler and display an exception message.

Goto ?!

An interesting question arises: what do goto labels and operators do here? These constructs do not have to be used in structural programming. The fact is that IL does not use the principles of structural programming, such as loops and conditional statements. This is a low-level language, therefore il2cpp.exe adheres to low-level concepts in the generated code.

As an example, consider the for loop in the HelloWorld_Start_m3 method:

IL_00a8:
{
  V_2 = 0;
  goto IL_00cc;
}
IL_00af:
{
  ObjectU5BU5D_t4* L_19 = ((ObjectU5BU5D_t4*)SZArrayNew(ObjectU5BU5D_t4_il2cpp_TypeInfo_var, 1));
  int32_t L_20 = V_2;
  Object_t * L_21 =
Box(InitializedTypeInfo(&Int32_t5_il2cpp_TypeInfo), &L_20);
  NullCheck(L_19);
  IL2CPP_ARRAY_BOUNDS_CHECK(L_19, 0);
  ArrayElementTypeCheck (L_19, L_21);
*((Object_t **)(Object_t **)SZArrayLdElema(L_19, 0)) = (Object_t *)L_21;
  Debug_LogFormat_m7(NULL /*static, unused*/, (String_t*) &_stringLiteral6, L_19, /*hidden argument*/&Debug_LogFormat_m7_MethodInfo);
  V_2 = ((int32_t)(V_2+1));
}
IL_00cc:
{
  if ((((int32_t)V_2) < ((int32_t)3)))
  {
    goto IL_00af;
  }
}


Variable V_2 is the loop index. At the beginning, it has the value 0, then increases at the bottom of the loop in this line:

V_2 = ((int32_t)(V_2+1));


The termination condition is checked here:

if ((((int32_t)V_2) < ((int32_t)3)))


As long as V_2 is less than three, the goto statement jumps to the label IL_00af, which is the top of the loop body. As you might have guessed, at the moment, il2cpp.exe generates C ++ code directly from IL without using an intermediate abstract representation of the syntax tree. You may also have noticed that in the section "Checks at runtime" in the code there are such fragments:

float L_1 = (__this->___x_1);
float L_2 = L_1;


Obviously, the variable L_2 is superfluous here. Despite the fact that in most C ++ compilers it is eliminated, we would like to avoid its appearance in the code altogether. We are now considering using an abstract syntax tree to better understand IL code and generate better C ++ code for cases where local variables and loops are used.

Conclusion

We have covered only a small portion of the C ++ code generated by IL2CPP for a very simple project. Now I recommend you take a look at the generated code of your own project. Keep in mind that in future versions of Unity C ++ code will look different, as we continue to improve the quality and performance of IL2CPP technology.

By converting IL code to C ++, we were able to achieve a good balance between portability and performance. We got a lot of managed code features useful for developers, while preserving the advantages of machine code that the C ++ compiler provides for various platforms.

In future posts, we will talk about the generated code in more detail: we will consider method calls and the distribution of their implementations and wrappers for calling native libraries. And next time we will debug the generated code for the 64-bit version of iOS using Xcode.

Also popular now: