Orlovski March 19, 2018 at 15:28

Using Reflection.Emit to Precompile Expressions in MSIL

From the sandbox

Hello, Habr! I present to you the translation of the article " Using Reflection.Emit to Precompile Expressions to MSIL " by Steve Marsh.

Introduction

The classes in this project allow you to parse text expressions entered by the user and compile them into a .NET assembly. This assembly can be performed on the fly or stored in a DLL. Precompilation of expressions allows for a high level of portability and allows very efficient evaluation of user-entered logic. In addition, we can use the ildasm.exe tool from Microsoft to open and verify the generated MSIL base code. There are many interesting features that come with the .NET platform, in my opinion the Reflection.Emit namespace offers much more than you can find. The Reflection.Emit namespace allows you to create your own .NET code at run time by dynamically creating .NET types and inserting MSIL instructions into the body. MSIL is Microsoft's middleware language for the .NET platform. IL is what your C # and VB.NET code compiles into and is sent to the JIT compiler when you run .NET programs. MSIL is a very low-level language that is very fast, and working with it gives you exceptional control over your programs. I will not go into details about MSIL in this article, but there are several other resources available on the Internet, and if you are interested in learning more, I have included some links at the end of this article.

reference Information

Let's take a quick look at what our parser / compiler will do. The user enters a string expression that matches the grammar of our parser. This expression will be turned into a tiny .NET program that will run and
output the result. To do this, the analyzer will read in a sequential list of characters and split it into a hierarchical tree, as shown below. Nodes are evaluated in this order. When a node maps, the corresponding command is invoked for that type of node. For example, when a number matches, we push that number onto the stack. When the token “*” is consistent, we invoke the multiplication instruction and so on. Adding all the instructions in the correct order gives us the “program” shown on the right.

Now let's see how our program executes and compares it with the original text expression. The first two commands push the integers 3 and 2 onto the stack. The multiply command pulls these two values from the stack, multiplies them and sends the result of 6 back to the stack. Instruction number 4 sends the integer 1 to the stack. Instruction No. 5 pushes two values (6 and 1), adds them and returns the result (7) back to the stack. Finally, the return command pops the value 7 from the stack and returns it as the result. Brilliant! This may seem simple and obvious to most programmers, but this smart idea is pretty much the basis for programming and compilation, and I think it's worth a look. This is what this program looks like in MSIL. For example, ldc.r8 is a load constant command and loads double 3.0 onto the stack.

IL_0000: ldc.r8 3.
IL_0009: ldc.r8 2.
IL_0012: mul
IL_0013: ldc.r8 1.
IL_001c: add
IL_0023: ret

Code usage

This project contains two classes for parsing an expression and compiling it into MSIL. The first class is RuleParser, which is an abstract parsing class that contains all the lexing and parsing logic for our particular grammar. This class parses the message, but takes no action. The above code snippet shows that when a ttAdd token is found, the parser calls the matchAdd () method, which is an abstract method defined in the RuleParser class. The implementation of the class method and the corresponding semantic action depends on the particular class. This template allows us to implement a separate concrete class for processing semantic actions and means that we can implement different concrete classes depending on what we are trying to execute. This code was previously configured to evaluate expressions on the fly by computing nodes as soon as they were found. Now we can exchange our MsilParser to compile the expression into an IL program using the same parser class. MsilParser does this by implementing all the necessary token functions and emitting the appropriate IL instructions. For example, the matchAdd () function simply inserts the Add command. When the variable is mapped, we load the variable name with the Ldstr command, and then call the GetVar method.

protected override void matchAdd()
{
    this.il.Emit(OpCodes.Add);
}
protected override void matchVar()
{
    string s = tokenValue.ToString();
    il.Emit(OpCodes.Ldstr, s);
    il.Emit(OpCodes.Call, typeof(MsilParser).GetMethod(
            "GetVar", new Type[] { typeof(string) }));
}

After setting all the tokens, we can call the CompileMsil () method of our MsilParser class, which starts the parser and returns the compiled .NET type using AssemblyBuilder classes in the Relection.Emit namespace.

/// 
/// Builds and returns a dynamic assembly
/// 
public Type CompileMsil(string expr)
{
    // Build the dynamic assembly
    string assemblyName = "Expression";
    string modName = "expression.dll";
    string typeName = "Expression";
    string methodName = "RunExpression";
    AssemblyName name = new AssemblyName(assemblyName);
    AppDomain domain = System.Threading.Thread.GetDomain();
    AssemblyBuilder builder = domain.DefineDynamicAssembly(
      name, AssemblyBuilderAccess.RunAndSave);
    ModuleBuilder module = builder.DefineDynamicModule
      (modName, true);
    TypeBuilder typeBuilder = module.DefineType(typeName,
      TypeAttributes.Public | TypeAttributes.Class);
    MethodBuilder methodBuilder = typeBuilder.DefineMethod(methodName,
      MethodAttributes.HideBySig | MethodAttributes.Static
      | MethodAttributes.Public,
      typeof(Object), new Type[] {  });
    // Create the ILGenerator to insert code into our method body
    ILGenerator ilGenerator = methodBuilder.GetILGenerator();
    this.il = ilGenerator;
    // Parse the expression. This will insert MSIL instructions
    this.Run(expr);
    // Finish the method by boxing the result as Double
    this.il.Emit(OpCodes.Conv_R8);
    this.il.Emit(OpCodes.Box, typeof(Double));
    this.il.Emit(OpCodes.Ret);
    // Create and save the Assembly and return the type
    Type myClass = typeBuilder.CreateType();
    builder.Save(modName);
    return myClass;
}

The end result is a .NET assembly that can be executed, cached, or saved to disk. Here's a look at the IL code for our method, which was created by our compiler:

.method public hidebysig static object
        RunExpression() cil managed
 {
   // Code size       36 (0x24)
   .maxstack  2
   IL_0000:  ldc.r8     3.
   IL_0009:  ldc.r8     2.
   IL_0012:  mul
   IL_0013:  ldc.r8     1.
   IL_001c:  add
   IL_001d:  conv.r8
   IL_001e:  box        [mscorlib]System.Double
   IL_0023:  ret
 } // end of method Expression::RunExpression

The main advantage of this approach is that it takes much longer to parse an expression than just executing instructions. Before compiling the expression in IL, we only need to parse the expression once, and not every time it is evaluated. Although this example uses only one expression, the actual implementation may include thousands of expressions precompiled and executed on demand. In addition, we also have our code packaged in a good .NET DLL, and we can do whatever we want. This example can be estimated more than 1 million times faster than in 3 hundredths of a second!

Using a sample project

An example project allows you to enter an expression in the upper left text box. When you click Analysis, the form will parse the expression and create a .NET assembly with your compiled code in the RunExpression () function. Then the program will call this function a certain number of times and show how long it took to execute it. Finally, the program will save the assembly as expression.dll and run the ildasm.exe file from Microsoft to display the complete MSIL code for the assembly so that you can see the code that was generated for your program.

Issues of Interest

How our dynamic method is called will significantly affect performance. For example, simply using the Invoke () method in a dynamic method will significantly slow down performance when called 1 million times. Using a general delegate subscription, as in the code below, gives us about a 20-fold increase in performance.

// Parse the expression and build our dynamic method
MsilParser em = new MsilParser();
Type t = em.CompileMsil(textBox1.Text);         
// Get a typed delegate reference to our method. This is very 
// important for efficient calls!
MethodInfo m = t.GetMethod("RunExpression");
Delegate d = Delegate.CreateDelegate(typeof(MsilParser.ExpressionInvoker<>), m);
MsilParser.ExpressionInvoker<> method = 
(MsilParser.ExpressionInvoker<>)d;
// Call the function
Object result = method();

* in empty angle brackets should be Object.

Call ILDASM.EXE

The sample project will also allow you to view all of the MSIL code for your newly created assembly. He does this by calling ildasm.exe in the background and displaying the result in a text box. Ildasm.exe is a very useful tool for those working with IL code or the System.Reflection.Emit namespace. The code below shows how to use this executable in your program using the System.Diagnostics namespace. Check out the Microsoft documentation for ildasm.exe from the links below.

// Save the Assembly and generate the MSIL code with ILDASM.EXE
string modName = "expression.dll";
Process p = new Process();
p.StartInfo.FileName = "ildasm.exe";
p.StartInfo.Arguments = "/text /nobar \"" + modName;
p.StartInfo.UseShellExecute = false;
p.StartInfo.CreateNoWindow = true;
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
p.Start();
string s = p.StandardOutput.ReadToEnd();
p.WaitForExit();
p.Close();
txtMsil.Text = s;

References:

Tags: