ViacheslavMezentsev June 29, 2014 at 01:27

Code injection with benefit

The article describes a way to build a bridge between unmanaged and managed code using the Mathcad mathematical package as an example. The picture shows an example of how Chipmunk Tot is going to process his image using mathematical package tools. To do this, he “used” a custom function written in VB.Net, which implements the ability to connect to a webcam and create a picture. The result of the function is immediately available in the working document.

Source code

For the impatient, who wants to understand everything at once, running the code diagonally, I indicate the repository: NetEFI . You can also find test user libraries in three languages: c #, vb.net and c ++ / cli (VS2012, .Net 2.0, x86-32). So far, only a 32-bit implementation is available.

Background

In the mathematical program Mathcad there is the possibility of connecting third-party libraries. This user interface is called User EFI and was developed more than 10 years ago. Since then, it has not changed at all, although Mathcad itself has changed beyond recognition. There was a time when this interface was thrown out of the package, but old users requested it back and in the new versions of Mathcad Prime this rare interface is again more alive than all the living ones.

There is a fairly intelligible guide to creating custom libraries, I cited it at the end of the article. In short, the process looks something like this. We create a regular dll, where at the entry point, i.e. when loading it, register our functions. At the same time, in the function descriptor we indicate its address for the subsequent call from Mathcad directly. In addition, you can still register one table with error messages. The result returned by the user function in case of an error can be used to select messages from this table. That's the whole kitchen.

The function descriptor looks like this:

FUNCTIONINFO structure

typedef LRESULT (* LPCFUNCTION ) ( void * const, const void * const, ... );    
// The FUNCTIONINFO structure contains the information that Mathcad uses to register a
// user function. Refer below for each member and its description.
typedef struct tagFUNCTIONINFO {
    // Points to a NULL-terminated string that specifies the name of the user
    // function.
    char *  lpstrName;
    // Points to a NULL-terminated string that specifies the parameters of the
    // user function.
    char *  lpstrParameters; 
    // Points to a NULL-terminated string that specifies the function description.
    char *  lpstrDescription;
    // Pointer to the code that executes the user function.
    LPCFUNCTION lpfnMyCFunction;
    // Specifies the type of value returned by the function. The values are
    // COMPLEX_ARRAY or COMPLEX_SCALAR.
    long unsigned int returnType;
    // Specifies the number of arguments expected by the function. Must be
    // between 1 and MAX_ARGS.
    unsigned int nArgs;
    // Specifies an array of long unsigned integers containing input parameter
    // types.
    long unsigned int argType[ MAX_ARGS ];
} FUNCTIONINFO;

The problem is that today it would be much more convenient to write our own functions if we did this in .net languages. But the direct way to do this is through the use of C ++ / CLI. The option of “wrapping” each user function through an adapter to C ++ / CLI or marshaling structures, I think, can be immediately dismissed as impractical and requiring non-trivial knowledge from the user of the mathematical program. I want to offer a universal “wrapper” called .Net User EFI.

The question arises, how to create a universal function that could be registered instead of all the functions of all connected assemblies, but at the same time at the entry point it would have all the necessary information to call a specific function from a particular assembly. The intermediary library in which such a function is located should automatically work with any number of connected assemblies and functions in them.

To implement such universality, there is one significant problem. Mathcad requires you to specify the address of the called function, the prototype itself is declared as having a variable number of parameters. It turns out that at the entry point of the universal function, the stack with parameters will have a different size and there is no way to transfer this information when the function is called by standard means, because it is determined by the compiled code itself. In the structure above, only the address itself acts as a parameter by which we could distinguish the call of one function from another.

And then our thought should come to one well-known solution, which is called code injection. On a habr more than once wrote about it, but here are not many practical useful examples of the use of such a technique. In a sense, we will also intercept function calls from the dll, everything will look a little more specific, but much simpler.

Idea

So, what will we inject, introduce, where and why. Let’s clarify the situation again. We want to write a universal function that will uniformly process all calls and distribute them depending on the type of function being called. Mathcad should not “suspect” anything, but additional information must come from somewhere at the entry point of the universal function about call parameters.

The solution will be to dynamically generate the code at the address that we register in Mathcad. We will reserve in memory a lot of space for dynamic code. This code will carry out auxiliary work on passing parameters to a universal function. I will say in advance that two parameters are enough for us: this is the assembly number in the array of loaded assemblies and the function number from the assembly. There are two ways to pass parameters: global variables and the stack. I chose the first option, because It is easy to upset the stack balance (in which the parameters are located), but to restore it in our case, I think, will be difficult.

I forgot to mention that there are only three types of parameters for the user function and all of them are passed by pointer: MCSTRING, COMPLEXSCALAR and COMPLEXARRAY. Their maximum number is also limited - 10 pieces. This simplifies the implementation of parameter parsing in a universal function.

Implementation

Now we are mentally prepared to parse a specific sequence of events that should occur at the implementation stage and after it.

Step 1 . The user creates a .net class that implements the IFunction interface, which contains the necessary information about the function. Compiles it to the assembly and copies it to the userefi folder. Also in this folder there should be an intermediary assembly, we will call it netefi.

Step 2 . When Mathcad boots, the netefi mediator assembly is perceived as a user library. It searches for all .net assemblies in the current folder and enumerates the functions in them for the implementation of the IFunction interface.

Step 3. netefi stores information about assemblies and functions in them in internal arrays, and to determine a function, you need two numbers: the assembly index and the function index in it.

Step 4 . netefi iterates over all the functions and registers them in Mathcad in a standard way, but in the address field of the FUNCTIONINFO structure we write a link to the dynamic code, the form of which is determined by the two indexes from the previous step.

This is how a concrete implementation method implementation looks like:

Dynamic code

static int assemblyId = -1;
static int functionId = -1;
static PBYTE pCode = NULL;
#pragma unmanaged
LRESULT CallbackFunction( void * out, ... ) {
    return ::UserFunction( & out );
}
#pragma managed
// TODO: 64-bit.
void Manager::InjectCode( PBYTE & p, int k, int n ) {
    // Пересылка константы (номера сборки) в глобальную переменную.
    * p++ = 0xB8; // mov eax, imm32
    p[0] = k;
    p += sizeof( int );
    * p++ = 0xA3; // mov [assemblyId], eax
    ( int * & ) p[0] = & assemblyId; 
    p += sizeof( int * ); 
    // Пересылка константы (номера функции) в глобальную переменную.
    * p++ = 0xB8; // mov eax, imm32
    p[0] = n;
    p += sizeof( int );
    * p++ = 0xA3; // mov [functionId], eax
    ( int * & ) p[0] = & functionId; 
    p += sizeof( int * );         
    // jmp to CallbackFunction. 
    * p++ = 0xE9;
    ( UINT & ) p[0] = ( PBYTE ) ::CallbackFunction - 4 - p;
    p += sizeof( PBYTE );
}

The InjectCode () method is called in a loop when registering functions in Mathcad. The global variables assemblyId and functionId are used to determine the type of function during its invocation. It works like this. Mathcad for each function receives a link to such a dynamic code. In this case, the assembly index known at the time of loading (parameter k) is written to assemblyId, the function index is written to functionId - parameter n. Next is an unconditional transition to CallbackFunction (), in which our universal function is called. This is done so that managed code can be called in UserFunction (). The unmanaged / managed directives will not allow this in CallbackFunction ().

Note that the parameter of the universal function is the call to the CallbackFunction () stack, i.e. to an array of parameters (the return value is in the same place). The dynamic code does not spoil our stack, so after the CallbackFunction () is completed, control will return to Mathcad. That’s all magic.

Step 5 . After registration is complete, you can call the user-defined function in a Mathcad document. The universal function UserFunction () can now restore the type of a user function from the global parameters assemblyId and functionId and parse the stack, knowing the number and type of parameters.

Step 6. Each unmanaged type of a function parameter is replaced with an analogue: MCSTRING for String, COMPLEXSCALAR for TComplex (I did not use Complex from .Net 4.0 so that there was no conflict) and COMPLEXARRAY for TComplex [,].

Step 7 . The implementation of the IFunction.NumericEvaluation method for the function is called. The returned result goes through the reverse sequence of transformations and is given to Mathcad.

About implementation

I think that I explained this specific implementation method more or less clearly. As for the source code of the project itself, it is worth briefly mentioning the environment and some details. Visual Studio 2012, C ++ / CLI, .Net Framework 2.0 are used as the development environment (the corresponding mode is set in the project properties). Since dynamic code, generally speaking, depends on bit depth and I still don’t know exactly how to bring it to a 64-bit representation, all projects are configured to compile for 32-bit machines. Although I was told that there will not be many changes.

Using global variables is not good, but working in Mathcad does not involve calling multiple functions at once. Everything is done there in order, one after another.

In the mediation assembly, some more ideas are implemented that allow you to fully use the old interface in the new environment. This applies to error handling, and this should be written separately. All main code is concentrated in one single Manager class (netefi.cpp). Analyzing the test cases, we can understand how to work with the IFunction interface. All test cases in different languages do the same thing and are called almost the same.

Examples are tested in Mathcad 15 and Mathcad Prime 3.0. Since the User EFI interface itself has not changed for more than 10 years (and is unlikely to change already), you can use the described method in other versions of Mathcad, probably starting with version 11. In Mathcad Prime 3.0, user functions were given a new name - Custom Functions, although the filling is the same.

Test cases

As stated above, you can find them here . But the article would not be complete if you did not show the specific form of .net custom functions for Mathcad.

Let's see what the echo function will look like for one string parameter.

C # option

using System;
using NetEFI;
public class csecho: IFunction {
    public FunctionInfo Info {
        get { 
            return new FunctionInfo(  "csecho", "s", "return string",
                typeof( String ), new[] { typeof( String ) } );
        }
    }
    public FunctionInfo GetFunctionInfo( string lang ) { return Info; }
    public bool NumericEvaluation( object[] args, out object result ) {
        result = args[0];
        return true;
    }
}

VB.Net option

Imports NetEFI
Public Class vbecho
    Implements IFunction
    Public ReadOnly Property Info() As FunctionInfo _
        Implements IFunction.Info
        Get
            Return New FunctionInfo("vbecho", "s", "return string", _
                GetType([String]), New Type() {GetType([String])})
        End Get
    End Property
    Public Function GetFunctionInfo(lang As String) As FunctionInfo _
        Implements IFunction.GetFunctionInfo
        Return Info
    End Function
    Public Function NumericEvaluation(args As Object(), ByRef result As Object) As Boolean _
        Implements IFunction.NumericEvaluation
        result = args(0)
        Return True
    End Function
End Class

C ++ / CLI option

#pragma once
using namespace System;
using namespace System::Text;
using namespace NetEFI;
public ref class cppecho: public IFunction {
public:
    virtual property FunctionInfo^ Info {
        FunctionInfo^ get() { 
            return gcnew FunctionInfo( "cppecho", "s", "return string",
                String::typeid, gcnew array { String::typeid } );
        }
    }
    virtual FunctionInfo^ GetFunctionInfo(String^ lang) { return Info; }
    virtual bool NumericEvaluation( array< Object^ > ^ args, [Out] Object ^ % result ) {
        result = args[0];
        return true;
    }
};

Other

Although the main functionality is almost ready, there are some imperfections. For example, it is desirable that the work of a universal function is performed in a separate thread. This is one of the first things to do. Interruption by calling isUserInterrupted is not reflected in the new interface. All hope so far is that Mathcad itself can interrupt the function. I’ll think about it and it has something in common with work in a stream.

The current project so far only works on 32-bit systems. To add 64-bit configurations, you need to test the operation of dynamic code on 64-bit systems. There is no such opportunity yet.

Working with COM inside a user-defined function is now also apparently impossible. I came across this when I implemented the function for creating a picture from a webcam. One of the standard options was to use the interface to Clipboard, and so it didn’t work, saying that the thread should be with the STAThreadAttribute attribute. Solved the problem through Graphics.CopyFromScreen. Also need to understand.

Downloading missing assemblies has also not yet been done reliably, as used by Assembly :: LoadFile (). If to use Assembly :: LoadFrom (), then Mathcad hangs in this place. There is still a problem with debugging mixed code. For some reason, it didn’t work for me as it should. I practically debugged the code in my mind, only logs saved.

Maybe someone did something similar and could suggest good ideas to simplify my code. I will listen to all the practical options. It would be generally great if someone made my project work under the studio debugger in mixed mode. So far, only breakpoints in unmanaged code are working. In test cases, you can roam the code, of course.

References

0. How to generate and run native code dynamically?
1. Sources and test cases on github .
2. Creating a User DLL (pdf).
3. .Net User EFI interface (thread on the main PTC forum).
4. Sources and builds of the webcam example (in the same branch below).
5. Mathcad EFI plugin (my other project, which performs the inverse function, calls unmanaged code from managed code).

Tags: