zebraxxl December 10, 2014 at 05:06

.NET / Mono in Java? Easy!

Hello. I want to introduce my project - the .NET / Mono compiler in Java. The aim of the project is to create a compiler, and a set of standard libraries that allow you to transfer written applications and libraries to the Java platform, version 1.6 and higher. Of the similar projects, I only know the dot42 project. But it is imprisoned for Android and has its own standard library that is not entirely compatible with .NET / Mono.

So far, there is only an alpha version, and therefore the compiler is not suitable for real use, but it is partially operational, it generates valid Java code and supports part of the ECMA-335 standard.

Source codes on github.com: https://github.com/zebraxxl/CIL2Java

The compiler is a console application that, when launched without parameters, displays help. So with the use of problems should not arise.

I also want to note that when working, the compiler proceeds from the assumption that all the code submitted to it at the input is valid.

What is not supported

Immediately I want to stipulate that at the moment it is not supported:

Unmanaged pointers
Mathematics over pointers (both managed and unmanaged)
P / invoke
Non-vector arrays (arrays with a lower border not equal to 0)
The switch statement on type long
Opcode calli
Exception filters

The last three points are planned to be implemented in the near future, at the expense of the rest - these are long-term plans, if they can be implemented at all.

How it works

Compilation is divided into three large stages. In the first step, the compiler loads and converts all types used into internal representation. The second is preparing for the third stage. The third stage converts meta-information and compiles the code.

The conversion of types to internal representation occurs on the fly during their loading. That is, a type is taken, converted to an internal representation, then added to the compilation list. Then all fields and methods are taken from the original type and are also converted into an internal representation. Thus, all types, their fields and methods from explicitly specified as input assemblies are added to the compilation list.

But also, during the conversion of any type, field or method (hereinafter referred to as a member), they are also loaded, converted into an internal representation and added to the compilation list, all members on which the original member depends. In this case, only really used members are added. Thus, after the first stage in the compilation list there will be only those types that are in the source assemblies, plus those types and their members that are necessary. Due to this, at the output we get compiled source assemblies and the necessary pieces of the rest. For example, take the code:

using System;
namespace TestConsole
{
    public class Program
    {
        public static void  Main(string[] args)
        {
            Console.WriteLine("Hello world");
        }
        // Точка входа для Java
        public static void main(string[] args)
        {
            Main(args);
        }
    }
}

After the first step, the compilation list will look like this:

0: type Foo.Program
    methods:
        Main
        main
1: type System.Console
    methods:
        Writeline
2: type System.String

It is also necessary to note here about the mechanism of substitution of assemblies. If a reference is found to a type that is in an external assembly (not specified as an input), this assembly will be automatically loaded. However, what if the loaded assembly has an implementation that is not compatible with Java? For example, standard mscorlib? For this, we need a mechanism for replacing assemblies. By default, at the moment, mscorlib is replaced with a special implementation that uses Java mechanisms to work. You can also specify other assemblies, for substitution using the –r compilation key. In short, it works as follows: when Mono.Cecil starts looking for where to load the assembly, it addresses this issue to the AssemblyResolver, which was passed to it as a parameter when reading the original assembly. The compiler's AssemblyResolver first looks for an assembly with the same name in previously loaded ones; if it doesn’t find it, then it looks for substitution in the lists. If there is, it loads and returns the assembly indicated in the list for spoofing. If, however, it is not in the lists for substitution, then the standard assembly is loaded by standard means.

Before the second compilation stage, the precompilation stage takes place, in which the compiler performs additional type processing to prepare them for direct compilation. For example, it is at this stage that methods are added that are not explicitly called anywhere, but are overloaded virtual methods of explicitly used methods.

And actually the third stage is the most basic. The transformation of meta-information is a fairly simple and straightforward process. The only thing I would like to note is that all types are declared as a result with public access due to incompatibility of the Java and CIL visibility levels. Global types in Java can have either public access or access only from the package (namespace). And nested types in Java, having for example a closed level of visibility, cannot be used at all outside the type in which they are declared. If it addresses any member of this type from an external class, an exception will be thrown. So all types automatically become open.

But compiling the code is a more complex and lengthy process, which is also divided into several stages. The first step is to build a code graph. This is done by the ICSharpCode.Decompiler library from ILSpy. In general, a graph is almost ready for compilation, but nevertheless, several additional transformations are performed. For example, the pseudo CompoundAssignment instructions generated by ICSharpCode.Decompiler are inversely converted. Well, after that, the Java bytecode is actually compiled.

This is how the compiler works. Now I’ll talk in more detail about some aspects of the work and how support for certain things is implemented.

Generics

From the point of view of JVM - generics do not exist. Generics in Java is just a compiler extension, and generics from the point of view of the JVM is a regular object like java.lang.Object. Generics in the CLI are compiled at runtime. This means that when the compiler encounters a generic, it substitutes the real type in its place and, in fact, creates a new type or method based on the original one. CIL2Java acts in the same way, skipping methods and types that have jerick parameters and creates them only when it encounters a link indicating which types to replace these parameters with.

Significant Types

This is probably one of the main reasons why .NET / Mono is better than Java in disputes, which is better. Yes, Java does not support these types. Therefore, all significant types are compiled as pointer types. But that would not be a problem due to the difference in the behavior of significant and pointer types, the behavior of significant types is emulated. First, a constructor without parameters is generated, and three internal methods are added:

c2j __ $ __ ZeroFill () - fills with zeros the contents of type
c2j __ $ __ CopyTo (ValueType) - copies the contents of the source type to the specified
c2j __ $ __ GetCopy () - creates a new instance of the type, and copies the data from the original into it

Using these three methods, the behavior of significant types is fully emulated. For example, the code is "Foo (valType);" will be converted to "Foo (valType.c2j __ $ __ GetCopy ());" and a valType copy will be passed to the Foo method.

Also, for the correct operation, all significant types are automatically initialized by the default constructor in the constructors and at the very beginning of the methods (prolog).

Thus, the main advantage of these types is that when used correctly, they increase the speed of the application, it’s not just lost, but, on the contrary, their use will slow down the application.

Transfers

In .NET / Mono, enumerations are essentially meaningful types, but with additional restrictions. They cannot have any methods, just one non-static field of a primitive type (int, short, etc.) named “value__” and static fields of the type of the enumeration itself.

When compiling, instead of the enumeration type, its base type is substituted. That is, the method "void Foo (EnumType val);" after compilation it will become “void Foo (int val);”.

Packaging

Packaging of significant types is divided into three categories: packaging of primitive types, packaging of significant types and packaging of listings.

Primitive type packaging is implemented in two ways: packaging into CIL types or packaging into Java types. In the first case, standard types for CIL from the System namespace (System.Int32, System.Single, etc.) are used as types for packaging. In the second - standard types for Java (java.lang.Integer, java.lang.Float, etc.)

In case of packing into CIL types, we save information about unsigned types and a code of the form “uintType.ToString ()” will have the correct result. However, when passing such parameters to Java, to methods where you want to pass a packed primitive type (for example java.lang.reflect.Method.invoke), the compiler will need to generate repackaging code (although this function is not in the compiler for now), and thus performance drop.

In the case of packaging in Java types, the opposite is true. The code “uintType.ToString ()” will give an incorrect result if the uintType value is more than 2 147 483 647, but there will be no unnecessary repackaging from CIL to Java and vice versa. Which method to apply is up to you. The compilation parameter box is responsible for this. By default, packaging is done in CIL types.

With packaging of significant types, everything is simpler. We take a copy of the type and just pass it on. It is already in fact, after compilation, becomes a pointer type.

But the listings are packed in their true type. That is, if there is an enumeration of type EnumType that has the base type int, then, as mentioned above, when compiling, instead of EnumType, the type int is substituted. But in the case of packaging, an object of the EnumType type will be created, and the value of this enumeration will be put in its value__ field. Thus, the type information will be saved.

Pointers

As already mentioned, the compiler does not support unsafe pointers. But the link transfer works quite successfully. If a value is passed to the method by reference, the type of this parameter will be the type CIL2Java.VES.ByRef [type], where [type] is the type to which the link is created (possible values: Byte, Short, Int, Long, Float, Double, Bool, Char, Ref). Separate types for primitive types are necessary in order not to pack / unpack them with each call. The link type itself is an abstract class with two abstract methods: get_Value and set_Value to get and set the value by reference, respectively. This is how it looks:

public abstract class ByRef[type]
{
    public abstract [type] get_Value();
    public abstract void set_Value([type] newValue);
}

When creating a reference to a value, an object is created that implements the corresponding abstract class. And it implements, depending on where the value to which we create the link is

stored : LocalByRef [type] - a link to a local variable or method parameter. It simply stores the value until it leaves the called place, after which the value of the variable or parameter is restored.
Take the following code:

public class Program
{
    public static void Foo(ref int refValue)
    {
        refValue = 10;
    }
    public static void  Main(string[] args)
    {
        int localVar = 0;
        Foo(ref localVar);
    }
    // Точка входа для Java
    public static void main(string[] args)
    {
        Main(args);
    }
}

After compilation, the code will look like this:

public class LocalByRefInt : ByRefInt
{
    private int value;
    public LocalByRefInt(int initialValue) { value = initialValue; }
    public override int get_Value() { return value; }
    public override void set_Value(int newValue) { value = newValue; }
}
public class Program
{
    public static void Foo(ByRefInt refValue)
    {
        refValue.set_Value(10);
    }
    public static void  Main(string[] args)
    {
        int localVar = 0;
        LocalByRefInt tmpByRef = new LocalByRefInt(localVar);
        Foo(tmpByRef);
        localVar = tmpByRef.get_Value();
    }
    // Точка входа для Java
    public static void main(string[] args)
    {
        Main(args);
    }
}

FieldByRef [type] - a reference to the field of the object. It is realized by the forces of reflection. This is what this type looks like after compilation:

public class FieldByRef[type] : ByRef[type]
{
    private object target;
    private java.lang.reflect.Field field;
    private [type] value;
    public FieldByRefInt(object target, Field targetField)
    {
        this.target = target;
        this.field = targetField;
        paramField.setAccessible(true);
        this.value = targetField.get[type](target);
    }
    public [type] get_Value()
    {
        return this.value;
    }
    public void set_Value([type] newValue)
    {
        this.field.set[type](this.target, newValue);
        this.value = newValue;
    }
}

ArrayByRef [type] - a reference to an array element. Everything is simple here - we save the array itself (which is a pointer type) and the index in this array. This is how it looks after compilation:

public class ArrayByRef[type] : ByRef[type]
{
    private [type][] array;
    private int index;
    private int value;
    public ArrayByRefInt([type][] paramArray, int index)
    {
        this.array = paramArray;
        this.index = index;
        this.value = paramArray[index];
    }
    public int get_Value()
    {
        return this.value;
    }
    public void set_Value(int newValue)
    {
        this.array[this.index] = newValue;
        this.value = newValue;
    }
}

Pointers to methods and delegates

This is what I miss most in Java. One way to implement method pointers is reflection. But I did not like this option because it requires the packaging of parameters, which reduces performance. Thus, the second method was used.

In the following description, I will use this example:

using System;
namespace TestConsole
{
    public delegate void Deleg(int f);
    public class Program
    {
        public void Foo(int f)
        {
            Console.WriteLine(f);
        }
        public static void  Main(string[] args)
        {
            Program p = new Program();
            Deleg d = new Deleg(p.Foo);
            d(10);
        }
        // Точка входа для Java
        public static void main(string[] args)
        {
            Main(args);
        }
    }
}

The method is that if we encounter the ldftn or ldvirtftn instructions, then an interface is first generated in the CIL2Java.VES.MethodPointers namespace with a name that depends on the method signature and with a single invoke method that has almost the same signature as the method to which we we get the pointer by adding the first parameter a link to the object in which you need to call the method. In our example, such an interface would look like this:

public interface __void_int
{
    void invoke(object target, int param);
}

Then, each ldftn or ldvirtftn instruction generates a nested type that implements the method pointer interface. The invoke method simply calls the method to which the instruction receives a pointer. In the above example, it looks like this:

public class C2J_anon_0 : __void_int
{
    public void invoke(object target, int paramInt)
    {
        ((Program)target).Foo(paramInt);
    }
}

And already in the delegate constructor, an instance of this class is passed as a pointer to a method.

After compilation, the delegate takes the following form:

public sealed class Deleg : MulticastDelegate
{
    public Deleg(object target, __void_int method_pointer)
        : super(paramObject, method_pointer)
    {
    }
    public sealed void Invoke(int paramInt)
    {
        ((__void_int)this.method).invoke(this.target, paramInt);
        if (this.next != null)
            ((Deleg)this.next).Invoke(paramInt);
    }
}

This is the default compiler behavior. As you can see, the signature of the delegate constructor is changed - the last parameter has the interface type of the method pointer, and not native int as it is necessary by standard. This is done again for optimization. However, you can tell the compiler to compile method pointers according to the standard using the "-method_pointers standart" parameter. In this case, the creation of the delegate in our example takes the form:

Deleg d = new Deleg(p, Global.AddMethodPointer("TestConsole.Program$C2J_anon_0"));

And the delegate himself becomes like this:

public sealed class Deleg : MulticastDelegate
{
    public Deleg(object target, int paramInt)
        : base(target, Integer.valueOf(paramInt));
    {
    }
    public sealed void Invoke(int paramInt)
    {
        ((__void_int)Global.GetMethodPointer(((Integer)this.method).intValue())).invoke(this.target, paramInt);
        if (this.next != null)
            ((Deleg)this.next).Invoke(paramInt);
    }
}

As you can see, in this case, the method pointer is of type int, but in fact, this is just an index in the global list of method pointers. In this way, we adhere to the standard, but lose in performance.

yield return / break

There is nothing honest to tell. It just works.

Async / await

There is also nothing special to tell. Code using async / await compiles, but does not work. It does not work because there is no implementation of the types necessary for the operation (System.Threading.Tasks.Task, System.Runtime.CompilerServices.AsyncTaskMethodBuilder and so on)

Unsigned numbers

Support for unsigned numbers in the compiler is available, but is included separately by the "-unsigned" parameter. The article http://habrahabr.ru/post/225901/ by elw00d was very helpful in the implementation . In general, this article describes everything and all operations with unsigned numbers were done on this article.

Exceptions

In general, exceptions in Java and CIL are very similar. While the exception filters are not supported (ICSharpCode.Decompiler does not support them).

Additionally, a mechanism for linking Java and CIL exception types has been added. For example, CIL has a System.ArithmeticException exception. Java has its own type java.lang.ArithmeticException. How to make sure that catching a System.ArithmeticException is caught in the same way as java.lang.ArithmeticException? To do this, a JavaExceptionMapAttribute attribute will be introduced that tells the compiler a similar exception in Java. And when the compiler encounters a System.ArithmeticException catch, it also adds a catch to a similar Java exception. The only condition that is added is that an additional constructor must be introduced in System.ArithmeticException that accepts only one parameter of the java.lang.ArithmeticException type so that an instance of an exception of the same type is passed to the interceptor.

Debugging

The compiler supports the generation of debugging information (if it is in the source assemblies) by specifying the -debug compilation key. Here is an example of how a test application is being debugged in Eclipse:

Type substitution

This mechanism was created so that types having similar ones in Java could be converted into these very analogues when compiling. An example of this type is System.String. In the mscorlib implementation, this type is marked with the TypeMapAttribute attribute, and when compiled it turns into java.lang.String. The substitution of individual methods is also possible. To do this, they must be marked with the MethodMapAttribute attribute.

Conclusion

That's all. This is only an alpha version of the project, and while the stability of the work leaves much to be desired. So a further vector of work is to improve stability and implement a standard library. Thank you for reading to the end.

Tags: