Algorithms and solutions for developing the JavaScript engine in C #
Hello, dear Khabrovites! 
A little less than a year ago, in the sandbox, I published an article about the beginning of the development of the JavaScript engine in C # . A year has passed since the creation of the project and I am pleased to present you the first version of this creation, which can be downloaded on nuget .
But in this article I will not be promoting, making comparisons with competitors, measuring performance and more. Here I will write about what I had to go through, what kind of blood was given and what I had to face.
Typing conflict
This problem was one of the first. And it follows from the fact that at the very beginning the goal was set to provide easy interaction with .NET classes. As a result, all built-in types of the standard JavaScript library are implemented by the same type in C #.
Everyone knows that in JavaScript you can call any function in the context of any object. Some of these cases are even described in the specification. For example, what happens if the arguments object that stores the arguments to the function call and their number is sent to some function from Array.prototype ? That's right, it will work as with a regular array and will return the correct result. And how to implement this in C # ?! It turned out there is such a way:
Now we can call this delegate by passing the object of any reference type as the first argument.
As you can see, there is no magic here, standard documented functions are used. But the creators of the platform obviously did not prepare for what comes after this - Visual Studio cannot determine the type and does not even try to call ToString when debugging , and the is statement returns true when checking for the type that declared the method, even if it is being checked their object is related only by inheritance from object . Saves GetType , which does not cheat and returns the real type.
Understanding the dangers of such an operation, I added the AllowUnsafeCall attributewhose constructor requires you to explicitly specify an alternative type that you are ready to receive inside the method marked by it (the heirs of the specified type will also come).
Sparse Array (Sparse Array)
JavaScript arrays have the wonderful property of storing elements with indices 0, 1, 2, and 10005000, not occupying several gigabytes of memory, and even sorting out existing indexes in ascending order, which is not characteristic of hash tables that you might think about. In search of a suitable data structure having such properties, and even fast, a bunch of books and sites were shoveled. At some point, all this hash of information degenerated into a "prefix binary search tree with optimization for the processor cache." It sounds loud and long, but in fact, the implementation of this structure takes a little more than 100 lines of code.
To store objects, three fields are created: one array for the "tree node", the second array for the values themselves and the integer number of selected elements.
Each node has 4 fields:
By default, we assume that an object with index 0 has already been added to the array.
Addendum:
Receiving:
Here it is already without formalization, since it is the same as adding, with the only difference being that there is no step 3, in the second step we return the value, not write it, and if the element does not have the desired child, we process the absence of the requested value , but do not add a new one.
During the experiments, it turned out that this structure (and this is pure binary tree) does not work much slower than the hash table. At the same time, there is one peculiarity: if, before the formal execution of the algorithm for adding or receiving, you “step” into a random element, then there is a chance that it will lead us to the desired element in fewer steps (here the value of the bit index that was processed during adding this item stored in field 4). Another nice feature was that when elements are added sequentially, they are stored by indexes that exactly match their keys, which reduces the search time to O (1).
Here is the structure. When I implemented it, the transit time of SunSpider 0.9.1 decreased by 10%. In my opinion, this is not bad.
Rope String (Rope string)
On a habr already wrote about it , therefore here I will write only idea. When string concatenation occurs in .NET, a new string is created, the value of which is first copied from the first string, then from the second. On the face of O (n + m). Rope String is an object that stores references to the original objects, and does not copy their value. In essence, it is a promise that these two lines, if required, will be one line. It turns out O (1). Of course, if for each such object we call ToString, which, in this case, will take O (n + m), we won’t get a win, but when they create Rope from Rope and Rope from Rope ... you can connect them in linear time using StringBuilder. When I added this solution, the time for the same SunSpider'e decreased by 5 times.
Type prediction
This optimization is specific to the dynamic nature of JavaScript. Again an example:
Which of the many "+" implementations should be called in the third line? Well, of course, adding up numbers is not worth thinking here. But what is obvious to a person is not always clear to the machine. And so when the execution reaches the third line, the virtual machine should think and “walk around” the huge mapping table. Type prediction is an attempt to teach a computer to analyze operations a little closer to how a person does it. Therefore, in the third line the operator of addition of numbers will be called, which will only follow if we were mistaken in the prediction and will call the standard implementation if this all the same happened. Now this optimization is still in the process of integration, but already when replacing the operators "+", "<", ">", "<=", "> ="
Separately, I will mention once again about concatenation. Now this effect is overlapped by Rope String, but sometime, replacing
On the multiple line merge operator (with one pass through all the elements), I got double acceleration.
In conclusion, a little about the project. The engine has remained interpretive. There were attempts to add JIT using System.Linq.Expressions, but for some reason this solution only led to a slowdown, therefore this direction was abandoned.
A little less than a year ago, in the sandbox, I published an article about the beginning of the development of the JavaScript engine in C # . A year has passed since the creation of the project and I am pleased to present you the first version of this creation, which can be downloaded on nuget .
But in this article I will not be promoting, making comparisons with competitors, measuring performance and more. Here I will write about what I had to go through, what kind of blood was given and what I had to face.
Typing conflict
This problem was one of the first. And it follows from the fact that at the very beginning the goal was set to provide easy interaction with .NET classes. As a result, all built-in types of the standard JavaScript library are implemented by the same type in C #.
Everyone knows that in JavaScript you can call any function in the context of any object. Some of these cases are even described in the specification. For example, what happens if the arguments object that stores the arguments to the function call and their number is sent to some function from Array.prototype ? That's right, it will work as with a regular array and will return the correct result. And how to implement this in C # ?! It turned out there is such a way:
- Through Reflection, get MethodInfo of the method you want to call.
- Get MethodHandle.GetFunctionPointer ()
- Through the Activator to create a delegate with the arguments of the types used in the original function, but add the first argument of type object .
Now we can call this delegate by passing the object of any reference type as the first argument.
As you can see, there is no magic here, standard documented functions are used. But the creators of the platform obviously did not prepare for what comes after this - Visual Studio cannot determine the type and does not even try to call ToString when debugging , and the is statement returns true when checking for the type that declared the method, even if it is being checked their object is related only by inheritance from object . Saves GetType , which does not cheat and returns the real type.
Understanding the dangers of such an operation, I added the AllowUnsafeCall attributewhose constructor requires you to explicitly specify an alternative type that you are ready to receive inside the method marked by it (the heirs of the specified type will also come).
Sparse Array (Sparse Array)
JavaScript arrays have the wonderful property of storing elements with indices 0, 1, 2, and 10005000, not occupying several gigabytes of memory, and even sorting out existing indexes in ascending order, which is not characteristic of hash tables that you might think about. In search of a suitable data structure having such properties, and even fast, a bunch of books and sites were shoveled. At some point, all this hash of information degenerated into a "prefix binary search tree with optimization for the processor cache." It sounds loud and long, but in fact, the implementation of this structure takes a little more than 100 lines of code.
To store objects, three fields are created: one array for the "tree node", the second array for the values themselves and the integer number of selected elements.
Each node has 4 fields:
- The key used to access. Unsigned 32 bit integer
- Index of the first child (or zero, which is more appropriate in this case)
- Second child index
- Bit index Designed for optimization
By default, we assume that an object with index 0 has already been added to the array.
Addendum:
- Let i = 31;
- If the key is equal to the key of the current element, write the value to be added to the array of values at the index of the current element and exit.
- If the key of the current element is larger than the key to be added, write the key to be added to the current element, the value to be added to the array of values, then call the addition recursively with the previous key and value, and then exit.
- Take the i-th bit of the key. If it is 0, make the current element of the first descendant, otherwise the second descendant, reduce i by 1. If the element with index 0 is considered to be a descendant, add a new element, assign it an element of index 0 to be the descendants of the new element, make the key equal to the key to be added , index the bits equal to i and write the value. Otherwise, go to step 2.
Receiving:
Here it is already without formalization, since it is the same as adding, with the only difference being that there is no step 3, in the second step we return the value, not write it, and if the element does not have the desired child, we process the absence of the requested value , but do not add a new one.
During the experiments, it turned out that this structure (and this is pure binary tree) does not work much slower than the hash table. At the same time, there is one peculiarity: if, before the formal execution of the algorithm for adding or receiving, you “step” into a random element, then there is a chance that it will lead us to the desired element in fewer steps (here the value of the bit index that was processed during adding this item stored in field 4). Another nice feature was that when elements are added sequentially, they are stored by indexes that exactly match their keys, which reduces the search time to O (1).
Here is the structure. When I implemented it, the transit time of SunSpider 0.9.1 decreased by 10%. In my opinion, this is not bad.
Rope String (Rope string)
On a habr already wrote about it , therefore here I will write only idea. When string concatenation occurs in .NET, a new string is created, the value of which is first copied from the first string, then from the second. On the face of O (n + m). Rope String is an object that stores references to the original objects, and does not copy their value. In essence, it is a promise that these two lines, if required, will be one line. It turns out O (1). Of course, if for each such object we call ToString, which, in this case, will take O (n + m), we won’t get a win, but when they create Rope from Rope and Rope from Rope ... you can connect them in linear time using StringBuilder. When I added this solution, the time for the same SunSpider'e decreased by 5 times.
Type prediction
This optimization is specific to the dynamic nature of JavaScript. Again an example:
var a = 1;
var b = 2;
var c = a + b;
Which of the many "+" implementations should be called in the third line? Well, of course, adding up numbers is not worth thinking here. But what is obvious to a person is not always clear to the machine. And so when the execution reaches the third line, the virtual machine should think and “walk around” the huge mapping table. Type prediction is an attempt to teach a computer to analyze operations a little closer to how a person does it. Therefore, in the third line the operator of addition of numbers will be called, which will only follow if we were mistaken in the prediction and will call the standard implementation if this all the same happened. Now this optimization is still in the process of integration, but already when replacing the operators "+", "<", ">", "<=", "> ="
Separately, I will mention once again about concatenation. Now this effect is overlapped by Rope String, but sometime, replacing
"abc" + 1234 + 567On the multiple line merge operator (with one pass through all the elements), I got double acceleration.
In conclusion, a little about the project. The engine has remained interpretive. There were attempts to add JIT using System.Linq.Expressions, but for some reason this solution only led to a slowdown, therefore this direction was abandoned.