
Future WinRT or Going Native 2.0
- Transfer
Alexandre Mutel, the creator of the fastest and most complete .NET wrapper for DirectX, the only one supporting Windows 8 Metro, works as a R&D developer of the game engine at SiliconStudio, a member of the French demo group FRequency.
Recently, we hear a lot of buzz about the return of the idea of “Going Native” after the era of managed languages such as Java and .NET. Last year, when WinRT was just introduced, not-so-distant comments began to appear that claimed that .NET was dead and C ++ was returning in all its glory - the true and only true way to develop applications, while JIT was starting more and more appear in the world of scripting languages (JavaScript most actively uses the advantages of JIT). Any code one way or another will become native before execution - the only difference is the length of the path that it goes to become native, and how optimized it will be. The meaning of the word “native” has changed slightly and has become inextricably linked with the word “performance”. Even as a strong managed language promoter [C #], its performance is actually below a well-written C ++ application. It turns out that we should just accept this fact and return to C ++, when such things as WinRT will be the basis for cross-language interaction for us? In truth, I would like .NET to die, and this post is about why and why.
Let's review the recent history of development in managed languages and highlight current issues. Remember the slogan Java? "Write once runs everywhere." This was the introduction of a new paradigm when a completely “secure” language based on a virtual machine, associated with a rich set of APIs, would provide an opportunity to easily develop applications for any OS and platform. This was the beginning of the era of controlled languages. While Java was quite successfully adopted in various development industries, it was also rejected by many developers who were aware of the peculiarities of memory management, the insufficiently optimized JIT (although everything has improved significantly since then), a huge number of poor architectural solutions, such as the lack of support structures, direct access to memory,they recently considered removing all native object types and making all objects — what a terrible idea!).
Also, Java could not fulfill the promise made in the slogan itself - in fact, it is impossible to cover all the capabilities of each platform with a single API, which led to things like Swing, to put it mildly, not the most optimal UI framework. Also, Java was originally designed for a single programming language, although many saw in JIT and bytecode the ability to port scripting languages to the Java JVM.
At the beginning of the era of managed languages, Microsoft tried to enter the Java market with its own extensions for the language (everyone knows about the end of this story) and eventually acquired its own platform for managed languages, which in some aspects was better designed and built: starting with bytecode, the keyword unsafe, calling native code, lightweight but very effective JIT and NGEN, the fast-growing C #, C ++ / CLI language, etc. Initially, considering interlanguage interaction and without the burden of the Java slogan (although Silverlight on MacOS or Moonlight was not bad Attempts).
Both platforms used a similar monolithic stack: metadata, bytecode, JIT and the garbage collector - all this is closely related. Accordingly, there were similar performance problems: JIT implies a delay at startup, and the code execution is not as fast as it should be. Main reasons:
But even with such poor performance, a managed ecosystem with a universal framework is the king of productivity and cross-language interaction, with decent overall performance for all supported languages. The climax of the era of managed languages was probably the launch of WindowsPhone and VisualStudio 2010 (which used WPF to render the interface, although WPF itself worked on top of a decent amount of native code). Managed languages were the only allowed way to develop applications at that time. This was not the best that could happen, given the long list of unresolved issues with .NET performance, long enough to encourage “native developers” to strike back, and they had every right to do so.
It turned out that this means in a sense, abandoning .NET. I don’t know much about Microsoft’s internal kitchen, but judging by the frequent reports, there is a strong confrontation between departments. Good or bad, but for .NET in recent years it seems that Microsoft is running out of steam (for example, there are practically no significant improvements in JIT / NGEN, many unresolved requests for performance improvements, including things like the SIMD developers are waiting for already a very long time). And it seems to me that all these changes are possible only if .NET is a global strategy and with the strong support and participation of all departments.
At the same time, Google began to promote its NativeClient technology, which allows you to run native code in the sandbox directly from the browser. Last year, following the Going Native trend, Microsoft announced that even HTML5 designed for the next IE would be native! Sic.
In “ Reader Q&A: When will better JITs save managed code? ” Herb Sutter, one of the Going Native evangelists, provides some interesting insights into the philosophy of “Going Native” thinking of JIT (“ Can JITs be faster? ” a post by Miguel de Icaza) with many inaccurate facts, but let's just look at the key: even if the JIT gets better in the future, managed languages have already made the choice between performance and security in favor of security.
And at that moment WinRT appears, which smoothes out sharp corners a bit. Using part of the .NET philosophy (metadata and some common types such as strings and arrays) and the good old COM model (as a common denominator for native interlanguage interaction), WinRT tries to solve the problems of language interaction outside the CLR world (which means there is no performance loss for C ++) and provide a more modern API for the OS. Is this the answer to the main question of life, the universe and all that? Not really. For WinRT, they chose a course towards a clear convergence of technologies that could potentially lead to great things, but so far there is no certainty in choosing the right path. But what could be this “right way”?
Security checks can have a negative impact on performance, but managed code is not doomed to run all its life over a slow JIT (for example, Mono can run C # code natively compiled through LLVM on iOS / Linux) and it would be quite easy to extend the bytecode with unsafe instructions so that Provide controlled performance improvements (such as overriding array bounds checking)
But the most obvious problem now is the lack of a strong infrastructure for cross-language compilers. Starting with the compiler used in IE 10 JavaScript JIT, .NET JIT and NGEN compilers, the Visual C ++ compiler (and many others) - they all use different code for almost the same time-consuming and complex task - generating efficient machine code. Having a single compiler at your disposal is a very important step to provide high-performance code available for all languages.
Felix9 on Channel9 discoveredthat Microsoft can actually work on this issue. This is definitely good news, but the issue of “performance for all” is only a small part of the picture. In fact, the “right path” mentioned earlier is a wider integrated architecture, not only an improved LLVM stack, but supported by Microsoft's many years of experience in various fields (C ++ compiler, JIT, garbage collector, metadata, etc.) system, which will provide a fully extensible and modular "CLR" consisting of:
The idea is very close to the CLR stack, however, it does not force applications to run on top of the JIT compiler (yes, .NET has NGEN, but it was designed to speed up loading, not to speed up overall work, in addition it is a black box and it only works with assemblies installed in the GAC) and allows mixed strategies for allocating memory: using the garbage collector and without it.
In such a system, interlanguage interaction will be simpler without sacrificing performance for the sake of simplicity and vice versa. Ideally, the OS itself should be built on the basis of a similar architecture. Perhaps this idea was (is there?) The basis of projects such as Redhawk (this is about the compiler) or Midori(as for the OS). In such an integrated system, perhaps only drivers will require direct access to the hardware.
Felix9 also unearthed that an intermediate bytecode, lower level than MSIL (.NET bytecode), called MDIL, can already be used and it may be just that intermediate bytecode described earlier. Although, if you look at the corresponding patent " INTERMEDIATE LANGUAGE SUPPORT FOR CHANGE RESILIENCE ", then in the specification you can find x86 instructions that do not quite fit the definition of an architecture-independent bytecode. Perhaps they will leave MSIL unchanged and use MDIL at a lower level. We will find out soon.
So, what problems does WinRT solve from this point of view? Metadata, a bit of API supporting sandboxes and interlanguage interaction in its infancy (although there are common data types and metadata). As you can see, not a lot, a kind of COM ++. It is also obvious that WinRT does not provide advanced optimizations when we use its API.. For example, we are not allowed to have a structure with embedded methods. Each method call in WinRT is a virtual call that will go through the table of virtual methods (and in some cases several virtual calls are required when, for example, the static method is used). The simplest read-write properties require a virtual call. This is clearly inefficient. Apparently WinRT is aimed only at higher-level APIs, not allowing scenarios in which we would like to use high-performance code wherever possible, bypassing the layer of virtual calls and non-embedded code. As a result, we have an extended COM model - this is not exactly what could be called “Building the Future”.
A language like C # would be an ideal candidate for such a modular CLR system, and could easily be ported to an existing intermediate bytecode. But in order to effectively use such a system, C # must be improved in several aspects:
In addition to performance, there are other equally important areas:
The main idea is that you need to add less to C # than to remove from C ++ in order to fully utilize the capabilities of such an integrated system, to increase the productivity of the developer and without associated performance losses . Some might argue that C ++ already offers all this and more, but that is why C ++ is so cluttered (in terms of syntax) and dangerous for most developers. It allows unsafe code absolutely everywhere, while in every application there are well-defined places where it is really needed (which leads to memory problems that are easier to fix if these places were explicitly indicated in the code, as is done with the key asm). It's much easier and safer to keep track of such areas in your code than to have them everywhere.
We hope that Microsoft has chosen the path from general to private and started with the release of WinRT, which provides a universal API for all languages and simple cross-language interaction. And that then they will present all these more advanced features in the next versions of their OS. But this is an ideal situation and it will be interesting to see if Microsoft can handle this. Even if we recently announced that .NET applications in WP8 will have the benefits of compiling in the cloud, we still know little about it: it’s just an adapted NGEN (which, I recall, is not performance-oriented and generates code very similar to the one generates JIT) or an unreleased RedHawk compiler?
Microsoft probably has something in the blank, given the many years of developing the C ++ compilers, JIT, the garbage collector, and all the related R&D projects that they have ...
To summarize - .NET should die and give way to a more integrated, performance-oriented, a common environment where managed (security and productivity) and unmanaged (performance) are closely related. And this should be a structural part of the next round of WinRT development.
Recently, we hear a lot of buzz about the return of the idea of “Going Native” after the era of managed languages such as Java and .NET. Last year, when WinRT was just introduced, not-so-distant comments began to appear that claimed that .NET was dead and C ++ was returning in all its glory - the true and only true way to develop applications, while JIT was starting more and more appear in the world of scripting languages (JavaScript most actively uses the advantages of JIT). Any code one way or another will become native before execution - the only difference is the length of the path that it goes to become native, and how optimized it will be. The meaning of the word “native” has changed slightly and has become inextricably linked with the word “performance”. Even as a strong managed language promoter [C #], its performance is actually below a well-written C ++ application. It turns out that we should just accept this fact and return to C ++, when such things as WinRT will be the basis for cross-language interaction for us? In truth, I would like .NET to die, and this post is about why and why.
The era of controlled languages
Let's review the recent history of development in managed languages and highlight current issues. Remember the slogan Java? "Write once runs everywhere." This was the introduction of a new paradigm when a completely “secure” language based on a virtual machine, associated with a rich set of APIs, would provide an opportunity to easily develop applications for any OS and platform. This was the beginning of the era of controlled languages. While Java was quite successfully adopted in various development industries, it was also rejected by many developers who were aware of the peculiarities of memory management, the insufficiently optimized JIT (although everything has improved significantly since then), a huge number of poor architectural solutions, such as the lack of support structures, direct access to memory,they recently considered removing all native object types and making all objects — what a terrible idea!).
Also, Java could not fulfill the promise made in the slogan itself - in fact, it is impossible to cover all the capabilities of each platform with a single API, which led to things like Swing, to put it mildly, not the most optimal UI framework. Also, Java was originally designed for a single programming language, although many saw in JIT and bytecode the ability to port scripting languages to the Java JVM.
At the beginning of the era of managed languages, Microsoft tried to enter the Java market with its own extensions for the language (everyone knows about the end of this story) and eventually acquired its own platform for managed languages, which in some aspects was better designed and built: starting with bytecode, the keyword unsafe, calling native code, lightweight but very effective JIT and NGEN, the fast-growing C #, C ++ / CLI language, etc. Initially, considering interlanguage interaction and without the burden of the Java slogan (although Silverlight on MacOS or Moonlight was not bad Attempts).
Both platforms used a similar monolithic stack: metadata, bytecode, JIT and the garbage collector - all this is closely related. Accordingly, there were similar performance problems: JIT implies a delay at startup, and the code execution is not as fast as it should be. Main reasons:
- JIT produces insufficient optimizations compared to C ++ -O2 because it must generate code very quickly (also, unlike Java HotSpot JVM, .NET JIT cannot replace existing code with more optimized code on the fly).
- .NET types, such as Array, always do border checks on access (not counting simple loops, where the JIT can remove checks if the loop termination condition is less than or equal to the length of the array).
- The garbage collector stops all threads during the build (although the new garbage collector in .NET 4.5 is somewhat improved in this regard ), which can lead to unpredictable performance drops.
But even with such poor performance, a managed ecosystem with a universal framework is the king of productivity and cross-language interaction, with decent overall performance for all supported languages. The climax of the era of managed languages was probably the launch of WindowsPhone and VisualStudio 2010 (which used WPF to render the interface, although WPF itself worked on top of a decent amount of native code). Managed languages were the only allowed way to develop applications at that time. This was not the best that could happen, given the long list of unresolved issues with .NET performance, long enough to encourage “native developers” to strike back, and they had every right to do so.
It turned out that this means in a sense, abandoning .NET. I don’t know much about Microsoft’s internal kitchen, but judging by the frequent reports, there is a strong confrontation between departments. Good or bad, but for .NET in recent years it seems that Microsoft is running out of steam (for example, there are practically no significant improvements in JIT / NGEN, many unresolved requests for performance improvements, including things like the SIMD developers are waiting for already a very long time). And it seems to me that all these changes are possible only if .NET is a global strategy and with the strong support and participation of all departments.
At the same time, Google began to promote its NativeClient technology, which allows you to run native code in the sandbox directly from the browser. Last year, following the Going Native trend, Microsoft announced that even HTML5 designed for the next IE would be native! Sic.
In “ Reader Q&A: When will better JITs save managed code? ” Herb Sutter, one of the Going Native evangelists, provides some interesting insights into the philosophy of “Going Native” thinking of JIT (“ Can JITs be faster? ” a post by Miguel de Icaza) with many inaccurate facts, but let's just look at the key: even if the JIT gets better in the future, managed languages have already made the choice between performance and security in favor of security.
And at that moment WinRT appears, which smoothes out sharp corners a bit. Using part of the .NET philosophy (metadata and some common types such as strings and arrays) and the good old COM model (as a common denominator for native interlanguage interaction), WinRT tries to solve the problems of language interaction outside the CLR world (which means there is no performance loss for C ++) and provide a more modern API for the OS. Is this the answer to the main question of life, the universe and all that? Not really. For WinRT, they chose a course towards a clear convergence of technologies that could potentially lead to great things, but so far there is no certainty in choosing the right path. But what could be this “right way”?
Going Native 2.0 - performance for everyone
Security checks can have a negative impact on performance, but managed code is not doomed to run all its life over a slow JIT (for example, Mono can run C # code natively compiled through LLVM on iOS / Linux) and it would be quite easy to extend the bytecode with unsafe instructions so that Provide controlled performance improvements (such as overriding array bounds checking)
But the most obvious problem now is the lack of a strong infrastructure for cross-language compilers. Starting with the compiler used in IE 10 JavaScript JIT, .NET JIT and NGEN compilers, the Visual C ++ compiler (and many others) - they all use different code for almost the same time-consuming and complex task - generating efficient machine code. Having a single compiler at your disposal is a very important step to provide high-performance code available for all languages.
Felix9 on Channel9 discoveredthat Microsoft can actually work on this issue. This is definitely good news, but the issue of “performance for all” is only a small part of the picture. In fact, the “right path” mentioned earlier is a wider integrated architecture, not only an improved LLVM stack, but supported by Microsoft's many years of experience in various fields (C ++ compiler, JIT, garbage collector, metadata, etc.) system, which will provide a fully extensible and modular "CLR" consisting of:
- Intermediate intermediate language . Reflective, very similar to LLVM IR or .NET bytecode, defining common data types (primitives, strings, arrays, etc.). An API similar to System.Reflection.Emit should be available. Vectorized types (SIMDs) must be as basic as int and double. IL code should not be limited only to the CPU, but should also allow GPU computing (as AMP extensions for C ++ do). It should be possible to present the HLSL bytecode using this IL, taking advantage of the unified compiler infrastructure (see below). Typeless IL should also be available to make it easier to port dynamic programming languages to it.
- Dynamically linked libraries and executables, such as .NET assemblies that provide metadata, IL code that supports reflection. At design time, the code should communicate with assemblies (IL code), and not with legacy C / C ++ header files).
- Compiler from IL to machine codethat can be integrated into a JIT, desktop application or cloud compiler, or a combination of all of this. This compiler should provide vectorization as much as the target platform supports. IL code must be compiled into machine code during installation or deployment using information about the system architecture (during development, this can be done immediately after compilation to IL). Compilation steps should be accessible through the API and should provide extension points wherever possible (providing access to IL, optimizing IL, or embedding native transformations from IL into machine code). Optimization settings should range from fast compilation (like JIT) to aggressive optimization (pre-compiled applications or hot-swapping code in JIT for a more productive one). An application profile can also be used to automatically fine-tune localized optimizations. This compiler should support advanced JIT usage scenarios, such as dynamic code analysis, On Stack Replacement (OSR, which allows replacing code for complex calculations with more optimal code right at runtime), unlike the current .NET JIT compiler, which compiles the method in time of its first launch. Optimizations of this kind are very important in dynamic scenarios when type inference occurs after compilation (as in the case of Javascript). which allows replacing code for complex calculations with more optimal code directly at runtime), unlike the current .NET JIT compiler, which compiles the method the first time it is run. Optimizations of this kind are very important in dynamic scenarios when type inference occurs after compilation (as in the case of Javascript). which allows replacing code for complex calculations with more optimal code directly at runtime), unlike the current .NET JIT compiler, which compiles the method the first time it is run. Optimizations of this kind are very important in dynamic scenarios when type inference occurs after compilation (as in the case of Javascript).
- An extensible component for memory allocation, allowing parallel memory allocation. The garbage collector will be one of the possible implementations . Most applications will use it for most objects, while the most performance-critical objects will use other memory allocation strategies (such as link counting used in COM / WinRT). There should be no restrictions on the use of several memory allocation strategies in one application (this is exactly what happens in .NET when an application has to resort to using native function calls to create objects outside of the CLR).
The idea is very close to the CLR stack, however, it does not force applications to run on top of the JIT compiler (yes, .NET has NGEN, but it was designed to speed up loading, not to speed up overall work, in addition it is a black box and it only works with assemblies installed in the GAC) and allows mixed strategies for allocating memory: using the garbage collector and without it.
In such a system, interlanguage interaction will be simpler without sacrificing performance for the sake of simplicity and vice versa. Ideally, the OS itself should be built on the basis of a similar architecture. Perhaps this idea was (is there?) The basis of projects such as Redhawk (this is about the compiler) or Midori(as for the OS). In such an integrated system, perhaps only drivers will require direct access to the hardware.
Felix9 also unearthed that an intermediate bytecode, lower level than MSIL (.NET bytecode), called MDIL, can already be used and it may be just that intermediate bytecode described earlier. Although, if you look at the corresponding patent " INTERMEDIATE LANGUAGE SUPPORT FOR CHANGE RESILIENCE ", then in the specification you can find x86 instructions that do not quite fit the definition of an architecture-independent bytecode. Perhaps they will leave MSIL unchanged and use MDIL at a lower level. We will find out soon.
So, what problems does WinRT solve from this point of view? Metadata, a bit of API supporting sandboxes and interlanguage interaction in its infancy (although there are common data types and metadata). As you can see, not a lot, a kind of COM ++. It is also obvious that WinRT does not provide advanced optimizations when we use its API.. For example, we are not allowed to have a structure with embedded methods. Each method call in WinRT is a virtual call that will go through the table of virtual methods (and in some cases several virtual calls are required when, for example, the static method is used). The simplest read-write properties require a virtual call. This is clearly inefficient. Apparently WinRT is aimed only at higher-level APIs, not allowing scenarios in which we would like to use high-performance code wherever possible, bypassing the layer of virtual calls and non-embedded code. As a result, we have an extended COM model - this is not exactly what could be called “Building the Future”.
Productivity and Performance for C # 5.0
A language like C # would be an ideal candidate for such a modular CLR system, and could easily be ported to an existing intermediate bytecode. But in order to effectively use such a system, C # must be improved in several aspects:
- More unsafe constructions , when we could turn off “managed” behavior like checking the boundaries of arrays (like “super unsafe mode”, when we could use cache instructions in the CPU to access the elements of the array, this kind of “advanced” things are now impossible to do with managed arrays without using undocumented tricks).
- A configurable new operator that supports different memory allocation schemes.
- Vectorized types (like float4 in HLSL) should be added to the base types. This has been asked for a long time (with terrible patches in XNA WP to "solve" this problem).
- Lightweight interaction with native code : in the current state, the transition from managed to unmanaged code is quite costly even without passing any parameters. Going to unmanaged code should be possible without the x86 / x64 prolog / epilog instructions that are now generated in the .NET JIT.
In addition to performance, there are other equally important areas:
- Generics are everywhere - in constructors and implicit type conversions, with more advanced constructs (contracts for operators, etc.), closer to the flexibility of C ++ templates, but more secure and less cluttered.
- Inheritance and finalizers in structures (to allow lightweight code to execute when the method completes without using bulky patterns like try / finally and using).
- More metaprogramming . Extension methods for static types, impurities (adding the contents of a class inside another class, convenient for things like mathematical functions), modifying classes / types / methods at compile time (for example, methods that would be called at compile time to add other methods or properties to to the class, something similar to eigenclass in Ruby , instead of using T4 templates to generate code).
- A built-in literal or type to express a reference to a language object (class, property, method) using a simple construction like: symbol LinkToMyMethod =
@ MyClass.MyMethod; instead of using Linq expressions. This would make the code more robust for things like INotifyPropertyChanged or simplify all property-based systems like WPF (which in the current state contains a lot of duplicate code).
The main idea is that you need to add less to C # than to remove from C ++ in order to fully utilize the capabilities of such an integrated system, to increase the productivity of the developer and without associated performance losses . Some might argue that C ++ already offers all this and more, but that is why C ++ is so cluttered (in terms of syntax) and dangerous for most developers. It allows unsafe code absolutely everywhere, while in every application there are well-defined places where it is really needed (which leads to memory problems that are easier to fix if these places were explicitly indicated in the code, as is done with the key asm). It's much easier and safer to keep track of such areas in your code than to have them everywhere.
What next?
We hope that Microsoft has chosen the path from general to private and started with the release of WinRT, which provides a universal API for all languages and simple cross-language interaction. And that then they will present all these more advanced features in the next versions of their OS. But this is an ideal situation and it will be interesting to see if Microsoft can handle this. Even if we recently announced that .NET applications in WP8 will have the benefits of compiling in the cloud, we still know little about it: it’s just an adapted NGEN (which, I recall, is not performance-oriented and generates code very similar to the one generates JIT) or an unreleased RedHawk compiler?
Microsoft probably has something in the blank, given the many years of developing the C ++ compilers, JIT, the garbage collector, and all the related R&D projects that they have ...
To summarize - .NET should die and give way to a more integrated, performance-oriented, a common environment where managed (security and productivity) and unmanaged (performance) are closely related. And this should be a structural part of the next round of WinRT development.