We compile C \ C ++ code in WebAssembly
- Transfer
- Tutorial
WebAssembly is a new binary format into which web applications can be compiled. It is designed and implemented right at the moment when you read these lines and the developers of all major browsers move it forward. Everything changes very quickly! In this article we will show the current state of the project with a fairly deep dive into the tools for working with WebAssembly.
In order for WebAssembly to work, we need two main components: tools for compiling code into a WebAssembly format binary and browsers that can download and execute this binary. Both that, and another is not completely created yet and very much depends on completion of work on the specificationWebAssembly, but in general these are separate components and their development goes in parallel. This separation is a good thing, it will allow compilers to create WebAssembly applications that can work in any browser, and browsers to run WebAssembly programs no matter what compiler they were created. In other words, we get open competition between development tools and browsers, which will continuously move all this forward, bringing an excellent choice to the end user. In addition, this separation allows the toolkit and browser development teams to work in parallel and independently.
The new WebAssembly toolkit project I want to talk about today is called Binaryen. Binaryen is a library for supporting WebAssembly in compilers, written in C ++. If you are not personally working on the WebAssembly compiler, then you probably do not need to know directly about Binaryen. If you use any WebAssembly compiler, then it probably uses Binaryen under the hood - we will look at the examples below.
The core of the Binaryen library is intended for parsing and generating WebAssembly, as well as representing its code as an abstract syntax tree (AST). Based on these features, the following useful utilities have been created:
About Binaryen you can see these slides .
I remind you once again that WebAssembly is under active design, which means that the input and output formats of Binaryen (.wast, .s) are not final. Binaryen is constantly updated to update the WebAssembly specification. The degree of cardinality of changes decreases over time, but of course no one can guarantee any compatibility.
Let's look at a few areas where Binaryen can be useful.
Compilation in WebAssembly using Emscripten
Emscripten can compile C code in asm.js, and Binaryen (using asm2wasm utility) can compile asm.js in WebAssembly, so the Emscripten + Binaryen bundle gives us a complete set of tools for compiling C and C ++ code in WebAssembly. You can run asm2wasm on asm.js code, but it's easier to let Emscripten do it for you, something like this:
Emscripten will compile file.cpp and the output will give you a JavaScript file, plus a separate .wast file with WebAssembly. Under the hood, Emscripten will compile the code in asm.js, run asm2wasm for it, and save the result. This is described in more detail on the Wiki of the Emscripten project.
But wait, what's the point of compiling something in WebAssembly if browsers don't support it yet? Great question! :) Yes, we still can not run this code in any browser. But we can already test something with it. So, we want to check if the correct binary created the Emscripten + Binaryen bundle. How to do this? To do this, we can use wasm.js, which Emscripten integrated into the output .js file received by the emcc invocation command (see above). wasm.js contains the Javascript Binaryen port, including the interpreter. If you run file.js (in node.js or in a browser), then you will get the result of executing WebAssembly. This allows us to actually confirm that the compiled WebAssembly binary is working correctly. You can look at an example of such a program, plus morea couple of examples are in the repository for test purposes.
Of course, we are not yet standing on solid ground with all these tools. The test environment is weird. C ++ code is compiled into WebAssembly and then executed in a WebAssembly interpreter, which itself is written in C ++ but ported to JavaScript. And there are so far no other ways to run it all. But we have several reasons to believe the results:
All this shows that we already have some result, we can compile C and C ++ code into WebAssembly and even somehow launch it.
Note that WebAssembly is just another new feature, and, distracted from it, everything else in Emscripten still works: Emscripten allows you to use libc and syscalls, OpenGL / WebGL code, browser integration, integration with node.js, etc. .d. As a result, projects that already use Emscripten will be able to switch to WebAssembly simply by adding a new command line parameter. And this will allow C ++ projects to be compiled into WebAssembly and work in browsers without any effort.
Using the new LLVM experimental backend for WebAssembly with Emscripten
We just saw a new important stage in the development of Emscripten, which gave him the opportunity to create WebAssembly modules and even test their work. But the work does not stop there: it was just using the current asm.js compiler, along with the asm2wasm utility. There is a new backend for LLVM for WebAssembly (or rather, not yet, but actively written) - right in the main LLVM development branch. And, although it is not yet ready for real use, over time it will become a very important tool. Binaryen supports its output format.
The LLVM backend for WebAssembly, like most LLVM backends, creates assembler code, in this case in a special .s format. This format is close to WebAssembly, but not directly identical to it - it is more like C compiler output (linear list of instructions, one instruction per line) than the abstract syntax tree of WebAssembly. This .s file can be converted to WebAssembly in a fairly trivial way (in general, Binaryen includes the s2wasm utility, which does just that - see how simple it is). You can run it by itself, or use Emscripten for this, which now supports the new WASM_BACKEND option, which you can use like this:
Note that you also need to use the BINARYEN option, since s2wasm is part of Binaryen. When all these options are specified, Emscripten uses a new backend for WebAssembly instead of using the asm.js. compiler. After calling the backend and receiving a file in .s-format from it, Emscripten will call s2wasm to convert to WebAssembly. A few examples of programs that you can already build in this way can be found on the Emscripten project wiki .
Thus, we have two ways to build a WebAssembly module using Binaryen:
The goal at the moment is to make the transition from the first method to the second as less difficult as possible. Ideally, everything should be reduced to replacing one argument on the command line.
Thus, we have a clear plan:
Each step gives new advantages to users (speed!) And practically does not cause difficulties for developers.
In conclusion, I want to say that although this article is written about Binaryen in the context of its use with Emscripten, it is still a separate library for WebAssembly for general use. If you have ideas for creating some tools for working with WebAssembly, you can take the Binaryen library and work with it without looking at Emscripten, LLVM or anything else.
In order for WebAssembly to work, we need two main components: tools for compiling code into a WebAssembly format binary and browsers that can download and execute this binary. Both that, and another is not completely created yet and very much depends on completion of work on the specificationWebAssembly, but in general these are separate components and their development goes in parallel. This separation is a good thing, it will allow compilers to create WebAssembly applications that can work in any browser, and browsers to run WebAssembly programs no matter what compiler they were created. In other words, we get open competition between development tools and browsers, which will continuously move all this forward, bringing an excellent choice to the end user. In addition, this separation allows the toolkit and browser development teams to work in parallel and independently.
The new WebAssembly toolkit project I want to talk about today is called Binaryen. Binaryen is a library for supporting WebAssembly in compilers, written in C ++. If you are not personally working on the WebAssembly compiler, then you probably do not need to know directly about Binaryen. If you use any WebAssembly compiler, then it probably uses Binaryen under the hood - we will look at the examples below.
The core of the Binaryen library is intended for parsing and generating WebAssembly, as well as representing its code as an abstract syntax tree (AST). Based on these features, the following useful utilities have been created:
- a utility for the command line that can load the WebAssembly module, parse it and execute its code as an interpreter (perform actions, print the result to the console). To load the module and display the result, we use the temporary notation of s-expressions with the suffix .wast (I recall that the final format for representing binary WebAssembly modules is still under development).
- asm2wasm is a utility for compiling asm.js into WebAssembly.
- wasm2asm - utility for compiling WebAssembly in asm.js (still under development)
- s2wasm, which compiles .s files (a format created by the new backend for WebAssembly in LLVM ) into WebAssembly.
- wasm.js - Binaryen JavaScript port. This will allow you to run all of the above tools directly in the browser.
About Binaryen you can see these slides .
I remind you once again that WebAssembly is under active design, which means that the input and output formats of Binaryen (.wast, .s) are not final. Binaryen is constantly updated to update the WebAssembly specification. The degree of cardinality of changes decreases over time, but of course no one can guarantee any compatibility.
Let's look at a few areas where Binaryen can be useful.
Compilation in WebAssembly using Emscripten
Emscripten can compile C code in asm.js, and Binaryen (using asm2wasm utility) can compile asm.js in WebAssembly, so the Emscripten + Binaryen bundle gives us a complete set of tools for compiling C and C ++ code in WebAssembly. You can run asm2wasm on asm.js code, but it's easier to let Emscripten do it for you, something like this:
emcc file.cpp -o file.js -s ‘BINARYEN=”path-to-binaryen”’
Emscripten will compile file.cpp and the output will give you a JavaScript file, plus a separate .wast file with WebAssembly. Under the hood, Emscripten will compile the code in asm.js, run asm2wasm for it, and save the result. This is described in more detail on the Wiki of the Emscripten project.
But wait, what's the point of compiling something in WebAssembly if browsers don't support it yet? Great question! :) Yes, we still can not run this code in any browser. But we can already test something with it. So, we want to check if the correct binary created the Emscripten + Binaryen bundle. How to do this? To do this, we can use wasm.js, which Emscripten integrated into the output .js file received by the emcc invocation command (see above). wasm.js contains the Javascript Binaryen port, including the interpreter. If you run file.js (in node.js or in a browser), then you will get the result of executing WebAssembly. This allows us to actually confirm that the compiled WebAssembly binary is working correctly. You can look at an example of such a program, plus morea couple of examples are in the repository for test purposes.
Of course, we are not yet standing on solid ground with all these tools. The test environment is weird. C ++ code is compiled into WebAssembly and then executed in a WebAssembly interpreter, which itself is written in C ++ but ported to JavaScript. And there are so far no other ways to run it all. But we have several reasons to believe the results:
- The output code passes all the Emscripten tests. They include the processing of many real code bases (Python, zlib, SQLite), plus many “suspicious” situations in C and C ++. From experience, we can say that if Emscripten tests pass for all these cases, then other code processed by Emscripten will behave normally
- The Binaryen interpreter passes all internal WebAssembly tests to determine if WebAssembly is running correctly. In other words, when we get WebAssembly support in browsers, they will have to behave in the same way (except maybe faster).
- Most of the work is done by Emscripten, which is a stable compiler that has been used in production for a long time and only a relatively small part on top of it is done using Binaryen (its code is only a couple of thousand lines). Less code means fewer bugs.
All this shows that we already have some result, we can compile C and C ++ code into WebAssembly and even somehow launch it.
Note that WebAssembly is just another new feature, and, distracted from it, everything else in Emscripten still works: Emscripten allows you to use libc and syscalls, OpenGL / WebGL code, browser integration, integration with node.js, etc. .d. As a result, projects that already use Emscripten will be able to switch to WebAssembly simply by adding a new command line parameter. And this will allow C ++ projects to be compiled into WebAssembly and work in browsers without any effort.
Using the new LLVM experimental backend for WebAssembly with Emscripten
We just saw a new important stage in the development of Emscripten, which gave him the opportunity to create WebAssembly modules and even test their work. But the work does not stop there: it was just using the current asm.js compiler, along with the asm2wasm utility. There is a new backend for LLVM for WebAssembly (or rather, not yet, but actively written) - right in the main LLVM development branch. And, although it is not yet ready for real use, over time it will become a very important tool. Binaryen supports its output format.
The LLVM backend for WebAssembly, like most LLVM backends, creates assembler code, in this case in a special .s format. This format is close to WebAssembly, but not directly identical to it - it is more like C compiler output (linear list of instructions, one instruction per line) than the abstract syntax tree of WebAssembly. This .s file can be converted to WebAssembly in a fairly trivial way (in general, Binaryen includes the s2wasm utility, which does just that - see how simple it is). You can run it by itself, or use Emscripten for this, which now supports the new WASM_BACKEND option, which you can use like this:
emcc file.cpp -o file.js -s ‘BINARYEN=”path-to-binaryen”’ -s WASM_BACKEND=1
Note that you also need to use the BINARYEN option, since s2wasm is part of Binaryen. When all these options are specified, Emscripten uses a new backend for WebAssembly instead of using the asm.js. compiler. After calling the backend and receiving a file in .s-format from it, Emscripten will call s2wasm to convert to WebAssembly. A few examples of programs that you can already build in this way can be found on the Emscripten project wiki .
Thus, we have two ways to build a WebAssembly module using Binaryen:
- Emscripten + asm.js backend + asm2wasm , which works right now and should be a relatively simple and acceptable option
- Emscripten + a new backend for WebAssembly + s2wasm , which is not yet fully operational, but with the development of a backend for WebAssembly will come to the fore.
The goal at the moment is to make the transition from the first method to the second as less difficult as possible. Ideally, everything should be reduced to replacing one argument on the command line.
Thus, we have a clear plan:
- We use Emscripten to generate asm.js code (today)
- We proceed to generate WebAssembly through asm2wasm (browsers are already possible, but browsers are not ready yet)
- We proceed to generate WebAssembly through the new LLVM backend (as soon as it is ready)
Each step gives new advantages to users (speed!) And practically does not cause difficulties for developers.
In conclusion, I want to say that although this article is written about Binaryen in the context of its use with Emscripten, it is still a separate library for WebAssembly for general use. If you have ideas for creating some tools for working with WebAssembly, you can take the Binaryen library and work with it without looking at Emscripten, LLVM or anything else.