Comparing the same project in Rust, Haskell, C ++, Python, Scala, and OCaml

Original author: Tristan Hume
  • Transfer
In the last semester of the university, I chose the CS444 compiler course . There, each group of 1-3 people had to write a compiler from a substantial subset of Java in x86. Language to choose a group. This was a rare opportunity to compare implementations of large programs of the same functionality, written by very competent programmers in different languages, and to compare the difference in design and language choice. Such a comparison gave rise to a lot of interesting thoughts. Such a controlled comparison of languages ​​is rarely seen. It is not perfect, but much better than most subjective stories on which people's opinions about programming languages ​​are based.

We made our Rust compiler, and first I compared it with the Haskell team project. I expected their program to be much shorter, but it turned out to be the same size or larger. The same goes for OCaml. Then I compared it with the C ++ compiler, and there it was quite expected that the compiler was about 30% larger, mainly due to headers, lack of sum types and pattern matching. The following comparison was with my friend, who made the compiler on her own in Python and used less than half of the code compared to us, due to the power of metaprogramming and dynamic types. Another friend had a smaller Scala program. What surprised me most was the comparison with another team that also used Rust, but they turned out to have three times as much code due to different design decisions. Finally,

I will explain why I consider this a good comparison, I will give some information about each project and explain some reasons for differences in the size of the compiler. I will also draw conclusions from each comparison. Feel free to use these links to go to the section of interest:


Why do I find it meaningful

Before you say that the amount of code (I compared both strings and bytes) is a terrible metric, I want to note that in this case it can provide a good understanding. At least this is the most well-controlled example where different teams write the same large program that I have heard or read about.

  • No one (including me) knew that I would measure this parameter, so no one tried to play metrics, everyone just tried to finish the project quickly and correctly.
  • All (with the exception of the Python project, which I will discuss later) implemented the program for the sole purpose of passing the same automated test suite in the same time frame, so the results cannot be greatly distorted by groups that solve different problems.
  • The project was completed within a few months, with the team, and was supposed to gradually expand and pass both known and unknown tests. This means that it was useful to write clean, clear code.
  • Apart from passing the course tests, the code will not be used for anything else, no one will read it and, being a compiler for a limited subset of Java into text assembler, it will not be useful.
  • No libraries other than the standard library are allowed, and no helpers for parsing, even if they are in the standard library. This means that the comparison cannot be distorted by the powerful compiler libraries that only some commands have.
  • There were not only public, but also secret tests. They started once after the final delivery. This meant that there was an incentive to write your own test code and make sure that the compiler is reliable, correct and handles complex border situations.
  • Although all participants are students, I consider them to be quite competent programmers. Each of them took internships for at least two years, mainly in high-tech companies, sometimes even working on compilers. Almost all of them have been programming for 7-13 years and are enthusiasts who read a lot on the Internet outside their courses.
  • The generated code was not taken into account, but the grammar files and the code that generated the other code were taken into account.

Thus, I think that the amount of code provides a decent understanding of how much effort will be required to support each project, if it were long-term. I think that not too much difference between the projects also allows you to refute some extraordinary statements that I read, for example, that the Haskell compiler will be more than half the size of C ++ by virtue of the language.

Rust (basis for comparison)

I and one of my comrades each wrote more than 10k lines in Rust before, and the third colleague wrote, perhaps, 500 lines in some hackathons. Our compiler came out in 6806 lines wc -l, 5900 lines of the source (without spaces and comments), and 220 KB wc -c.

I found that in other projects these proportions are roughly respected, with a few exceptions, which I will note. For the rest of the article, when I refer to the lines or the amount, I mean wc -l, but it does not matter (if I do not notice the difference), and you can convert with a coefficient.

I wrote another article describing our designwho passed all public and secret tests. It also contains some additional features that we made for fun, not for passing tests, which probably added about 400 lines. It also has about 500 lines of our unit tests.


The Haskell team included two of my friends who wrote perhaps a couple thousand lines of Haskell each, plus read a lot of online content about Haskell and other similar functional languages ​​such as OCaml and Lean. They had another teammate whom I did not know very well, but it seems that a strong programmer used Haskell before.

Their compiler amounted to 9750 lines wc -l, 357 KB and 7777 lines of code (SLOC). This team also has the only significant differences between these ratios: their compiler is 1.4 times larger than ours in rows, 1.3 times in SLOC and 1.6 times in bytes. They did not implement any additional functions, passed 100% of public and secret tests.

It is important to note that the inclusion of tests affected this team most of all. Since they carefully approached the correctness of the code, they included 1,600 lines of tests. They caught several borderline situations that our team did not catch, but these cases were simply not checked by course tests. So without tests on both sides (6.3 thousand lines versus 8.1 thousand lines) their compiler is only 30% more than ours.

Here I tend to bytes as a more reasonable measure of volume comparison, because in a Haskell project, on average, there are longer lines, since it does not have a large number of lines from one closing bracket, and they do rustfmtnot break single-line function chains into several lines.

After rummaging with one of my teammates, we came up with the following explanation for this difference:

  • We used a handwritten lexical analyzer and a recursive descent method, and they used an NFA and DFA generator and an LR parser , and then a pass to convert the parsing tree to AST ( abstract syntax tree , more convenient representation of the code). This gave them significantly more code: 2677 lines compared to our 1705, about 1000 lines more.
  • They used the fanciful generic AST, which moved on to various type parameters as more information was added in each pass. This and more helper functions for rewriting probably explain why their AST code is about 500 lines longer than our implementation, where we collect struct literals and mutate fields Option<_>to add information as we go through.
  • They still have about 400 lines of code during generation, which are mainly associated with the greater abstraction needed to generate and combine the code in a purely functional way, where we simply use mutation and writing lines.

These differences plus tests explain all differences in volume. In fact, our files for folding constants and context resolution are very close in size. But still, there is some difference in bytes due to longer lines: probably because more code is needed to rewrite the whole tree in each pass.

As a result, setting aside design decisions, in my opinion, Rust and Haskell are equally expressive, perhaps with a slight advantage Rust because of the ability to easily use the mutation when it is convenient. It was also interesting to know that my choice of the recursive descent method and the handwritten lexical analyzer paid off: it was a risk that contradicted the recommendations and instructions of the professor, but I decided that it was easier and that was right.

Haskell fans will argue that that team probably didn't take full advantage of the Haskell features, and if they knew the language better, they could make a project with less code. I agree, someone like Edward Kmett can write the same compiler in a much smaller amount. Indeed, my friend’s team didn’t use many fancy super-advanced abstractions and fancy combinator libraries such as lens. However, all this affects the readability of the code. All the people on the team are experienced programmers, they knew that Haskell was capable of very bizarre things, but decided not to use them because they decided that understanding them would take more time than they would save, and make the code more difficult for others to understand. This seems like a real compromise to me, and the claim that Haskell is magically suitable for compilers goes into something like "Haskell requires extremely high qualifications for writing compilers if you don't care about code support for people who are also not very adept at Haskell."

It is also interesting to note that at the beginning of each project, the professor says that students can use any language that runs on a university server, but warns that the teams on Haskell are different from the rest: they have the largest scatter in grades. Many people overestimate their abilities and the Haskell teams have the most bad grades, although others do just fine like my friends.

C ++

Then I talked to my friend from the C ++ team. I knew only one person in this team, but C ++ is used in several courses at our university, so probably everyone in the team had C ++ experience.

Their project consisted of 8733 lines and 280 KB, not including the test code, but including about 500 lines of additional functions. Which makes it 1.4 times larger than our code without tests, which also has about 500 lines of additional functions. They passed 100% of public tests, but only 90% of secret tests. Presumably because they did not implement the fancy vtables arrays required by the specification, which take up perhaps 50-100 lines of code.

I did not delve too deeply into these differences in size. I guess this is mainly due to:

  • They use the LR parser and tree rewriter instead of the recursive descent method.
  • The lack of sum types and pattern comparisons in C ++, which we have widely used and which were very useful.
  • The need to duplicate all signatures in the header files, which is not the case in Rust.

We also compared compilation time. On my laptop, the clean debug build of our compiler takes 9.7 s, the clean release 12.5 s, and the incremental debug build 3.5 s. My friend did not have timings on hand for their C ++ build (using parallel make), but he said that the numbers are similar, with the caveat that they put implementations of many small functions in the header files in order to reduce duplication of signatures at the cost of a longer time (namely therefore, I cannot measure the net line overhead in the header files).


My friend, a very good programmer, decided to make the project alone in Python. She also implemented more advanced features (for fun) than any other team, including an intermediate SSA view with register allocation and other optimizations. On the other hand, since it worked alone and implemented many additional functions, it paid the least attention to the quality of the code, for example, throwing undifferentiated exceptions for all errors (relying on backtraces for debugging) instead of implementing error types and corresponding messages, like us.

Her compiler consisted of 4581 lines and passed all public and secret tests. She also implemented more advanced functions than any other command, but it is difficult to determine how much extra code it took, because many of the additional functions were more powerful versions of simple things that everyone needed to implement, such as folding constants and generating code. Additional functions are probably 1000-2000 lines, at least, so I’m sure that her code is at least twice as expressive as ours.

One big part of this difference is probably dynamic typing. Only in ourast.rs500 lines of type definitions, and many more types defined elsewhere in the compiler. We are also always limited to the type system itself. For example, we need an infrastructure to ergonomically add new information to the AST as we pass through and access it later. While in Python you can just set new fields on AST nodes.

Powerful metaprogramming also explains part of the difference. For example, although she used an LR parser instead of a recursive descent method, in my case I think it took less code because instead of going through a tree rewrite, her LR grammar included pieces of Python code to build the AST, which the generator could turn into Python functions viaeval. Part of the reason we did not use the LR parser is that building an AST without rewriting the tree will require a lot of ceremony (creating either Rust files or procedural macros) to associate the grammar with fragments of Rust code.

Another example of the power of metaprogramming and dynamic typing is the 400-line file, which is basically a repeating boilerplate code that implements a visitor on a bunch of AST structures. In Python, this can be a short function of about 10 lines that recursively introspects the fields of the AST node and visits them (using the attribute __dict__).

As a fan of Rust and statically typed languages ​​in general, I am inclined to note that the type system is very useful for preventing errors and for performance. Unusual metaprogramming can also make it difficult to understand how the code works. However, this comparison surprised me by the fact that I did not expect that the difference in the amount of code would be so big. If the difference as a whole is really close to having to write twice as much code, I still think Rust is a suitable compromise, but still half as much code is an argument, and in the future I tend to do something in Ruby / Python if you just need to quickly build something alone, and then throw it away.

Rust (another group)

The most interesting comparison for me was with my friend, who was also doing a project in Rust with one teammate (whom I did not know). My friend had a good Rust experience. He contributed to the development of the Rust compiler and read a lot. I don’t know anything about his comrade.

Their project consisted of 17,211 raw lines, 15k source lines and 637 KB, not including the test code and the generated code. It had no additional functions, and it passed only 4 of 10 secret tests and 90% of public tests for code generation, because they did not have enough time before the deadline to implement more bizarre parts of the specification. Their program is three times larger than ours, written in the same language, and with less functionality!

This result was really amazing for me and overshadowed all the differences between the languages ​​that I have studied so far. Therefore, we compared the lists of file sizes wc -l, and also checked how each of us implemented some specific things that resulted in a different code size.

It seems that it all comes down to the consistent adoption of various design decisions. For example, their frontend (lexical analysis, parsing, building AST) takes 7597 lines against our 2164. They used the DFA lexical analyzer and the LALR parser (1), but other groups did similar things without so much code. Looking at their weeder file, I noticed a number of design decisions that were different from ours:

  • They decided to use a fully typed parsing tree instead of a standard, uniform, string-based parsing tree. This probably required a lot more type definitions and additional conversion code at the parsing stage or a more complex parser.
  • They used trait implementations tryfromto convert between parsing tree types and AST types to validate them. This leads to many 10-20 line blocks impl. To do this, we used functions that return types Result, which generates fewer lines, and also frees us a bit from the type structure, simplifying parameterization and reuse. Some things that were single-line branches for us match, they were 10-line blocks impl.
  • Our types are structured to reduce copy-paste. For example, they used separate fields is_abstract, is_nativeand is_staticwhere the constraint checking code had to be copied twice: once for void-typed methods and once for methods with return type, with minor changes. While we voidhad just a special type, we came up with a taxonomy of modifiers with modeand visibility, which applied type-level constraints, and constraint errors were generated by default for the match operator, which translated the modifier sets into modeand visibility.

I did not look at the code of the passes of the analysis of their compiler, but they are also great. I talked with my friend, and it seems that they did not implement anything similar to the infrastructure of visitors, like ours. I guess that, along with some other smaller design differences, explains the difference in size of this part. The visitor allows our analysis passes to focus only on the parts of the AST that they needed, rather than matching the pattern across the entire AST structure. This saves a lot of code.

Their part for code generation consists of 3594 lines, and ours - 1560. I looked at their code and it seems that almost all the difference is that they chose an intermediate data structure for assembler instructions, where we just used string formatting for direct assembler output . They had to define types and output functions for all used instructions and operand types. It also meant that building assembly instructions took more code. Where we had a format operator with short instructions like mov ecx, [edx]they needed a giant operatorrustfmt, broken into 6 lines, which built the instruction with a bunch of intermediate nested types for operands that include up to 6 levels of nested brackets. We could also output blocks of related instructions, such as a function preamble, in a single format statement, where they had to specify the complete construction for each instruction.

Our team was considering using such an abstraction as theirs. It was easier to be able to either output a text assembly or directly issue machine code, however this was not a requirement of the course. The same thing could be done with less code and better performance using a trait X86Writerwith methods likepush(reg: Register). We also took into account that this could simplify debugging and testing, but we realized that viewing the generated text assembler is actually easier to read and test using snapshot testing if you insert comments liberally. But we (apparently correctly) predicted that it would take a lot of additional code, and there was no real benefit, given our real needs, so we did not worry.

It’s good to compare this with the intermediate representation that the C ++ team used as an extra function, which took them only 500 extra lines. They used a very simple structure (for simple type definitions and build code) that used operations close to what Java required. This meant that their intermediate representation was much smaller (and therefore required less build code) than the resulting assembler, since many language operations, such as calls and casts, were expanded into many assembler instructions. They also say that it really helped debugging, as it cut out a lot of garbage and improved readability. A higher-level presentation also allowed some simple optimizations to be made on their intermediate representation.

In general, it seems that the common reason for the threefold difference in volume is due to the consistent adoption of various design decisions, both large and small, in the direction of more code. They implemented a number of abstractions that we did not - they added more code, and skipped some of our abstractions, which reduce the amount of code.

This result really surprised me. I knew that design decisions matter, but I would not have guessed in advance that they would lead to any differences of this size, given that I only examined people whom I consider to be strong competent programmers. Of all the comparison results, this is the most significant for me. It probably helped me that I read a lot about how to write compilers before I took this course, so I could use smart projects that other people came up with and found to work well, like AST visitors and the recursive descent method, although they weren’t taught on our course.

What really made me think is the cost of abstraction. Abstractions can facilitate future expansion or protect against some types of errors, but they need to be taken into account given that you can get three times as much code for understanding and refactoring, three times as many possible places for errors and less time for testing and further development. Our training course was different from the real world: we knew for sure that we would never touch the code after development, this eliminates the benefits of proactive abstraction. However, if I had to choose which compiler to extend with an arbitrary function that you will say later, I would choose ours, without even considering my familiarity with it. Just because it has a lot less code to understand, and I could potentially choose the best abstraction for the requirements (e.g.

Also, in my view, the taxonomy of abstractions was strengthened: there are those that reduce the code, taking into account only the current requirements, like our visitor template, and there are abstractions that add code, but provide the benefits of extensibility, debugging, or correctness.


I also talked with a friend who did a project on Scala in the previous semester, but the project and tests were exactly the same. Their compiler consisted of 4141 lines and ~ 160 KB of code, not counting the tests. They passed 8 out of 10 secret tests and 100% open tests and did not implement any additional functions. Thus, compared to our 5906 lines without additional functions and tests, their compiler is 30% less.

One of the small design factors was a different approach to parsing. The course allowed the use of a command line tool for LR table generator. Nobody used it except this team. This saved them from having to implement the LR table generator. They also managed to avoid writing LR grammar with a 150-line Python script that scraped the Java grammar webpage they found on the Internet and translated it into the generator input format. They still needed to make some kind of tree in Scala, but in general the parsing stage amounted to 1073 lines compared to our 1443, although our gradient descent method here gave an advantage in volume compared to all other teams.

The rest of their compiler was also smaller than ours, without any obvious large design differences, although I did not delve into the code. I suspect that this is due to differences in the expressiveness of Scala and Rust. Scala and Rust have similar programming features useful for compilers, such as pattern matching, but Scala's managed memory saves the code needed for borrow checker to work in Rust. In addition, Scala has a more varied syntactic sugar than Rust.


Since all members of our team undergo an internship at Jane Street (technology trading company - approx. Per.), I was especially interested in looking at the result of other former Jane Street interns who chose OCaml to write the compiler.

Their compiler was 10,914 lines and 377 KB, including a small amount of test code and no additional features. They passed 9/10 secret tests and all public tests.

Like other groups, it seems that the main difference in size is due to the use of the LR parser and tree rewriting for parsing, as well as the regex-> NFA-> DFA conversion pipeline for lexical analysis. Their frontend (lexical analysis, parsing, AST construction) is 5548 lines, and ours - 2164, with similar ratios for bytes. They also used testing for their parser with the expectation that it was similar to our snapshot tests, which put the expected output outside the code, so their parser tests made ~ 600 lines of the total, and ours - about 200.

This leaves 5366 lines for the rest of the compiler (461 lines of which are interface files with type declarations) and 4642 for us, the difference is only 15%, if we count their interface files, and almost the same size, if not counting. It seems that apart from our parsing design decisions, Rust and OCaml seem equally expressive, except that OCaml needs front-end files, and Rust doesn't.


In general, I am very glad that I made this comparison, I learned a lot and was surprised many times. I think the general conclusion is that design decisions are much more important than language, but language matters because it gives you tools for implementing different designs.

Also popular now: