Compiling Kotlin: JetBrains VS ANTLR VS JavaCC

    How quickly does Kotlin parse and what does it matter? JavaCC or ANTLR? Is the source code from JetBrains suitable?

    Compare, fantasize and wonder.

    tl; dr

    JetBrains are too hard to carry, ANTLR is hyip but unexpectedly slow, and JavaCC is still too early to write off.

    Parsing a simple Kotlin file with three different implementations:
    ИмплементацияПервый запуск1000й запускразмер джара (парсера)
    JetBrains (w/o analyzer)1423мс0,9мс35.3МБ

    One serene sunny day ...

    I decided to build a translator into GLSL from some convenient language. The idea was to program the shaders right in the idea and get "free" support for IDE - syntax, debugging and unit tests. It turned out really very convenient .

    Since then, the idea has remained to use Kotlin - you can use the name vec3 in it, it is more strict and more convenient with the IDE. In addition, it is HYIP. Although from the point of view of my internal manager these are all insufficient reasons, but the idea came back so many times that I decided to get rid of it simply by implementing.

    Why not java? There is no operator overloading, so the syntax of vector arithmetic will be too different from what you are used to seeing in game dev


    The guys from JetBrains put the code for their compiler on the githab . How to use it you can spy here and here .

    At first, I used their parser along with the analyzer, because in order to translate into another language, it is necessary to know what type the variable has without explicitly indicating the type val x = vec3(). Here the type for the reader is obvious, but in AST this information is not so easy to get, especially when there is another variable to the right, or a function call.

    Here I was disappointed. The first run of the parser on a primitive file takes 3s (THREE SECONDS). This time has the following obvious inconveniences:

    Kotlin JetBrains parser
    first call elapsed : 3254.482ms
    min time in next 10 calls: 70.071ms
    min time in next 100 calls: 29.973ms
    min time in next 1000 calls: 16.655ms
    Whole time for 1111 calls: 40.888756 seconds

    1. because it is plus three seconds to launch a game or application.
    2. during development, I use hot shader overload and see the result immediately after changing the code.
    3. I often restart the application and am glad that it starts quickly enough (second or two).

    Plus three seconds to warm up the parser - this is unacceptable. Of course, it immediately became clear that during subsequent calls the parsing time drops to 50ms and even to 20ms, which removes (almost) the inconvenience No. 2 from the expression. But the other two do not go anywhere. In addition, 50ms per file is plus 2500ms per 50 files (one shader is 1-2 files). And if it is Android? (Here we are talking only about time so far.) The

    crazy work of JIT attracts attention. The parsing time of a simple file falls from 70ms to 16ms. Which means, firstly, JIT itself consumes resources, and secondly, the result for another JVM can be very different.

    In an attempt to find out where these figures come from, I found an option to use their parser without an analyzer. After all, I just need to arrange the types and it can be done relatively easily, while the JetBrains analyzer does something much more complicated and collects much more information. And then the launch time drops twice (but almost half a second is still decent), and the time of subsequent calls is much more interesting - from 8ms in the first ten, to 0.9ms, somewhere in a thousand.

    Kotlin JetBrains parser (without analyzer) (исходник)
    first call elapsed : 1423.731ms
    min time in next 10 calls: 8.275ms
    min time in next 100 calls: 2.323ms
    min time in next 1000 calls: 0.974ms
    Whole time for 1111 calls: 3.6884801 seconds

    I had to collect just such numbers. The time of the first run is important when loading the first shaders. It is critical, because there you will not distract the user while the shader loads in the background, it just waits. The fall in execution time is important in order to see the dynamics itself, how the JIT works, how effectively we can load the shader on the warmed up application.

    The main reason to look first at the JetBrains parser was to use their typewriter. But once the rejection of it becomes a discussed option, you can try using other parsers. In addition, non-JetBrains are likely to be much smaller in size, less demanding on the environment, easier to support and include code in the project.


    There was no parser on JavaCC, but on the HYIP ANTLR, as expected, there is ( one , two ).

    But what was unexpected was speed. Same 3c for loading (first call) and fantastic 140ms for subsequent calls. There is not only the first launch lasts unpleasantly long, but then the situation is not corrected. Apparently, the guys from JetBrains did some magic, letting JIT optimize their code so much. Because ANTLR is not optimized at all with time.

    Kotlin ANTLR parser(исходник)
    first call elapsed : 3705.101ms
    min time in next 10 calls: 139.596ms
    min time in next 100 calls: 138.279ms
    min time in next 1000 calls: 137.20099ms
    Whole time for 1111 calls: 161.90619 seconds


    In general, we are surprised to refuse ANTLR services. Parsing should not be so long! There are no cosmic ambiguities in the Kotlin grammar, and I checked it on almost empty files. So, it is time to uncover the old JavaCC, roll up your sleeves, and still "do it yourself and how you should."

    This time the numbers turned out to be expected, although in comparison with alternatives, they were unexpectedly pleasant. Sudden advantages of my parser on JavaCC Of course, instead of writing my parser, I would like to use a ready-made solution. But the existing ones have huge drawbacks: - performance (pauses when reading a new shader are unacceptable, as well as three seconds of warming up at the start)

    Kotlin JavaCC parser (исходник)
    first call elapsed : 19.024ms
    min time in next 10 calls: 1.952ms
    min time in next 100 calls: 0.379ms
    min time in next 1000 calls: 0.114ms
    Whole time for 1111 calls: 0.38707677 seconds

    - a huge runtime runner, I'm not even sure if the parser can be packaged using it into the final product
    - by the way, the current solution with Groovy is the same problem - runtime stretches

    While the resulting parser on JavaCC is

    + great speed both at the start and in process
    + just a few classes of the parser itself


    JetBrains are too hard to carry, ANTLR is hyip but unexpectedly slow, and JavaCC is still too early to write off.

    Parsing a simple Kotlin file with three different implementations: At some point, I decided to look at the size of the jar with all dependencies. JetBrains are great, as expected, but ANTLR runtime is surprising in its size . UPDATE: Initially, I wrote 15MB, but, as suggested in the comments, if you connect antlr4-runtime instead of antlr4, the size drops to the expected. Although the JavaCC parser itself remains 10 times smaller than ANTLR (if you remove all the code at all, except for the parsers themselves).

    ИмплементацияПервый запуск1000й запускразмер джара (парсера)
    JetBrains (w/o analyzer)1423мс0,9мс35.3МБ

    The size of the jar itself is, of course, important for mobile phones. But it also matters for the desktop, because, in fact, it means the amount of additional code in which bugs can be found, which must be indexed by the IDE, which, in fact, affects the speed of the first load and the warm-up speed. In addition, for complex code there is no particular hope to broadcast to another language.
    I do not call for counting kilobytes and I appreciate the programmer’s time and convenience, but still it’s worth thinking about saving, because this is how projects become sluggish and difficult to maintain.

    A couple more words about ANTLR and JavaCC

    A serious feature of ANTLR is the separation of grammar and code. It would be good if you didn’t have to pay so much for it. Yes, and this is only important for “serial grammar developers,” and for final products this is not so important, because even the existing grammar will still have to be traversed to write your own code. Plus, if we save and take a “third-party” grammar - it may simply be inconvenient, it will still need to be thoroughly understood, to transform the tree for yourself. In general, JavaCC, of ​​course, mixes flies and burgers, but does it matter a lot and does it feel bad?

    Another counter of ANTLR is the set of target platforms. But here you can see from the other side - the code from under JavaCC is very simple. And it's very simple ... to broadcast! Directly with your custom code - even in C #, even in JS.


    All code is here

    The result of the parsing I have is a tree built on YastNode (this is a very simple class, in fact, a map with convenient methods and an ID). But YastNode is not exactly a “spherical node in a vacuum.” It is this class that I actively use, based on it I have collected several tools - a typifier, several translators and an optimizer / inliner.

    The JavaCC parser does not yet contain all the grammar, there are 10 percent left. But it does not seem that they could affect performance - I checked the speed as rules were added, and it did not change noticeably. In addition, I have already done much more than I needed and just try to share the unexpected result found in the process.

    Also popular now: