New do-it-yourself programming language

    Hello, Habr! Go straight to the point. At the moment I am reading “The Dragon Book” and am developing a compiler for my programming language called Lolo (in honor of the penguin from the Soviet-Japanese cartoon). I plan to finish within a year if nothing hurts. In parallel, I will post interesting excerpts from the experience of translation, building intermediate code, optimization, etc., well, today I’ll just introduce you to the language. Sit down and go.

    The language is compiled, imperative, not object-oriented, semantics has been impudently written off from C and supplemented with many useful features. Let's start with them.

    Semantic modifications

    Safe pointers

    You may have thought about smart pointers from Rust right now, but they are not. In my language, the safety of accessing memory is provided by two idioms. First: the lack of a dereferencing operation of pointers. Instead, when accessing the declared pointer, the object itself is referred to. That is, you can and should write like this:

    int # pointer ~~ new int(5)
    int variable ~ pointer + 7

    The variable variable now contains the number 12. Now you see an unfamiliar syntax and, perhaps, you are a little perplexed, but I will explain everything in the course of the article. Second idiom: lack of operations on pointers. Again: all operations when accessing pointers, including assignment, increment and decrement are performed on objects. The only operation that relates directly to the pointer is assignment by address, or, as I call it, identification. In the code example above, in the first line, it is precisely identification. Any pointer can be set to the address of only the already allocated memory area, which is the new operation returned. You can also put a pointer to the address of another variable allocated even on the heap, even on the stack. Here is an example:

    int variable ~ 5
    int # pointer ~~ variable

    Here "~" is the usual assignment operation. You can also identify pointers with a special null pointer. It acts as a pointer that refers to a null address. After identifying the operations of comparison and comparison on identity (identical addresses) with null, they will yield true:

    int # pointer ~~ null
    if (pointer = null) nop  ;; true
    if (pointer == nul) nop  ;; true

    Here "=" is a comparison of values, "==" is a comparison by addresses, "nop" is an empty operation, and after ";;" - a comment. And yes, null is the only pointer operations with which are possible without checking type compatibility.

    Thus, pointers can only be assigned to allocated memory or null areas and cannot be moved anywhere. However, these measures do not fully protect against segmentation fault errors. To get it, just follow these steps:

    int # pointer1 ~~ new int(5)
    int # pointer2 ~~ pointer1
    delete pointer1
    int variable ~ pointer2  ;; segmentation fault!

    I think everything is clear here. But to make such a mistake can only be done on purpose, and then, having worked hard. After all, the delete operation does the same as the garbage collector, only less safely. Speaking of him ...

    Garbage collector

    Garbage collector - he is also a collector in Lolo. Probably no need to explain what it is. I can only say that it can be disabled by a special option during compilation. We tested the program with the collector, everything works as it should - you can enter the option and try to optimize the program using manual memory management.

    Built-in Arrays

    Although I said that the semantics of the language are written off from C, the differences are quite significant. Here arrays are pointers. Arrays have their own syntax and secure addressing. No, not with a range check. With them, in principle, it is difficult to get a runtime error. This is because each array stores the length in the variable size, as in Java, and with each indexing from the index ... there is the remainder of the division by this size! A stupid decision, at first glance, until we look at negative indices. If you find the remainder of dividing -1 by the length of the array, you get a number equal to size-1, that is, the most finite element. By such a maneuver, we can access indices not only from the beginning, but also from the end of the array. Another trick is to cast any primitive type to the byte [] array. But how do you get a runtime error, you ask? I will leave this question for you as an easy riddle.


    I don’t know for sure whether the current C standard includes links, but they will definitely be in Lolo. Perhaps the lack of references in earlier versions of C is one of the main reasons for pointers to pointers. They are needed to pass arguments to the address, to return values ​​from functions without copying. Pointers and arrays can also be passed by reference (since when passing by value, arrays will be completely copied, and pointers set to a new location by the ~~ operation will not save it).


    Everything is more beautiful and more beautiful. I'm already in love with my language. His next hobby is multithreading. Honestly, I have not fully decided what tools it will be provided with. Most likely, the synchronized keyword with all the properties of ala-Java and, possibly, the concurrent keyword in front of non-inline functions, which means “run these functions in parallel threads”.

    Inline strings

    It is strings, not string literals, as in C ++. Each line will have its own length, indexing will occur with finding the remainder. In general, strings in Lolo are very similar to character arrays, except that arrays do not have concatenation via "+", animation through "*", and comparisons through "<" and ">". And since we are talking about lines, we must mention the characters. Symbols in Lolo are not numbers, as in C ++. And they contain not one byte, but 4 for DKOTI characters and 6 for UTF characters. I'll talk about DKOTI next time, but for now, just know that Lolo supports characters and strings in two encodings. And yes, the length property can even be taken from constants:

    int len ~ "Hello, world!".length  ;; len = 13

    Boolean type with three values

    The vast majority of programming languages ​​that have a logical data type use binary logic. But in Lolo it will be ternary, or rather, fuzzy ternary. Three values: true - true, false - false and none - nothing. So far I have not found in the language of operations that return none, but I remember many examples from practice when flags with three values ​​would be very useful. Had to use enumerations or an integer type. No longer have to. That's just the name of this type I can not choose. The most commonplace is “logical,” but too long. Other options are “luk” in honor of Jan Lukasevich, “brus” in honor of N. P. Brusnetsov and “trit”, but strictly speaking, this type is not a trit. In general, the survey is at the end of the article.

    Lists for initializing structures and lists

    If, after declaring a structural variable, put the ~ sign and open the square brackets, you can set the values ​​of its fields in turn or in the form of a dictionary. If you carry out such a procedure with an array, you can set the values ​​of its cells, only without a dictionary. There is nothing special to tell, just look at the code:

    struct {
        int i;
        real r;
        str s;
    } variable ~ [ i: 5, r: 3.14, s: "Hello!" ]
    int[5] arr ~ [ 1, 2, 3, 4, 5 ]

    Return multiple values ​​from functions

    Just like in Go! You can write several variable names separated by commas and assign them all the values ​​returned from the function at once:

    int, real function() {
        return 5, 3.14
    byte § {
        int i; real r
        i, r ~ function

    Modules instead of headers

    Everything is clear here. Instead of C-shy headers - modules from Java.

    for (auto item: array)

    Again native Java. Since we have arrays with length, it’s a sin not to use the expression for each.

    The selection operator is not just for int

    I don’t know about you, but in C and C ++ I am terribly enraged by the lack of the ability to use the switch-case operation for non-integer variables. And the syntax also infuriates. Here in Pascal is another matter. And now in Lolo:

    case variable {
        "hello", "HELLO": nop
        "world": {
            nop; nop
        "WORLD": nop

    Powering and Division Operators

    And this is from Python.

    real r ~ 3.14 ** 2
    int i ~ r // 3

    Function parameter tuples

    Remember that all operations with pointers are forbidden in Lolo, except for identification? Now let's remember how to access function parameters from variable length parameter lists. You need to declare a pointer to the first element, and then increment until the truth check returns true. You cannot increment in Lolo. But that's okay. After all, the list of parameters here is presented in the form of a tuple of a fixed (call-dependent) length, with index-safe, as in arrays. His name is "?" Type checking is performed only for parameters set in the function definition. The rest (“multi-point”) parameters are reduced to any type, and with an awkward movement their behavior is not defined. But still, such a tuple is much safer and more convenient than macros in C.

    void function(...) {
        if (?.size > 1) {
            int i ~ ?[0]
            real r ~ ?[1]

    Numerical intervals

    And another character - a family of interval types (range, urange, lrange, etc.). They are given by two integers through two points (..) and can cut an array from an array, a string from a string, in general, a useful thing, I think.

    int[5] arr ~ [ 1, 2, 3, 4, 5 ]
    int[3] subarr = arr[1..3]  ;; [ 2, 3, 4 ]

    In operator

    From Pascal. Works with strings, arrays, tuples? and ranges.

    int[5] arr ~ [ 1, 2, 3, 4, 5 ]
    if (4 in arr) nop

    Function Parameter Dictionary

    Honestly, I’m already confused how this thing is correctly called, with it you can directly specify the arguments of non-pure functions:

    int pos = str_find(string, npos: -1)

    Default options

    From C ++. Here, even an example is not necessary to give, and so everything is clear.


    Well, and where without them?

    try {
    } except (Exception e) {

    No unconditional jump

    Because in 2019, using the GOTO operator of death is similar.


    Well, a little talk about the syntax. As you noticed, the semicolon is shallow. Modern programming languages ​​do very well without this source of error. Examples are Python, Kotlin. The arrow operator (->) is combined with the dot operator. When calling functions without arguments, brackets are optional. Strings are given in numbers and vice versa. Logical and bitwise operators are combined. There are function modifiers for tabulation. Nested Functions type_of. And most importantly - multilingualism. Yes, I am going to duplicate keywords, properties of strings and arrays and all identifiers of the standard library in all languages ​​of international communication, namely: English, Russian, Japanese, Chinese, Spanish, Portuguese, Arabic, French, German and Latin.

    In fact, all of the above does not include half the capabilities of Lolo. I just can’t immediately recall all its features. I will add as the compiler is ready.

    Only registered users can participate in the survey. Please come in.

    How to name a boolean type?

    • 45.2% logical 19
    • 7.1% luk 3
    • 14.2% brus 6
    • 21.4% trit 9
    • 19% your option in the comments 8

    Also popular now: