New do-it-yourself programming language
Hello, Habr! Go straight to the point. At the moment I am reading “The Dragon Book” and am developing a compiler for my programming language called Lolo (in honor of the penguin from the Soviet-Japanese cartoon). I plan to finish within a year if nothing hurts. In parallel, I will post interesting excerpts from the experience of translation, building intermediate code, optimization, etc., well, today I’ll just introduce you to the language. Sit down and go.
The language is compiled, imperative, not object-oriented, semantics has been impudently written off from C and supplemented with many useful features. Let's start with them.
You may have thought about smart pointers from Rust right now, but they are not. In my language, the safety of accessing memory is provided by two idioms. First: the lack of a dereferencing operation of pointers. Instead, when accessing the declared pointer, the object itself is referred to. That is, you can and should write like this:
The variable variable now contains the number 12. Now you see an unfamiliar syntax and, perhaps, you are a little perplexed, but I will explain everything in the course of the article. Second idiom: lack of operations on pointers. Again: all operations when accessing pointers, including assignment, increment and decrement are performed on objects. The only operation that relates directly to the pointer is assignment by address, or, as I call it, identification. In the code example above, in the first line, it is precisely identification. Any pointer can be set to the address of only the already allocated memory area, which is the new operation returned. You can also put a pointer to the address of another variable allocated even on the heap, even on the stack. Here is an example:
Here "~" is the usual assignment operation. You can also identify pointers with a special null pointer. It acts as a pointer that refers to a null address. After identifying the operations of comparison and comparison on identity (identical addresses) with null, they will yield true:
Here "=" is a comparison of values, "==" is a comparison by addresses, "nop" is an empty operation, and after ";;" - a comment. And yes, null is the only pointer operations with which are possible without checking type compatibility.
Thus, pointers can only be assigned to allocated memory or null areas and cannot be moved anywhere. However, these measures do not fully protect against segmentation fault errors. To get it, just follow these steps:
I think everything is clear here. But to make such a mistake can only be done on purpose, and then, having worked hard. After all, the delete operation does the same as the garbage collector, only less safely. Speaking of him ...
Garbage collector - he is also a collector in Lolo. Probably no need to explain what it is. I can only say that it can be disabled by a special option during compilation. We tested the program with the collector, everything works as it should - you can enter the option and try to optimize the program using manual memory management.
Although I said that the semantics of the language are written off from C, the differences are quite significant. Here arrays are pointers. Arrays have their own syntax and secure addressing. No, not with a range check. With them, in principle, it is difficult to get a runtime error. This is because each array stores the length in the variable size, as in Java, and with each indexing from the index ... there is the remainder of the division by this size! A stupid decision, at first glance, until we look at negative indices. If you find the remainder of dividing -1 by the length of the array, you get a number equal to size-1, that is, the most finite element. By such a maneuver, we can access indices not only from the beginning, but also from the end of the array. Another trick is to cast any primitive type to the byte [] array. But how do you get a runtime error, you ask? I will leave this question for you as an easy riddle.
I don’t know for sure whether the current C standard includes links, but they will definitely be in Lolo. Perhaps the lack of references in earlier versions of C is one of the main reasons for pointers to pointers. They are needed to pass arguments to the address, to return values from functions without copying. Pointers and arrays can also be passed by reference (since when passing by value, arrays will be completely copied, and pointers set to a new location by the ~~ operation will not save it).
Everything is more beautiful and more beautiful. I'm already in love with my language. His next hobby is multithreading. Honestly, I have not fully decided what tools it will be provided with. Most likely, the synchronized keyword with all the properties of ala-Java and, possibly, the concurrent keyword in front of non-inline functions, which means “run these functions in parallel threads”.
It is strings, not string literals, as in C ++. Each line will have its own length, indexing will occur with finding the remainder. In general, strings in Lolo are very similar to character arrays, except that arrays do not have concatenation via "+", animation through "*", and comparisons through "<" and ">". And since we are talking about lines, we must mention the characters. Symbols in Lolo are not numbers, as in C ++. And they contain not one byte, but 4 for DKOTI characters and 6 for UTF characters. I'll talk about DKOTI next time, but for now, just know that Lolo supports characters and strings in two encodings. And yes, the length property can even be taken from constants:
The vast majority of programming languages that have a logical data type use binary logic. But in Lolo it will be ternary, or rather, fuzzy ternary. Three values: true - true, false - false and none - nothing. So far I have not found in the language of operations that return none, but I remember many examples from practice when flags with three values would be very useful. Had to use enumerations or an integer type. No longer have to. That's just the name of this type I can not choose. The most commonplace is “logical,” but too long. Other options are “luk” in honor of Jan Lukasevich, “brus” in honor of N. P. Brusnetsov and “trit”, but strictly speaking, this type is not a trit. In general, the survey is at the end of the article.
If, after declaring a structural variable, put the ~ sign and open the square brackets, you can set the values of its fields in turn or in the form of a dictionary. If you carry out such a procedure with an array, you can set the values of its cells, only without a dictionary. There is nothing special to tell, just look at the code:
Just like in Go! You can write several variable names separated by commas and assign them all the values returned from the function at once:
Everything is clear here. Instead of C-shy headers - modules from Java.
Again native Java. Since we have arrays with length, it’s a sin not to use the expression for each.
I don’t know about you, but in C and C ++ I am terribly enraged by the lack of the ability to use the switch-case operation for non-integer variables. And the syntax also infuriates. Here in Pascal is another matter. And now in Lolo:
And this is from Python.
Remember that all operations with pointers are forbidden in Lolo, except for identification? Now let's remember how to access function parameters from variable length parameter lists. You need to declare a pointer to the first element, and then increment until the truth check returns true. You cannot increment in Lolo. But that's okay. After all, the list of parameters here is presented in the form of a tuple of a fixed (call-dependent) length, with index-safe, as in arrays. His name is "?" Type checking is performed only for parameters set in the function definition. The rest (“multi-point”) parameters are reduced to any type, and with an awkward movement their behavior is not defined. But still, such a tuple is much safer and more convenient than macros in C.
And another character - a family of interval types (range, urange, lrange, etc.). They are given by two integers through two points (..) and can cut an array from an array, a string from a string, in general, a useful thing, I think.
From Pascal. Works with strings, arrays, tuples? and ranges.
Honestly, I’m already confused how this thing is correctly called, with it you can directly specify the arguments of non-pure functions:
From C ++. Here, even an example is not necessary to give, and so everything is clear.
Well, and where without them?
Because in 2019, using the GOTO operator of death is similar.
Well, a little talk about the syntax. As you noticed, the semicolon is shallow. Modern programming languages do very well without this source of error. Examples are Python, Kotlin. The arrow operator (->) is combined with the dot operator. When calling functions without arguments, brackets are optional. Strings are given in numbers and vice versa. Logical and bitwise operators are combined. There are function modifiers for tabulation. Nested Functions type_of. And most importantly - multilingualism. Yes, I am going to duplicate keywords, properties of strings and arrays and all identifiers of the standard library in all languages of international communication, namely: English, Russian, Japanese, Chinese, Spanish, Portuguese, Arabic, French, German and Latin.
In fact, all of the above does not include half the capabilities of Lolo. I just can’t immediately recall all its features. I will add as the compiler is ready.
The language is compiled, imperative, not object-oriented, semantics has been impudently written off from C and supplemented with many useful features. Let's start with them.
Semantic modifications
Safe pointers
You may have thought about smart pointers from Rust right now, but they are not. In my language, the safety of accessing memory is provided by two idioms. First: the lack of a dereferencing operation of pointers. Instead, when accessing the declared pointer, the object itself is referred to. That is, you can and should write like this:
int # pointer ~~ new int(5)
int variable ~ pointer + 7
The variable variable now contains the number 12. Now you see an unfamiliar syntax and, perhaps, you are a little perplexed, but I will explain everything in the course of the article. Second idiom: lack of operations on pointers. Again: all operations when accessing pointers, including assignment, increment and decrement are performed on objects. The only operation that relates directly to the pointer is assignment by address, or, as I call it, identification. In the code example above, in the first line, it is precisely identification. Any pointer can be set to the address of only the already allocated memory area, which is the new operation returned. You can also put a pointer to the address of another variable allocated even on the heap, even on the stack. Here is an example:
int variable ~ 5
int # pointer ~~ variable
Here "~" is the usual assignment operation. You can also identify pointers with a special null pointer. It acts as a pointer that refers to a null address. After identifying the operations of comparison and comparison on identity (identical addresses) with null, they will yield true:
int # pointer ~~ null
if (pointer = null) nop ;; true
if (pointer == nul) nop ;; true
Here "=" is a comparison of values, "==" is a comparison by addresses, "nop" is an empty operation, and after ";;" - a comment. And yes, null is the only pointer operations with which are possible without checking type compatibility.
Thus, pointers can only be assigned to allocated memory or null areas and cannot be moved anywhere. However, these measures do not fully protect against segmentation fault errors. To get it, just follow these steps:
int # pointer1 ~~ new int(5)
int # pointer2 ~~ pointer1
delete pointer1
int variable ~ pointer2 ;; segmentation fault!
I think everything is clear here. But to make such a mistake can only be done on purpose, and then, having worked hard. After all, the delete operation does the same as the garbage collector, only less safely. Speaking of him ...
Garbage collector
Garbage collector - he is also a collector in Lolo. Probably no need to explain what it is. I can only say that it can be disabled by a special option during compilation. We tested the program with the collector, everything works as it should - you can enter the option and try to optimize the program using manual memory management.
Built-in Arrays
Although I said that the semantics of the language are written off from C, the differences are quite significant. Here arrays are pointers. Arrays have their own syntax and secure addressing. No, not with a range check. With them, in principle, it is difficult to get a runtime error. This is because each array stores the length in the variable size, as in Java, and with each indexing from the index ... there is the remainder of the division by this size! A stupid decision, at first glance, until we look at negative indices. If you find the remainder of dividing -1 by the length of the array, you get a number equal to size-1, that is, the most finite element. By such a maneuver, we can access indices not only from the beginning, but also from the end of the array. Another trick is to cast any primitive type to the byte [] array. But how do you get a runtime error, you ask? I will leave this question for you as an easy riddle.
References
I don’t know for sure whether the current C standard includes links, but they will definitely be in Lolo. Perhaps the lack of references in earlier versions of C is one of the main reasons for pointers to pointers. They are needed to pass arguments to the address, to return values from functions without copying. Pointers and arrays can also be passed by reference (since when passing by value, arrays will be completely copied, and pointers set to a new location by the ~~ operation will not save it).
Multithreading
Everything is more beautiful and more beautiful. I'm already in love with my language. His next hobby is multithreading. Honestly, I have not fully decided what tools it will be provided with. Most likely, the synchronized keyword with all the properties of ala-Java and, possibly, the concurrent keyword in front of non-inline functions, which means “run these functions in parallel threads”.
Inline strings
It is strings, not string literals, as in C ++. Each line will have its own length, indexing will occur with finding the remainder. In general, strings in Lolo are very similar to character arrays, except that arrays do not have concatenation via "+", animation through "*", and comparisons through "<" and ">". And since we are talking about lines, we must mention the characters. Symbols in Lolo are not numbers, as in C ++. And they contain not one byte, but 4 for DKOTI characters and 6 for UTF characters. I'll talk about DKOTI next time, but for now, just know that Lolo supports characters and strings in two encodings. And yes, the length property can even be taken from constants:
int len ~ "Hello, world!".length ;; len = 13
Boolean type with three values
The vast majority of programming languages that have a logical data type use binary logic. But in Lolo it will be ternary, or rather, fuzzy ternary. Three values: true - true, false - false and none - nothing. So far I have not found in the language of operations that return none, but I remember many examples from practice when flags with three values would be very useful. Had to use enumerations or an integer type. No longer have to. That's just the name of this type I can not choose. The most commonplace is “logical,” but too long. Other options are “luk” in honor of Jan Lukasevich, “brus” in honor of N. P. Brusnetsov and “trit”, but strictly speaking, this type is not a trit. In general, the survey is at the end of the article.
Lists for initializing structures and lists
If, after declaring a structural variable, put the ~ sign and open the square brackets, you can set the values of its fields in turn or in the form of a dictionary. If you carry out such a procedure with an array, you can set the values of its cells, only without a dictionary. There is nothing special to tell, just look at the code:
struct {
int i;
real r;
str s;
} variable ~ [ i: 5, r: 3.14, s: "Hello!" ]
int[5] arr ~ [ 1, 2, 3, 4, 5 ]
Return multiple values from functions
Just like in Go! You can write several variable names separated by commas and assign them all the values returned from the function at once:
int, real function() {
return 5, 3.14
}
byte § {
int i; real r
i, r ~ function
}
Modules instead of headers
Everything is clear here. Instead of C-shy headers - modules from Java.
for (auto item: array)
Again native Java. Since we have arrays with length, it’s a sin not to use the expression for each.
The selection operator is not just for int
I don’t know about you, but in C and C ++ I am terribly enraged by the lack of the ability to use the switch-case operation for non-integer variables. And the syntax also infuriates. Here in Pascal is another matter. And now in Lolo:
case variable {
"hello", "HELLO": nop
"world": {
nop; nop
}
"WORLD": nop
}
Powering and Division Operators
And this is from Python.
real r ~ 3.14 ** 2
int i ~ r // 3
Function parameter tuples
Remember that all operations with pointers are forbidden in Lolo, except for identification? Now let's remember how to access function parameters from variable length parameter lists. You need to declare a pointer to the first element, and then increment until the truth check returns true. You cannot increment in Lolo. But that's okay. After all, the list of parameters here is presented in the form of a tuple of a fixed (call-dependent) length, with index-safe, as in arrays. His name is "?" Type checking is performed only for parameters set in the function definition. The rest (“multi-point”) parameters are reduced to any type, and with an awkward movement their behavior is not defined. But still, such a tuple is much safer and more convenient than macros in C.
void function(...) {
if (?.size > 1) {
int i ~ ?[0]
real r ~ ?[1]
}
}
Numerical intervals
And another character - a family of interval types (range, urange, lrange, etc.). They are given by two integers through two points (..) and can cut an array from an array, a string from a string, in general, a useful thing, I think.
int[5] arr ~ [ 1, 2, 3, 4, 5 ]
int[3] subarr = arr[1..3] ;; [ 2, 3, 4 ]
In operator
From Pascal. Works with strings, arrays, tuples? and ranges.
int[5] arr ~ [ 1, 2, 3, 4, 5 ]
if (4 in arr) nop
Function Parameter Dictionary
Honestly, I’m already confused how this thing is correctly called, with it you can directly specify the arguments of non-pure functions:
int pos = str_find(string, npos: -1)
Default options
From C ++. Here, even an example is not necessary to give, and so everything is clear.
Exceptions
Well, and where without them?
try {
raise SEGMENTATION_FAULT_EXCEPTION
} except (Exception e) {
print(e.rus)
}
No unconditional jump
Because in 2019, using the GOTO operator of death is similar.
Syntax
Well, a little talk about the syntax. As you noticed, the semicolon is shallow. Modern programming languages do very well without this source of error. Examples are Python, Kotlin. The arrow operator (->) is combined with the dot operator. When calling functions without arguments, brackets are optional. Strings are given in numbers and vice versa. Logical and bitwise operators are combined. There are function modifiers for tabulation. Nested Functions type_of. And most importantly - multilingualism. Yes, I am going to duplicate keywords, properties of strings and arrays and all identifiers of the standard library in all languages of international communication, namely: English, Russian, Japanese, Chinese, Spanish, Portuguese, Arabic, French, German and Latin.
In fact, all of the above does not include half the capabilities of Lolo. I just can’t immediately recall all its features. I will add as the compiler is ready.
Only registered users can participate in the survey. Please come in.
How to name a boolean type?
- 45.2% logical 19
- 7.1% luk 3
- 14.2% brus 6
- 21.4% trit 9
- 19% your option in the comments 8