Subjective vision of an ideal programming language

    Further text is my point of view. Perhaps it will allow someone to take a fresh look at the design of programming languages ​​or to see some advantages and disadvantages of specific features. I will not go into private details like "there should be a while construct in the language," but I will simply describe the general approaches. PS I once had the idea to create my own programming language, but it turned out to be a rather complicated process that I haven’t mastered yet.


    Impact of previous experience


    This article inspired me to write an article . The author came up with his own programming language, and this language, with its syntax and peculiarities, turned out to be suspiciously similar to Free Pascal, in which the VM implementation for the language was written. And this is no coincidence. Programming languages ​​in which we previously wrote, drive thinking into the framework of the language. We ourselves may not notice this, but an outsider with a different experience may advise something unexpected or learn something new.


    The framework of thinking moves apart a bit after mastering several languages. Then in language A, you may want to have a feature from B and vice versa, and there will be an awareness of the strengths and weaknesses of each language.


    For example, when I tried to invent and create my own language, my thoughts were completely different from those described in the article above. I thought about completely different things within completely different terms. Below I will describe the features of the language that I would like to see in the "ideal" programming language.


    My experience: I once started with Pascal, later I became acquainted with Java, Kotlin, C ++, Python, Scheme, and I consider Scala to be the main language. As in the above case, my "perfect" language has a lot in common with Scala. At least, I am aware of this similarity)


    The effect of syntax on code style


    "You can write in Fortran in any language"


    It would seem that in any programming language you can express almost any idea and the syntax of the language is not important. But a typical program is written in as simple and short a way as possible, and some possibilities of the language may prevail over others. Examples with the code (I did not check them for correctness, it is just a demonstration of the idea)


    Python:


    filtered_lst = [elem for elem in lst if elem.y > 2]
    filtered_lst = list(filter(lambda elem: elem.y > 2, lst))

    In python, a long, heavyweight declaration of anonymous functions. It is more convenient to write as in the first line, although the second option seems to be more algorithmically more beautiful.


    Scala:


    val filteredLst = lst.filter(_.y > 2)

    IMHO, this is close to the ideal. Nothing extra. If in python it was possible to declare lambdas in a shorter way, at least it => it.y > 2, the list generators would not be very necessary.


    The most interesting thing is that the approach, like in a rock, is well scaled into a chain of type calls. lst.map(_.x).filter(_>0).distinct()We read and write code from left to right, elements go along the chain of transformations, too, from left to right, which is convenient and organic. In addition, the development environment by the code on the left can give adequate hints.


    In python in line, the [elem for elem indevelopment environment to the last does not suspect what type the element has. Large constructions have to be read from right to left, which is why these biggest constructions in python are usually not written.


    ... = set(filter(lambda it: it > 2, map(lambda it: it.x, lst))))

    This is terrible!


    An approach with lst.filter (...). Map (...) in python could exist, but it is killed in the bud by dynamic typing and non-ideal support for development environments that are not always aware of the type of variable. And to suggest that in numpy there is a max function - always please. Therefore, the design of most libraries means not an object with a bunch of methods, but a primitive object and a bunch of functions that accept it and do something.


    Another example, already in java:


    int x = func();
    finalint x = func();

    the constant version is stricter, but it takes up more space, is worse readable and is used far from wherever it could. In Rust, developers deliberately made the declaration of a variable longer so that programmers use constants more often.


    let x = 1;
    letmut x = 1;

    It turns out that the syntax of the language is really important, and should be as simple and concise as possible. Language should be initially created under often used features. An anti-example is C ++, where, for historical reasons, the class definition is scattered over a pair of files, and declaring a simple function may not be able to line up with the words of the type template, typename, inline, virtual, override, const, constexprand not less “short” descriptions of the types of arguments.


    Static typing


    Perhaps, if you look at the static typing in C, you can say that it does not describe the whole variety of relationships between objects and their types, and only dynamic typing will save us.


    But it is not. There is no escape from types, and if we do not indicate them, this does not mean that they are not there. If you write code without a rigid framework, then it becomes very easy to produce errors and it is difficult to prove the correctness of what is happening. With the growth of the program, this is aggravated, and something big, I think, is not written in dynamic languages.


    There are powerful type systems that allow you to create flexible pieces while maintaining severity. In addition, no one forbids in static typing to create an island with the help of the same hashmap or something else.


    And if in the static program an island of dynamic objects can be made without problems, then on the contrary it will not work. Third-party code is written without any consideration for the types, and it is not always possible to correctly describe this chaos with existing python tools. (For example, if the function receives some tricky parameter, it will return a different type than without this parameter)


    So static typing is a must have for modern languages. Of the advantages of static typing, it is worth noting the greater rigor of the program, the detection of errors at the compilation stage, as well as more room for code optimization and analysis. Of the minuses - types sometimes need to be written, but auto-typing reduces this problem.


    Unit, void and function differences from the procedure


    In Pascal / delphi there is a division into procedures (not returning values) and functions (something returning). But no one forbids us to call a function, and the return value is not used. Hm So what is the difference between a function and a procedure? Yes, nothing, it is the inertia of thinking. Peculiar Legacy, crawling in Java, C ++ and a bunch of languages. You say: "there is void!" But the problem is that the void in them is not exactly the type, and if you climb into the templates or generics, this difference becomes noticeable. For example, in Java it is HashSet<T>implemented as HashMap <T, Boolean>. Type boolean is just a stub, a crutch. It is not needed there, a value is not required in HashMap to say that there is no key. In C / C ++ there are also nuances with sizeof (void).


    So, in an ideal language there should be a type Unit, which occupies 0 bytes and takes only one value (no matter what, it is one, and if you have a Unit, then this is it). This type should be a full-fledged type, and then the compiler will become easier, and the design of the language more beautiful and more logical. In an ideal language, it will be possible to implement HashSet<T>both HashMap<T, Unit>and not have any overhead to store unnecessary objects.


    Tuples


    We still have some historical legacy, probably from mathematics. Functions can take many values, and return only one. What kind of asymmetry ?! This is done in most languages, which leads to the following problems:


    • Functions with a variable number of arguments require special syntax - the language becomes more complicated. Making a universal proxy function becomes more difficult.
    • To return multiple values ​​at once, you have to declare a special structure or pass variable arguments by reference. It is not comfortable.

    The funny thing is that from the point of view of hardware, there are no limitations - just as we decompose arguments by register or stack, we can do the same with the returned values.


    There are some steps towards the type std::tuplein C ++, but it seems to me that this should not be in the standard library, but exist directly in the type system of the language and be written, for example, as (T1, T2). (by the way, you can look at the type Unit as a tuple without elements). The signature of the function should be described as T => U, where T and U are some types. Perhaps, someone from them Unit, perhaps, a tuple. Frankly, I am surprised that in most languages ​​it is not. Apparently, the inertia of thinking.


    Since we can return Union, we can completely abandon the division of expression / instruction and make sure that in the language any construction returns something. This is already implemented in relatively young languages ​​like scala / kotlin / rust - and this is convenient.


    val a = 10 * 24 * 60 * 60val b = {
        val secondsInDay = 24 * 60 * 60val daysCount = 10
        daysCount * secondsInDay
    }

    Enums, Union and Tagged Union


    This feature is more high-level, but it seems to me that it is also needed so that programmers do not suffer from errors with null pointers or the return of type pairs значение, ошибкаlike in go.


    First, the language must support a lightweight declaration of enum types. It is desirable that in rantayme they turn into ordinary numbers and there is no extra load from them. And then it turns out every pain and sadness, when some functions return 0 on successful completion or an error code, while other functions return true (1) on success or false (0) on a file. On the right way. The enum type declaration must be how short that the programmer can write directly to the function signature that the function returns something from success | failor ok|failReason1 | failReason2.


    In addition, the enumeration types that can contain values ​​are very convenient. For example, ok | error(code)or Pointer[MyAwesomeClass] |nullThis approach will avoid heaps of errors in the code.


    In general terms, this can be called sum-types. They contain one of several values. The difference between Union and Tagged Union is what we will do in cases of matching types, for example int | int. From the point of view of a simple Union int | int == int, since we have int in any case. In general, with the union in b and so it turns out. In the case of a int | inttagged union, it also contains information about which int we have — the first or second.


    Small retreat


    Generally speaking, if we take tuples and types-sums (Union), then we can draw an analogy between them and arithmetic operations.


    List(x) = Unit | (x, List(x))

    Well, almost like lists in Lisp.
    If we replace type-sum with addition (for good reason it is so called), interpret the tuple as a product, we get:


    f(x) = 1 + x * f(x)

    Well, or in other words, f(x) = 1 + x + x*x + x*x*x + ...but from the point of view of types of works (tuples) and types of sums, it looks like


    List(x) = Unit | (x, Unit) | (x, (x, Unit)) | ...  = Unit | x | (x, x) | (x, x, x) | ...

    A list of type x = is an empty list, or one element x, or a tuple of two, or ...


    It can be said that the (x, Unit) == xanalogy in the world of numbers will be x * 1 = x, it (x, (x, (x, Unit)))can also be turned into (x, x, x).


    Unfortunately, it follows from this that theorems for ordinary numbers can be expressed in a language of types, and, like theorems that are not always easily proved (if they are proved at all), the relationship of types can also be quite complex. Perhaps that is why in real languages ​​such possibilities are severely limited. However, this is not an insurmountable obstacle - for example, the template language in C ++ is Turing-complete, which does not prevent the compiler from digesting adequately written code.


    In short, types of sums in a language are needed, and they are needed right in the type system of a language in order to combine normally with types of works (tuples). There will be a lot of room for type conversions.(A, B | C) == (A, B) | (A, C)


    Constants


    It may sound unexpected, but immutability can be understood in different ways. I see as many as four degrees of variability.


    1. variable variable
    2. a variable that "we" cannot change, but in general it is changeable (for example, a container is passed to a function via a constant link)
    3. variable that was initialized and it will not change anymore.
    4. constant that can be found right at compile time.

    The difference between points 2 and 3 is not quite obvious, I will give an example: for example, in C ++ we were passed a pointer to a constant memory to the object. If we save this pointer somewhere inside the class, then we have no guarantees that the contents of the memory under the pointer will not change during the life of the object.
    In some cases, we need exactly the third type of immutability - for example, when reading an object from several streams or when calculating something based on the properties of the resulting object. It is the third type of immutability that will allow the compiler to perform some clever optimizations. The usage example is the final field in java.


    Personally, it seems to me that the nuances of 1-2 types of variability can be solved using interfaces and missing getters / setters. For example, we have an immutable object that contains a pointer to variable memory. It is possible that we will want to have several "interfaces" for using an object - and one that will not allow to change only an object and one that, for example, will close access to external memory.
    (As you might guess, jvm languages ​​have affected me in which there is no const)


    Computations performed at compile time is also a very interesting topic. In my opinion, the most beautiful approach is used in D. Something like is written static value = func(42);and the most usual function is explicitly computed during compilation.


    Kotlina fishes


    If someone used the gradle, then perhaps when looking at the broken build files you were visited by the thought like "wtf? What should I do?"


    android {
        compileSdkVersion 28
    }

    This is just Groovy code. The android object simply accepts a closure { compileSdkVersion 28}, and somewhere in the wilds of the android plug-in, this closure assigns an object in the context of which our closure will actually run. The problem here is the dynamism of the groovy language - the development environment is unaware of which fields and methods are possible in our closure and cannot highlight errors.


    So, in the cotlin there are cunning types, and it could be implemented somehow


    class UnderlyingAndroid(){
         compileSdkVersion: Int = 42
    }
    fun android(func: UndelyingAndroid.() -> Unit) ....

    We are already in the function signature saying that we are accepting something that works with the fields / methods of the UnderlyingAndroid class, and the development environment will immediately highlight errors.


    We can say that this is all syntactic sugar and instead write like this:


    android { it =>
        it.compileSdkVersion = 28
    }

    but it's ugly! And if we enclose each other several such structures? The approach in the Kotlin + static types allow you to make very concise and convenient DSL. Hopefully, sooner or later, the whole gradle will be rewritten to Kotlin, usability will grow significantly. I would like to have such a feature in my language, although it is not critical.


    Similar to extension methods. Syntactic sugar, but quite convenient. It is not necessary to be the author of the class to add another method to it. And they can also be invested in the scope of something and thus not cluttering the global area. Another interesting application is that you can hang these methods on existing collections. For example, if a collection contains objects of type T that support addition with themselves, then you can add the collection method sum, which will be only if T allows it.


    Call-by-name semantics


    This is again syntactic sugar, but it is convenient, and in addition allows you to write lazy calculations. For example, in the type code, the map.getOrElse(key, new Smth())second argument is taken not by value, and therefore a new object will be created only if there is no key in the table. Similarly, type functions assert(cond, makeLogMessage())look much nicer and more comfortable.


    In addition, no one forces the compiler to do exactly an anonymous function - for example, you can invalidate the assert function and then it will turn into just if (cond) { log(makeLogMessage())}that, which is also not bad.


    I will not say that this is a must have feature of the language, but it clearly deserves attention.


    Co-counter-in-non-variance of template parameters


    All this is needed. "Input" types can be made wider, "output" types can be narrowed, with some types nothing can be done, and some can be ignored. IMHO, in modern language you need to have support for this directly in the type system.


    Explicit Implicit Conversions


    On the one hand, implicit conversions from one type to another can lead to errors. On the other hand, the experience of the same potlin shows that writing explicit conversions turns out to be rather dull. IMHO, ideally, the language should allow explicitly allow implicit conversions so that they are used consciously and only where necessary. For example, the same conversion from int to long.


    Where to store object types?


    They can not store at all. For example, Сall types are known at compile time and at run time we have this information no longer. It can be stored together with the object (this is done in languages ​​with virtual machines, as well as for virtual classes in C ++). Personally, the third approach seems more interesting to me when the type (pointer to the method plate) is stored directly in the pointer.


    Values, references, pointers


    The language should hide from the programmer implementation details. In C ++, there are problems when writing templates, as Tthe template can be some kind of unexpected type. This can be a value, a pointer, a link, or an rvalue link, some are spiced with the word const. I can not say how to do it, but I can definitely see how to do it. Something close to ideal for convenience is in Scala and Kotlin, where primitive types "pretend" to be objects, so everything with which we work looks monotonous and does not load the brain of the programmer and the syntax of the language.


    Minimum of entities


    This is what I do not like. C # - a lot of things were dragged into the language, it all somehow strangely combined and increases the complexity of the language. (I can be very wrong in the details, since I wrote in C # a long time ago and only under Unity) For example, there are class fields, properties and methods. 3 entities! They are not very compatible with each other, you can declare several methods with the same name, but with a different signature, but for some reason you cannot declare a property with the same name. Or if the interface requires that it be properti, then you cannot simply declare a field in the class - it must be properti.


    In kotlin / scala it is better done - all fields are private, from the outside they are used through generated getters and setters. Technically, they are simply methods with special names, and can be overridden at any time. All, no distortions.


    Another example is the word inline in C ++ / Kotlin. Do not drag it into the tongue! Both here and there the word inline changes the logic of compilation and code execution, people start to write inline not for the sake of inline itself, but for the sake of opportunities to write a function in a header (C ++) or make a tricky return from a nested function as a caller (kotlin). Then forced_inline__, noinline, crossinline appear in the language, affecting some nuances and further complicating the language. It seems to me that the language should be as flexible and simple as possible, and the same inline can be annotations that do not affect the logic of code execution and only help the compiler.


    Macros


    The language must have macros that take the syntax tree as input and transform it. In the case of dull repetitive code, macros can save from errors and make the code several times shorter. Unfortunately, quite serious languages ​​like C ++ also have a complex syntax with a bunch of nuances, which is probably why normal macros have not yet appeared there. In languages ​​such as lisp and scheme, where the program itself is suspiciously similar to the list, writing macros does not cause major problems.


    Functions inside functions


    The flat structure is sad. If something is used only in one or two places, then why not do it as locally as possible - for example, allow declaring some local functions or classes inside functions. The same is convenient: the namespace is not clogged, when removing a function code, its "internal" details are also deleted.


    Substructural type system


    It is possible to implement a type system for which restrictions are imposed. For example, a variable can be used only once or, for example, no more than one.
    Why is this useful? Move-semantics and the idea of ​​ownership are based on the fact that you can only give ownership of an object once. In addition, any objects with an internal state may imply a certain scenario of work. For example, we first open the file, read something / write, and then close it back. Now the state of the file lies on the programmer’s conscience, although actions with it (theoretically) can be pushed into the type system and get rid of some of the errors.


    Some private applications such as possession of objects are needed now, some will become popular when it appears in mainstream languages.


    Dependent types


    On the one hand, they look very interesting and promising, but I have no idea how to implement them. Of course, I would like to have clever types, for example, a list of at least one element or a number that is divisible by 5, but not divisible by 3, but I have little idea how this can be proved in a fairly complex program.


    Assembly


    First, the language must be able to work without the standard library. Secondly, the library should consist of separate pieces (perhaps with dependencies between them), so that if desired, only a part of them can be included. Thirdly, in the modern language there should be a convenient assembly system (in C ++ pain and sadness).
    Functions, variables and classes should be used to describe the course of the calculation; this is not something that should be stuffed into the binary. For export to the outside, you can somehow annotate the necessary pieces, but everything else should be given to the compiler and let it dominate the code as it wants.


    findings


    So, in my opinion, in an ideal programming language should be:


    • powerful type system supporting from the very beginning
      • union types and tuples
      • restrictions on template parameters and their relationships with each other
      • perhaps an exotic type of linear or even dependent types.
    • convenient syntax
      • laconic
      • having to write in declarative style and use constants.
      • unified for types by value and by pointer (link)
      • as simple and flexible as possible
      • taking into account the possibilities of ide and designed for convenient interaction with it.
    • a bit of syntax sugar like extension-methods and lazy function arguments.
    • the ability to transfer part of the calculations to the compilation stage
    • macros working directly with AST
    • Convenient companion type tools

    I did not consider ambiguous features such as the presence / absence of a garbage collector, because in life languages ​​are needed with and without the garbage collector.


    I'm not sure that all my Wishlist described is well combined with each other. I tried to describe the vision of a programming language in which I would like to write code.


    Most likely, your views are different - it will be interesting to read the comments.


    Also popular now: