Reliable programming in the context of languages ​​- nubobzor. Part 1

    Having once again spent two days writing and debugging only four hundred lines of system library code, the thought arose - “it would be good if programs were written in a less painful way”.

    And first of all, since debugging takes much more time than writing code - you need protection against a fool (including yourself) at the writing stage. And I would like to get this from the programming language (PL) used.

    Of course, we must invent a new, the best YP!
    No, first we will try to express our wishes and look at what we have already invented.

    So, what would you like to receive:

    • Resistance to human error, the elimination of ambiguities when compiling
    • Input Resistance
    • Resistance to damage to the program or data - media failure, hacking
    • At the same time, everyone has more or less tolerated syntax and functionality.

    The area of ​​application is machinery, transport, industrial control systems, IoT, embedded phones including.

    It is hardly necessary for the Web, it is built (for now) on the principle of “threw and restarted” (fire and forget).

    You can quickly come to the conclusion that the language should be compiled (at least Pi-compiled) so that all checks are performed as much as possible at the compilation stage without VS (Versus, hereafter negative opposition) “oh, this object does not have such a property” in runtime. Even the scripting of interface descriptions already leads to the necessity of full coverage of such scripts with tests.

    Personally, I don’t want to pay with my own resources (including money for faster, more expensive hardware) for interpretation, therefore, it is desirable to have a minimum of JIT compilation.

    So, the requirements.

    Human error tolerance


    Diligently looking through the Talmuds from PVS-Studio, I found out that the most common mistakes are typos and unfixed copy-paste.

    I will add a little more incidents from my own experience and encountered in various literature as negative examples. Additionally, I updated the MISRA C rules in memory.

    After a little thought, I came to the conclusion that the linters applied after the fact suffer from the “survivor's mistake”, since the old projects had already fixed serious errors.

    a) We get rid of similar names

    - a strict check of the visibility of variables and functions should be carried out. When sealed, you can use an identifier from a more general scope, instead of
    - use case-insensitive names. (VS) “Let's call a function as a variable, only in Camelcases” and then compare it with something - you can do it in C (get the address of the function, which is quite a number)
    - the names with a difference of 1 letter should trigger a warning ( , you can select in IDE) but a very common mistake is copy-paste .x, .y, .w, .h.
    - we don’t allow different entities to be named the same way - if there is a constant with that name, then there should not be a variable of the same name or a type name
    - it’s highly desirable to check the name of all project modules - it’s easy to confuse, especially if different people write different modules

    b) Once I mentioned - there should be modularity and preferably hierarchical - VS project from 12000 files in one directory is a search hell.
    Another modularity is required for descriptions of data exchange structures between different parts (modules, programs) of the same project. VS I encountered an error due to the different alignment of numbers in the exchange structure in the receiver and transmitter.

    - Exclude the possibility of duplicate linking (layout).

    c) Ambiguities
    - There must be a certain order of function calls. When writing X = funcA () + fB () or Fn (funcA (), fB (), callC ()) - it should be understood that the person expects to receive calculations in the written order, (VS) and not how the optimizer thought of himself.
    - Exclude similar operators. Not like in C: + ++, <<<, | ||, & &&, = ==
    - It is advisable to have few clear statements and with obvious priority. Hello from the ternary operator.
    - Redefinition of operators is rather harmful. You write i: = 2, but (VS) in fact, this causes an implicit creation of an object for which there is not enough memory, and the disk crashes when swapping and your satellite falls to Mars :-(

    In fact, from personal experience, I watched the flight to the string ConnectionString = “DSN”, this turned out to be the setter that opened the database (and the server was not visible on the network).

    - Initialization of all variables with default values ​​is needed.
    - Also, the OOP approach saves from forgetting the reassignment of all fields in an object in some new cell functions.
    - The type system must be safe - control of the dimensions of the objects to be assigned is needed - protection against memory overwriting, arithmetic overflow like 65535 + 1, loss of accuracy and significance when casting, excluding the comparison is incomparable - the whole 2 is not equal to 2.0 in the general case.

    And even a typical division by 0 can give a well-defined + INF, instead of an error - you need a precise definition of the result.

    Input Resistance


    - The program should work on any input data and preferably, approximately the same time. (VS) Hello Android with a response to the button tube from 0.2s to 5s; It's good that not Android manages automotive ABS.

    For example, the program must correctly process both 1Kb of data and 1TB, without having exhausted the resources of the system.

    - It is highly desirable to have reliable and unambiguous error handling in PL RAII that does not lead to side effects (resource leaks, for example). (VS) A very fun thing - a handles leak, it can manifest itself in many months.
    - It would be nice to protect against stack overflow - recursion is prohibited.
    - The problem of exceeding the available volume of the required memory, uncontrolled growth of consumption due to fragmentation during dynamic allocation / release. If the language has runtime dependent on the heap, the case is most likely bad - hello STL and Phobos. (VS) There was a story with the old C-runtime from Microsoft, which inadequately returned the memory to the system, because of what msbackup fell on large volumes (for that time).
    - We need good and safe work with strings - not limited by resources. It strongly depends on the implementation (Immuntable, COW, R / W arrays)
    - Excess system response time, independent of the programmer. This is a typical garbage collector problem. Although they save from some programming errors - they are introduced by others - poorly diagnosable.
    - In a certain class of tasks, it turns out, you can do without the dynamic memory at all, or by selecting it once at the start.
    - Monitor the output of the array, and it is quite possible to write a runtime warning and ignore. Very often these are noncritical errors.
    - To have protection against calls to an uninitialized program memory, including to the null-area, and to another address space.
    - Interpreters, JIT - extra layers reduce reliability, there are problems with garbage collection (a very complex subsystem - will introduce its own mistakes), and with guaranteed response time. We exclude, but in principle there is Java Micro Edition (where so much is cut off from Java that only I am left, there was an interesting article dernasherbrezon(sorry shot) and the .NET Micro Framework with C #.

    However, under consideration, these options have disappeared:

    • .NET Micro turned out to be the usual interpreter (deleted by speed);
    • Java Micro is only suitable for deployed applications, as it is too castrated by the API, and you have to switch to at least SE Embedded, which has already been closed or regular Java, too monstrous and unpredictable by reaction.
      However, there are still options , and although it does not look like a blank for a workable foundation, it can be compared with other languages, even obsolete or with certain disadvantages.


    - Resistance to multi-threaded operation - protection of private data flow, and mechanisms for the guaranteed exchange between threads. A program with 200 threads may not work at all like two.
    - Contract programming plus built-in unit tests also help you sleep a lot.

    Resistance to damage to the program or data - media failure, hacking


    - The program must be fully loaded into memory - without loading modules, especially remotely.
    - We clear memory at release (and not just allocation)
    - Control overflow of a stack, areas of variables, especially strings.
    - Restart after failure.

    By the way, the approach when rantaym has its own logging, and not only shows that the arctic fox and spectra are very appealing to me.

    Languages ​​- and the correspondence table


    At first glance, for analysis, we take specially designed safe PLs:

    1. Active oberon
    2. Ada
    3. BetterC (dlang subset)
    4. IEC 61131-3 ST
    5. Safe-c

    And go through them in terms of the above criteria.

    But this is already a volume for a continuation article, if karma permits.

    With the allocation of the aforementioned factors into the table, well, and possibly, something else will be drawn from the comments.

    As for other interesting languages ​​- C ++, Crystal, Go, Delphi, Nim, Red, Rust, Zig (add to taste), I’ll leave those who want to fill in the correspondence table for them.

    Disclaimers:

    • In principle, if the program, say on Python, consumes 30Mb, and the response requirements are seconds, and the microcomputer has 600 MB of free memory and 600 MHz percent - then why not? Only such a reliable program will be with some probability (albeit 96%), no more.
    • In addition, the language should try to be convenient for the programmer - otherwise no one will use it. Such articles “I came up with an ideal programming language so that I and only I could write comfortably” are not uncommon on Habré too, but this is about something else.

    Also popular now: