defanor July 7, 2014 at 15:26

Idris language overview

From the sandbox

Agda is too mainstream!

"Foresight"

There are practically no materials about the Idris language in Russian, I will try to fix it with a brief overview. I will also try to make the text understandable for beginners, non-functional and unfamiliar with dependent types of programmers, and therefore those who are familiar with such languages may find it easier to rewind to the end or immediately read the official documentation. I do not presume to write a serious introduction to language and theory, but I hope to show what it is all about.

So, Idris is a pure functional general-purpose programming language with dependent types.

Why is this?

For a brief answer, Dijkstra's famous quote is appropriate; although the approach has changed since then, the goal remains the same:

The one who thinks that the products of programmers are the programs they write is deeply mistaken. The programmer is obliged to create credible decisions and present them in the form of convincing arguments, and the text of the written program is only supporting material to which this evidence applies.

To be more precise, there can be several reasons for creating and using it: the author, Edwin Brady, uses it to study application programming with dependent types; many, like me, like the ability to make accurate program specifications; I also hope to significantly reduce the debugging time of application programs through formal verification thereof. And here the second quote begs:

Testing a program can very effectively demonstrate the presence of errors, but is hopelessly inadequate to demonstrate their absence.

Static typing and dependent types

One of the ways to classify the typification of programming languages is the separation of static and dynamic. With dynamic typing, types are unknown at compile time, and therefore cannot be checked; this helps to write programs faster, but at the same time it adds a class of typification errors that are detected during program execution, which is usually not very pleasant. Idris uses static typing, so what follows will be discussed only about it; but in order not to frighten people who are accustomed to dynamic typing, I immediately notice that type inference is also used , i.e. types can be omitted explicitly if they are obvious to the compiler from the context.

Typing can be limited to several primitive types - for example, strings, numbers, floating point numbers; then all that we can “say” about the type of some value (for example, a function argument, or the value returned by it) is “string” , “number” , “floating point number” . But usually this is complemented by something like structures: simple C-like structures, classes, (generalized) algebraic data types, etc., and at the same time synonymous with types ( typedefin C, typein Haskell), which already allows you to say, for example, “ date ” , having previously defined the structure, class or type“ date ”.

You can add this with the phrase "pointer to", which at the same time allows you to use arrays, but in higher-level languages, generalized programming (templates in C ++, generics in Java) is more often supported , which allows you to build phrases like “date list” , “road graph” , “event tree” to describe types . In other words, types can be parameterized by other types.

And finally, you can parameterize types with values, which makes it possible to say “a list of five lines” , “a message signed by Vasily” ; types of functions can then take the form “a function that receives a message signed by some user and returns the data of this user”, instead of the more common "function that receives a message and returns user data" , which leads to much more accurate specifications. Moreover, this makes it possible, so to speak, to switch from subject to predicate, to use the principle of “utterances as types”: “5 is greater than 3” , “it is not true that 5 is greater than 3” , “7 is a prime number” , “Vasily received 2 messages . " These are the dependent types that are implemented in Idris, as well as in several other languages (Agda, Coq, Epigram, etc.), but Idris is distinguished from them by its desire to be a modern applied language.

Also, under conditions of total functional programming, it allows using the correspondence between computer programs and mathematical proofs ( Curry – Howard correspondence ): the type of function can be, for example, “A is less than B, B is less than C, so A is less than C” or “the message has been sent to some user, that means this user has access to this message ” ; then the body of the function becomes the proof, which simply must return the value of the given type, and the proof check is the type check.

Total functional programming

As said, Idris is a pure functional language . But the optional requirement of the totality property of functions is also supported, which means two properties:

Certainty is everywhere: a function must be defined for any input. In pure languages, the absence of this property is almost the only opportunity for the program to “crash” (others are IO, in particular, killing a process by the system due to some of its actions, and possible compiler bugs).
Strict normalization: during recursive function calls, at least one of its arguments must strictly decrease (structurally), or the function must be productive at each iteration - i.e. return some part of the result, and the promise to calculate the next iteration. In the first case, this guarantees the completion of the function, bypassing the stopping problem, and in the second, the ability to read the final prefix of any length in a finite time.

This is important, first of all, for building evidence, but the correctness of ordinary programs is beneficial, judging by my observations. By the way, by the presence of this property it is guaranteed that the result of lazy and greedy calculations of this function will be the same, and the calculations in Idris are greedy by default, but any argument can be made lazy.

You can enable this option for individual functions, for files, or with the flag of the compiler.

Examples

The unusual syntax seems to distract attention from the essence of what is being described, and therefore I will not describe the syntax; code examples are provided for people familiar with the languages of the ML family, and for the rest, text descriptions are given before these examples.

Division

The operation, familiar to everyone, which is available in almost all programming languages (or in their standard libraries) is division. But dividing by 0 is forbidden, which leads to various controversial decisions:

To return something special, like Infinity: then we must also determine the rules for the operation of all other functions / operations with Infinity, which complicates the arithmetic as a whole, but, for example, the simple property a / b = c → b * c = a will be lost , and further, an attempt to use it will easily lead to results other than expected.
To give rise to an exception: this not only leads to disruption of the normal program flow, at the same time again creating the ground for bugs with uncaught exceptions, but also significantly complicates the discussion (in particular, formal) about functions, if it does not make it impossible.
Wrap the result in Maybe , returning Nothing when trying to divide by 0: something better than the previous two, except that the function itself will not encourage the programmer to write the correct programs, and the formulas can become less accurate and / or overly saturated with category theory, simultaneously complicating discussions about functions using such a division implementation.

In languages with dependent types, and in Idris in particular, another option appears: to prohibit division by 0. In other words, division by 0 is prohibited - and we just make it impossible; the type of the function is then described as "a function that takes two numbers, the second of which is not 0, and returns a number." If desired, you can supplement this specification with a property that speaks of the connection of the return value with the arguments of the function, i.e. "... such that ...", but it’s easier to leave this for separate theorems, but for now we return to the description of the type of the division function: to the usual two arguments, we need to add a third one - proof that the second argument is not zero.

Now we will consider three cases of using division (recall that we are talking about compilation time):

The divisor is known: the proof is trivial because the equivalence of natural numbers is decidable (in other words, the compiler sees for himself whether the divisor is equal to zero or not), and therefore we can, if desired, wrap such a division function in a function with the usual two arguments, substituting the same simple third argument.
The divisor is unknown, but something is known about it that allows us to make sure that it is not equal to zero: for example, that it is greater than some non-negative number. Then we need to either write the proof in place, or use a function that takes what we have (proof that the number is greater than some non-negative number) and returns what we need (proof that this number is not equal to zero). Such functions can be called simply functions, or lemmas or theorems, but, as already mentioned, in this context it is one and the same.
We ourselves are not sure that the divisor will not be equal to zero (after careful consideration of the situation in an attempt to prove inequality, the previous case can go into this one): only in this case we really need to introduce a check before division; if the divisor is zero, then use an alternative solution, otherwise, use the information received and perform the division.

This approach, of course, is not necessary in Idris, but it makes the programmer make sure that he performs the calculations correctly.

And finally, the code example is the framework for the division function:

-- (total) функция деления для натуральных чисел
-- "==" - функция проверки разрешимой эквивалентности, возвращающая Bool
-- "=" - эквивалентность высказываний, тип
total div : (n, m: Nat) -> ((m == Z) = False) -> Nat
-- деление на 0 - невозможно; просто показываем это компилятору
-- с тем же успехом, впрочем, можно и вернуть любое число
div n Z f = FalseElim $ trueNotFalse f
-- деление на положительное число
div n (S k) f = {-todo-}?div_rhs

Since this is just a wireframe, the meta variable div_rhs,; people familiar with Agda already know the “holes”, the weaker versions of which are meta variables in Idris, and for the rest I’ll explain: Idris allows you to see its context (variables that are visible from its position) and the purpose (which should be constructed / returned from this position), which greatly facilitates the writing of both evidence and programs. This is how it looks in this case:

  n : Nat
  k : Nat
  f : S k == 0 = False
--------------------------------------
div_rhs : Nat

It is also possible to fill them semi-automatically: having the goal to return Nat, you can use the (REPL / IDE) "refine" command, passing it, for example, the name of the function plus, and if the function plusis able to return a value of the required type, it will be substituted for this meta -variable, and for its arguments two new meta will be substituted. The process is the same as that used for another function - a fully automatic replacement of meta variables, i.e. Search for evidence: in some cases, the required evidence can be found automatically.

Types (slightly edited for readability):

λΠ> :t div
div : Nat -> (m : Nat) -> (m == 0 = False) -> Nat

λΠ> :t div 1 2
div 1 2 : (2 == 0 = False) -> Nat

λΠ> :t div 1 2 refl
div 1 2 refl : Nat

λΠ> :t div 1 0 refl
(ошибка, программа с подобным вызовом не пройдёт проверку типов)

Now let's digress from the implementation of the division function and see what this approach gives us. Let's start with the aforementioned correctness: the function works exactly as it should. We can think of a function as we usually think of what it implements (arithmetic, in this case: we cannot divide by zero, but in the process of division, no concepts like exceptions are involved, and the result is a number). We can talk about the function in the same way as about the concept that it represents, including formally, i.e. writing down reasoning with code. In functions using it, reasoning and conclusions can now be intertwined with calculations and be checked, rather than remain in the head of the programmer or text comments. And the program becomes a construct from the correct reasoning and decisions, and not just a recipe for calculations.

Lists

Working with lists (or arrays) is another common task that leads to about the same set of controversial decisions and subsequent errors. In particular, we consider a function that returns the nth element of a list: the problem with its implementation is that we can request an element from a position that is not in the list. The usual solutions are to fall with an “access violation” or “segmentation fault”, throw an exception (and, possibly, then fall with it), use Maybe.

In languages with dependent types, a couple more possible solutions appear. The first is similar to solving the problem of division by zero, i.e. the last argument to the function is the proof that the requested position is less than the length of the list.

Here you can see that various numbers, lists, "if-then-else", etc. defined in the language itself. Natural numbers implemented in the library are replaced with GMP numbers when compiled in C, and functions such as addition and multiplication are replaced with the corresponding GMP functions, which gives quick calculations in compiled programs and the ability to talk about them. Now take a look at the code:

data Nat = Z | S Nat
data List a = Nil | (::) a (List a)
length : List a -> Nat
length []      = 0
length (x::xs) = 1 + length xs
index : (n : Nat) -> (l : List a) -> (ok : lt n (length l) = True) -> a
index Z     (x::xs) p    = x
index (S n) (x::xs) p    = index n xs ?indexTailProof
index _     []      refl   impossible
indexTailProof = proof {
  intros;
  rewrite sym p;
  trivial;
}

indexTailProof- proof using tactics, in the Coq style, but you can (and I like something more) write them in the Agda style, i.e. more familiar features (they say it’s true that tactics will soon be in Agda too).

The second solution is to use not just a list, but a list of fixed length: a vector, as it is called in the context of languages with dependent types. An example was mentioned above: this is what allows you to say "a list of five lines." The so-called final type - that is, a type whose number of different values is finite: we can consider it as the same type of natural numbers, but with an upper bound. This type is convenient in particular in that it is easier to say “a number less than n” with it than with natural numbers, where it is more likely “a number and it is less than n”. The type of the function for obtaining the nth element of the vector is then read as “a function that takes a number less than n and a list of n elements of type a and returns an element of type a”.

In this case, the task is not to provide proof, but to construct a number of the desired type (i.e., less than n). By the way, the logic in the proofs here differs from the classical (is intuitionistic), and consists in constructing the proofs, that is, the proof of the statement / type is any value of this type, so that some similarities with the previous approach can be traced, but at the same time, the approach to writing the program as a whole, it changes.

The code:

data Fin : (n : Nat) -> Type where
    fZ : Fin (S k)
    fS : Fin k -> Fin (S k)
data Vect : Nat -> Type -> Type where
  Nil  : Vect Z a
  (::) : (x : a) -> (xs : Vect n a) -> Vect (S n) a
index : Fin n -> Vect n a -> a
index fZ     (x::xs) = x
index (fS k) (x::xs) = index k xs
-- запросить что-либо из пустого вектора невозможно, в данном случае
-- компилятор это "видит" сам
index fZ     [] impossible

So we go even further from clear evidence, but we build, however, the correct program.

Well, since we got to the vectors, here is the function of combining them, often given as an example to demonstrate dependent types:

(++) : Vect m a -> Vect n a -> Vect (m + n) a
(++) []      ys = ys
(++) (x::xs) ys = x :: xs ++ ys

The body is the same as for regular lists, but the type is more interesting: pay attention to m + n.

More examples

The above are examples of individual small functions that Idris helps to make correct, but here are a few larger things offhand:

Protocols: there is a Protocols library that allows you to describe protocols using DSL, and then implement them; implementation mismatch with protocol description will result in an error.
Physics and games : not only is the calculation correct, but it is also useful to prove theorems after writing the main programs.
The laws of functors, applicative functors and monads can be included in the code , in the time classes themselves.
Cryptography : An obvious example of what is important.
Parsing and compiling text / binary formats: it can be formally proved that any composed lines are parsed into the source data, and the parsed data is compiled into the source lines. That's what I chose as the first project on Idris.
Access rights systems, databases, device drivers - Idris has not seen implementations yet, but also things that I want to see reliable.

Both the web framework and the library for interacting with the DOM are written on it (by the way, at the time of writing, the compiler supports compilation in C, LLVM, Java and JavaScript), and all sorts of parsers ( one , two ), and it is aimed at system programming. Corresponding programs from Haskell to Idris is quite simple.

Idris is not yet used in production - at least widely (I heard that it is actively used at the Potsdam Institute for Climate Impact Research, but this may not be exactly what is usually understood by production). The main technical obstacle is “non-roll-in”, which should be decided by “rolling-in”.

Conclusion

The language is new, and therefore “raw,” but modern, using modern theory and the practice accumulated by other languages.

Allowing ourselves to dream a little about the future use of Idris (and the languages that will appear after it), one can imagine, in addition to a general increase in the reliability of programs, technical tasks accompanied by formal specifications or even consisting of them; growth of services like Proof Market ; libraries of physical, mathematical, and any other theories and languages like Lojban; articles and books as lists of formal specifications of proven theorems; communication built from automatically checked replicas and question-answer pairs in particular.

PS

Thanks to those on whom this review was tested, for their comments and ideas.

Tags: