Dependent Types - The Future of Programming Languages

Original author: Marin Benčević
  • Transfer

Despite the outlandishness and some distraction of the topic under consideration today - we hope that she will be able to diversify your weekend. At the end of the post we put three links from the author, allowing you to get acquainted with dependent typing in Idris, F * and JavaScript

Sometimes it seems as if programming languages ​​have hardly changed since the 60s. When they talk to me about it, I often remember how many cool tools and opportunities we now have, and how they simplify life. Offhand: it’s integrated debuggers, unit tests, static analyzers, cool IDEs, typed arrays, and more. The development of languages ​​is a long and progressive process, and there are no such “silver bullets” with the appearance of which the development of languages ​​would change once and for all.

Today I want to tell you about one of the final stages in this ongoing process. The technology, which will be discussed, is being actively explored, but everything indicates that it will soon take root in mainstream languages. And our story begins with one of the most fundamental concepts in computer science: with types .

World of types

Typification is one of those things that are so inseparable from our thinking that we hardly even think about the concept of types as such? Why is 1 - this int, but, if you only put this value in quotes - and it turns into string? What is “type”, in essence? As is often the case in programming, the answer depends on the wording of the question.

Types are diverse. In some type systems, there are very clear boundaries between types and values. So, 3, 2, and 1 are values ​​of type integer, but integerare not values. This construct is “embedded” in the language and is fundamentally different from the meaning. But, in fact, such a distinction is not necessary and can only limit us.

If you release the types and turn them into another category of values, a number of amazing possibilities open up. Values ​​can be stored, converted and passed to functions. Thus, it would be possible to make a function that takes a type as a parameter, creating generic functions: those that can work with many types without overloads. You can have an array of values ​​of a given type, rather than doing strange pointer arithmetic and type casting, as you have to do in C. You can also collect new types as the program runs and provide features such as, for example, JSON automatic deserialization. But, even if you interpret types as values, you still can’t do anything with types that you can do with values. So, using user instances, you can, for example, compare their names,

if == "Marin" && user.age < 65 {
    print("You can't retire yet!")

However, when trying to do the same with a type User, you could only compare type names and, possibly, property names. Since this is a type, not an instance, you cannot check the values ​​of its properties.

if typeof(user) == User {
    print("Well, it's a user. That's all I know")

How cool would it be if we had a function that could only get a non-empty list of users? Or a function that would accept an email address only if it was recorded in the correct format? For these purposes, you would need the types of "non-empty array" or "email address." In this case we are talking about a type that depends on the value, i.e. about dependent type . In mainstream languages, this is not possible.

That types could use, the compiler should check them. If you claim that the variable contains an integer, then it would be better if it didn’t exist string, otherwise the compiler will swear. In principle, this is good, because it does not allow us to sabotage. It is very simple to check the types: if the function returns integer, and we try to return it "Marin", then this is an error.

However, with dependent types everything is more complicated. The problem is when exactly the compiler checks the types. How can he make sure that there are three values ​​in the array, if the program is not even running yet? How to make sure that an integer is greater than 3 if it is not even assigned yet? There is magic for it ... or, in other words, mathematics . If you can mathematically prove that the set of numbers is always greater than 3, then the compiler can verify this.

Math in the studio!

Mathematical induction is used to formulate evidence. Induction allows you to unconditionally confirm the truth of some statement. For example, we want to prove that the following mathematical formula holds for any positive number:

1 + 2 + 3 + ... + x = x * (x + 1) / 2

There are an infinite number of possible x, so it would take us a very long time to check all the numbers manually. Fortunately, this is not necessary. We just need to prove two things:

  1. This statement is respected for the first number. (Usually it is 0 or 1)
  2. If this statement is true for a number n, then it will be observed for the next numbern + 1

Since the statement is observed for both the first number and all the following numbers, we know that it is true for all possible numbers.

Proving this is not difficult:

1 = 1 * (1 + 1) / 2
1 = 1

Now we also have to prove that the statement holds for all other numbers. To do this, assume that it works for some number n, and then make sure that it also works for n + 1.

Assuming that the following expression is true:

1 + 2 + 3 + ... + n = n * (n + 1) / 2

Check it out for n + 1:

(1 + 2 + 3 + ... + n) + (n + 1) = (n + 1) * ((n + 1) + 1) / 2

Thus, it is possible to replace with the "(1 + 2 + 3 + ... + n)" above equation:

(n * (n + 1) / 2) + (n + 1) = (n + 1) * ((n + 2) / 2)

and simplify to

(n + 1) * (n/2 + 1) = (n + 1) * (n/2 + 1)

Since both parts of the expression are equal, we are convinced that this statement is true. This is one of the ways in which one can check the truth of statements without manually calculating each case, and dependent types work on the basis of this principle. You write a mathematical statement to make sure that the thesis about the type is correct.

The beauty of this approach lies in the fact that any mathematical proof can be made in the form of a computer program - and that’s what we need!

Back to programming

So, we have established that some things can be proved first, and then proceed to specific values. To do this in a programming language, you need a way to express these statements in code that will be written to the type system itself, that is, the type system needs to be improved.

Consider an example. Here we have the append function that takes two arrays and combines them. As a rule, the signature of such a function will look like this:

append: (arr1: Array, arr2: Array) -> Array

However, just by looking at the signature, we cannot be sure of the correct implementation. The fact that the function returns an array does not mean that it has done something. One way to check the result is to make sure that the length of the resulting array is equal to the sum of the lengths of the parameter arrays.

newArray = append([1], [2, 3])
assert(length(newArray) == 3)

But why check it at runtime if you can create a constraint that will be checked at program compilation time:

append: (arr1: Array, arr2: Array) -> newArray: Array
         where length(newArray) == length(arr1) + length(arr2)

We declare that append- this is a function that takes two arguments Arrayand returns a new argument Array, which we called newArray. Only this time we add a clause that the length of the new array should be equal to the sum of the lengths of all the arguments of the function. The statement we had above at runtime is converted to a type at compile time.

The above code refers to the world of types, not values, that is, the sign ==indicates a comparison of the return type length, not its value. For such a mechanism to work, the return type length must give us any information about the actual number.

To ensure the operation of such a mechanism, you need to make sure that each number is of a separate type. Type One can contain only one value: 1. The same applies to Two, Three and all other numbers. Naturally, such work is very tiring, but it is for this kind of work that we have programming. You can write a compiler that will do it for us.

By doing this, you can create separate types for arrays containing 1, 2, 3, and a different number of elements. ArrayOfOne, ArrayOfTwo, Etc.

Thus, you can define the function length, which will take one of the above types of arrays and have a dependent return type Onefor ArrayOfOne, Twofor ArrayOfTwo, etc. for every number.

Now that we have a separate type for any particular array length, we can make sure (at compile time) that both arrays are of equal length. To do this, compare their types. And since types are the same values ​​as any others, you can assign operations to them. You can define the addition of two specific types by specifying that the sum ArrayOfOneand ArrayOfTwois equal ArrayOfThree.

That's all the information the compiler needs to make sure that the code you write is correct.

Suppose we want to create a type variable ArrayOfThree:

result: ArrayOfThree = append([1], [2, 3])

The compiler can determine that [1] has only one value, so you can assign a type ArrayOfOne. It can also assign ArrayOfTwoto [2, 3].

The compiler knows that the result type must be equal to the sum of the types of the first and second argument. He also knows that ArrayOfOne + ArrayOfTwo is equal to ArrayOfThree, that is, he knows that the entire expression on the right side of the identity is of type ArrayOfThree. It coincides with the expression on the left side, and the compiler is satisfied.

If we wrote the following:

result: ArrayOfTwo = append([1], [2, 3])

then the compiler would be completely unhappy because it would know that the type is incorrect.

Dependent typing is very cool.

In this case, a huge number of bugs is simply impossible to prevent. With dependent typing, you can avoid errors by one, calls to non-existent array indexes, null pointer exceptions, infinite loops, and inoperative code.

With the help of dependent types, you can express almost anything. The factorial function will only accept natural numbers, the function loginwill not accept empty strings, the function removeLastwill only accept non-empty arrays. Moreover, all this is checked before you run the program.

The problem with runtime checks is that they fail if the program is already running. This is normal if the program is run only by you, but not by the user. Dependent types allow you to bring such checks to the level of types, so failure of this kind during program execution becomes impossible.

I think dependent typing is the future of mainstream programming languages, and I can't wait to wait for it!


F *

Add dependent types in JavaScript

Also popular now: