# How to make even more incorrect states even more inexpressible

Not so long ago on Habré translated an article about how using algebraic data types to ensure that incorrect states were ineffable. Today we will look at a slightly more generalized, scalable and secure way to express the inexpressible, and haskel will help us in this.

In short, in that article, an entity is considered with a postal address and an e-mail address, as well as with the additional condition that there must be at least one of these addresses. How is it proposed to express this condition at the type level? It is proposed to record addresses as follows:

``````typeContactInfo =
| EmailOnlyofEmailContactInfo
| PostOnlyofPostalContactInfo
| EmailAndPostofEmailContactInfo * PostalContactInfo``````

What are the problems with this approach?

The most obvious (and it was noted several times in the comments to that article) - this approach is not scalable at all. Imagine that we have not two types of addresses, but three or five, and the condition of correctness looks like “there must be either a postal address or an email address and an office address at the same time, and there should not be several addresses of the same type”. Anyone can write the appropriate type as an exercise for self-checking. The task with an asterisk is to rewrite this type for the case when the condition about the absence of duplicates disappeared from the TZ.

### We divide

How can this problem be solved? Let's try to fantasize. Only first we will decompose and divide the address class (for example, mail / email / table number in the office) and the contents corresponding to this class:

``````data AddrType = Post
| Email| Office``````

We will not think about the content yet, because there is nothing about it in the statement of work on the condition of the validity of the address list.

If we checked the corresponding condition in the runtime of some constructor of some ordinary OOP-language, then we would just write a function like

``````valid :: [AddrType] -> Boolvalid xs = let hasNoDups = nub xs == xs  -- не делайте так в продакшене
hasPost = Post `elem` xs
hasEmail = Email `elem` xs
hasOffice = Office `elem` xs
in hasNoDups && (hasPost || (hasEmail && hasOffice))``````

and would throw any action if it returns `False`.

Can we instead check for a similar condition using a tip checker when compiling? It turns out that yes, we can, if the type system of the language is sufficiently expressive, and the rest of the article we will pick this approach.

Here dependent types will help us a lot, and since the most adequate way to write seized code on a Haskel is to first write it on aqda or Idris, we will sharply change our shoes and we will write on Idris. Idris syntax is close enough to a Haskel: for example, the above function only needs to change the signature a little:

``valid : List AddrType -> Bool``

Now remember that in addition to the address classes, we also need their contents, and encode the dependence of the fields on the address class as GADT:

``````data AddrFields : AddrType -> Type where
PostFields : (city : String) -> (street : String) -> AddrFields Post
EmailFields : (email : String) -> AddrFields Email
OfficeFields : (floor : Int) -> (desk : Nat) -> AddrFields Office``````

That is, if we are given a value of `fields`type `AddrFields t`, then we know that `t`- this is a kind of class `AddrType`, and that `fields`the set of fields corresponding to this particular class lies in it.

This is not the type-safe coding as GADT need not be injective, and it would be better to announce three separate data type `PostFields`, `EmailFields`, `OfficeFields`and write function

``````addrFields : AddrType -> Type
addrFields Post = PostFields
addrFields Email = EmailFields
addrFields Office = OfficeFields``````

but this is too much writing, which for the prototype does not give a significant gain, but in Haskel there are still shorter and more pleasant mechanisms for this.

What is the entire address in this model? This is a pair from the address class and the corresponding fields:

``````Addr : Type

Fans of the theory of types will say that this is an existential dependent type: if we are given a certain value of type `Addr`, then this means that there is a value of `t`type `AddrType`and a corresponding `t`set of fields `AddrFields t`. Naturally, the addresses of different classes are of the same type:

``````someEmailAddr : Addr
someEmailAddr = (Email ** EmailFields "that@feel.bro")
someOfficeAddr = (Office ** OfficeFields (-2) 762)``````

Moreover, if fields are given to us `EmailFields`, then the only address class that fits is `Email`, so you can omit it, the tipchecker will output it himself:

``````someEmailAddr : Addr
someEmailAddr = (_ ** EmailFields "that@feel.bro")
someOfficeAddr = (_ ** OfficeFields (-2) 762)``````

Let us write an auxiliary function that, according to the list of addresses, gives the corresponding list of classes of addresses, and immediately summarizes it before working on an arbitrary functor:

``````types : Functor f => f Addr -> f AddrType
types = map fst``````

Here, the existential type `Addr`behaves like a familiar couple: in particular, you can ask for its first component `AddrType`(task with an asterisk: why ask for the second component so it won't work?).

#### Raise

Now we come to the key part of our story. So, we have a list of addresses `List Addr`and some predicate `valid : List AddrType -> Bool`, the execution of which for this list we want to guarantee at the level of types. How can we combine them? Of course, another type!

``````data ValidatedAddrList : List Addr -> Typewhere
MkValidatedAddrList : (lst : List Addr) ->
(prf : valid (types lst) = True) ->

Now we will understand what we have written here.

`data ValidatedAddrList : List Addr -> Type where`means that the type is `ValidatedAddrList`parameterized by the address list itself.

Let's look at the signature of the sole designer `MkValidatedAddrList`of this type: `(lst : List Addr) -> (prf : valid (types lst) = True) -> ValidatedAddrList lst`. That is, it takes some address list `lst`and one more `prf`type argument `valid (types lst) = True`. What does this type mean? So it means that the value to the left of `=`is equal to the value to the right of `=`, that is, that `valid (types lst)`, in fact, is equal to True.

How it works? The signature `=`looks like `(x : A) -> (y : B) -> Type`. That is, it `=`takes two arbitrary values `x`and `y`(perhaps even different types `A`and `B`, which means that the inequality in Idris is heterogeneous, and that is somewhat ambiguous from the point of view of the theory of types, but this is a topic for a separate discussion). Due to what then equality is demonstrated? And due to the fact that the only constructor `=`- `Refl`with a signature almost`(x : A) -> x = x` . That is, if we have a value of type `x = y`, then we know that it was built using `Refl`(because there are no other constructors), which means that it is `x`in fact equal `y`.

Note that this is why we will always pretend at best in Haskel that we are proving something, because there is one in Haskel `undefined`that inhabits any type, so the above reasoning doesn’t work there: for any `x`, the `y`term type `x = y`could be created through `undefined`(or through infinite recursion, let's say that by and large the same thing from the point of view of type theory).

Note also that equality here is not meant in the sense of Haskele `Eq`or some `operator==`in C ++, but much more rigorous: structural, which, simplifying, means that two values ​​have the same form . That is, to deceive him so simply will not work. But equality issues are traditionally pulled into a separate article.

In order to consolidate our understanding of equality, we write unit tests for the function `valid`:

``````testPostValid : valid [Post] = True
testPostValid = Refl
testEmptyInvalid : valid [] = False
testEmptyInvalid = Refl
testDupsInvalid : valid [Post, Post] = False
testDupsInvalid = Refl
testPostEmailValid : valid [Post, Email] = True
testPostEmailValid = Refl``````

These tests are good because you don’t even need to run them; it’s enough that the TIP checker checked them. Indeed, let's replace `True`with `False`, for example, in the very first of them and see what will happen:

``````testPostValid : valid [Post] = False
testPostValid = Refl``````

Typcheker cuss

as expected. Wonderful.

#### Simplified

Now a little refute our `ValidatedAddrList`.

First, the comparison pattern of a certain value with is `True`quite common, so there is a special type for this `So`: you can be taken `So x`as a synonym for `x = True`. Let's correct the definition `ValidatedAddrList`:

``````data ValidatedAddrList : List Addr -> Typewhere
MkValidatedAddrList : (lst : List Addr) ->
(prf : So (valid \$ types lst)) ->

In addition, `So`there is a convenient helper function `choose`, meaning to raise the check for the type level:

``````> :doc choose
Data.So.choose : (b : Bool) -> Either (So b) (So (not b))
Perform a case analysis on a Boolean, providing clients with a So proof``````

We need it when we write functions that modify this type.

Secondly, sometimes (especially with interactive development), the idris can find the appropriate value on `prf`its own. In order in such cases it was not necessary to design it by hand, there is an appropriate syntactic sugar:

``````data ValidatedAddrList : List Addr -> Typewhere
MkValidatedAddrList : (lst : List Addr) ->
{auto prf : So (valid \$ types lst)} ->

Braces mean that this is an implicit argument that idris will try to pull out of context, and `auto`means that he will also try to construct it himself.

So what does this new one give us `ValidatedAddrList`? And it gives such a chain of reasoning: let `val`- the value of the type `ValidatedAddrList lst`. This means that `lst`- some address list, and, moreover, `val`it was created with the help of the constructor `MkValidatedAddrList`, to which we passed this same `lst`and one more value of the `prf`type `So (valid \$ types lst)`, which is almost `valid (types lst) = True`. And in order for us to build `prf`, we need, in fact, to prove that this equality holds.

And the most beautiful thing is that everything is checked by the tickpicker. Yes, it will be `valid`necessary to perform the check at runtime (because the addresses can be read from the file or from the network), but the type checker guarantees that this check will be done: it is impossible to create without it `ValidatedAddrList`. At least in Idris. In Haskel it is possible, alas.

#### We insert

To make sure that verification is inevitable, we will try to write a function to add an address to the list. First try:

``````insert : (addr : Addr) ->

Nope, the ticker gives on the fingers (although not very readable, the costs of being `valid`too complicated):

How do we get a copy of this here `So`? None other than the above `choose`. Second attempt:

``````insert : (addr : Addr) ->
casechoose (valid \$ types (addr :: lst)) ofLeft l => MkValidatedAddrList (addr :: lst)
Right r => ?rhs``````

It is almost secret check. “Almost” because it is not clear what to substitute instead `rhs`. Rather, it is clear: in this case, the function must somehow report the error. So, we need to change the signature and wrap the return value, for example, in `Maybe`:

``````insert : (addr : Addr) ->
casechoose (valid \$ types (addr :: lst)) ofLeft l => Just \$ MkValidatedAddrList (addr :: lst)
Right r => Nothing``````

This is tipped and works as it should.

But now the following is not a very obvious problem, which was, in fact, in the original article. The type of this function does not interfere with writing such an implementation:

``````insert : (addr : Addr) ->

That is, we always say that we could not build a new address list. Tips? Yes. Correctly? Well, hardly. Can this be avoided?

It turns out that it is possible, and we have all the necessary tools for this. If successful, `insert`returns `ValidatedAddrList`, which contains evidence of this very success. So add elegant symmetry and ask the function to return more and proof of failure!

``````insert : (addr : Addr) ->
Either
(So (not \$ valid \$ types (addr :: lst)))
casechoose (valid \$ types (addr :: lst)) ofLeft l  => Right \$ MkValidatedAddrList (addr :: lst)
Right r => Left r``````

Now we can not just take and always return `Nothing`.

Similarly, you can do for address removal functions and the like.

Let's see how it all looks in the end now.

Let's try to create an empty address list:

It is impossible, the empty list is not valid.

How about a list of just one email address?

OK, let's try to insert the postal address into the list, in which there is already an postal address:

Let's try to insert an email:

In the end, everything works exactly as expected.

Ffuh. I thought it would be three lines, but it worked a little longer. So explore how far we can go to Haskel, we will be in the next article. In the meantime, a little

#### Ponder

What is the result of the profit of such a decision in comparison with the one given in the article to which we referred at the very beginning?

1. It is, again, much more scalable. Complex validation functions are easier to write.
2. It is more isolated. The client code is generally not obliged to know what is inside the validation function, whereas the form `ContactInfo`from the original article requires it to be tied up.
3. The logic of validation is written in the form of ordinary and familiar functions, so that you can immediately check it with thoughtful reading and test with time-tests, and not deduce the meaning of the test from the form of the data type representing the already verified result.
4. It becomes possible to slightly more accurately specify the behavior of functions that work with the type of data that we are interested in, especially in the case of failure to check. For example, written as a result, it is `insert`simply impossible to write wrong . Similarly, one could write `insertOrReplace`, `insertOrIgnore`and the like, whose behavior is fully specified in the type.

What is the profit compared to the OOP solution in this spirit?

``````classValidatedAddrListClass
{
public:
{
throw ValidationError {};
}
};``````

1. The code is more modularized and secure. In the case of the above, verification is an action that is checked once and which was later forgotten. Everything rests on the word of honor and agreement that, if you have it `ValidatedAddrListClass`, its implementation once there made a check. The very fact of this check from the class as some value can not be picked out. In the case of values ​​of some type, this value can be transferred between different parts of the program, used to build more complex values ​​(for example, again, negating this test), explore (see the next paragraph) and generally do the same thing that we used to do. with values.
2. Such checks can be used with (dependent) pattern matching. True, not in the case of this function `valid`and not in the case of Idris, it is painfully complicated, and Idris is painfully stupid so that `valid`information useful for patterns can be extracted from the structure . Nevertheless, `valid`it is possible to rewrite it in a slightly more pattern-friendly style, but this is beyond the scope of this article and is generally nontrivial in itself.

What are the disadvantages?

I see only one serious fundamental flaw: `valid`- too stupid function. It returns only one bit of information - validated or not. In the case of smarter types, we could achieve something more interesting.

For example, imagine that the requirement of unique addresses is missing from the TK. In this case, it is obvious that adding a new address to an existing list of addresses will not make the list invalid, so you could prove this theorem by writing a function with a type `So (valid \$ types lst) -> So (valid \$ types \$ addr :: lst)`and use it, for example, to write a type safe always successful

``````insert : (addr : Addr) ->
But, alas, theorems like recursion and induction, and in our problem there is no elegant inductive structure, therefore, in my opinion, the code with oak boolean is `valid`also not bad.