Vitter January 16, 2014 at 04:08

Algebraic Data Types Zoo

In this article we will try to consider the whole variety of Algebraic Data Types.

I must say, the task is quite heavy, and to understand a person if he had not dealt with Algebraic Types before is not very simple.
ADTs were first used in the Hope language, but they gained major popularity thanks to the ML languages such as Standart ML, OCaml, F #, and the Haskell language.
Today ADTs are supported in one way or another in a much larger number of languages: Scala, Rust, Nemerle, Racket, ...
ADT is a universal data type. Using it, you can imagine most data types.
ADTs are called algebraic, because they can be represented as a kind of algebraic composition of the types of its components. This knowledge gives its advantage: understanding the properties of algebraic composition, you can calculate which type is necessary for constructing certain data.
We will consider types based on the Haskell language, however, similar with slight changes can be achieved in other languages with support for ADT.

It should be noted that the type theories developed in the twentieth century by Russell, Church, Martin-Löf gained such popularity that now (already in the twenty-first century) the theory of Homotopy Type Theory appeared, which attempted to explain the foundation of mathematics.

HoTT

If anyone is interested in theory, you can read for free fresh work in English Homotopy Type Theory: Univalent Foundations of Mathematics

Our task is to look at Algebraic Data Types. Often, ADT data is called boxed or wrapped. They are so named because they look like data packed in a box. The most interesting thing is that even the implementation does not mean storing data, but pointers. In addition to ease of implementation, this provides the effective mechanisms of laziness that Haskell has.
In fairness, I note that Haskell allows you to create non-lazy, including non-wrapped data, which helps in processing highly loaded data volumes.
In Haskell, there are 2 ways to create an ADT using the dataor declaration newtype. The difference between the two is thatnewtype de facto not wrapped, and therefore cheaper (by machine resources), and therefore more profitable, but with it you can record only a narrow circle of ADTs, so it is not suitable in the general case.

Data Types Implemented Through Primitives

In Haskell there are not many, first of all it is these types of data, such as
Int, Integer,Float, Double, Char, ...
It's not that you can not make these types of this (via ATD), a computer is able to work with numbers, and sin not to use it.
I want to draw attention to the fact that all types are written with a capital letter, this is due to the fact that all type and data constructors in Haskell are written exclusively with a capital letter.

Zero Data Types

Most imperative languages are designed in such a way that it is convenient to create programs from bottom to top in hierarchical complexity.
In Haskell, it is very easy to write code from top to bottom. In this connection, it became necessary to have empty data types (without implementations), since it is known what this data type will do, but it is not yet decided how to fill it out. An extension has appeared that allows you to do this.

These are not real, pseudo-null, empty data types.

data S
data T a
data K a b

The syntax, as we see it, is simple to disgrace.
Here we created 3 types of data, the second of which is parametric. We see what ais written with a lowercase letter, which means it cannot be a constructor. Conclusion - This option, which can substitute any type, it is possible to have the following types: T Int,T Char, T S, T (T Double)and even T (T ( T String)).
We also see that the constructor is at the beginning of the type, and then the data is, as it were, packed into a box.
You ask why they are needed?
For example, we can write the following valid function code:

doubleT :: T a -> (T a, T a)
doubleT t = (t, t)

Of course, in most cases it is impossible to write all the functions without internal knowledge of the data, but you can leave some of the functions for later, doing other tasks, such as this:

foo :: S -> Int -> T a
foo = undefined
nextT :: S -> Int -> (Int, T a)
nextT s i = (i+1, foo s (i+1))

This program will be compiled and checked for errors, despite the fact that neither the data nor part of the functions are defined.

The true null data type looks unusual enough for the untrained eye, but we can figure it out.
It was found relatively recently, it is still not in the Platform (so that it can be used out of the box), and is still little used.

Void

newtype Void = Void Void

Here we see a recursive type (we will talk about this later), first comes the type name - Void, then =, then the constructor of this type Voidand the parameter, which is the type Void.
If we rename, this will not change anything except our understanding:

newtype VoidData = Void VoidData

This data type is infinitely looped on itself.

Unit Data Types

Single data types give nothing but knowledge of what it is.
Of the common data types used

data () = ()      --X

hereinafter commentary
-- X
data that is so defined will be shown, but it cannot be determined by itself because invalid characters are used.
In almost all programs, the signature is used:

main :: IO ()

because the program must return something. Therefore, it ()is the most popular type when you need to return anything.
Or, here's another well-known unit type:

data Unit = Unit

the same, only regular constructors are used.
I must say that in these types there is no recursion, that is, you can write:

data UnitData = Unit

the fact is that the type namespace does not intersect with the type constructor space, so the compiler understands where there is recursion and where not.

Types of works

Types of works are named so that they can be represented as the product of several types that make up it.
Very popular data types are used universally.
The most famous of them are tuples or stupid. Haskell uses special syntactic sugar for this:

data (,) a b        = (,) a b            --X  (a, b)
data (,,) a b c     = (,,) a b c         --X  (a, b, c)
data (,,,,) a b c d = (,,,,) a b c d     --X  (a, b, c, d)

we can write the same thing using ordinary constructors:

data Pair a b = Pair a b
data Triple a b c = Triple a b c

Wherever you need to store more than one given in the data - this is the type of work.

Total Data Types

Total data types are named because they summarize the data types that make up them.
For example, if you summarize single data types, an enumeration is obtained.
Presenting one of the most famous listings:

data Bool = False | True

also a famous enumeration is the type Orderingused in comparisons

data Ordering = LT | EQ | GT

If we move on to more complex ADTs, we need to recall the zero-abelian type:

data Maybe a = Nothing | Just a

Data that has a field for the result when it is not, so to speak, nullfrom many programming languages.

() :: Int -> Int -> Maybe Int
_  0 = Nothing
a  b = Just (a `div` b)

If you need to learn more about the error, an even more complex data type is used:

data Either a b = Left a | Right b

a data type where there is either a “left” result (error description or the error itself) or a “correct” result.
Also this data type is used for binary selection.

type IdPerson = Int
type ErrorMsg = String
personChecker :: Person -> Either ErrorMsg IdPerson
personChecker p = if age p < 0          then Left "this guy is not born yet!"
            else if null $ first_name p then Left "this guy is unnamed!"
            else Right $ registeredId p

You can compose more exotic data types, for example:

data Both a b = None | First a | Second b | Both a b

here we either have no result, or 1 result, or 2nd, or both 1st and 2nd.

Recursive Data Types

In Haskell, you can easily create recursive data types.
The best known syntax sugar type is lists:

data [] a = [] | a : [a]     --X

However, it can be easily rewritten with the words:

data List a = Nil | Cons a (List a)

I want to note that special characters can be used as constructors, where the first character is necessarily a colon (:), then they use infix notation.
Want to build a tree? No problem.
For example, a simple binary tree:

data Tree a = Leaf a | Branch (Tree a) (Tree a)
data Tree2 a = Empty | Branch2 (Tree2 a) a (Tree2 a)

Or else, natural numbers:

data Nat = Zero | Succ Nat

If you have not yet opened a spoiler about a null data type (or opened, but did not understand) - it's time to open and read and understand that an Voidelementary infinitely recursive data type.

Power Data Types

Do not think that only data can be written to the ADT. Functions can also easily be written in the ADT: power types (the general type has the power of raising to the power of one data type the power of another):

data Fun a b = Fun (a -> b)
newtype ContT r m a = Writer ((a -> m r) -> m r)

If we use the ADT of works, where the data will be mixed with functions, this is nothing more than records, and very similar structures to objects.

Phantom Data Types

Sometimes you have to create quite unusual behavior, and you have to invent types that do nothing, but pretend they do. For this, in particular, phantom types are used. One of the most famous types is:

data Proxy a = Proxy

As you can see, the type header says that the type is parametric, but in reality it is a dummy.
These data types are often used as universal gags, as they are easily converted from one data type to another.

proxyСonverter :: Proxy a -> Proxy b
proxyСonverter Proxy = Proxy

Posts

The type of works is so often used that there is a special syntactic sugar that helps to work with it easier and more conveniently (with built-in getters and setters). These types are called records.
May we have a type

data Sex =  Female | Male
data Person = Person String String Int Int Sex (String -> String)

much more convenient to rewrite to:

data Sex =  Female | Male
data Person = Person  { lastName  :: String
                      , firstName :: String
                      , age       :: Int 
                      , socialId  :: Int 
                      , sex       :: Sex
                      , greeting  :: String -> String
                      }

Inserts or Wrappers

Inserts are used where it is necessary either to hide the type implementation, to protect it from outside interference, or to give other properties than the mother type.
It is written in a similar way (with the syntax of the entries):

newtype Wrapper a = Wrapper { unwrapper :: a }
newtype Dollar = Dollar Int

Existential data types

Existential data types are named so because of the existence quantifier ∃.
The paradox is that in Haskell there is no such quantifier.
But there is a quantifier of universality ∀. But these quantifiers can be easily transformed into each other.
To be completely precise, then every existentially-quantified type of rank n+1can be transformed into a universally-quantified type of rank n.

data HeteroData = forall a. HeteroData a
heteroList :: [HeteroData]
heteroList = [HeteroData 3.7, HeteroData "message", HeteroData True]

As you can see, they were able to create a heterogeneous list, despite the fact that it is homogeneous.
But something looks like an object:

data Counter a = forall self. NewCounter
    { _this    :: self
    , _inc     :: self -> self
    , _display :: self -> IO ()
    , tag      :: a
    }

With functions written with an underscore, we cannot work directly.
True, working with such objects is not at all like working with OOP:

Existential Variable Record

inc :: Counter a -> Counter a
inc (NewCounter x i d t) = NewCounter { _this = i x, _inc = i, _display = d, tag = t }
display :: Counter a -> IO ()
display NewCounter{ _this = x, _display = d } = d x
counterA :: Counter String
counterA = NewCounter { _this = 0, _inc = (1+), _display = print, tag = "A" }
counterB :: Counter String
counterB = NewCounter { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }
main = do
    display (inc counterA)         -- prints "1"
    display (inc (inc counterB))   -- prints "##"

Generic Algebraic Data Types (GADTs)

Generalized ADTs differ from ordinary ones in that they trim and specialize the final type.
Haskell uses a “functional” record of this data. Let's rewrite a simple data type in the new syntax:

data Maybe a = Nothing | Just a
data Maybe a where
    Nothing ::      Maybe a
    Just    :: a -> Maybe a
-- --------------------
data List a = Nil | Cons a (List a)
data List a where
    Nil  ::                List a
    Cons :: a -> List a -> List a

We see that the final data type is the same for all: Maybe a(or List a). OATD allow you to have different resulting data types.
If we have an abstract tree:

  data Term a where
      Lit    :: Int -> Term Int
      Succ   :: Term Int -> Term Int
      IsZero :: Term Int -> Term Bool
      If     :: Term Bool -> Term a -> Term a -> Term a
      Pair   :: Term a -> Term b -> Term (a,b)

it’s easy to build a function to calculate a similar tree:

  eval :: Term a -> a
  eval (Lit i) 	= i
  eval (Succ t)     = 1 + eval t
  eval (IsZero t)   = eval t == 0
  eval (If b e1 e2) = if eval b then eval e1 else eval e2
  eval (Pair e1 e2) = (eval e1, eval e2)

Parametric data types with limited parameters

Sometimes it really hurts to limit parametric data types. They are cut constraint type (kind) type.
View - this is essentially a type of type, for example

Just 3 :: (Maybe Int :: *)

And here is the data with a restriction on the parameter:

data F (a :: * -> *) where ...

Basically, these types are necessary for very high levels of abstraction.

Vector type

Vector with natural numbers as length

data Ze
data Su n
data Vec :: * -> * -> * where
  Nil  :: Vec a Ze
  Cons :: a -> Vec a n -> Vec a (Su n)

Conclusion

Algebraic Data Types is a very simple, universal, elegant, easily extensible and powerful toolkit for creating custom data types for very many programmers needs.

Tags: