Introduction to OCaml: Data Types and Mapping [3]
- Transfer
[approx. Lane: continued translation. first part , second part ]
OCaml, like Perl, has language-level built-in list support. All list items must be of the same type. To determine the type, the expression is used:
Note: a semicolon, not a comma.
The list has a head (the first element) and a tail (other elements except the head). The head is an element. The tail is a list. In the above example, the head is an integer of 1, and the tail is a list
An alternative form of notation is to use the construct operator (cons) on the form
Why did we mention the construction operator? It is useful when we begin the pattern matching for lists, we will discuss this a bit later.
The data type for a linked list of integers will be
This implies that all list items must be of the same type. But we can use polymorphic lists (for example,
A good example is a function
In C and C ++ there is a concept
Take a look at a simple structure in C:
The simplest equivalent in OCaml is a tuple , such as
A somewhat more complex embodiment of the structure is to use Si'shnoy recording (record). Records, like C structures, allow you to use named elements. Tuples do not have named elements, but in return store the order in which elements appear in the tuple. Here is the equivalent to a C structure using records:
The above line defines the type, and below we really create an object of this type:
Note that we used ":" in the type definition and "=" when creating an object of a given type.
Here are some top-level examples (toplevel):
OCaml prohibits leaving entries undefined.
The "qualified union" object type does not exist in C, although gcc has some support for it. Here is an example of how often a qualified C join is implemented:
As you can see, this is not a very pleasant sight. And in general, it is not too safe: a programmer can use a value
Here is an elegant and compact version on OCaml:
This is a type definition. The first part of each plot limited | called a constructor. It can be called whatever you like, but it must begin with a capital letter. If the constructor can be used to determine the value, it follows
For a real creation , the following is written:
Each expression in the example above has a type
Note: it is
Continuing. A simple C listing is defined as
It can also be written on OCaml as follows:
Options can be recursive. Most often, this is used to define tree structures. This is where the expressiveness of functional languages becomes visible:
Here are some binary trees. For practice, try drawing them on paper:
The binary tree in the examples above stored an integer in each sheet. But what if we want to describe the shape of a binary tree, and we will leave the specification of what to store in leaves for later? We can do this using parameterized (polymorphic) options, something like this:
This is a common type. The exact type that stores integers in each sheet is called
Note that the type name is written in reverse order. Compare this to the type name for lists, for example
In fact, this is not a coincidence, which is
Although this is not a complete definition. Here is a much more accurate definition:
Recall that we said earlier - lists can be written in two ways: either with a small amount of syntactic sugar
One of the Really Cool Features of functional programming languages is the ability to parse structures and do pattern matching for data. This is not quite a “functional” feature - we can imagine some semblance of a variation in C, which will allow us to do this, but in any case, this is a Cool Feature.
Let's start with a real programming problem: I want to have a writing tool for mathematical expressions such as n * (x + y) and perform symbolic multiplication in order to get n * x + n * y from the expression above.
Let's define the type of these expressions:
An expression of the form n * (x + y) can be written as:
Now let's write a function that will output
(Note: the operator is
And here is the function to display on the screen:
General form for pattern matching:
The left side of the pattern matching can be simple (as in the case
Here it is in work:
How does it work
The second pattern does the same, except for the use of view expressions
The remaining samples do not change the expression, but, in fact, they recursively call
Can we do the opposite? (Factorization of the general part of subexpressions?) Of course! (although this is slightly more complicated). The version below works only for the top level of expressions. Of course, you can expand it for all levels of expression (and for more complex cases):
The factorization function shows us a few more features of the language. You can add custodians to each pattern match. A keeper is a condition that follows a match. Pattern matching only works if there is a match for the pattern AND the condition in is
The second feature of the language is an operator
OCaml can verify at compile time that you have covered all the possible cases in your templates. I changed the type definition by
Then I tried to copy the function
Linked Lists
OCaml, like Perl, has language-level built-in list support. All list items must be of the same type. To determine the type, the expression is used:
[1; 2; 3]
Note: a semicolon, not a comma.
[]
means empty list. The list has a head (the first element) and a tail (other elements except the head). The head is an element. The tail is a list. In the above example, the head is an integer of 1, and the tail is a list
[2; 3]
. An alternative form of notation is to use the construct operator (cons) on the form
head :: tail
. The lines below are completely equivalent to each other.[1; 2; 3] 12; 3] 1 :: 2 :: [3] 1 :: 2 :: 3 :: []
Why did we mention the construction operator? It is useful when we begin the pattern matching for lists, we will discuss this a bit later.
Linked List Data Type
The data type for a linked list of integers will be
int list
; a generic type for a linked list of type objects foo
will be foo list
. This implies that all list items must be of the same type. But we can use polymorphic lists (for example,
'a list
), which are very useful when writing general functions that work with a “list of something”. (Note that this 'a list
does not mean that the elements of the list can be of different types, you still cannot, for example, make a list consisting of a mixture of integers and strings. This form of writing means “a list of elements of any type, but of the same type "). A good example is a function
length
defined in a moduleList
. It does not matter if the list contains integers, or strings, or objects, or small furry animals, the function List.length
can be used for any type of list. Thus type List.lenght
:List.length: 'a list -> int
Structures
In C and C ++ there is a concept
struct
(short for structure). Java has classes that can be used in a similar way, although using them requires more painstaking work. Take a look at a simple structure in C:
struct pair_of_ints { int a, b; };
The simplest equivalent in OCaml is a tuple , such as
(3,4)
whose type int * int
. Unlike lists, tuples can contain elements of different types, so, for example, (3, “helo”, 'x') has a type int * string * char
. A somewhat more complex embodiment of the structure is to use Si'shnoy recording (record). Records, like C structures, allow you to use named elements. Tuples do not have named elements, but in return store the order in which elements appear in the tuple. Here is the equivalent to a C structure using records:
type pair_of_ints = {a: int; b: int} ;;
The above line defines the type, and below we really create an object of this type:
{a = 3; b = 5}
Note that we used ":" in the type definition and "=" when creating an object of a given type.
Here are some top-level examples (toplevel):
# type pair_of_ints = {a: int; b: int} ;; type pair_of_ints = {a: int; b: int; } # {a = 3; b = 5} ;; -: pair_of_ints = {a = 3; b = 5} # {a = 3} ;; Some record field labels are undefined: b
OCaml prohibits leaving entries undefined.
Options (qualified associations and transfers)
The "qualified union" object type does not exist in C, although gcc has some support for it. Here is an example of how often a qualified C join is implemented:
struct foo { int type; #define TYPE_INT 1 #define TYPE_PAIR_OF_INTS 2 #define TYPE_STRING 3 union { int i; // If type == TYPE_INT. int pair [2]; // If type == TYPE_PAIR_OF_INTS. char * str; // If type == TYPE_STRING. } u; };
As you can see, this is not a very pleasant sight. And in general, it is not too safe: a programmer can use a value
u.i
at a time when the structure contains a string. And the compiler here will not help to verify that all possible values have been checked inside the switch expression (specifically, this problem is partially solved by using enum
). The programmer may forget to set the value of the field type
, which will lead to all kinds of fun and games with bugs. And all this is cumbersome. Here is an elegant and compact version on OCaml:
type foo = Nothing | Int of int | Pair of int * int | String of string ;;
This is a type definition. The first part of each plot limited | called a constructor. It can be called whatever you like, but it must begin with a capital letter. If the constructor can be used to determine the value, it follows
of part
where it type
always starts with a lowercase letter. In the example above Nothing
, this is a constant, and all other constructors are used with values. For a real creation , the following is written:
Nothing Int 3 Pair (4, 5) String "hello" & c.
Each expression in the example above has a type
foo
. Note: it is
of
used in defining a type, but NOT in creating elements of a given type. Continuing. A simple C listing is defined as
enum sign {positive, zero, negative};
It can also be written on OCaml as follows:
type sign = Positive | Zero | Negative ;;
Recursive options (for trees)
Options can be recursive. Most often, this is used to define tree structures. This is where the expressiveness of functional languages becomes visible:
type binary_tree = Leaf of int | Tree of binary_tree * binary_tree ;;
Here are some binary trees. For practice, try drawing them on paper:
Leaf 3 Tree (Leaf 3, Leaf 4) Tree (Tree (Leaf 3, Leaf 4), Leaf 5) Tree (Tree (Leaf 3, Leaf 4), Tree (Tree (Leaf 3, Leaf 4), Leaf 5))
Parameterized Options
The binary tree in the examples above stored an integer in each sheet. But what if we want to describe the shape of a binary tree, and we will leave the specification of what to store in leaves for later? We can do this using parameterized (polymorphic) options, something like this:
type 'a binary_tree = Leaf of' a | Tree of 'a binary_tree *' a binary_tree ;;
This is a common type. The exact type that stores integers in each sheet is called
int binary_tree
. Similarly, the exact type that stores the rows in each sheet is called string binary_tree
. In the following example, we will determine the types of some instances at the top level and let the type inference system show their types for us:# Leaf "hello" ;; -: string binary_tree = Leaf "hello" # Leaf 3.0 ;; -: float binary_tree = Leaf 3.
Note that the type name is written in reverse order. Compare this to the type name for lists, for example
int list
. In fact, this is not a coincidence, which is
a' list
also written in the same “in reverse order”. Lists are just parameterized options with this (slightly strange) definition:type 'a list = [] | :: of 'a *' a list (* this is not real OCaml * code)
Although this is not a complete definition. Here is a much more accurate definition:
# type 'a list = Nil | :: of 'a *' a list ;; type 'a list = Nil | :: of 'a *' a list # Nil ;; -: 'a list = Nil # 1 :: Nil ;; -: int list = :: (1, Nil) # 1 :: 2 :: Nil ;; -: int list = :: (1, :: (2, Nil))
Recall that we said earlier - lists can be written in two ways: either with a small amount of syntactic sugar
[1; 2; 3]
, or more formally, like 1 :: 2 :: 3 :: []
. If we look at the definition a' list
above, we will see an explanation of the formal notation.Lists, structures, options - the result
Name in OCaml | Example Type Definition | Using example |
---|---|---|
list | int list | [1; 2; 3] |
tuple | int * string | (3, "hello") |
record | type pair = {a: int; b: string} | {a = 3; b = "hello"} |
variant | type foo = Int of int | Pair of int * string | Int 3 |
variant | type sign = Positive | Zero | Negative | Positive zero |
parameterized variant | type 'a my_list = Empty | Cons of 'a *' a my_list | Cons (1, Cons (2, Empty)) |
Pattern matching (for data types)
One of the Really Cool Features of functional programming languages is the ability to parse structures and do pattern matching for data. This is not quite a “functional” feature - we can imagine some semblance of a variation in C, which will allow us to do this, but in any case, this is a Cool Feature.
Let's start with a real programming problem: I want to have a writing tool for mathematical expressions such as n * (x + y) and perform symbolic multiplication in order to get n * x + n * y from the expression above.
Let's define the type of these expressions:
type expr = Plus of expr * expr (* means a + b *) | Minus of expr * expr (* means a - b *) | Times of expr * expr (* means a * b *) | Divide of expr * expr (* means a / b *) | Value of string (* "x", "y", "n", etc. *) ;;
An expression of the form n * (x + y) can be written as:
Times (Value "n", Plus (Value "x", Value "y"))
Now let's write a function that will output
(Value "n", Plus (Value "x", Value "y"))
as something more similar to n * (x+y)
. To do this, we will write two functions, one of which converts the expression into a beautiful string and one that displays it (the reason for the separation is that perhaps I will need to write the same line to the file and I do not want to repeat the whole function for this).let rec to_string e = match e with Plus (left, right) -> "(" ^ (to_string left) ^ "+" ^ (to_string right) ^ ")" | Minus (left, right) -> "(" ^ (to_string left) ^ "-" ^ (to_string right) ^ ")" | Times (left, right) -> "(" ^ (to_string left) ^ "*" ^ (to_string right) ^ ")" | Divide (left, right) -> "(" ^ (to_string left) ^ "/" ^ (to_string right) ^ ")" | Value v -> v ;; let print_expr e = print_endline (to_string e) ;;
(Note: the operator is
^
used to concatenate strings) And here is the function to display on the screen:
# print_expr (Times (Value "n", Plus (Value "x", Value "y"))) ;; (n * (x + y))
General form for pattern matching:
match object with pattern -> result | pattern -> result ...
The left side of the pattern matching can be simple (as in the case
to_string
in the example above), or complex, with attachments. The following example is our function for multiplying expressions in a form n * (x + y)
or in a form (x + y) * n
. For this we will use nested samples:let rec multiply_out e = match e with Times (e1, Plus (e2, e3)) -> Plus (Times (multiply_out e1, multiply_out e2), Times (multiply_out e1, multiply_out e3)) | Times (Plus (e1, e2), e3) -> Plus (Times (multiply_out e1, multiply_out e3), Times (multiply_out e2, multiply_out e3)) | Plus (left, right) -> Plus (multiply_out left, multiply_out right) | Minus (left, right) -> Minus (multiply_out left, multiply_out right) | Times (left, right) -> Times (multiply_out left, multiply_out right) | Divide (left, right) -> Divide (multiply_out left, multiply_out right) | Value v -> Value v ;;
Here it is in work:
# print_expr (multiply_out (Times (Value "n", Plus (Value "x", Value "y")))) ;; ((n * x) + (n * y))
How does it work
multiply_out
? The key point is the first two samples. The first pattern is one Time (e1, Plus (e2, e3))
that matches view expressions e1 * (e2 + e3)
. Now let's look at the right side of the first comparison with the sample and make sure that it is equivalent (e1 * e2) + (e1 * e3)
. The second pattern does the same, except for the use of view expressions
(e1 + e2) * e3
. The remaining samples do not change the expression, but, in fact, they recursively call
multiply_out
on subexpressions. This ensures that all subexpressions are multiplied (if you want to multiply only the top level of an expression, then you should replace all remaining patterns with a simple rule e -> e
).Can we do the opposite? (Factorization of the general part of subexpressions?) Of course! (although this is slightly more complicated). The version below works only for the top level of expressions. Of course, you can expand it for all levels of expression (and for more complex cases):
let factorize e = match e with Plus (Times (e1, e2), Times (e3, e4)) when e1 = e3 -> Times (e1, Plus (e2, e4)) | Plus (Times (e1, e2), Times (e3, e4)) when e2 = e4 -> Times (Plus (e1, e3), e4) | e -> e ;;
# factorize (Plus (Times (Value "n", Value "x"), Times (Value "n", Value "y"))) ;; -: expr = Times (Value "n", Plus (Value "x", Value "y"))
The factorization function shows us a few more features of the language. You can add custodians to each pattern match. A keeper is a condition that follows a match. Pattern matching only works if there is a match for the pattern AND the condition in is
when
satisfied.match object with pattern [when condition] -> result pattern [when condition] -> result ...
The second feature of the language is an operator
=
that checks for “structural correspondence” between expressions. This means that it goes recursively into each expression and checks them accordingly at all levels. OCaml can verify at compile time that you have covered all the possible cases in your templates. I changed the type definition by
expr
adding a variant to it Product
:type expr = Plus of expr * expr (* means a + b *) | Minus of expr * expr (* means a - b *) | Times of expr * expr (* means a * b *) | Divide of expr * expr (* means a / b *) | Product of expr list (* means a * b * c * ... *) | Value of string (* "x", "y", "n", etc. *) ;;
Then I tried to copy the function
to_string
without adding it. OCaml issued the following warning:Warning: this pattern-matching is not exhaustive. Here is an example of a value that is not matched: Product _