Introduction to OCaml: Data Types and Mapping [3]

Original author: www.ocaml-tutorial.org
  • Transfer
[approx. Lane: continued translation. first part , second part ]

Linked Lists


OCaml, like Perl, has language-level built-in list support. All list items must be of the same type. To determine the type, the expression is used:

[1; 2; 3]

Note: a semicolon, not a comma.

[]means empty list.

The list has a head (the first element) and a tail (other elements except the head). The head is an element. The tail is a list. In the above example, the head is an integer of 1, and the tail is a list [2; 3].

An alternative form of notation is to use the construct operator (cons) on the form head :: tail. The lines below are completely equivalent to each other.

[1; 2; 3]
12; 3]
1 :: 2 :: [3]
1 :: 2 :: 3 :: []

Why did we mention the construction operator? It is useful when we begin the pattern matching for lists, we will discuss this a bit later.

Linked List Data Type


The data type for a linked list of integers will be int list; a generic type for a linked list of type objects foowill be foo list.

This implies that all list items must be of the same type. But we can use polymorphic lists (for example, 'a list), which are very useful when writing general functions that work with a “list of something”. (Note that this 'a listdoes not mean that the elements of the list can be of different types, you still cannot, for example, make a list consisting of a mixture of integers and strings. This form of writing means “a list of elements of any type, but of the same type ").

A good example is a function lengthdefined in a moduleList. It does not matter if the list contains integers, or strings, or objects, or small furry animals, the function List.lengthcan be used for any type of list. Thus type List.lenght:

List.length: 'a list -> int


Structures


In C and C ++ there is a concept struct(short for structure). Java has classes that can be used in a similar way, although using them requires more painstaking work.

Take a look at a simple structure in C:

struct pair_of_ints {
  int a, b;
};

The simplest equivalent in OCaml is a tuple , such as (3,4)whose type int * int. Unlike lists, tuples can contain elements of different types, so, for example, (3, “helo”, 'x') has a type int * string * char.

A somewhat more complex embodiment of the structure is to use Si'shnoy recording (record). Records, like C structures, allow you to use named elements. Tuples do not have named elements, but in return store the order in which elements appear in the tuple. Here is the equivalent to a C structure using records:

type pair_of_ints = {a: int; b: int} ;;

The above line defines the type, and below we really create an object of this type:

{a = 3; b = 5}

Note that we used ":" in the type definition and "=" when creating an object of a given type.

Here are some top-level examples (toplevel):
# type pair_of_ints = {a: int; b: int} ;;
type pair_of_ints = {a: int; b: int; }
# {a = 3; b = 5} ;;
-: pair_of_ints = {a = 3; b = 5}
# {a = 3} ;;
Some record field labels are undefined: b

OCaml prohibits leaving entries undefined.

Options (qualified associations and transfers)


The "qualified union" object type does not exist in C, although gcc has some support for it. Here is an example of how often a qualified C join is implemented:

struct foo {
  int type;
#define TYPE_INT 1
#define TYPE_PAIR_OF_INTS 2
#define TYPE_STRING 3
  union {
    int i; // If type == TYPE_INT.
    int pair [2]; // If type == TYPE_PAIR_OF_INTS.
    char * str; // If type == TYPE_STRING.
  } u;
};

As you can see, this is not a very pleasant sight. And in general, it is not too safe: a programmer can use a value u.iat a time when the structure contains a string. And the compiler here will not help to verify that all possible values ​​have been checked inside the switch expression (specifically, this problem is partially solved by using enum). The programmer may forget to set the value of the field type, which will lead to all kinds of fun and games with bugs. And all this is cumbersome.

Here is an elegant and compact version on OCaml:

type foo = Nothing | Int of int | Pair of int * int | String of string ;;

This is a type definition. The first part of each plot limited | called a constructor. It can be called whatever you like, but it must begin with a capital letter. If the constructor can be used to determine the value, it follows of partwhere it typealways starts with a lowercase letter. In the example above Nothing, this is a constant, and all other constructors are used with values.

For a real creation , the following is written:

Nothing
Int 3
Pair (4, 5)
String "hello"
     & c.

Each expression in the example above has a type foo.

Note: it is ofused in defining a type, but NOT in creating elements of a given type.

Continuing. A simple C listing is defined as

enum sign {positive, zero, negative};

It can also be written on OCaml as follows:

type sign = Positive | Zero | Negative ;;


Recursive options (for trees)


Options can be recursive. Most often, this is used to define tree structures. This is where the expressiveness of functional languages ​​becomes visible:

type binary_tree = Leaf of int | Tree of binary_tree * binary_tree ;;


Here are some binary trees. For practice, try drawing them on paper:

Leaf 3
Tree (Leaf 3, Leaf 4)
Tree (Tree (Leaf 3, Leaf 4), Leaf 5)
Tree (Tree (Leaf 3, Leaf 4), Tree (Tree (Leaf 3, Leaf 4), Leaf 5))


Parameterized Options


The binary tree in the examples above stored an integer in each sheet. But what if we want to describe the shape of a binary tree, and we will leave the specification of what to store in leaves for later? We can do this using parameterized (polymorphic) options, something like this:

type 'a binary_tree = Leaf of' a | Tree of 'a binary_tree *' a binary_tree ;;

This is a common type. The exact type that stores integers in each sheet is called int binary_tree. Similarly, the exact type that stores the rows in each sheet is called string binary_tree. In the following example, we will determine the types of some instances at the top level and let the type inference system show their types for us:

# Leaf "hello" ;;
-: string binary_tree = Leaf "hello"
# Leaf 3.0 ;;
-: float binary_tree = Leaf 3.


Note that the type name is written in reverse order. Compare this to the type name for lists, for example int list.

In fact, this is not a coincidence, which is a' listalso written in the same “in reverse order”. Lists are just parameterized options with this (slightly strange) definition:

type 'a list = [] | :: of 'a *' a list (* this is not real OCaml * code)

Although this is not a complete definition. Here is a much more accurate definition:

# type 'a list = Nil | :: of 'a *' a list ;;
type 'a list = Nil | :: of 'a *' a list
# Nil ;;
-: 'a list = Nil
# 1 :: Nil ;;
-: int list = :: (1, Nil)
# 1 :: 2 :: Nil ;;
-: int list = :: (1, :: (2, Nil))

Recall that we said earlier - lists can be written in two ways: either with a small amount of syntactic sugar [1; 2; 3], or more formally, like 1 :: 2 :: 3 :: []. If we look at the definition a' listabove, we will see an explanation of the formal notation.

Lists, structures, options - the result


Name in OCamlExample Type DefinitionUsing example
list int list [1; 2; 3]
tuple int * string (3, "hello")
record type pair = {a: int; b: string}{a = 3; b = "hello"}
variant type foo = Int of int
| Pair of int * string
Int 3
variant type sign = Positive | Zero
| Negative
Positive
zero
parameterized
variant
type 'a my_list = Empty
| Cons of 'a *' a my_list
Cons (1, Cons (2, Empty))


Pattern matching (for data types)


One of the Really Cool Features of functional programming languages ​​is the ability to parse structures and do pattern matching for data. This is not quite a “functional” feature - we can imagine some semblance of a variation in C, which will allow us to do this, but in any case, this is a Cool Feature.

Let's start with a real programming problem: I want to have a writing tool for mathematical expressions such as n * (x + y) and perform symbolic multiplication in order to get n * x + n * y from the expression above.

Let's define the type of these expressions:

type expr = Plus of expr * expr (* means a + b *)
          | Minus of expr * expr (* means a - b *)
          | Times of expr * expr (* means a * b *)
	  | Divide of expr * expr (* means a / b *)
          | Value of string (* "x", "y", "n", etc. *)
	  ;;

An expression of the form n * (x + y) can be written as:

Times (Value "n", Plus (Value "x", Value "y"))


Now let's write a function that will output (Value "n", Plus (Value "x", Value "y"))as something more similar to n * (x+y). To do this, we will write two functions, one of which converts the expression into a beautiful string and one that displays it (the reason for the separation is that perhaps I will need to write the same line to the file and I do not want to repeat the whole function for this).

let rec to_string e =
  match e with
    Plus (left, right) -> "(" ^ (to_string left) ^ "+" ^ (to_string right) ^ ")"
  | Minus (left, right) -> "(" ^ (to_string left) ^ "-" ^ (to_string right) ^ ")"
  | Times (left, right) -> "(" ^ (to_string left) ^ "*" ^ (to_string right) ^ ")"
  | Divide (left, right) -> "(" ^ (to_string left) ^ "/" ^ (to_string right) ^ ")"
  | Value v -> v
  ;;
let print_expr e =
  print_endline (to_string e) ;;

(Note: the operator is ^used to concatenate strings)

And here is the function to display on the screen:
# print_expr (Times (Value "n", Plus (Value "x", Value "y"))) ;;
(n * (x + y))


General form for pattern matching:
match object with
  pattern -> result
| pattern -> result
    ...

The left side of the pattern matching can be simple (as in the case to_stringin the example above), or complex, with attachments. The following example is our function for multiplying expressions in a form n * (x + y)or in a form (x + y) * n. For this we will use nested samples:

let rec multiply_out e =
  match e with
    Times (e1, Plus (e2, e3)) ->
      Plus (Times (multiply_out e1, multiply_out e2),
            Times (multiply_out e1, multiply_out e3))
  | Times (Plus (e1, e2), e3) ->
      Plus (Times (multiply_out e1, multiply_out e3),
            Times (multiply_out e2, multiply_out e3))
  | Plus (left, right) -> Plus (multiply_out left, multiply_out right)
  | Minus (left, right) -> Minus (multiply_out left, multiply_out right)
  | Times (left, right) -> Times (multiply_out left, multiply_out right)
  | Divide (left, right) -> Divide (multiply_out left, multiply_out right)
  | Value v -> Value v
  ;;

Here it is in work:

# print_expr (multiply_out (Times (Value "n", Plus (Value "x", Value "y")))) ;;
((n * x) + (n * y))

How does it work multiply_out? The key point is the first two samples. The first pattern is one Time (e1, Plus (e2, e3))that matches view expressions e1 * (e2 + e3). Now let's look at the right side of the first comparison with the sample and make sure that it is equivalent (e1 * e2) + (e1 * e3).

The second pattern does the same, except for the use of view expressions (e1 + e2) * e3.

The remaining samples do not change the expression, but, in fact, they recursively call multiply_outon subexpressions. This ensures that all subexpressions are multiplied (if you want to multiply only the top level of an expression, then you should replace all remaining patterns with a simple rule e -> e).

Can we do the opposite? (Factorization of the general part of subexpressions?) Of course! (although this is slightly more complicated). The version below works only for the top level of expressions. Of course, you can expand it for all levels of expression (and for more complex cases):

let factorize e =
  match e with
    Plus (Times (e1, e2), Times (e3, e4)) when e1 = e3 -> Times (e1, Plus (e2, e4))
  | Plus (Times (e1, e2), Times (e3, e4)) when e2 = e4 -> Times (Plus (e1, e3), e4)
  | e -> e
  ;;


# factorize (Plus (Times (Value "n", Value "x"), Times (Value "n", Value "y"))) ;;
-: expr = Times (Value "n", Plus (Value "x", Value "y"))

The factorization function shows us a few more features of the language. You can add custodians to each pattern match. A keeper is a condition that follows a match. Pattern matching only works if there is a match for the pattern AND the condition in is whensatisfied.

match object with
  pattern [when condition] -> result
  pattern [when condition] -> result
    ...

The second feature of the language is an operator =that checks for “structural correspondence” between expressions. This means that it goes recursively into each expression and checks them accordingly at all levels.

OCaml can verify at compile time that you have covered all the possible cases in your templates. I changed the type definition by expradding a variant to it Product:
type expr = Plus of expr * expr (* means a + b *)
          | Minus of expr * expr (* means a - b *)
          | Times of expr * expr (* means a * b *)
	  | Divide of expr * expr (* means a / b *)
	  | Product of expr list (* means a * b * c * ... *)
          | Value of string (* "x", "y", "n", etc. *)
	  ;;

Then I tried to copy the function to_stringwithout adding it. OCaml issued the following warning:
Warning: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
Product _

Also popular now: