# Haskell applicative parsers

- Tutorial

# Motivation

When I first started to learn Haskell, I was very annoyed by the widespread use of complex abstractions instead of some specific solutions. It seemed to me that it would be much better to always follow the KISS principle and write bicycles using elementary language constructions than to sort through all these types of classes in order to write one supposedly convenient structure somewhere in the end.

I lacked a good example where the efforts spent on mastering the "materiel" would pay off. For me, one of the most successful such examples was the parsers. Now I often talk about them when they ask me for which common tasks you can beautifully use Haskell.

I want to suggest that beginners also go through this path and create from scratch a small base of functions for conveniently implementing parsers, and then use it to write your own parser, the code of which will almost literally repeat the grammar that is being analyzed.

I hope someone can help it to overcome the fear of abstractions and teach *appropriate to* use them (yes, I still believe that it is sometimes more effective to write a bicycle).

I have no goal and no desire to make a Haskell course out of the article from scratch, so I assume that the reader is familiar with the syntax and independently developed simple programs. Just in case, I will briefly talk about the type classes before proceeding to the description of the implementation.

For those who have never written on Haskell, but want to understand what is going on here, I recommend that you first look at the corresponding page on Learn X in Y minutes . As an excellent Russian-language book for beginners, I advise Denis Shevchenko, "About Haskell Humanly" .

I will try to use the most simple language constructs that beginners can understand. At the end of the article there is a link to the source repository, where in some parts of the code a more convenient and short record is used, which may be less clear at a glance.

And yes, gentlemen, Haskelists, many things are explained very simply and clumsily, for particular cases, not very abstract, without the use of terms from category theory and other scary words. I am glad that you know them and of course easily mastered them. I also know them, but I do not consider it necessary to throw out such a volume of information in this context to unprepared readers.

# Type Classes

Type classes in Haskell have nothing to do with classes in C ++ and other object-oriented languages. If we draw an analogy with OOP, the types of classes are more like the overloading of methods and functions.

Classes define what actions can be performed with objects of the types that are included in the class. For example, all numbers can be compared by equality, but everything can be ordered except complex ones, and functions in general cannot be compared at all. The class of types that can be compared, called `Eq`

, ordered - `Ord`

(types do not have to be numeric). What can be printed by translating into a string belongs to a class `Show`

, it has an “opposite” class `Read`

that determines how to convert strings to objects of the desired type.

For a set of standard types of classes (such as `Eq`

, `Show`

, `Read`

...) you can ask the compiler to implement the desired functionality in the standard way using a keyword `deriving`

after determining the type:

```
data Point = Point
{ xCoord :: Float
, yCoord :: Float
} deriving (Eq, Show)
```

You can define your own type classes:

```
class PrettyPrint a where
pPrint :: a -> String
```

Here `PrettyPrint`

is the class name, `a`

is the type variable. The keyword is `where`

followed by a list of the so-called class methods, i.e. functions that can be applied to objects of type from this class.

In order to denote the belonging of a data type to a class, the following construction is used:

```
instance PrettyPrint Point where
pPrint (Point x y) = "(" ++ show x ++ ", " ++ show y ++ ")"
```

The language allows you to specify restrictions on the types of classes to which the function arguments should refer:

```
showVsPretty :: (Show a, PrettyPrint a) => a -> (String, String)
showVsPretty x = (show x, pPrint x)
```

For each function call, the compiler checks whether these requirements for the type are met, and if it fails, it displays an error (of course, this happens at the compilation stage).

```
>>> showVsPretty (Point 2 3)
("Point {xCoord = 2.0, yCoord = 3.0}","(2.0, 3.0)")
>>> showVsPretty "str"
error:
No instance for (PrettyPrint [Char]) arising from a use of ‘showVsPretty’
```

# Implementation

The parser receives as input a string that should be parsed according to predefined rules and get the value of the type we need (for example, an integer). In this case, the input line may not end, and the remainder will serve as an input for further parsing. In addition, our parser will generally be non-deterministic, i.e. will return several possible parse results as a list.

For the description of one result of the parser, a tuple of two elements is suitable `(String, a)`

, where `a`

is a type variable that can denote any custom type.

Since the parser parses the string according to some rules, we describe it as a function that takes a string as input and returns a list of results:

`newtype Parser a = Parser { unParser :: String -> [(String, a)] }`

We will consider the parsing successful if the list of results consists of one element and the input string has been completely processed. We implement an auxiliary function that attempts to perform an unambiguous parsing of the entire string:

```
parseString :: String -> Parser a -> Maybe a
parseString s (Parser p) = case (p s) of
[("", val)] -> Just val
_ -> Nothing
```

## Simple parsers

We implement several simple parsers, which will then come in handy in building more complex combinations.

We begin by parsing a single character that must satisfy the predicate. If the input string is empty, then the result of the work is an empty list. Otherwise, check the value of the predicate on the first character of the string. If the value is returned `True`

, the parsing result is this character; return it along with the rest of the string. Otherwise, parsing also ends in failure.

```
predP :: (Char -> Bool) -> Parser Char
predP p = Parser f
where
f "" = []
f (c : cs) | p c = [(cs, c)]
| otherwise = []
```

Now we can write a parser that accepts a specific character at the beginning of a line. To do this, use the newly written one `predP`

and pass it to it as an argument a function that compares its argument with the symbol we need:

```
charP :: Char -> Parser Char
charP char = predP (\c -> c == char)
```

The following simplest case: a parser that accepts only a specific string entirely. Let's call him `stringP`

. The function inside the parser compares the input string with the required one and, if the lines are equal, returns a list of one element: a pair of empty lines (nothing left at the input) and the original one. Otherwise, the parsing failed, and an empty list of results is returned.

```
stringP :: String -> Parser String
stringP s = Parser f
where
f s' | s == s' = [("", s)]
| otherwise = []
```

Quite often, you need to skip characters that have a certain property while they are at the beginning of a line (for example, whitespace characters). At the same time, the result of the analysis is not important to us and will not be useful in the future. Let's write a function `skip`

that skips the initial characters of the string, while the true value of the predicate is preserved. As a result of the analysis we use an empty tuple.

```
skip :: (Char -> Bool) -> Parser ()
skip p = Parser (\s -> [(dropWhile p s, ())])
```

The following two parsers are very similar to each other. Both check the input string prefix, only the first returns the prefix if successful, and the second returns an empty tuple, i.e. allows you to skip an arbitrary string at the beginning of the entry. For implementation, the function `isPrefixOf`

defined in the module is used `Data.List`

.

```
prefixP :: String -> Parser String
prefixP s = Parser f
where
f input = if s `isPrefixOf` input
then [(drop (length s) input, s)]
else []
skipString :: String -> Parser ()
skipString s = Parser f
where
f input = if s `isPrefixOf` input
then [(drop (length s) input, ())]
else []
```

A little later, we will look at a simpler implementation of the last function and get rid of code duplication.

## Parser as a functor

We can distinguish a whole class of container types for which the following is true: if you know how to transform objects inside a container, then you can convert the containers themselves. The simplest example is a list as a container and a function `map`

that exists in almost all high-level languages. Indeed, you can go through all the elements of a type list `[a]`

, apply a function to each `a -> b`

and get a type list `[b]`

.

This type class is called `Functor`

; the class has one method `fmap`

:

```
class Functor f where
fmap :: (a -> b) -> f a -> f b
```

Suppose that we already know how to parse strings into objects of a certain type `a`

, and, moreover, we know how to convert objects of a type `a`

into objects of a type `b`

. Is it possible to say that then there is a parser for type objects `b`

?

If we express this in the form of a function, then it will have the following type:

`(a -> b) -> Parser a -> Parser b`

This type coincides with the type of the function `fmap`

, so we will try to make the parser a functor. Create a parser of type values from scratch `b`

that will first call the first parser (we already have one), and then apply the function to the results of its parsing.

```
instance Functor Parser where
fmap :: (a -> b) -> Parser a -> Parser b
fmap f (Parser p1) = Parser p2
where
p2 :: String -> [(String, b)]
p2 s = convert (p1 s)
convert :: [(String, a)] -> [(String, b)]
convert results = map (\(s, val) -> (s, f val)) results
```

At the function `fmap`

there is a convenient infix synonym: `fmap f x == f <$> x`

.

If, as an argument, we `fmap`

use a function that simply replaces its first argument with a new value, we will get another useful operation, which has already been implemented for all functors even in two copies (they differ only in the order of the arguments):

```
(<$) :: Functor f => a -> f b -> f a
($>) :: Functor f => f a -> b -> f b
```

Remember the parser that skips a certain string ( `skipString`

)? Now you can implement it as follows:

```
skipString :: String -> Parser ()
skipString s = () <$ prefixP s
```

## Parser Combinations

In Haskell, all functions are curried by default and allow partial use. This means that the function of the `n`

arguments is actually a function of one argument, which returns a function of the `n-1`

arguments:

```
cons :: Int -> [Int] -> [Int]
cons = (:)
cons1 :: [Int] -> [Int]
cons1 = cons 1 -- функция cons применена частично
```

Apply the function of three arguments to some value inside the parser, using `fmap`

. Types will be as follows:

```
f :: c -> a -> b
p :: Parser c
(fmap f p) :: Parser (a -> b)
```

The parser of function has turned out ?! Of course, it is possible that the input line actually contains the function representation, but I would like to be able to use this function, or rather combine parsers, `Parser (a -> b)`

and `Parser a`

to get `Parser b`

:

`applyP :: Parser (a -> b) -> Parser a -> Parser b`

The type of this function is very similar to the type `fmap`

, only the function itself that needs to be applied is also in the container. This gives an intuitive understanding of how the implementation of the function should look like `applyP`

: get the function from the container (as a result of applying the first parser), get the values to which the function should be applied (the result of applying the second parser) and “pack” the values converted using this function back to container (create a new parser). In the implementation we will use list comprehension:

```
applyP :: Parser (a -> b) -> Parser a -> Parser b
applyP (Parser p1) (Parser p2) = Parser f
where f s = [ (sx, f x) | (sf, f) <- p1 s, -- p1 применяется к исходной строке
(sx, x) <- p2 sf] -- p2 применяется к строке, оставшейся после предыдущего разбора
```

There is a class `Applicative`

that has a method with the same prototype. The second class method is called `pure`

and is used to “wrap” or “lift” ( *lift* ) a value, including a functional one. In the case of implementation for the parser, the function `pure`

adds its argument to the result of the parser, without changing the input string.

```
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
instance Applicative Parser where
pure x = Parser (\s -> [(s, x)])
pf <*> px = Parser (\s -> [ (sx, f x) | (sf, f) <- unParser pf $ s,
(sx, x) <- unParser px $ sf])
```

The function `applyP`

is `<*>`

from the class `Applicative`

. Types belonging to this class are called applicative functors.

For applicative functors, two auxiliary functions are implemented that will be useful to us:

```
(*>) :: f a -> f b -> f b
(<*) :: f a -> f b -> f a
```

These functions perform two consecutive actions and return the result of only one of them. For parsers, they can be used, for example, in order to skip leading gaps before parsing the part of the line that carries the meaning.

By combining `<$>`

and `<*>`

creating very convenient designs. Consider the following data type:

```
data MyStructType = MyStruct
{ field1 :: Type1
, field2 :: Type2
, field3 :: Type3
}
```

The value constructor `MyStruct`

is also a function, in this case it has a type `Type1 -> Type2 -> Type3 -> MyStructType`

. You can work with the constructor as with any other function. Suppose that parsers are already written for the structure field types:

```
parser1 :: Parser Type1
parser2 :: Parser Type2
parser3 :: Parser Type3
```

Using the function, `fmap`

you can partially apply `MyStruct`

to the first of these parsers:

```
parserStruct' :: Parser (Type2 -> Type3 -> MyStructType)
parserStruct' = MyStruct <$> parser1
```

Let's try to continue to use the function, which is now "inside" the parser. For this you need to use `<*>`

:

```
parserStruct'' :: Parser (Type3 -> MyStructType)
parserStruct'' = parserStruct' <*> parser2
parserStruct :: Parser MyStructType
parserStruct = parserStruct'' <*> parser3
```

As a result, we got a parser for the whole structure (of course, here we use the assumption that in the initial line of the presentation of its fields go in a row). The same can be done in one line:

```
parserStruct :: Parser MyStructType
parserStruct = MyStruct <$> parser1 <*> parser2 <*> parser3
```

Such designs will often be found in the example of use.

Now suppose that we are trying to write a parser that parses simple arithmetic expressions in which integers and identifiers can be present as operands. Create a separate type for them `Operand`

:

```
data Operand
= IntOp Int
| IdentOp String
```

If we can already parse integers and identifiers (for example, as in C), then we need *one* parser for operands that can parse one or the other. This parser is an alternative of the other two, so we need a function that can combine parsers so that the results of their work are combined. The result of the parser is a list, and the union of lists is their concatenation. We implement the function `altP`

combining two parsers:

```
altP :: Parser a -> Parser a -> Parser a
altP (Parser p1) (Parser p2) = Parser (\s -> p1 s ++ p2 s)
```

Then the operand parser can be implemented using this function (here it is assumed that `parserInt`

they are `parserIdent`

already described somewhere:

```
parserOperand :: Parser Operand
parserOperand = altP parserIntOp parserIdentOp
where
parserIntOp = IntOp <$> parserInt
parserIdentOp = IdentOp <$> parserIdent
```

Of course, for the alternatives have already come up with a separate class, which is called `Alternative`

. It has another method `empty`

, describing a neutral element for an alternative operation. In our case, this is a parser that never parses anything, i.e. always returns an empty list of results. For the parser, the implementation of the class methods `Alternative`

looks like this:

```
class Applicative f => Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
instance Alternative Parser where
empty = Parser (const [])
px <|> py = Parser (\s -> unParser px s ++ unParser py s)
```

An operation `<|>`

is a function `altP`

, only in the infix notation, which is more convenient to use, combining several parsers in a row.

For all types in this class two functions are implemented, `some`

and `many`

types `f a -> f [a]`

. Each of them can be expressed through the other:

```
some v = (:) <$> v <*> many v
many v = some v <|> pure []
```

In terms of parsers, these functions allow you to parse data sequences, if you know how to parse one data element. If used, the `some`

sequence must be non-empty.

# Usage example

Now we are ready to write our parser, for example, for simple arithmetic expressions with the following grammar:

```
expr ::= constExpr | binOpExpr | negExpr
const ::= int
int ::= digit{digit}
digit ::= '0' | ... | '9'
binOpExpr ::= '(' expr ' ' binOp ' ' expr ')'
binOp ::= '+' | '*'
negExpr ::= '-' expr
```

The expression consists of integer constants, unary minus and two infix binary operations: addition and multiplication. Brackets are required around an expression with a binary operation, the operation symbol is separated from the operands by exactly one space, leading and trailing spaces are not allowed.

Examples of correct expression writing:

```
"123"
"-(10 + 42)"
"(1 + ((2 + 3) * (4 + 5)))"
```

Examples of incorrect entries:

```
" 666 "
"2 + 3"
"(10 * 10)"
```

We declare the necessary data types (the expression itself and the binary operation):

```
data Expr = ConstExpr Int
| BinaryExpr Expr Operator Expr
| NegateExpr Expr
data Operator = Add | Mul
```

You can start parsing! The expression itself consists of three alternatives. So we write:

```
-- expr ::= constExpr | binOpExpr | negExpr
exprParser :: Parser Expr
exprParser = constParser <|> binParser <|> negParser
```

The constant is a positive integer. In our data type, it is "wrapped" in the constructor, so we cannot use the parser for an integer directly, but we can use `fmap`

it to get the value of the desired type.

```
-- const ::= int
constParser :: Parser Expr
constParser = ConstExpr <$> intParser
```

The integer, according to the grammar, is represented as a non-empty sequence of numbers. To parse a single digit, we use an auxiliary function `predP`

and a predicate `isDigit`

from a module `Data.Char`

. Now, to build a parser to parse a sequence of numbers, we use the function `some`

(not `many`

, because there must be at least one digit). The result of this parser returns a list of all possible parsing options, starting with the longest entry. For example, if the input string "123ab", the results list will be as follows: `[("ab", "123"), ("3ab", "12"), ("23ab", "1")]`

. We need to parse the longest sequence of numbers and convert it to a type `Int`

. The whole implementation is as follows:

```
-- int ::= digit{digit}
-- digit ::= '0' | ... | '9'
intParser :: Parser Int
intParser = Parser $ \s -> let res = unParser (some digitParser) s in
case res of
[] -> []
((rest, i) : xs) -> [(rest, read i)]
where
digitParser = predP isDigit
```

The next variant of the expression is the use of a binary operation. According to the grammar, the input string must first include an opening bracket, the first operand, a space, an operation symbol, another space, the second operand, and a closing bracket. To parse individual characters (brackets and spaces) use the function `charP`

. Operands are expressions, and there is already a parser to parse them ( `exprParser`

). To parse the binary operation symbol, we will describe the auxiliary parser just below. It remains to carefully combine this set of parsers. At the beginning and at the end of the expression there should be brackets: you need to check this, but discard the result itself. For this we use `*>`

and `<*`

:

```
binParser :: Parser Expr
binParser = charP '(' *> ??? <* charP ')'
```

Between these parsers for brackets, the construction of the expression with the help of the constructor `BinaryExpr`

and the parsers for the expression and operation should occur . Let's not forget about the spaces around the operation symbol, using the same method as for parentheses. This part is as follows:

```
BinaryExpr <$> exprParser -- первый операнд
<*> (charP ' ' *> binOpParser <* charP ' ') -- операция, окружённая пробелами
<*> exprParser -- второй операнд
```

Substitute this expression instead of question marks:

```
-- binOpExpr ::= '(' expr ' ' binOp ' ' expr ')'
binParser :: Parser Expr
binParser =
charP '(' *>
(BinaryExpr <$> exprParser
<*> (charP ' ' *> binOpParser <* charP ' ')
<*> exprParser
)
<* charP ')'
```

A binary operation is either a symbol `+`

that understands the value `Add`

, or `*`

that understands `Mul`

:

```
-- binOp ::= '+' | '*'
binOpParser :: Parser Operator
binOpParser = plusParser <|> multParser
where
plusParser = charP '+' $> Add
multParser = charP '*' $> Mul
```

Remained the simplest part of the grammar, the negation of expression. We `-`

do the same with the symbol as with brackets and spaces. Next, apply the constructor `NegateExpr`

to the result of recursive parsing:

```
-- negExpr ::= '-' expr
negParser = charP '-' *> (NegateExpr <$> exprParser)
```

So, all parts of the parser are implemented. The code in many ways resembles a grammar and completely coincides with it in structure.

The source code is available on GitLab: https://gitlab.com/fierce-katie/applicative-parsers-demo .

It is easier to estimate its volume and degree of expressiveness there, since there are far fewer comments. The project can be assembled with the Stack utility and you can launch a primitive interpreter using the parser we wrote:

```
$ stack build
$ stack exec demo-parser
```

For those who want to practice further on their own, I can advise the following:

- Grammar can be improved in every way, for example, to allow leading and trailing spaces, add new operations, etc.
- The parser translates the string into the internal representation of the expression. This expression can be calculated and converted by the interpreter so that it prints not the result of the parsing but the result of the calculation.
- To explore the possibilities of libraries
`parsec`

,`attoparsec`

,`applicative-parsec`

and`optparse-applicative`

try to apply them.

Thanks for attention!