
Haskell Fast Principles under the GHC
- Tutorial
GHC (Glasgow Haskell Compiler) is the standard Haskell compiler. GHC is one of the coolest compilers in the world, but unfortunately without additional gestures, the programs compiled by it are more like interpreted ones, that is, they work very slowly. However, if you unleash the full potential of the compiler, Haskell is close in performance to a similar code in C.
In this article, I summarize the experience of squeezing the maximum out of GHC when creating the Yarr dataflow framework .
The article consists of rather trivial observations separately, but I think it will suit as a memo or a start in the topic. The principles described are empirical; I do not know the internal structure of the GHC. I don’t know the theory of compilers either, so if for some reason there are generally accepted terms, suggest in the comments.
The current version of GHC is 7.6.1.
One final note: in fact, with “default” GHC, everything is not as bad as described in the first paragraph, if only if you just compiled with the option
There are 2 main areas of optimizations that the GHC conducts:
They support each other, that is, if the data structures are not split, then functions are built in worse, and vice versa. Therefore, it makes no sense to follow only a couple of the following tips, there will be almost no result.
Including Maybe, Either, and lists , no matter how "standard" they are. The reason is obvious - such data types cannot be decomposed, because it is not statically known what their structure is.
The only exception - enumeration types with multiple constructors without parameters:
If you think about it, lists in Haskell - are simply connected, which are not used in imperative languages because of its slowness. So why not use vectors in Haskell instead of them (instead of standard strings - ByteString ), which are hardly inferior to lists in terms of convenience and capabilities. For replacing short lists with several elements, they are even better.vectors of statically known length .
See also: HaskellWiki - Performance / Data types .
A regular, lazy field is a reference to a closure that will return the value of the field. Strict field - a reference to the field value. The built-in (unboxed) field is the value itself (if it is also a structure, then it is completely embedded in the parent). An inline field is obtained by adding a strict directive
Without the 1st option, as in the case of lists, it’s pretty good at living. In each case, make an informed choice in favor of 2 or 3 options. Fields by reference are sometimes more memory efficient, faster when creating structures, checking for equality. Built-in fields are faster when reading. If you often need built-in fields (most likely), compile with the flag
See also: Haskell Style Guide / Dealing with laziness .
In the example, the structure is used
GHC does not take structure calculation out of the “loop”:
In machine code, at each iteration, we will follow the link and read the same fields.
However, if you “tell” the compiler outside the loop, it does not repeat the calculations already performed somewhere above in the ASD (abstract syntax tree):
Instead of writing cases manually, you can use the generic control flow from the deepseq package .
The example is not very successful, because the structure is
Rule number 1: add a directive to all small, often (on runtime) called functions that live far from the root of the ASD
Or put it smartly, use continuation passing style .
The advantage of closures over data structures is that the GHC is much more positively tuned for combining and reducing them.
For example, in the Yarr framework, I use a whole system of functions to parameterize calculations instead of configuration structures. The diagram shows the hierarchy of function types , the arrow “A → B” means “B is obtained by partial application from A”, the arguments are burgundy, which in turn are also functions:

To ensure that the partially applied function will be specialized, break the arguments with a lambda:
This trick is described in the INLINE pragma section of the GHC documentation.
The call of a generalized function (a la real polymorphism in imperative languages) cannot be inline by definition. Therefore, such functions must necessarily be specialized in the place of the call. If the GHC knows exactly the type of argument and the generic function itself is supplied with a directive
If the goal is to get a productive program, it is obviously better to compile it longer than to execute. Haskell has many mechanisms for transferring computations to the compilation stage .
In Yarr, I used marker index types for static dual dispatch , static arithmetic (there are ready-made arithmetic and logic in the type-level library ), and some other things.
About
Who except in rare cases, a faster machine code backend generates the LLVM:
To prevent the GHC from doing fortune-telling on the coffee grounds and incorporating whatever they say, Ben Lippmeier recommends using flags
In this article, I summarize the experience of squeezing the maximum out of GHC when creating the Yarr dataflow framework .
The article consists of rather trivial observations separately, but I think it will suit as a memo or a start in the topic. The principles described are empirical; I do not know the internal structure of the GHC. I don’t know the theory of compilers either, so if for some reason there are generally accepted terms, suggest in the comments.
The current version of GHC is 7.6.1.
One final note: in fact, with “default” GHC, everything is not as bad as described in the first paragraph, if only if you just compiled with the option
-O2
. Sometimes the compiler makes fast machine code without prompting the programmer, but it depends entirely on the position of the stars , it is enough to correct one letter in the program so that the GHC suddenly ceases to understand how to optimize it.There are 2 main areas of optimizations that the GHC conducts:
- Splitting data types into elementary pieces and transforming functions for working directly with them (see Unboxed types and primitive operations )
- Inline and Feature Specialization
They support each other, that is, if the data structures are not split, then functions are built in worse, and vice versa. Therefore, it makes no sense to follow only a couple of the following tips, there will be almost no result.
Data types
Avoid data types with multiple constructors
Including Maybe, Either, and lists , no matter how "standard" they are. The reason is obvious - such data types cannot be decomposed, because it is not statically known what their structure is.
The only exception - enumeration types with multiple constructors without parameters:
Bool
, Ordering
.., Etc. If you think about it, lists in Haskell - are simply connected, which are not used in imperative languages because of its slowness. So why not use vectors in Haskell instead of them (instead of standard strings - ByteString ), which are hardly inferior to lists in terms of convenience and capabilities. For replacing short lists with several elements, they are even better.vectors of statically known length .
See also: HaskellWiki - Performance / Data types .
Declare fields in structures to be rigorous unless you know exactly why they should be lazy
A regular, lazy field is a reference to a closure that will return the value of the field. Strict field - a reference to the field value. The built-in (unboxed) field is the value itself (if it is also a structure, then it is completely embedded in the parent). An inline field is obtained by adding a strict directive
{-# UNPACK #-}
. Without the 1st option, as in the case of lists, it’s pretty good at living. In each case, make an informed choice in favor of 2 or 3 options. Fields by reference are sometimes more memory efficient, faster when creating structures, checking for equality. Built-in fields are faster when reading. If you often need built-in fields (most likely), compile with the flag
-funbox-strict-fields
and add a directive to the "reference" fields {-# NOUNPACK #-}
. See also: Haskell Style Guide / Dealing with laziness .
Explicitly calculate structures before reuse
In the example, the structure is used
t
many times when displaying a vector:import Data.Vector as V
...
data T = T !Int !Int
fSum (T x y) = x + y
...
-- t :: T
V.map (+ (fSum t)) longVectorOfInts
GHC does not take structure calculation out of the “loop”:
-- Как примерно GHC скомпилирует последнюю строчку
V.map (\x -> case t of (T y z) -> x + y + z) longVectorOfInts
In machine code, at each iteration, we will follow the link and read the same fields.
However, if you “tell” the compiler outside the loop, it does not repeat the calculations already performed somewhere above in the ASD (abstract syntax tree):
case t of (T x y) -> V.map (+ (fSum t)) longVectorOfInts
-- GHC скомпилирует нормально:
case t of (T x y) -> let s = x + y in V.map (\x -> x + s) longVectorOfInts
Instead of writing cases manually, you can use the generic control flow from the deepseq package .
The example is not very successful, because the structure is
t
used once and you can simply select it fSum t
as a variable, but I think the essence of the technique is clear.Functions
Rule number 1: add a directive to all small, often (on runtime) called functions that live far from the root of the ASD
INLINE
. If functions are not embedded, the tips in this section become harmful.More functions, lambdas, currying!
Or put it smartly, use continuation passing style .
The advantage of closures over data structures is that the GHC is much more positively tuned for combining and reducing them.
For example, in the Yarr framework, I use a whole system of functions to parameterize calculations instead of configuration structures. The diagram shows the hierarchy of function types , the arrow “A → B” means “B is obtained by partial application from A”, the arguments are burgundy, which in turn are also functions:

To ensure that the partially applied function will be specialized, break the arguments with a lambda:
f :: A -> B -> C
{-# INLINE f #-}
f a b =
... -- body
where ... -- declarations
-->
f a \b ->
let ... -- declarations
in ... -- body
This trick is described in the INLINE pragma section of the GHC documentation.
Be careful with generic functions
The call of a generalized function (a la real polymorphism in imperative languages) cannot be inline by definition. Therefore, such functions must necessarily be specialized in the place of the call. If the GHC knows exactly the type of argument and the generic function itself is supplied with a directive
INLINE
, then it usually does. When extending generalization to several levels of functions, it does not specialize.Take everything you can to the level of types
If the goal is to get a productive program, it is obviously better to compile it longer than to execute. Haskell has many mechanisms for transferring computations to the compilation stage .
In Yarr, I used marker index types for static dual dispatch , static arithmetic (there are ready-made arithmetic and logic in the type-level library ), and some other things.
Compilation flags
-O2
- is always. About
-funbox-strict-fields
it is written above. Who except in rare cases, a faster machine code backend generates the LLVM:
-fllvm -optlo-O3
. To prevent the GHC from doing fortune-telling on the coffee grounds and incorporating whatever they say, Ben Lippmeier recommends using flags
-funfolding-use-threshold1000 -funfolding-keeness-factor1000
.