leventov March 5, 2013 at 12:21

Haskell Fast Principles under the GHC

Tutorial

GHC (Glasgow Haskell Compiler) is the standard Haskell compiler. GHC is one of the coolest compilers in the world, but unfortunately without additional gestures, the programs compiled by it are more like interpreted ones, that is, they work very slowly. However, if you unleash the full potential of the compiler, Haskell is close in performance to a similar code in C.

In this article, I summarize the experience of squeezing the maximum out of GHC when creating the Yarr dataflow framework .

The article consists of rather trivial observations separately, but I think it will suit as a memo or a start in the topic. The principles described are empirical; I do not know the internal structure of the GHC. I don’t know the theory of compilers either, so if for some reason there are generally accepted terms, suggest in the comments.

The current version of GHC is 7.6.1.

One final note: in fact, with “default” GHC, everything is not as bad as described in the first paragraph, if only if you just compiled with the option -O2. Sometimes the compiler makes fast machine code without prompting the programmer, but it depends entirely on the position of the stars , it is enough to correct one letter in the program so that the GHC suddenly ceases to understand how to optimize it.

There are 2 main areas of optimizations that the GHC conducts:

Splitting data types into elementary pieces and transforming functions for working directly with them (see Unboxed types and primitive operations )
Inline and Feature Specialization

They support each other, that is, if the data structures are not split, then functions are built in worse, and vice versa. Therefore, it makes no sense to follow only a couple of the following tips, there will be almost no result.

Data types

Avoid data types with multiple constructors

Including Maybe, Either, and lists , no matter how "standard" they are. The reason is obvious - such data types cannot be decomposed, because it is not statically known what their structure is.

The only exception - enumeration types with multiple constructors without parameters: Bool, Ordering.., Etc.

If you think about it, lists in Haskell - are simply connected, which are not used in imperative languages because of its slowness. So why not use vectors in Haskell instead of them (instead of standard strings - ByteString ), which are hardly inferior to lists in terms of convenience and capabilities. For replacing short lists with several elements, they are even better.vectors of statically known length .

See also: HaskellWiki - Performance / Data types .

Declare fields in structures to be rigorous unless you know exactly why they should be lazy

A regular, lazy field is a reference to a closure that will return the value of the field. Strict field - a reference to the field value. The built-in (unboxed) field is the value itself (if it is also a structure, then it is completely embedded in the parent). An inline field is obtained by adding a strict directive {-# UNPACK #-}.

Without the 1st option, as in the case of lists, it’s pretty good at living. In each case, make an informed choice in favor of 2 or 3 options. Fields by reference are sometimes more memory efficient, faster when creating structures, checking for equality. Built-in fields are faster when reading. If you often need built-in fields (most likely), compile with the flag -funbox-strict-fieldsand add a directive to the "reference" fields {-# NOUNPACK #-}.

See also: Haskell Style Guide / Dealing with laziness .

Explicitly calculate structures before reuse

In the example, the structure is used tmany times when displaying a vector:

import Data.Vector as V
...
data T = T !Int !Int
fSum (T x y) = x + y
...
    -- t :: T
    V.map (+ (fSum t)) longVectorOfInts

GHC does not take structure calculation out of the “loop”:

-- Как примерно GHC скомпилирует последнюю строчку
V.map (\x -> case t of (T y z) -> x + y + z) longVectorOfInts

In machine code, at each iteration, we will follow the link and read the same fields.

However, if you “tell” the compiler outside the loop, it does not repeat the calculations already performed somewhere above in the ASD (abstract syntax tree):

case t of (T x y) -> V.map (+ (fSum t)) longVectorOfInts
-- GHC скомпилирует нормально:
case t of (T x y) -> let s = x + y in V.map (\x -> x + s) longVectorOfInts

Instead of writing cases manually, you can use the generic control flow from the deepseq package .

The example is not very successful, because the structure is tused once and you can simply select it fSum tas a variable, but I think the essence of the technique is clear.

Functions

Rule number 1: add a directive to all small, often (on runtime) called functions that live far from the root of the ASD INLINE. If functions are not embedded, the tips in this section become harmful.

More functions, lambdas, currying!

Or put it smartly, use continuation passing style .

The advantage of closures over data structures is that the GHC is much more positively tuned for combining and reducing them.

For example, in the Yarr framework, I use a whole system of functions to parameterize calculations instead of configuration structures. The diagram shows the hierarchy of function types , the arrow “A → B” means “B is obtained by partial application from A”, the arguments are burgundy, which in turn are also functions:

To ensure that the partially applied function will be specialized, break the arguments with a lambda:

f :: A -> B -> C
{-# INLINE f #-}
f a b =
    ... -- body
  where ... -- declarations
-->
f a \b ->
    let ... -- declarations
    in ... -- body

This trick is described in the INLINE pragma section of the GHC documentation.

Be careful with generic functions

The call of a generalized function (a la real polymorphism in imperative languages) cannot be inline by definition. Therefore, such functions must necessarily be specialized in the place of the call. If the GHC knows exactly the type of argument and the generic function itself is supplied with a directive INLINE, then it usually does. When extending generalization to several levels of functions, it does not specialize.

Take everything you can to the level of types

If the goal is to get a productive program, it is obviously better to compile it longer than to execute. Haskell has many mechanisms for transferring computations to the compilation stage .

In Yarr, I used marker index types for static dual dispatch , static arithmetic (there are ready-made arithmetic and logic in the type-level library ), and some other things.

Compilation flags

-O2- is always.

About -funbox-strict-fieldsit is written above.

Who except in rare cases, a faster machine code backend generates the LLVM: -fllvm -optlo-O3.

To prevent the GHC from doing fortune-telling on the coffee grounds and incorporating whatever they say, Ben Lippmeier recommends using flags -funfolding-use-threshold1000 -funfolding-keeness-factor1000.

Tags: