Parallel quick sort on Haskell and how hard it was to write

Original author: Jon Harrop
  • Transfer
Note perev .: This is a translation of the story of how hard it was to write parallel quicksort in Haskell. The original article was written in 2010, but it seems to me that it is still instructive and largely relevant.

There are many examples of how Haskell makes simple problems difficult. Probably the most famous of them is the sieve of Eratosthenes, which is easy to write in any imperative language, but so difficult to write in Haskell that almost all the solutions that were taught at universities and used in research for the past 18 years turned out to be wrong. Melissa O'Neill drew attention to their failure in her important scientific work " The Real Sieve of Eratosthenes". It provides an excellent description of what is wrong with the old approaches and how to fix them. Melissa’s decision was to use the priority queue to implement the sieve. The correct solution turned out to be 10 times longer than the much simpler solution F # and a whole 100 times longer than the original mutilated algorithm in Haskell.

Today quick sorting is the new sieve of Eratosthenes. And Haskell programmers have again overcome the inability of the language to express this algorithm by mutilating the latter . The new version is slower by p Ordinary, but it can be easily recorded on Haskell.

qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

This code is completely inconsistent with the essence of the real quick sort algorithm, which makes it so efficient (see Tony Hoar's original article on fast sorting , 1962 ). Namely, partitioning an array without additional memory allocation [in-place partitioning using swaps].

Faced with writing a general-purpose parallel parallel quicksort in Haskell, Jim Apple (who is writing a Haskell Ph.D. at the University of California, Davis, UC Davis) started the case by writing the following code:

import Data.HashTable as H
import Data.Array.IO
import Control.Parallel.Strategies
import Control.Monad
import System
exch a i r =
    do tmpi <- readArray a i
       tmpr <- readArray a r
       writeArray a i tmpr
       writeArray a i tmpi
bool a b c = if c then a else b
quicksort arr l r =
  if r <= l then return () else do
    i <- loop (l-1) r =<< readArray arr r
    exch arr i r
    withStrategy rpar $ quicksort arr l (i-1)
    quicksort arr (i+1) r
  where
    loop i j v = do
      (i', j') <- liftM2 (,) (find (>=v) (+1) (i+1)) (find (<=v) (subtract 1) (j-1))
      if (i' < j') then exch arr i' j' >> loop i' j' v
                   else return i'
    find p f i = if i == l then return i
                 else bool (return i) (find p f (f i)) . p =<< readArray arr i
main = 
    do [testSize] <- fmap (fmap read) getArgs
       arr <- testPar testSize
       ans <- readArray arr  (testSize `div` 2)
       print ans
testPar testSize =
    do x <- testArray testSize
       quicksort x 0 (testSize - 1)
       return x
testArray :: Int -> IO (IOArray Int Double)
testArray testSize = 
    do ans <- newListArray (0,testSize-1) [fromIntegral $ H.hashString $ show i | i <- [1..testSize]]
       return ans

This algorithm uses Haskell’s parallel “ strategies ”. This concept was developed to give Haskell programmers more control over parallelization, but it turned out that the only available implementation was memory and no one was able to make it work in this code: Jim's solution contains a multithreading error [concurrency], due to which it returns incorrect results on almost every call.

Then Peaker proposed his solution on Haskell:

import Data.Array.IO
import Control.Monad
import Control.Concurrent
bool t _f True = t
bool _t f False = f
swap arr i j = do
  (iv, jv) <- liftM2 (,) (readArray arr i) (readArray arr j)
  writeArray arr i jv
  writeArray arr j iv
parallel fg bg = do
  m <- newEmptyMVar
  forkIO (bg >> putMVar m ())
  fg >> takeMVar m
sort arr left right = when (left < right) $ do
  pivot <- read right
  loop pivot left (right - 1) (left - 1) right
  where
    read = readArray arr
    sw = swap arr
    find n pred i = bool (find n pred (n i)) (return i) . pred i =<< read i
    move op d i pivot = bool (return op)
                        (sw (d op) i >> return (d op)) =<<
                        liftM (/=pivot) (read i)
    loop pivot oi oj op oq = do
      i <- find (+1) (const ( cell>pivot && idx/=left) oj
      if i < j
        then do
          sw i j
          p <- move op (+1) i pivot
          q <- move oq (subtract 1) j pivot
          loop pivot (i + 1) (j - 1) p q
        else do
          sw i right
          forM_ (zip [left..op-1] [i-1,i-2..]) $ uncurry sw
          forM_ (zip [right-1,right-2..oq+1] [i+1..]) $ uncurry sw
          let ni = if left >= op then i + 1 else right + i - oq
              nj = if right-1 <= oq then i - 1 else left + i - op
          let thresh = 1024
              strat = if nj - left < thresh || right - ni < thresh
                      then (>>)
                      else parallel
          sort arr left nj `strat` sort arr ni right
main = do
  arr <- newListArray (0, 5) [3,1,7,2,4,8]
  getElems arr >>= print
  sort (arr :: IOArray Int Int) 0 5
  getElems arr >>= print

This solution also turned out to be with bugs. Firstly, it contains a more subtle concurrency bug, which leads to incorrect results only occasionally. Picker fixed this bug in the following code:

import System.Time
import System.Random
import Data.Array.IO
import Control.Monad
import Control.Concurrent
import Control.Exception
import qualified Data.List as L
bool t _ True = t
bool _ f False = f
swap arr i j = do
  (iv, jv) <- liftM2 (,) (readArray arr i) (readArray arr j)
  writeArray arr i jv
  writeArray arr j iv
background task = do
  m <- newEmptyMVar
  forkIO (task >>= putMVar m)
  return $ takeMVar m
parallel fg bg = do
  wait <- background bg
  fg >> wait
sort arr left right = when (left < right) $ do
  pivot <- read right
  loop pivot left (right - 1) (left - 1) right
  where
    read = readArray arr
    sw = swap arr
    find n pred i = bool (find n pred (n i)) (return i) . pred i =<< read i
    move op d i pivot = bool (return op)
                        (sw (d op) i >> return (d op)) =<<
                        liftM (/=pivot) (read i)
    swapRange px x nx y ny = if px x then sw x y >> swapRange px (nx x) nx (ny y) ny else return y
    loop pivot oi oj op oq = do
      i <- find (+1) (const ( cell>pivot && idx/=left) oj
      if i < j
        then do
          sw i j
          p <- move op (+1) i pivot
          q <- move oq (subtract 1) j pivot
          loop pivot (i + 1) (j - 1) p q
        else do
          sw i right
          nj <- swapRange (oq) (right-1) (subtract 1) (i+1) (+1)
          let thresh = 1024000
              strat = if nj - left < thresh || right - ni < thresh
                      then (>>)
                      else parallel
          sort arr left nj `strat` sort arr ni right
timed act = do
  TOD beforeSec beforeUSec <- getClockTime
  x <- act
  TOD afterSec afterUSec <- getClockTime
  return (fromIntegral (afterSec - beforeSec) +
          fromIntegral (afterUSec - beforeUSec) / 1000000000000, x)
main = do
  let n = 1000000
  putStrLn "Making rands"
  arr <- newListArray (0, n-1) =<< replicateM n (randomRIO (0, 1000000) >>= evaluate)
  elems <- getElems arr
  putStrLn "Now starting sort"
  (timing, _) <- timed $ sort (arr :: IOArray Int Int) 0 (n-1)
  print . (L.sort elems ==) =<< getElems arr
  putStrLn $ "Sort took " ++ show timing ++ " seconds"

This solution does work on small input arrays, but increasing the size of the array to 1,000,000 elements causes the stack to overflow. Two attempts were made to analyze this bug, here and here , but both turned out to be wrong. In fact, this is a bug in the getElems function of the Haskell standard library, which overflows the stack on large arrays.

Oddly enough, the fixes to a few more bugs seem to have led to the implementation of the world's first parallel general purpose quicksort written in Haskell. Moreover, the final solution on Haskell is only about 55% slower than the equivalent solution on F #. Be careful, this solution requires the latest version of GHC, which was released a few weeks ago ( approx. Pere .: article 2010, so the reader has nothing to worry about ).

First comments on the original article


Ganesh Sittampalam:
Congratulations on learning how to fork and synchronize in Haskell!

Jon Harrop (original author):
Congratulations on testing your theory that it will be "trivial" ...

Also popular now: