D_Bushenko May 7, 2015 at 14:58

REST server for a simple Haskell blog

From the sandbox

Some time ago, I was completely tired of dynamic typing languages and decided to try to learn something brutally static. Haskell liked the beauty of the code and the uncompromising desire to clearly separate the pure functions from the producing side effects. I gulped down several Haskell books and decided it was time to write something already.

And then disappointment awaited me: I was not able to write anything except hello world-a. Those. I roughly imagined how to write some kind of console utility like find or something like that - but the very first meeting with IO destroyed all my ideas. There seems to be a lot of libraries for Haskell, but there is almost no documentation for them. Examples of solving typical problems are also very few.

The symptoms are clear, the diagnosis is simple: lack of practice. And for Haskell, this is quite painful, because the language is extremely unusual. Even the fact that I know Clojure well didn’t help me much, because Clojure focuses more on functions, while Haskell focuses on their types.

I think many newcomers are faced with the problem of lack of practice at Haskell. Writing something completely without an interface is somehow not interesting, and making a desktop or web application for a novice Haskelist is quite difficult. And in this article I am going to offer a simple example of how to write a web application server in Haskell especially for those who want to practice Haskell but don’t know which way to approach it.

For the most impatient: source is here .

I will say right away: this is not another tutorial onYesod . This framework too dictates its ideas about how to make web applications, and I do not agree with everything. Therefore, the base will be a small Scotty library , offering beautiful route description syntax for the Warp web server .

Task

Design a web application server for a simple blog. The following routes will be available:

GET / articles - a list of articles.
GET / articles /: id - a separate article.
POST / admin / articles - create an article.
PUT / admin / articles - update the article.
DELETE / admin / articles /: id - delete the article.

All routes that begin with "/ admin" require user authentication. For stateless service it is very convenient to use Basic authentication , as Each request contains the username and password of the user.

What do you need?

Some basic knowledge of Haskell, a general understanding of monads and functors, device design, I / O, etc.
Cabal utility, ability to use sandboxes, connect libraries, compile and run a project.
MySQL and the most basic knowledge about it.

Architecture

To implement the architecture, I propose using the following libraries.

Web Server - Warp.
Router - Scotty.
Application configuration is configurator.
Access to the database: mysql and mysql-simple.
Database Connection Pool: resource-pool.
Interaction with the client - REST using JSON, the library - aeson.
wai-extra for basic authentication, as the application will be stateless.

We will divide our application into modules.

Main.hs will contain the code to run the application, the router, and the configuration of the application.
Db.hs - everything related to access to the database.
View.hs - data presentation.
Domain.hs domain types and functions.
Auth.hs - functions for authentication.

Getting down

Let's create a simple cabal project for our application.

	mkdir hblog
	cd hblog
	cabal init

Here you need to answer a couple of questions, with the project type select Executable, the main file - Main.hs, the source directory - src. Here are the libraries used that need to be added to build-depends in the hblog.cabal file:

   base                          >= 4.6        && < 4.7
 , scotty                        >= 0.9.1
 , bytestring                    >= 0.9        && < 0.11
 , text                          >= 0.11       && < 2.0
 , mysql                         >= 0.1.1.8
 , mysql-simple                  >= 0.2.2.5
 , aeson                         >= 0.6        && < 0.9
 , HTTP                          >= 4000.2.19
 , transformers                  >= 0.4.3.0
 , wai                           >= 3.0.2.3
 , wai-middleware-static         >= 0.7.0.1
 , wai-extra                     >= 3.0.7
 , resource-pool                 >= 0.2.3.2
 , configurator                  >= 0.3.0.0
 , MissingH                      >= 1.3.0.1

Now, in order to avoid the hell of a mess with the library versions and their dependencies, we will create a sandbox.

	cabal sandbox init
	cabal install —dependencies-only

Remember to create the src / Main.hs file.

Let's see how the minimal Scotty web application works. The documentation and examples of using this micro-framework are very good, so that at a glance everything becomes clear. And if you have experience with Sinatra, Compojure or Scalatra - consider yourself lucky, because this experience is completely useful here.

This is what the minimal src / Main.hs looks like:

{-# LANGUAGE OverloadedStrings #-}
import Web.Scotty
import Data.Monoid (mconcat)
main = scotty 3000 $ do
  get "/:word" $ do
    beam <- param "word"
    html $ mconcat ["Scotty, ", beam, " me up!"]

The first line of code can plunge a beginner into wonder: what else are overloaded lines? I'll explain now.

Since I, like many others, began to study Haskell from the books “ Learn you a Haskell for a greater good ” and “ Real World Haskell ”, word processing became an immediate problem for me. I found the best description of working with text in Haskell in the book “ Beginning Haskell ” in chapter 10.

If very briefly, then in practice three basic types of string data are used:

String is a list of characters. This data type is built into the language.
Text is a data type intended for both ASCII and UTF characters. It is located in the text library and exists in two forms: strict and lazy. Read more here
ByteString - designed to serialize strings into a stream of bytes. It is delivered in the bytestring library and also in two versions: strict and lazy.

Back to the OverloadedStrings heading. The thing is that, given the presence of several types of string data, the source will be full of calls like T.pack "Hello" where the token "Hello" needs to be converted to Text; or B.pack “Hello” where the token is to be converted to a ByteString. Here's the OverloadedStrings directive, which independently converts the string token to the desired string type, to use this syntax garbage.

Main.hs File

Main function:

main :: IO ()
main = do
-- Здесь мы загружаем конфигурационный файл application.conf, в котором хранятся настройки соединения с базой данных
    loadedConf <- C.load [C.Required "application.conf"]
    dbConf <- makeDbConfig loadedConf
    case dbConf of
      Nothing -> putStrLn "No database configuration found, terminating..."
      Just conf -> do
-- Создаем пул соединений (время жизни неиспользуемого соединения — 5 секунд, максимальное количество соединений с БД -- 10)      
          pool <- createPool (newConn conf) close 1 5 10
-- Запускаем маршрутизатор Scotty
          scotty 3000 $ do
-- Доступ к статическим файлам из директории «static»
              middleware $ staticPolicy (noDots >-> addBase "static")
-- Логирование всех запросов. Для продакшена используйте logStdout вместо logStdoutDev
              middleware $ logStdoutDev
-- Запрос на аутентификацию для защищенных маршрутов
              middleware $ basicAuth (verifyCredentials pool)
                           "Haskell Blog Realm" { authIsProtected = protectedResources }
              get    "/articles" $ do articles <- liftIO $ listArticles pool
                                      articlesList articles
-- Получит из запроса параметр :id и найдет в БД соответствующую запись
              get    "/articles/:id" $ do id <- param "id" :: ActionM TL.Text
                                          maybeArticle <- liftIO $ findArticle pool id
                                          viewArticle maybeArticle
-- Распарсит тело запроса в тип Article и создаст новую запись Article в БД
              post   "/admin/articles" $ do article <- getArticleParam
                                            insertArticle pool article
                                            createdArticle article
              put    "/admin/articles" $ do article <- getArticleParam
                                            updateArticle pool article
                                            updatedArticle article
              delete "/admin/articles/:id" $ do id <- param "id" :: ActionM TL.Text
                                                deleteArticle pool id
                                                deletedArticle id

To configure the application, we will use the configurator package . We will store the configuration in the application.conf file, and here its contents:

database {
  name = "hblog"
  user = "hblog"
  password = "hblog"
}

For the connection pool we use the resource-pool library. Connecting to the database is expensive, so it’s better not to create it for every request, but to give you the opportunity to reuse the old ones. The type of the createPool function is:

createPool :: IO a -> (a -> IO ()) -> Int -> NominalDiffTime -> Int -> IO (Pool a)
createPool create destroy numStripes idleTime maxResources

Here, create and destroy are functions for creating and terminating a database connection, numStripes is the number of separate connection sub-pools, idleTime is the lifetime of an unused connection (in seconds), maxResources is the maximum number of connections in the sub-pool.

To open a connection, use the newConn function (from Db.hs).

data DbConfig = DbConfig {
     dbName :: String,
     dbUser :: String,
     dbPassword :: String
     }
     deriving (Show, Generic)
newConn :: DbConfig -> IO Connection
newConn conf = connect defaultConnectInfo
                       { connectUser = dbUser conf
                       , connectPassword = dbPassword conf
                       , connectDatabase = dbName conf
                       }

Well, DbConfig itself is created like this:

makeDbConfig :: C.Config -> IO (Maybe Db.DbConfig)
makeDbConfig conf = do
  name <- C.lookup conf "database.name" :: IO (Maybe String)
  user <- C.lookup conf "database.user" :: IO (Maybe String)
  password <- C.lookup conf "database.password" :: IO (Maybe String)
  return $ DbConfig <$> name
                    <*> user
                    <*> password

The input is passed Data.Configurator.Config, which we read and parse from application.conf, and the output is Maybe DbConfig, enclosed in an IO shell.

Such a entry for beginners may seem a little incomprehensible, and I will try to explain what is happening here.
The expression type C.lookup conf "database.name" is Maybe String, enclosed in IO. You can extract it from IO like this:

name <- C.lookup conf "database.name" :: IO (Maybe String)

Accordingly, the constants name, user, password type is Maybe String.

The type of DbConfig data constructor is:

DbConfig :: String -> String -> String -> DbConfig

This function takes three lines of input and returns DbConfig.

The type of function (<$>) is this:

(<$>) :: Functor f => (a -> b) -> f a -> f b

Those. it takes a regular function, a functor, and returns a functor with a function applied to its value. In short, this is a regular map.

The DbConfig <$> name entry retrieves a string from name (the type of the name is Maybe String) assigns a value to the first parameter in the DbConfig constructor and returns a curried DbConfig in the Maybe shell:

DbConfig <$> name :: Maybe (String -> String -> DbConfig)

Please note that here already one String is transferred less.

Type (<*>) is similar to <$>:

(<*>) :: Applicative f => f (a -> b) -> f a -> f b

He takes a functor whose value is a function, takes another functor and applies the function from the first functor to the value from the second, returning a new functor.

Thus, the entry DbConfig <$> name <*> user is of type:

DbConfig <$> name <*> user :: Maybe (String -> DbConfig)

There remains the last String parameter, which we will fill in with the password:

DbConfig <$> name 
	     <*> user 
	     <*> password 
:: Maybe DbConfig

Authentication

In the main function, the last complex construction remained - this is middleware basicAuth. The type of the basicAuth function is:

basicAuth :: CheckCreds -> AuthSettings -> Middleware

The first parameter is a function that checks for the presence of a user in the database, the second determines which routes require authentication protection. Their types:

type CheckCreds = ByteString -> ByteString -> ResourceT IO Bool
data AuthSettings = AuthSettings
    { authRealm :: !ByteString
     , authOnNoAuth :: !(ByteString -> Application)
     , authIsProtected :: !(Request -> ResourceT IO Bool)
    }

The AuthSettings data type is quite complex, and if you want to understand it a little deeper, see the source here . We are only interested in one parameter here - authIsProtected. This is a function that, by Request, can determine whether to require authentication or not. Here is its implementation for our blog:

protectedResources ::  Request -> IO Bool
protectedResources request = do
    let path = pathInfo request
    return $ protect path
    where protect (p : _) =  p == "admin"
          protect _        =  False

The pathInfo function has the following type:

pathInfo :: Request -> [Text]

It takes a Request and returns a list of strings that were obtained after dividing the request route into substrings by the separator "/".
Thus, if our request starts with “/ admin”, then the protectedResources function will return IO True, requiring authentication.

But the verifyCredentials function, which checks the user and password, refers to the interaction with the database, and therefore about it - below.

Database Interaction

Utility functions for extracting data from a database using a connection pool:

fetchSimple :: QueryResults r => Pool M.Connection -> Query -> IO [r]
fetchSimple pool sql = withResource pool retrieve
       where retrieve conn = query_ conn sql
fetch :: (QueryResults r, QueryParams q) => Pool M.Connection -> q -> Query -> IO [r]
fetch pool args sql = withResource pool retrieve
      where retrieve conn = query conn sql args

The fetchSimple function must be used for queries without parameters, and fetch for queries with parameters. Changing data can be done with the execSql function:

execSql :: QueryParams q => Pool M.Connection -> q -> Query -> IO Int64
execSql pool args sql = withResource pool ins
       where ins conn = execute conn sql args

If you need to use a transaction, here is the execSqlT function:

execSqlT :: QueryParams q => Pool M.Connection -> q -> Query -> IO Int64
execSqlT pool args sql = withResource pool ins
       where ins conn = withTransaction conn $ execute conn sql args

Using the fetch function, for example, you can find the hash of the user's password in the database by his login:

findUserByLogin :: Pool Connection -> String -> IO (Maybe String)
findUserByLogin pool login = do
         res <- liftIO $ fetch pool (Only login) 
			      "SELECT * FROM user WHERE login=?" :: IO [(Integer, String, String)]
         return $ password res
         where password [(_, _, pwd)] = Just pwd
               password _ = Nothing

It is needed in the Auth.hs module:

verifyCredentials :: Pool Connection -> B.ByteString -> B.ByteString -> IO Bool
verifyCredentials pool user password = do
   pwd <- findUserByLogin pool (BC.unpack user)
   return $ comparePasswords pwd (BC.unpack password)
   where comparePasswords Nothing _ = False
         	  comparePasswords (Just p) password =  p == (md5s $ Str password)

As you can see, if a password hash is found in the database, then it can be mapped to the password from the request encoded using the md5 algorithm.

But the database stores not only users, but also articles that the blog should be able to create, edit, and display. In the Domain.hs file, define the Article data type with fields id title bodyText:

data Article = Article Integer Text Text
     deriving (Show)

Now you can define the CRUD functions in the database for this type:

listArticles :: Pool Connection -> IO [Article]
listArticles pool = do
     res <- fetchSimple pool "SELECT * FROM article ORDER BY id DESC" :: IO [(Integer, TL.Text, TL.Text)]
     return $ map (\(id, title, bodyText) -> Article id title bodyText) res
findArticle :: Pool Connection -> TL.Text -> IO (Maybe Article)
findArticle pool id = do
     res <- fetch pool (Only id) "SELECT * FROM article WHERE id=?" :: IO [(Integer, TL.Text, TL.Text)]
     return $ oneArticle res
     where oneArticle ((id, title, bodyText) : _) = Just $ Article id title bodyText
           oneArticle _ = Nothing
insertArticle :: Pool Connection -> Maybe Article -> ActionT TL.Text IO ()
insertArticle pool Nothing = return ()
insertArticle pool (Just (Article id title bodyText)) = do
     liftIO $ execSqlT pool [title, bodyText]
                            "INSERT INTO article(title, bodyText) VALUES(?,?)"
     return ()
updateArticle :: Pool Connection -> Maybe Article -> ActionT TL.Text IO ()
updateArticle pool Nothing = return ()
updateArticle pool (Just (Article id title bodyText)) = do
     liftIO $ execSqlT pool [title, bodyText, (TL.decodeUtf8 $ BL.pack $ show id)]
                            "UPDATE article SET title=?, bodyText=? WHERE id=?"
     return ()
deleteArticle :: Pool Connection -> TL.Text -> ActionT TL.Text IO ()
deleteArticle pool id = do
     liftIO $ execSqlT pool [id] "DELETE FROM article WHERE id=?"
     return ()

The most important here are the insertArticle and updateArticle functions. They accept Maybe Article as input and insert / update the corresponding entry in the database. But where to get this Maybe Article?

Everything is simple, the user must pass the Article encoded in JSON in the body of the PUT or POST request. Here are the functions for encoding and decoding Article in and out of JSON:

instance FromJSON Article where
     parseJSON (Object v) = Article <$>
                            v .:? "id" .!= 0 <*>
                            v .:  "title"       <*>
                            v .:  "bodyText"
instance ToJSON Article where
     toJSON (Article id title bodyText) =
         object ["id" .= id,
                     "title" .= title,
                     "bodyText" .= bodyText]

To process JSON, we use the aeson library, more about it here .

As you can see, when decoding, the id field is optional, and if it is not in the line with JSON, then the default value is set to 0. The id field will not be present when creating the Article record, because id should create the database itself. But id will be in the update request.

Data presentation

Let's go back to the Main.hs file and see how we get the request parameters. You can get the parameter from the route using the param function:

param :: Parsable a => TL.Text -> ActionM a

And the request body can be obtained with the body function:

body :: ActionM Data.ByteString.Lazy.Internal.ByteString

Here is a function that can get the request body, parse it and return Maybe Article

getArticleParam :: ActionT TL.Text IO (Maybe Article)
getArticleParam = do b <- body
                     return $ (decode b :: Maybe Article)
                     where makeArticle s = ""

The last thing left: to return data to the client. To do this, define the following functions in the Views.hs file:

articlesList :: [Article] -> ActionM ()
articlesList articles = json articles
viewArticle :: Maybe Article -> ActionM ()
viewArticle Nothing = json ()
viewArticle (Just article) = json article
createdArticle :: Maybe Article -> ActionM ()
createdArticle article = json ()
updatedArticle :: Maybe Article -> ActionM ()
updatedArticle article = json ()
deletedArticle :: TL.Text -> ActionM ()
deletedArticle id = json ()

Server performance

For tests, I used a Samsung 700Z laptop with 8GB of memory and a quad-core Intel Core i7.

1000 consecutive PUT requests to create an article entry.
Average response time: 40 milliseconds, which is approximately 25 requests per second.
100 threads with 100 PUT requests each.
Average response time: 1248 milliseconds, approximately 80 concurrent requests per second.
100 threads of 1000 GET requests, returning 10 article entries.
Average response time: 165 milliseconds, approximately 600 requests per second.

Just to be able to compare with at least something, I implemented the exact same server in Java 7 and Spring 4 with the Tomcat 7 web server and received the following numbers.

1000 последовательных PUT-запросов для создания записи article.
Среднее время ответа: 51 миллисекунда, это примерно 19-20 запросов в секунду.
100 потоков по 100 PUT-запросов в каждом.
Среднее время ответа: 104 миллисекунды, примерно 960 параллельных запросов в секнуду.
100 потоков по 1000 GET-запросов, возвращающих 10 записей article.
Среднее время ответа: 26 миллисекунд, примерно 3800 запросов в секунду.

Выводы

If you lack practice in Haskell and would like to try to write web applications on it, then here you will find an example of a simple server with CRUD operations for one entity described in the article - Article. The application is implemented as a JSON REST service and requires basic authentication on secure routes. The MySQL DBMS is used for data storage, and a connection pool is used to improve performance. Since the application does not store state in a session, it is very easy to scale it horizontally, in addition, the stateless server is ideal for developing a microservice architecture .

Using Haskell to develop a JSON REST server allowed us to get a short and beautiful source, which, among other things, is easy to maintain: refactoring, making changes and additions will not require much work, because the compiler itself will check the correctness of all changes. The downside of using Haskell is the not very high performance of the resulting web service compared to a similar one written in Java.

PS

On the advice of the comments conducted additional testing. Changing the number of threads to N = 8 inclusive - does not affect performance. As N decreases further, performance decreases, because On my laptop there are 8 logical cores.

Another interesting thing. If you disable saving the record in the database, then the average delay in the response of the service to Haskell drops to 6 milliseconds (!), In a similar service in java this time is 80ms. Those. the bottleneck in the project shown is interaction with the database, if you turn it off, Haskell is 13 times faster than similar functionality in Java. Memory consumption is also several times lower: approximately 80MB versus 400MB.

Tags:

haskell