Hasura. High-Performance GraphQL to SQL Server Architecture
Hi, Habr! I present to your attention the translation of the article “Architecture of high performance GraphQL to SQL engine” .
This is a translation of an article about how it is designed from the inside and what optimization and architectural solutions Hasura has in itself - a high-performance, lightweight GraphQL server that acts as a layer between your web application and the PostgreSQL database.
It allows you to generate GraphQL scheme based on an existing database or create a new one. Supports GraphQL Subscriptions out of the box based on Postgres-triggers, dynamic control of access rights, automatic generation of joines, solves the problem of N + 1 requests (batching) and much more.

You can use foreign keys constraints in PostgreSQL to get hierarchical data in a single query. For example, you can execute this query in order to get albums and corresponding tracks (if a foreign key is created in the “track” table pointing to the “album” table)
As you may have guessed, you can request data of any depth. This API, combined with access control, allows web applications to request data from PostgreSQL without writing their own backend. It is designed to quickly perform requests, have high bandwidth, while saving processor time and memory consumption on the server. We will talk about the architectural solutions that allowed us to achieve this.
The request sent to Hasura goes through the following stages:
The requirements are approximately as follows:
There are several approaches to obtaining the data required for a GraphQL query:
Performing GraphQL queries usually involves calling a resolver for each field.
In the sample request, we get albums released in 2018, and then for each of them we request the corresponding tracks — the classic problem of N + 1 requests. The number of queries grows exponentially with increasing depth of the query.
The requests made to Postgres will be:
This request will return all albums to us. Let the number of albums that the request returned be equal to N. Then for each album we would execute the following request:
In total, you get N + 1 requests to get all the necessary data.
Tools like dataloader are designed to solve the problem of N + 1 requests with batching. The number of SQL queries on the embedded data no longer depends on the size of the initial sample, since now it is affected by the number of nodes in the GraphQL query. In this case, you will need 2 requests to Postgres to obtain the required data:
Get the albums:
We get the tracks to the albums that we received in the previous request:
In total, 2 requests are received. We avoided performing SQL queries on the tracks for each individual album, instead using the WHERE clause to get all the necessary tracks in one query at once.
Dataloader is designed to work with different data sources and does not allow exploiting the capabilities of a particular. In our case, the only data source is Postgres and it, like all relational databases, provides the ability to collect data from several tables in a single query using the JOIN operator. We can define all the tables needed for a GraphQL query, and generate one SQL query using JOINs to get all the data. It turns out that the data required for any GraphQL query can be obtained using a single SQL query. This data is converted before it is sent to the customer.
Such a request:
Return the following data to us:
After which it will be converted to JSON and sent to the client:
We found that most of the time in processing requests is spent on the function of converting the result of the SQL query to JSON.
After several attempts to optimize this function in various ways, we decided to move it to Postgres. In Postgres 9.4 ( released around the time of the first release of Hasura ), we added a function for JSON aggregation, which helped us make our plans. After this optimization, the SQL queries began to look like this:
The result of this query will have one column and one row, and this value will be sent to the client without any further conversions. According to our tests, this approach is about 3–6 times faster than the Haskell conversion function.
The generated SQL queries can be quite large and complex, depending on the nesting level of the query and the conditions of use. Usually, web applications have a set of queries that are re-executed with different parameters. For example, the previous query must be executed for 2017, instead of 2018. Prepared statements are best suited for such cases when there is a recurring complicated SQL query in which only the parameters change.
Suppose such a query is executed for the first time:
We create a prepared statement for the SQL query instead of executing it:
After that, we immediately execute it:
When you need to execute a GraphQL query for 2017, we simply call the same prepared statement with a different argument:
This gives about a 10-20% increase in speed depending on the complexity of the GraphQL query.
Haskell works well for several reasons:
All the above-mentioned optimizations result in quite serious performance benefits:

In fact, low memory consumption and minor delays compared to a direct call to PostgreSQL, in most cases, can replace the ORM in your backend with GraphQL API calls.
Benchmarks:
Test stand:
Request 1: tracks_media_some
Request 2: tracks_media_all
Request 3: album_tracks_genre_some
Request 4: album_tracks_genre_all
This is a translation of an article about how it is designed from the inside and what optimization and architectural solutions Hasura has in itself - a high-performance, lightweight GraphQL server that acts as a layer between your web application and the PostgreSQL database.
It allows you to generate GraphQL scheme based on an existing database or create a new one. Supports GraphQL Subscriptions out of the box based on Postgres-triggers, dynamic control of access rights, automatic generation of joines, solves the problem of N + 1 requests (batching) and much more.

You can use foreign keys constraints in PostgreSQL to get hierarchical data in a single query. For example, you can execute this query in order to get albums and corresponding tracks (if a foreign key is created in the “track” table pointing to the “album” table)
{
album (where: {year: {_eq: 2018}}) {
title
tracks {
id
title
}
}
}
As you may have guessed, you can request data of any depth. This API, combined with access control, allows web applications to request data from PostgreSQL without writing their own backend. It is designed to quickly perform requests, have high bandwidth, while saving processor time and memory consumption on the server. We will talk about the architectural solutions that allowed us to achieve this.
Query life cycle
The request sent to Hasura goes through the following stages:
- Receiving sessions : The request goes to the gateway, which checks the key (if any) and adds various headers, such as the user ID and role.
- Parsing requests : Hasura receives a request, parses the headers to get information about the user, creates GraphQL AST based on the request body.
- Validation of requests : It checks whether the request is semantically correct, then the access rights corresponding to the user's role are applied.
- Query execution : The query is converted to SQL and sent to Postgres.
- Response generation : The result of the SQL query is processed and sent to the client (the gateway can use gzip if necessary ).
Goals
The requirements are approximately as follows:
- The HTTP stack should add minimal overhead and allow processing of multiple simultaneous requests for high bandwidth.
- Quick SQL generation from GraphQL query.
- The generated SQL query must be effective for Postgres.
- The result of the SQL query must be effectively transferred back from Postgres.
Processing GraphQL Request
There are several approaches to obtaining the data required for a GraphQL query:
Conventional resolvers
Performing GraphQL queries usually involves calling a resolver for each field.
In the sample request, we get albums released in 2018, and then for each of them we request the corresponding tracks — the classic problem of N + 1 requests. The number of queries grows exponentially with increasing depth of the query.
The requests made to Postgres will be:
SELECTid,title FROM album WHEREyear = 2018;
This request will return all albums to us. Let the number of albums that the request returned be equal to N. Then for each album we would execute the following request:
SELECTid,title FROM tracks WHERE album_id = <album-id>
In total, you get N + 1 requests to get all the necessary data.
Batching requests
Tools like dataloader are designed to solve the problem of N + 1 requests with batching. The number of SQL queries on the embedded data no longer depends on the size of the initial sample, since now it is affected by the number of nodes in the GraphQL query. In this case, you will need 2 requests to Postgres to obtain the required data:
Get the albums:
SELECTid,title FROM album WHEREyear = 2018
We get the tracks to the albums that we received in the previous request:
SELECTid, title FROM tracks WHERE album_id IN {the listof album ids}
In total, 2 requests are received. We avoided performing SQL queries on the tracks for each individual album, instead using the WHERE clause to get all the necessary tracks in one query at once.
Joins
Dataloader is designed to work with different data sources and does not allow exploiting the capabilities of a particular. In our case, the only data source is Postgres and it, like all relational databases, provides the ability to collect data from several tables in a single query using the JOIN operator. We can define all the tables needed for a GraphQL query, and generate one SQL query using JOINs to get all the data. It turns out that the data required for any GraphQL query can be obtained using a single SQL query. This data is converted before it is sent to the customer.
Such a request:
SELECT
album.id as album_id,
album.title as album_title,
track.id as track_id,
track.title as track_title
FROM
album
LEFTOUTERJOIN
track
ON
(album.id = track.album_id)
WHERE
album.year = 2018
Return the following data to us:
album_id, album_title, track_id, track_title
1, Album1, 1, track1
1, Album1, 2, track2
2, Album2, NULL, NULL
After which it will be converted to JSON and sent to the client:
[
{
"title" : "Album1",
"tracks": [
{"id" : 1, "title": "track1"},
{"id" : 2, "title": "track2"}
]
},
{
"title" : "Album2",
"tracks" : []
}
]
Response generation optimization
We found that most of the time in processing requests is spent on the function of converting the result of the SQL query to JSON.
After several attempts to optimize this function in various ways, we decided to move it to Postgres. In Postgres 9.4 ( released around the time of the first release of Hasura ), we added a function for JSON aggregation, which helped us make our plans. After this optimization, the SQL queries began to look like this:
SELECT json_agg(r.*) FROM (
SELECT
album.title as title,
json_agg(track.*) as tracks
FROM
album
LEFTOUTERJOIN
track
ON
(album.id = track.album_id)
WHERE
album.year = 2018GROUPBY
album.id
) r
The result of this query will have one column and one row, and this value will be sent to the client without any further conversions. According to our tests, this approach is about 3–6 times faster than the Haskell conversion function.
Prepared statements
The generated SQL queries can be quite large and complex, depending on the nesting level of the query and the conditions of use. Usually, web applications have a set of queries that are re-executed with different parameters. For example, the previous query must be executed for 2017, instead of 2018. Prepared statements are best suited for such cases when there is a recurring complicated SQL query in which only the parameters change.
Suppose such a query is executed for the first time:
{
album (where: {year: {_eq: 2018}}) {
title
tracks {
id
title
}
}
}
We create a prepared statement for the SQL query instead of executing it:
PREPARE prep_1 ASSELECT json_agg(r.*) FROM (
SELECT
album.title as title,
json_agg(track.*) as tracks
FROM
album
LEFTOUTERJOIN
track
ON
(album.id = track.album_id)
WHERE
album.year = $1GROUPBY
album.
After that, we immediately execute it:
EXECUTE prep_1('2018');
When you need to execute a GraphQL query for 2017, we simply call the same prepared statement with a different argument:
EXECUTE prep_1('2017');
This gives about a 10-20% increase in speed depending on the complexity of the GraphQL query.
Haskell
Haskell works well for several reasons:
- Compiled language with excellent performance ( more here ).
- Very efficient HTTP stack ( warp , warp's architecture ).
- Our previous experience with language.
Eventually
All the above-mentioned optimizations result in quite serious performance benefits:

In fact, low memory consumption and minor delays compared to a direct call to PostgreSQL, in most cases, can replace the ORM in your backend with GraphQL API calls.
Benchmarks:
Test stand:
- Laptop with 8GB RAM and i7
- Postgres running on the same computer
- wrk , was used as a comparison tool and for various types of queries we tried to “maximize” rps
- One copy of Hasura GraphQL Engine
- Connection pool size: 50
- Dataset : chinook
Request 1: tracks_media_some
query tracks_media_some {
tracks (where: {composer: {_eq: "Kurt Cobain"}}){
id
name
album {
id
title
}
media_type {
name
}
}}
- Requests per second: 1375 req / s
- Delay: 17.5ms
- CPU: ~ 30%
- RAM: ~ 30MB (Hasura) + 90MB (Postgres)
Request 2: tracks_media_all
query tracks_media_all {
tracks {
id
name
media_type {
name
}
}}
- Queries per second: 410 req / s
- Delay: 59ms
- CPU: ~ 100%
- RAM: ~ 30MB (Hasura) + 130MB (Postgres)
Request 3: album_tracks_genre_some
query albums_tracks_genre_some {
albums (where: {artist_id: {_eq: 127}}) {
id
title
tracks {
id
name
genre {
name
}
}
}}
- Queries per second: 1029 req / s
- Delay: 24ms
- CPU: ~ 30%
- RAM: ~ 30MB (Hasura) + 90MB (Postgres)
Request 4: album_tracks_genre_all
query albums_tracks_genre_all {
albums {
id
title
tracks {
id
name
genre {
name
}
}
}
- Requests per second: 328 req / s
- Delay: 73ms
- CPU: 100%
- RAM: ~ 30MB (Hasura) + 130MB (Postgres)