We make the GraphQL API on PHP and MySQL. Part 3: Solving the problem of N + 1 queries

  • Tutorial
image

In this third article on creating a GraphQL server with graphql-php, I will talk about how to deal with the problem of N + 1 queries.

Foreword


I will continue to modify the code obtained at the end of the previous article . You can also see it in the article repository on Github . If you have not read the previous articles, I recommend that you read them before continuing.

I will also comment out two lines in graphql.php that add a restriction on the maximum depth and complexity of the request, since they are not very necessary during development, but can only cause debugging problems:

// DocumentValidator::addRule('QueryComplexity', new QueryComplexity(6));
// DocumentValidator::addRule('QueryDepth', new QueryDepth(1));


Problem N + 1 Queries


Problem


The easiest way to explain what the problem of N + 1 queries is with an example. Suppose you need to request a list of articles and their authors. Without hesitation, you can do it like this:

$articles = DB::table('articles')->get();
foreach ($articles as &$article) {
    $article->author = DB::table('users')->where('id', $article->author_id)->first();
}

As a rule, DB::table('articles')->get()as a result, it sends one request to the database, something like this:

SELECT * FROM articles;

And then another N queries to the database are sent in the loop:

SELECT * FROM users WHERE id = 1;
SELECT * FROM users WHERE id = 2;
SELECT * FROM users WHERE id = 3;
SELECT * FROM users WHERE id = 4;
SELECT * FROM users WHERE id = 5;
...
SELECT * FROM users WHERE id = N;

Where N is the number of articles received in the first request.

For example, we perform one request that returns 100 articles to us, and then for each article we perform one more request from the author. In total, 100 + 1 = 101 requests are obtained. This is an extra load on the database server and is called the N + 1 query problem.

Decision


The most common way to solve this problem is to group queries.

If we rewrite the same example using the query grouping, we get something like this:

$articles = DB::table('articles')->get();
$authors_ids = get_authors_ids($articles);
$authors = DB::table('users')->whereIn('id', $authors_ids)->get();
foreach ($articles as &$article) {
    $article->author = search_author_by_id($authors, $article->author_id);
}

That is, we do the following:

  1. Request an array of articles
  2. Remember the id of all authors of these articles
  3. We request an array of users by these id
  4. Insert authors into articles from an array of users

At the same time, no matter how many articles we request, only two requests will be sent to the database:

SELECT * FROM articles;
SELECT * FROM users WHERE id IN (1, 2, 3, 4, 5, ..., N);


The problem of N + 1 queries in GraphQL


Now let's go back to our GraphQL server in the state it is after the previous article, and pay attention to how the request for the number of user friends is implemented.

If we request a list of users indicating the number of friends of each, then first the GraphQL server will request all the entries from the user table:

'allUsers' => [
    'type' => Types::listOf(Types::user()),
    'description' => 'Список пользователей',
    'resolve' => function () {
        return DB::select('SELECT * from users');
    }
]

And then for each user, he will query the database for the number of his friends:

'countFriends' => [
    'type' => Types::int(),
    'description' => 'Количество друзей пользователя',
    'resolve' => function ($root) {
        return DB::affectingStatement("SELECT u.* FROM friendships f JOIN users u ON u.id = f.friend_id WHERE f.user_id = {$root->id}");
    }
]

This is where the problem of N + 1 queries comes about.

To solve this problem by grouping queries graphql-php offers us to postpone the execution of resolvers of such fields until the values ​​of all other (not deferred) fields are received.

The idea is simple: instead of the result, the “resolve” function of the field should return an object of the GraphQL \ Deferred class, into the constructor of which the function is passed to get the very result.

That is, now we can connect the Deferred class:

use GraphQL\Deferred;

And defer execution by overwriting the resolver of the “countFriends” field as follows:

'countFriends' => [
    'type' => Types::int(),
    'description' => 'Количество друзей пользователя',
    'resolve' => function ($root) {
        return new Deferred(function () use ($root) {
            return DB::affectingStatement("SELECT u.* FROM friendships f JOIN users u ON u.id = f.friend_id WHERE f.user_id = {$root->id}");
        });
    }
]

But simply postponing the execution of the request, we will not solve the problem of N + 1. Therefore, we need to create a buffer that will accumulate the id of all users for which we need to request the number of friends, and in the future will be able to return the results for all users.

To do this, I will create a small class that will have three simple static methods:

  • add - Adding user id to the buffer
  • load - Loading the number of friends from the database for all users in the buffer
  • get - Getting the number of user friends from the buffer

You can also implement this class in any way convenient for you, I will only give its code for a specific example:

App / Buffer.php
id] = $row->count;
        }
    }
    /**
     * Добавление id пользователя в буфер
     *
     * @param int $id
     */
    public static function add($id)
    {
        // Если такой id уже есть в буфере, то не добавляем его
        if (in_array($id, self::$ids)) return;
        self::$ids[] = $id;
    }
    /**
     * Получение количества друзей пользователя из буфера
     * 
     * @param $id
     * @return int
     */
    public static function get($id)
    {
        if (!isset(self::$results[$id])) return null;
        return self::$results[$id];
    }
}

Now connect our buffer to UserType.php:

use App\Buffer;

And again, we rewrite the resolver for the “countFriends” field:

'countFriends' => [
    'type' => Types::int(),
    'description' => 'Количество друзей пользователя',
    'resolve' => function ($root) {
        // Добавляем id пользователя в буфер
        Buffer::add($root->id);
        return new Deferred(function () use ($root) {
            // Загружаем результаты в буфер из БД (если они еще не были загружены)
            Buffer::load();
            // Получаем количество друзей пользователя из буфера
            return Buffer::get($root->id);
        });
    }
],

Done. Now when executing the query: The number of friends for all users will be obtained from the database only once. Moreover, the request for data on the number of friends will be performed only once even with such a query GraphQL: Of course, in this form, our buffer is very highly specialized. It turns out that for another field we need to create another buffer. But this is just an example and nothing prevents us from making a universal buffer, which, for example, will store data for different fields by their keys, and also take as an argument a function to get the results. At the same time, the buffer can receive data not only from the database, but also from some API.

GraphQL: Requesting Users with Friends



GraphQL: Nested user query with number of friends



Conclusion


That's all. Suggest your options for solving such problems and ask questions if they arise.

Source code from an article on Github.

Other parts of this article:

  1. Installation, schematic, and queries
  2. Mutations, Variables, Validation and Security
  3. Solving the problem of N + 1 queries

Also popular now: