m4rt1n August 21, 2017 at 14:49

Custom bike for JSON API

Hello! At the recent Superjob IT Meetup, I talked about how we at Superjob are developing our API for a project with a millionth audience and a bunch of different platforms.

In this article I would like to talk about why we could not stop at any of dozens of ready-made solutions, how painful it was to write your own and what awaits you if you decide to repeat our path. I ask everyone interested under cat.

Instead of joining

The history of the API in Superjob began with the harsh XML API. We moved from it to concise JSON, and later, tired of disputes over what is more correct - {success: true} or {result: true}, we implemented the JSON API . Over time, we abandoned some of its features, agreed on data formats and wrote our own version of the spec, which retained backward compatibility with the original. Exactly on this spec runs the last, third version of our API, to which we gradually transfer all our services.

For our tasks, when most of the endpoints in the API accept or return certain objects, the JSON API turned out to be an almost perfect solution. At the heart of this spec - essence and their relationship. Entities are typified, have a fixed set of attributes and relationships, and in essence are very similar to the models with which we are used to working in code. Work with entities is carried out in accordance with the principles of REST - protocol over HTTP, as, for example, in SOAP or JSON-RPC, no. The request format almost completely repeats the response format, which greatly simplifies the life of both the server and the client. For example, a typical JSON API response looks like this:

{
    "data": {
        "type": "resume",
        "id": 100,
        "attributes": {
            "position": "Курьер"
        },
        "relationships": {
            "owner": {
                "data": {
                    "type": "user",  
                    "id": 200
                }
            }
        }
    },
    "included": [
        {
            "type": "user",
            "id": 200,
            "attributes": {
                "name": "Василий Батарейкин"
            }
        }
    ]
}

Here we see an entity of type resume, with the owner linking to an entity of type user. If the client wanted us to send such an entity, he would put exactly the same json in the request body.

First steps

Initially, the implementation of our API was very naive: the answers of the endpoints were formed directly in the actions, the data from the client was obtained using a small add-on on Yii1, which runs our server application, and the documentation lived in a separate file, filled out by hand.

With the transition to the JSON API, we turned the add-in into a full-fledged framework that controlled the transformation (mapping) of models in essence, and also managed the transport layer (parsing requests and generating responses).

To map the model to the entity, two additional classes needed to be described: the DTO for the entity and the hydrator, which would fill the DTO with data from the model. This approach made the mapping process flexible enough, but in reality this flexibility turned out to be evil: over time, our hydrators began to overgrow copy-paste, and the need for each model to start another 2 classes led to the swelling of our code base.

The transport layer was also far from ideal. The developer was forced to constantly think about the internal structure of the JSON API: as in the case of model mapping, full control over the process led to the need to drag almost identical code from action to action.

We began to think about switching to a third-party solution that works with the JSON API. The JSON API website has a rather impressive list of spec implementations in a variety of languages for both the server and the client. At the time of writing, there were 18 projects implementing the server part in PHP, of which not one was suitable for us:

Firstly, third-party solutions had all the same problems as our own - too much extra code, little automation. In some cases, certain requirements were imposed on models (for example, the implementation of an interface), and with our code volume this could result in serious refactoring. In order to work with requests and responses, in any case, we would have to write an adapter connecting the selected solution with Yii.
Secondly, the overwhelming number of third-party solutions supported mapping one to one: you have one model, you can turn it into one entity. This is a normal case when the data in the models are stored in the form in which you would like to give them to the client, but in reality this is not always the case. For example, the resume model has attributes with contacts, but the client can receive these contacts only under certain conditions. It would be great to make contacts a separate entity related to the essence of the resume itself, thus turning one model into several entities, but in third-party solutions this can only be done through crutches.
Thirdly, we wanted to simplify the development of standard endpoints as much as possible so that the programmer who is faced with the task of writing an endpoint that selects models from the database and sends them to the client does not have to write the same type of code each time. However, third-party solutions did not offer any integration with DBAL.
Finally, fourthly, we wanted to simplify the writing of documentation and tests, but the majority of third-party solutions did not provide any information about what attributes and relationships an entity has.

The need to start writing our decision again became obvious :)

Framework development

After analyzing the shortcomings of our previous development and third-party solutions, we formed our vision of what our new framework should be, which received a very original name Mapper:

First of all, instead of writing DTO and hydrators, we decided to describe the entire mapping in the config.
Unbeknownst to the developer, this config should have been compiled into PHP code, which, in turn, would be used to hydrate entities.
All work with the JSON API was to be carried out behind the scenes: it was assumed that for typical endpoints all work would be reduced to describing the business logic and obtaining models.
Finally, as mentioned above, we wanted to integrate our solution with DBAL, documentation, and tests.

Core

The framework is based on compiled hydrators, that is, objects that fill models and build entities. What knowledge should a hydrator have to cope with the task? First of all, he must know which models and which entity will build. He must understand what properties and relationships the entity has and how they relate to the properties and relationships of the source models.

Let's try to describe the config for such a hydrator. The config format is YAML, which is easy to write, easy to read, and easy to parse (we used symfony / yaml at home ).

entities:
    TestEntity:
        classes:
            - TestModel
        attributes:
            id:
                type: integer
                accessor: '@getId'
                mutator: '@setId'
            name:
                type: string
                accessor: name
                mutator: name
        relations:
            relatedModel:
                type: TestEntity2
                accessor: relatedModel
            relatedModels:
                type: TestEntity3[]
                accessor: '@getRelatedModels'

Here, the TestEntity entity is assembled from the TestModel model. The entity has two attributes: id, which is obtained from the getId getter, and name - from the name property. An entity also has two relationships: a single relatedModel, which consists of an entity of type TestEntity2, and a multiple relatedModels, which consists of entities of TestEntity3.

The hydrator compiled using this configuration is as follows:

class TestEntityHydrator extends Hydrator
{
    public static function getName(): string
    {
        return 'TestEntity';
    }
    protected function getClasses(): array
    {
        return [Method::DEFAULT_ALIAS => TestModel::class];
    }
    protected function buildAttributes(): array
    {
        return [
            'id' => (new CompiledAttribute('id', Type::INTEGER))
                ->setAccessor(
                    new MethodCallable(
                        Method::DEFAULT_ALIAS, function (array $modelArray) {
                            return $modelArray[Method::DEFAULT_ALIAS]->getId();
                        }
                    )
                )
                ->setMutator(
                    new MethodCallable(
                        Method::DEFAULT_ALIAS,
                        function (array $modelArray, $value) {
                            $modelArray[Method::DEFAULT_ALIAS]->setId($value);
                        }
                    )
                ),
            'name' => (new CompiledAttribute('name', Type::STRING))
                ->setAccessor(
                    new MethodCallable(
                        Method::DEFAULT_ALIAS, function (array $modelArray) {
                            return $modelArray[Method::DEFAULT_ALIAS]->name;
                        }
                    )
                )
                ->setMutator(
                    new MethodCallable(
                        Method::DEFAULT_ALIAS,
                        function (array $modelArray, $value) {
                            $modelArray[Method::DEFAULT_ALIAS]->name = $value;
                        }
                    )
                )
                ->setRequired(false),
        ];
    }
    protected function buildRelations(): array
    {
        return [
            'relatedModel' => (new CompiledRelation('relatedModel', TestEntity2Hydrator::getName()))->setAccessor(
                new MethodCallable(
                    Method::DEFAULT_ALIAS, function (array $modelArray) {
                        return $modelArray[Method::DEFAULT_ALIAS]->relatedModel;
                    }
                )
            ),
            'relatedModels' => (new CompiledRelation('relatedModels', TestEntity3Hydrator::getName()))->setAccessor(
                new MethodCallable(
                    Method::DEFAULT_ALIAS, function (array $modelArray) {
                        return $modelArray[Method::DEFAULT_ALIAS]->getRelatedModels();
                    }
                )
            )->setMultiple(true),
        ];
    }
}

All this monstrous code, in fact, only describes the data that is in essence. Agree, writing this by hand, and even for each entity that is in the project, it would not be great.

In order for everything described above to work, we needed to implement three services: a config parser, a validator, and a compiler.

The parser was committed to following the configuration changes ( symfony / config helped us with this ) and, if such changes were detected, reread all the config files, merged them and passed them to the validator.

The validator checked the correctness of the config: first we checked the correspondence of json schema, which we described for our config (here we used justinrainbow / json-schema), and then all the mentioned classes, their properties and methods were checked for existence.

Finally, the compiler took the validated config and compiled PHP code from it.

Integration with DBAL

For historical reasons, two DBALs are in close proximity in our project: the standard one for Yii1 ActiveRecord and Doctrine, and we wanted to make our framework friends with both. By integration, it was understood that Mapper would be able to independently receive data from the database and save it.

To achieve this, we first needed to make small changes to the config. Since in the general case the name of the connection in the model may differ from the name of the getter or property that returns this connection (this is especially true for Doctrine), we needed to be able to tell Mapper under what name he knows this or that DBAL connection. To do this, we added the parameter internalName to the communication description. Later, the same internalName appeared in attributes, so that Mapper could independently perform field selections.

In addition to internalName, we added to the config the knowledge about which DBAL the entity belongs to: in the adapter parameter, the name of the service was specified, which implemented an interface that allowed Mapper to interact with DBAL.

The interface was as follows:

interface IDbAdapter
{
    /**
     * Statement по контексту.
     *
     * @param string $className
     * @param mixed  $context
     * @param array  $relationNames
     *
     * @return IDbStatement
     */
    public function statementByContext(string $className, $context, array $relationNames): IDbStatement;
    /**
     * Statement по значениям атрибутов.
     *
     * @param string $className
     * @param array  $attributes
     * @param array  $relationNames
     *
     * @return IDbStatement
     */
    public function statementByAttributes(string $className, array $attributes, array $relationNames): IDbStatement;
    /**
     * Инстанцировать модель указанного класса.
     *
     * @param string $className
     *
     * @return mixed
     */
    public function create(string $className);
    /**
     * Сохранить модель.
     *
     * @param mixed $model
     */
    public function save($model);
    /**
     * Выполнить привязку одной модели к другой.
     *
     * @param mixed  $parent
     * @param mixed  $child
     * @param string $relationName
     */
    public function link($parent, $child, string $relationName);
    /**
     * Отвязать одну модель от другой.
     *
     * @param mixed  $parent
     * @param mixed  $child
     * @param string $relationName
     */
    public function unlink($parent, $child, string $relationName);
}

In order to simplify the interaction with DBAL, we introduced the concept of context. A context is a certain object, upon receipt of which, DBAL should understand what query it should fulfill. In the case of ActiveRecord, CDbCriteria is used as context, for Doctrine - QueryBuilder.

For each DBAL, we wrote our own adapter implementing the IDbAdapter. There were surprises: for example, it turned out that over the entire existence of Yii1 not a single extension was written that would support the preservation of all kinds of connections — I had to write my own wrapper.

Documentation and Tests

At home, we use Behat for integration tests and Swagger for documentation. Both tools natively support JSON Schema, which allowed us to integrate Mapper support into them without any problems.

Tests for Behat are written in Gherkin. Each test is a sequence of steps, and each step is a sentence in natural language.

We added steps that integrated support for the JSON API and Mapper into Behat:

# Описываем сущность
When I have entity "resume"
And I have entity attributes:
  | name   	| value 	|
  | profession | Кладовщик |
# Описываем связь
And I have entity relationship "owner" with data:
  | name       	| value |
  | id         	| 100   |
# Отправляем запрос и проверяем, что вернулась сущность resume
Then I send entity via "POST" to "/resume/" and get entity "resume"

In this test, we create a resume entity, fill out its attributes and relationships, send a request and validate the response. At the same time, the whole routine is automated: we do not need to compose the request body, since our helpers for Behat do this, we do not need to describe the JSON Schema of the expected response, as it will be generated by Mapper.

With the documentation, the situation is somewhat more interesting. The JSON Schema files for Swagger were originally generated on the fly from the sources in YAML: as already mentioned, YAML is much easier to write than the same JSON, but Swagger only understands JSON. We have supplemented this mechanism so that not only the contents of YAML files, but also descriptions of entities from the mapper get into the final JSON Schema. So, for example, we taught Swagger to understand links like:

$ref: '#mapper:resume'

Or:

$ref: '#mapper:resume.collection.response'

And Swagger rendered the resume entity object or the entire server response object with a collection of resume entities, respectively. Thanks to such links, as soon as the Mapper config changed, the documentation was automatically updated.

conclusions

With a lot of effort, we made a tool that made life easier for developers. To create trivial endpoints, it is now sufficient to describe the entity in the config and throw a couple of lines of code. Automating the routine in writing tests and documentation allowed us to save time on the development of new endpoints, and the flexible architecture of Mapper itself made it possible to easily expand its functionality when we needed it.

The time has come to answer the basic question that I voiced at the beginning of the article - what did it cost us to make our bike? And do you need to make your own?

The intensive development phase of Mapper took us about three months. We still continue to add new features to it, but in a much less intensive mode. In general, we are satisfied with the result: since Mapper was designed taking into account the features of our project, it copes with the tasks assigned to it much better than any third-party solution.

Should you go our way? If your project is still young and the code base is small, it is quite possible that writing your bike for you will be an unjustified waste of time, and the best choice would be to integrate a third-party solution. However, if your code has been written over the years and you are not ready to conduct serious refactoring, then you should definitely think about your own decision. Despite the initial difficulties in the development, it can significantly save you time and effort in the future.

Tags: