Working with data when building an API based on GraphQL

Preamble

First of all, this article is designed for those readers who are already familiar with GraphQL and more about the intricacies and nuances of working with it. Nevertheless, I hope that it will be useful for beginners.

GraphQL is a great tool. I think many already know and understand its advantages. However, there are some nuances that you should know when you build your API based on GraphQL.

For example, GraphQL allows you to return to the consumer (user or program) who has requested the data only the part in which this consumer is interested. However, when building a server, it is quite easy to make a mistake, which leads to the fact that inside the server (which can be, including - distributed), the data will run in full "bursts". This is primarily due to the fact that out of the box GraphQL itself does not provide convenient tools for parsing an incoming request, and those interfaces that are laid out in it are not well documented.

Source of the problem

Let's look at a typical example of a non-optimal implementation (open the image in a separate window if it is hard to read):

Suppose that our consumer is some kind of application or component of the "phone book", which requests from our API only the identifier, name and telephone number of the users stored by us. At the same time, our API is much more extensive, it will allow access to other data, such as physical address and email address of users.

At the point of data exchange between the consumer and the API, GraphQL perfectly does all the work we need - only the requested data will be sent in response to the request. The problem in this case is at the point of sampling data from the database - i.e. in the internal implementation of our server, and it lies in the fact that for each incoming request we select all user data from the database, regardless of the fact that we don’t need their part. This generates an unnecessary load on the database and causes excessive traffic to circulate within the system. With a significant number of queries, you can get a substantial optimization by changing the approach to data sampling and select only those fields that were requested. At the same time, it is absolutely not important that we act as a data source - a relational database, NoSQL technology or another service (internal or external). Any implementation may be subject to such non-optimal behavior. MySQL in this case is simply chosen as an example.

Decision

It is possible to optimize this server behavior by analyzing the arguments that come into the resolve()function:

async resolve(source, args, context, info) {
    // ...
}

It is the last argument that infois of particular interest to us, in this case. Referring to the documentation and analyze in detail what is the resolve()function and the argument that interests us:

typeGraphQLFieldResolveFn = (
  source?: any,
  args?: {[argName: string]: any},
  context?: any,
  info?: GraphQLResolveInfo
) => anytypeGraphQLResolveInfo = {
  fieldName: string,
  fieldNodes: Array<Field>,
  returnType: GraphQLOutputType,
  parentType: GraphQLCompositeType,
  schema: GraphQLSchema,
  fragments: { [fragmentName: string]: FragmentDefinition },
  rootValue: any,
  operation: OperationDefinition,
  variableValues: { [variableName: string]: any },
}

So, the first three arguments passed to the "resolver" are the sourcedata passed from the parent node in the GraphQL schema tree, the argsquery arguments (which come from the query), and the contextexecution context object defined by the developer, often called to transfer some global data to the resolver ". And finally, the fourth argument is the meta information about the request.

What can we learn from GraphQLResolveInfoto solve our problem?

The most interesting parts of it are:

fieldName- the current field name of their GraphQL schema. Those. it corresponds to the field name that is specified in the scheme for this resolver. If we catch infoan object on the field users, as in our example above, then it is "users" that will be contained as the valuefieldName
fieldNodes- collection (array) of nodes that were REQUESTED in query. Just what is required!
fragments- collection of query fragments (in case the query has been fragmented). Also important information to extract the final data fields.

So, as a solution, we have to parse the argument infoand select the list of fields that came to us from query, and then pass them to the SQL query. Unfortunately, the GraphQL package from Facebook out of the box does not give us anything to simplify this task. In general, as practice has shown, this task is not so trivial, given the fact that requests can be fragmented. And besides, a similar analysis has a universal solution, which later is simply copied from project to project.

So I decided to write it as a library of open source code is licensed under the ISC . With its help, the solution of parsing incoming request fields is solved quite simply, for example, in our case like this:

const { fieldsList } = require('graphql-fields-list');
// ...async resolve(source, args, context, info) {
  const requestedFields = fieldsList(info);
  returnawait database.query(`SELECT ${requestedFields.join(',')} FROM users`)
}

fieldsList(info)in this case it does all the work for us and returns a "flat" array of child fields for the given resolver, i.e. our final SQL query will look like this:

SELECTid, name, phone FROMusers;

If we change the incoming request to:

query UserListQuery {
  users {
    id
    name
    phone
    email
  }
}

then the SQL query will turn into:

SELECTid, name, phone, email FROMusers;

However, it is not always possible to do with such a simple call. Often, real-world applications are much more complex in structure. In some implementations, we may need to describe a resolver at the top level relative to the data in the final GraphQL scheme. For example, in case we decide to use the Relay library, we want to use a ready-made mechanism for splitting collections of data objects by pages, which leads to the fact that our GraphQL scheme will be built according to certain rules. For example, we rework our schema in this way (TypeScript):

import { GraphQLObjectType, GraphQLSchema, GraphQLString } from'graphql';
import { 
    connectionDefinitions,
    connectionArgs,
    nodeDefinitions,
    fromGlobalId,
    globalIdField,
    connectionFromArray,
    GraphQLResolveInfo,
} from'graphql-relay';
import { fieldsList } from'graphql-fields-list';
exportconst { nodeInterface, nodeField } = nodeDefinitions(async (globalId: string) => {
    const { type, id } = fromGlobalId(globalId);
    let node: any = null;
    if (type === 'User') {
        node = await database.select(`SELECT id FROM user WHERE id="${id}"`);
    }
    return node;
});
const User = new GraphQLObjectType({
    name: 'User',
    interfaces: [nodeInterface],
    fields: {
        id: globalIdField('User', (user: any) => user.id),
        name: { type: GraphQLString },
        email: { type: GraphQLString },
        phone: { type: GraphQLString },
        address: { type: GraphQLString },
    }
});
exportconst { connectionType: userConnection } =
    connectionDefinitions({ nodeType: User });
const Query = new GraphQLObjectType({
    name: 'Query',
    fields: {
        node: nodeField,
        users: {
            type: userConnection,
            args: { ...connectionArgs },
            async resolve(
                source: any,
                args:  {[argName: string]: any},
                context: any,
                info: GraphQLResolveInfo,
            ) {
                // TODO: implement
            },
    },
});
exportconst schema = new GraphQLSchema({
    query: Query
});

Thus connectionDefinitionfrom Relay adds a node circuit edges, node, pageInfoand cursor, i.e. we will now need to restructure our requests differently (we will not now dwell on the pagination):

query UserListQuery {
  users {
    edges {
      node {
        id
        name
        phone
        email
      }
    }
  }
}

So, the resolve()function implemented on the node will usersnow have to determine which fields are requested not for it itself, but for its nested child node node, which, as we see, is relative to usersalong the way edges.node.

fieldsListFrom the library it graphql-fields-listwill help to solve this problem as well, for this you should give it the appropriate option path. For example, here is the implementation in our case:

async resolve(
    source: any,
    args:  {[argName: string]: any},
    context: any,
    info: GraphQLResolveInfo,
) {
    const fields = fieldsList(info, { path: 'edges.node' });
    return connectionFromArray(
        await database.query(`SELECT ${fields.join(',')} FROM users`),
        args
    );
}

Also in the real world it may be that in the GraphQL scheme we have only one field name, and in the database schema it corresponds to other field name. For example, suppose that a table of users in a database was defined differently:

CREATETABLEusers (
  idBIGINT PRIMARY KEY 
      AUTO_INCREMENT,
  fullName VARCHAR(255),
  email VARCHAR(255),
  phoneNumber VARCHAR(15),
  address VARCHAR(255)
);

In this case, the fields from the GraphQL query must be renamed before embedding it into the SQL query. fieldsListit will help with this if he is to transfer the name conversion map in the corresponding option transform:

async resolve(
    source: any,
    args:  {[argName: string]: any},
    context: any,
    info: GraphQLResolveInfo,
) {
    const fields = fieldsList(info, {
        path: 'edges.node',
        transform: { phone: 'phoneNumber', name: 'fullName' },
    });
    return connectionFromArray(
        await database.query(`SELECT ${fields.join(',')} FROM users`),
        args
    );
}

And yet, sometimes, converting to a flat array of fields is not enough (for example, if the data source returns a complex structure with nesting). In this case, a function fieldsMapfrom the library will come to the rescue graphql-fields-list, which returns the entire tree of the requested fields as an object:

const { fieldsMap } = require(`graphql-fields-list`);
// ... some resolver implementation on `users`:
resolve(arc, args, ctx, info) {
   const map = fieldsMap(info);
   /*
    RESULT:
    {
      edges: {
        node: {
          id: false,
          name: false,
          phone: false,
        }
      }
    }
   */
}

If we assume that the user is described by a complex structure, we will get it all. This method can also take an optional argument paththat allows you to get a map of only the required branch from the entire tree, for example:

const { fieldsMap } = require(`graphql-fields-list`);
// ... some resolver implementation on `users`:
resolve(arc, args, ctx, info) {
   const map = fieldsMap(info, 'edges.node');
   /*
    RESULT:
    {
      id: false,
      name: false,
      phone: false,
    }
   */
}

The transformation of names on maps is not currently supported and remains at the mercy of the developer.

Query fragmentation

GraphQL supports query fragmentation, for example, we can expect the consumer to send us such a request (here we refer to the original scheme, a bit contrived, but syntactically correct):

query UsersFragmentedQuery { 
  users {
    id
    ...NamesFramgment
    ...ContactsFragment
  }
}
fragment NamesFragment on User {
    name
}
fragment AddressFragment on User {
    address
}
fragment ContactsFragment on User {
  phone
  email
  ...AddressFragment
}

You should not worry in this case, and fieldsList(info), and fieldsMap(info)in this case, return the expected result, because they take into account the possibility of query fragmentation. So, fieldsList(info)returns ['id', 'name', 'phone', 'email', 'address'], and fieldsMap(info), respectively, returns:

{
  id: false,
  name: false,
  phone: false,
  email: false,
  address: false
}

PS

I hope this article has helped shed light on some of the nuances of working with GraphQL on the server, and the graphql-fields-list library can help you in the future to create optimal solutions.

UPD 1

Library version 1.1.0 released - support for directives @skipand @includein queries has been added . By default, this option is enabled, if necessary, disable it as follows:

fieldsList(info, { withDirectives: false })
fieldsMap(info, null, false);

Tags: