API accessibility: natural language interfaces

Original author: Ahmed Hassan Awadallah, Ryen W. White

Transfer

Application programming interfaces (APIs) are playing an increasingly important role in both the virtual and physical world thanks to the development of technologies such as service-oriented architecture, cloud computing and the Internet of things (IoT). Today, our colleagues from Microsoft Research shared their insights into the Natural Language Interfaces area (natural language interfaces). Join now!

Hosted weather, sports, and finance web services in the cloud provide data and services to end users via a web API, and IoT devices allow other devices on the network to use their functionality.

Typically, APIs are used in a variety of software: desktop applications, websites, and mobile applications. They also serve users through a graphical user interface (GUI). Graphical interfaces have made a great contribution to the popularization of computers, but as computer technology developed, their numerous limitations increasingly manifest themselves. As devices become smaller, more mobile, and smarter, more and more demands are placed on the graphic image on the screen, for example, with respect to portable devices or devices connected to the IoT.

Users are also forced to get used to a large variety of graphical interfaces when using different services and devices. As the number of available services and devices grows, so does the cost of training and adapting users. Natural Language Interfaces (NLI) demonstrate significant potential as a single intelligent tool for a wide range of server-side services and devices. NLIs have incredible capabilities for defining user intentions and recognizing contextual information, which makes applications, such as virtual assistants, much more convenient for users.

We studied natural language interfaces for APIs (NL2API). Unlike general-purpose NLIs, such as virtual assistants, we tried to figure out how to create NLIs for individual web APIs, such as the API for the calendar service. In the future, such NL2API will be able to democratize the API, helping users interact with software systems. They can also solve the problem of scalability of general-purpose virtual assistants by providing the possibility of distributed development. The usefulness of a virtual assistant largely depends on the breadth of its capabilities, that is, on the number of services it supports.

However, integrating web services into a virtual assistant one by one is incredibly hard work. If individual web services providers had an easy way to create NLI for their APIs, the integration costs could be significantly reduced. A virtual assistant would not need to handle different interfaces for different web services. It would be enough for him to simply integrate the individual NL2API, which achieve uniformity thanks to natural language. NL2API can also facilitate the development of web services, programming recommendation systems and API assistance. This does not have to memorize a large number of available web APIs and their syntax.

Figure 1. Ways to create a natural language interface for the API. Left: traditional methods teach language perception models to recognize intentions in natural language, other models to extract cells associated with each of the intentions, and then manually match them with requests to the API. Right: our method can translate natural language directly into API requests.

The main task of NL2API is to recognize the natural language expressions of the user and translate them into an API request. To be more precise, we focused on the web API, created in the image of the REST architecture, that is, the RESTful API. RESTful API is widely used in web services, devices for IoT, as well as in applications for smartphones. An example of the Microsoft Graph API is shown in Figure 1.

The left side of the figure shows the traditional method of constructing a natural language, in which we teach language perception models to recognize intentions, and other models to extract cells associated with each of the intentions, and then manually match them with API requests by writing code. Instead, (as shown on the right side of the figure), we can learn how to translate expressions from natural language directly into API requests. As part of the study, we use our system for API-interfaces from the Microsoft Graph API . Microsoft Graph's Web APIs allow developers to access data that improves productivity: email, calendar, contacts, documents, directories, devices, and more.

Figure 2. Microsoft Graph's Web APIs allow developers to access data that improves performance: mail, calendar, contacts, documents, directories, devices, and more.

One of the requirements for the model we are developing is the ability to create a detailed user interface. Most existing NLIs can do little to help users when a command has been incorrectly recognized. We assume that more detailed user interaction can make NLI much more convenient.

We have developed a modular model that works on the “from sequence to sequence” principle (see Figure 3) in order to provide detailed interaction with the NLI. To do this, we use an architecture that works on the principle “from sequence to sequence”, but at the same time we divide the result of the decryption into several interpretable units, called modules.

Each module tries to predict a predetermined result, for example, using a certain parameter based on the statement coming into the NL2API. After a simple comparison, users will be able to easily understand the result of the forecast of any module and interact with the system at a modular level. Each module in our model generates consistent results, not a continuous state.

Figure 3. Modular model working from sequence to sequence. The controller activates several modules, each of which creates a parameter.

Modules: first we define what a module is. The module is a specialized neural network designed to perform a specific task of sequence prediction. In NL2API, different modules correspond to different parameters. For example, in GET-Messages API, the modules will be FILTER (sender), FILTER (isRead), SELECT (attachments), ORDERBY (receivedDateTime), SEARCH, etc. The task of the module is to recognize the incoming statement and activate the full parameter if activated. To do this, the module must determine the values of its parameter based on the incoming statement.

For example, if an incoming statement sounds like “unread letters about a doctoral dissertation”, the SEARCH module should predict that the value of the SEARCH parameter is a “doctoral dissertation” and generate the full parameter “SEARCH doctoral dissertation” as an output sequence. By analogy, the FILTER (isRead) module should remember that phrases such as “unread emails”, “emails that have not been read” and “still unread emails” indicate that the value of its parameter should be “False” .

It is quite natural that the next step was the creation of module-decoders, which determine what should be focused on, as in the usual “from sequence to sequence” model. However, instead of a single decoder, which is used for everything, we now have several decoders, each of which specializes in predicting specific parameters. Moreover, since each module has a well-defined terminology, it becomes much easier to configure user interaction at the modular level.

Moderator: only a few modules will be used for each introductory phrase. The task of the regulator is to determine which modules it will run. Thus, the regulator is also a decoder that determines what to focus on. By coding a statement and transforming it into input data, it creates a sequence of modules, called a circuit. Then the modules create the corresponding parameters, and, finally, the parameters are combined to form the final request to the API.

By breaking the complex prediction process in the standard “from sequence to sequence” model into small, highly specialized prediction units called modules, the predictive model will be easy to explain to users. Then, with the help of feedback from users, it will be possible to correct possible prediction errors at the lowest level. In our study, we test our hypothesis by comparing the interactive NLI with its non-interactive version, both with the help of simulation and with the help of experiments involving people using actually working APIs. We can demonstrate that using interactive NLI, users achieve success more often and faster, which leads to a higher level of user satisfaction.

Very soon we will publish an article., more about the creation of natural language interfaces for web APIs. Stay tuned!

Tags:

API accessibility: natural language interfaces

Also popular now: