morfeusys November 14, 2013 at 17:19

Android AI with open API

Only the lazy today does not know what Siri 's voice assistant is . When two years ago this product was shown at the presentation of the new iPhone 4S, many looked at the development of the IT industry in a new way. Indeed, artificial intelligence in a pocket that understands natural speech has not yet been shown by anyone.

Many at that time began to say that Apple could already at the next WWDC provide all programmers for iOS the opportunity to use Siri's open API assistant for their own programs. The picture loomed bright - any application could respond to user phrases by executing various commands. Indeed, if there are so many different useful applications in the AppStore, why not give them voice control? Moreover, such a type of communication with the user as speech quickly became trendy after the release of the iPhone 4s.

About whether Apple managed to do this, and what we managed to do, read on.

Time passed, but the Siri API did not appear

It should be noted that the majority simply confuses speech recognition and the actual capabilities of an assistant as artificial intelligence . There is a huge difference between these two concepts - speech recognition (speech-to-text) solutions have been on the market for a long time (for example, in Android OS it is available to everyone), but to create an open technology for a dialogue system (with maintaining context, extracting meaning) etc.) no one has succeeded yet. Many also did not think about the number of problems that would arise with the general access of many programs to a single think tank AI in the person of Siri. As well as including completely new technologies that programmers would have to deal with.

The idea of creating a voice assistant with an open and accessible “artificial intelligence” API was already in our heads at that time, and we decided to implement it.

Assistant in Russian

Our small group of initiative developers took up the project, now known as the Assistant in Russian .

It is worth noting that the creation of such a voice platform requires knowledge in such specialized areas as recognition technology (ASR) and speech synthesis (TTS), as well as NLP, which allows you to extract meaning from user speech and manage the dialogue context. It is this component that is the binder for any artificial intelligence system and allows not only to turn speech into text, but also to understand what the user wants. This distinguishes speech recognition technology from artificial intelligence technology.

Our goal was to make an affordable tool for using these technologies.

By the time of launch, the application skillfully solved everyday tasks of the user with the help of speech. And users of the Android version of JellyBean could execute voice commands without an Internet connection.

Artificial Intelligence Open API

From the first day, each “Assistant in Russian” service was created on the basis of the same platform that we planned to open for everyone in the future. This principle is called “ Eating your own dog food ” in English . Thus, we could simultaneously design the voice architecture and functionality of the assistant himself.

The result of our work was an application with an open API and “ hybrid ” NLP technology, which, on the one hand, makes it possible to program a voice interface without any servers, using only your device and Android SDKand, on the other hand, transfer part of the solutions to the cloud as necessary. For example, your contacts are not sent to any servers (hello, Siri), and the list of all cities that the Weather service works with, for example, is not stored on the client.

All assistant services were created by different programmers, some of which do not have special knowledge in the field of ASR, TTS or NLP. At the same time, there were no particular difficulties in using the API of our “Assistant”, since we set ourselves the task of making an open, accessible and understandable platform for everyone.

“Assistant in Russian” takes advantage of interprocess communication(IPC) in the Android OS, so that the assistant itself acts as a voice interface between the user and your own application. At the same time, your application can display its GUI in the assistant interface - for this, RemoteViews and other similar techniques are used.

What the API can do

Due to the API “Assistant in Russian”, you can create much more interesting options where the assistant’s functionality extends beyond the device on which it works. For example, the third-party application “ AssistantConnect ”, using the API of our assistant, makes it possible to control the voice of various devices of the “smart” home and home theater.

At the same time, “AssistantConnect” is a regular Android application that can send requests via HTTP protocol to the XBMC movie theater and ZWave to the Vera smart home controller .

You can also see how, using the same add-on, you can control, for example, a regular browser. All this demonstrates the capabilities of the assistant API, which allows you to create a new type of communication with users.

How to get API

You can try the API in your own projects right now by downloading it from our website . Now we give only a brief description of how to use it. In the following articles, we will describe in more detail the technical details of the implementation of the entire “Assistant in Russian” platform, as well as talk about the nuances of using the API itself.

This article is the very first step in publishing an assistant API. In the near future, much will change, we plan to provide more features, including a catalog of add-ons , with which the user can find all applications with support for voice control in the PlayStore, as well as a commercial SDK for creating your own voice assistants.

The basics

To implement a library with an assistant API in your application, you do not need to learn any new programming languages or technologies. All you need is an Android SDK and IDE for development. We suggest using Android Studio . The libraries are connected simply by specifying the dependencies in the build.gradle file

repositories {
    maven {
        url 'http://voiceassistant.mobi/m2/repository'
    }
}
dependencies {
    compile 'mobi.voiceassistant:base:0.1.0-SNAPSHOT'
    compile 'mobi.voiceassistant:client:0.1.0-SNAPSHOT'
}

The API allows you to establish a connection between your application and “Assistant in Russian” in such a way that all user phrases that are appropriate for your application will be redirected to a special service that you must implement. We call these services Agents.

Agents and Modules

The assistant will extract all the necessary data from the text of the phrase in advance and provide it to the agent in the form of a semantic parse tree - Token. This is due to special grammars (Modules) that you need to form for your service.

A module is a set of commands with templates (patterns) of phrases that your agent should respond to (the syntax of patterns is described in detail in the documentation for the API). The agent can at any time limit the set of such modules available to the user, thereby forming a dialogue context. Here is an example of a simple module:

A module is just an xml file that needs to be stored in the xml resource directory of your application. Here is an example of a simple module with two commands and very simple patterns.
As you can see, the module does not contain any control code, since all code will be described in the class of your agent. This reflects the basic principle of our approach to the voice API - the declarative part that describes the grammar of the dialogue is separated from the control code, which implements the processing logic and is completely independent of the language .

An agent is, in fact, an add-on over regular Android services . It implements the interface between the assistant and the logic of your application.

public class HelloAgent extends AssistantAgent {
    @Override
    protected void onCommand(Request request) {
        switch (request.getDispatchId()) {
            case R.id.cmd_hello:
                onHello(request);
                break;
            case R.id.cmd_name:
                onName(request);
                break;
        }
    }
   ...
}

Here is a simple example of how an agent can process the commands described earlier in the module. The AssistantAgent abstraction provides many different methods for processing commands, managing the dialogue context, invoking third-party activities, etc.

Request contains all the necessary information about the user's request - the identifier of the command, the content of the request (token or something else), a session, etc. For any request, the agent must generate a response - Response, containing the response content and, if necessary, instructions to the assistant about switching the context of the dialogue.

request.addQuickResponse(“Привет!”);

This is an example of forming a quick response in one line. Here is a slightly more complex example:

Response response = request.createResponse();
response.setContent(getString(R.string.hello_say_name));
response.enterModalQuestionScope(R.xml.name);
request.addResponse(response);

Here, the answer, in addition to content in the form of a string (you can transfer other types of content, for example, GUI), also contains information about changing the context of the dialogue. Now the user will have access to commands only from the R.xml.name module, and after the assistant’s voice from the agent answers the microphone will automatically turn on - this is called “modal mode”.

Each agent is a service, and therefore it must be described in the application manifest - AndroidManifest.xml

The main module of the agent and the package of that “Assistant in Russian” with which the agent can work are indicated here.

After building your application and installing it on the “Assistant in Russian” device, it will pick up information from your manifest and load the module. And then it will redirect all suitable requests from the user to your agent if the assistant NLP engine considers that the phrase fits best under the module command patterns.

To be continued

In this post, we very briefly described the basics of using our API, describing the basic principles of working with it. Of course, the assistant library provides many more complex functions: remote and fuzzy patterns, RemoteViews, dynamically changing the response content, extracting data from phrases and much more. All this is described in the documentation , which we will supplement in the course of improvements in the library itself.

We suggest you try voice control in your own projects, join the developer community and help improve this tool.

Tags: