As I searched for the perfect tool for designing conversational interfaces, or In the search for the Holy Grail
Pavel Guay, KODE Android Developer
Hi, my name is Pavel pavelgvay , I work in the Kaliningrad studio for developing mobile applications KODE and about a year ago I was actively immersed in the process of developing applications for Google Assistant and just stuck on the interface design stage, which became a real creative outlet after the lines of code.
Having developed a dozen projects, speaking at several conferences, meeting with the developers of the Google Assistant, which, incidentally, will very soon speak Russian , exchanging experiences with developers, studios and even the author of the book , I seriously thought about optimizing the design and testing of voice applications, which can already be done even for Alice.
It was this idea that gave me a motivational kick, sent me on a long journey through existing tools and an analysis of their shortcomings, and led to the expected conclusion - about him at the end of the article, but for now about the present.
For those who have not yet felt the conversational interfaces from the inside, I’ll explain what the design of such an application is all about.
A good conversational application differs from chatbots in the absence of coercion to use specific commands - here the user builds up a free dialogue with the service, similar to communicating with a real person. The main thing is voice and text, but if the device has a screen, the application can connect visual accompaniment in the form of cards, carousels, lists for better reporting.
Take, for example, “pizza order”: Imagine how many different phrases you can use to tell the app that you want pizza. The user can name a specific name, or may ask him to advise him on options with mushrooms and ham, or ask him to read the entire available list and choose from it, or maybe just say that he is hungry.
All these are plot development options. Here's what we need to provide: every single step in every possible path of each of the application scenarios. Plowed field! And we still haven't ordered pizza!
The design (or design, as you prefer) of the conversational interface, regardless of the platform, goes through a standard set of steps. Detailed guidelines can be found on the developers of Google Assistant , Amazon Alexa , Microsoft Cortana themselves , but I put this in a short checklist:
- We identify people - each person is a collective image of a representative of a group of the application’s audience, behind which there is a certain set of phrases based on stereotypes of his behavior.
- We filter scenarios - we sort possible conversation options by their applicability to a real dialogue with a person. Sounds weird? Then discard. We are writing dialogue examples for these scenarios.
- We create a character - since we are for the naturalness of the dialogue, the image of the person with whom he communicates should be formed in the interlocutor of our application. Add a name, draw appearance, skills, a brief biography, character and, of course, voice ( SSML - markup language for speech).
We are building a dialogue tree - in order to take into account all variants of the course of events, all the steps that will lead the user to a hypothetical “order pizza” should be visualized all the actions.
- Work with phrases - each step involves at least 5-10 variations of both replicas from the interface, which makes the conversation live, as well as the user, which will help in speech recognition.
- Testing - whether all branches of the dialogue have been taken into account, whether there are logical deadlocks, chopped phrases - for this it is necessary to check all the scenarios by talking them to someone.
Houston, we have a problem
The root of all the problems of the designer of conversational interfaces is a huge amount of information. Scenarios, options for their passage, dialog trees, steps that can be typed in a small application a hundred pieces. All this mass of information needs to be stored somewhere, somehow synthesized, verified, tested, transferred to development, given to the customer, and there are simply no recommendations on choosing a tool from the developers of voice assistants in the guidelines.
Having designed the first applications, I reduced all my pains to the main set of problems:
- A huge dialogue map - a detailed and visual path from point A to point B, the entire maze of the user’s tangled movement to the goal - the usual white board is not suitable for such a task (just imagine what type of pins you need to write words and then drag this Talmud to developers), and because you still have to agree with the team on the conventions that we use on the map. Darkness!
- Manual slave labor - a lot of time must be spent not only on posting information, but also on synchronizing edits and changes. All phrases cannot be placed on the card, so you have to keep them in the table. A lot of time has to be spent on manual synchronization of all the information that we have. Since all the actions are performed manually and are not immune to common mistakes and typos, you have to double-check yourself a hundred times.
- Quality mark - each time to check the quality of the work done, you have to manually collect the transcript of the dialogue, constantly switching between a document with a transcript, a dialogue map and a table with phrases. This is a terribly boring and long process that discourages the desire to control the quality of their work completely.
The result of this ongoing struggle with pain is not only the extended development time, but also the loss of quality due to carelessness, fatigue and, of course, loss of motivation.
A number of tools have already appeared on the network that should facilitate the process, but their functionality is quite limited.
Criteria for evaluation
In order not to be unfounded in my analysis and subjective criticism, in the best traditions of scientific research, I took the same part of a real application that I worked on and tried to implement it using the proposed tools.
I summarized all the results in a table and evaluated each set of services according to three main criteria, giving them a rating on a 5-point scale:
- visibility of the dialogue map;
- ease and quality of testing;
- ease of editing and synchronization.
White board (Realtimeboard)
Let's start with the “classical” approach: we build a dialogue map on a white board, or rather, in its digital analogue - Realtimeboard . Character descriptions and phrases will be stored in Google Docs .
Before building a map, you will have to work out your conventions - again a waste of time, and when building a map, each step is drawn and aligned manually - it comes out slowly, but visually the map becomes clearer.
The process of collecting materials for testing takes a lot of time. It looks something like this: they looked at the map, then took the phrase from the table and entered it into the document. No flexibility, continuous routine and constant switching between tools.
Editing and Syncing
Editing a map is easy: you can swap steps, move entire branches, and select individual elements in groups. But you have to manually synchronize the card with the table of phrases - again a scratching feeling of lost data.
We put “good” Realtimeboard for visibility and flexible adjustment of the work methodology for the designer. We threaten with a finger for the lengthy testing process and the manual synchronization of the phrase table with the map.
- Map - 5/5
- Editing and Syncing - 0/5
- Testing - 0/5
The map is formed step by step: there are signs for the user and interface, it can be divided into scripts. In the process of building you catch minor inconveniences, for example, the need to constantly save changes. At the same time, the map is absolutely linear: the transitions are not displayed in any way (the links and forks on the screen are already added independently).
The service allows you to test scripts with your voice, but a textual analogue of phrases is not available, there is no way to go back a couple of steps (you have to start again), speech recognition is available only for three languages and works poorly. For testing, this mode is useless, because there is no way to look at the history of the dialogue, you still have to collect dialogs into a file. Fortunately, gathering dialogs here is easy. By clicking on the button, the tool itself will show you the possible dialogs. There are many problems and inconveniences (for example, you cannot collect two scripts into one file; you cannot download a file, just view it in the tool), but this already saves us time on testing.
Editing and Syncing
Making changes to the map is inconvenient: dragging and dropping elements is possible only within one scenario, groupings are not available.
Sayspring eliminates the routine work of collecting materials for testing and synchronizing the phrase table with the map, since the replicas are assigned to the steps. These are the only pluses.
The card is beloved; working with it is difficult and inconvenient. Voice testing works, but is useless, since there is no way to read the replicas, look at the history, and downloading dialogs is limited.
- Map - 0/5
- Editing and Syncing - 3/5
- Testing - 3/5
On the map, the forks and connections between the steps are clearly visible. It is interactive: by clicking on a step the editing of an element opens.
There is no division into scripts, which will lead to a large number of repetitions and a huge confusing block diagram.
However, there is no way to choose the steps: in fact, we do not control the process, but watch a video, which makes the mode useless.
Editing and Syncing
Since phrases and the map are stored separately, the synchronization problem remains. Making changes to the map is quite convenient, there is drag-and-drop, but you cannot select several elements and make a general action on them.
By the way, the service implements the so-called build-mode: you can embed variables in phrases and access them through the API. Thus, the tool can become a custodian of content. What exactly is not clear, because you can specify only one version of the phrase.
The tool is more likely created for rapid prototyping of simple applications, and not for full-fledged design. Testing does not work, leaving the collection problem open. Downloading dialogs is available only in MP4, GIF or AVI format.
- Map - 2/5
- Editing & Syncing - 1/5
- Testing - 1/5
The connections between the steps are poorly implemented, it is impossible to change the curves, and they are built on top of everything, greatly reducing the readability of the map.
As in realtimeboard, before building a map you will have to work out a legend.
There is nothing in the tool for collecting materials, the problem has not been solved at all.
Editing and Syncing
It’s convenient to work with the map: selection and dragging of elements is available. Since phrases are stored separately, the synchronization problem remains.
The process of building a map is very convenient, the map itself is quite visual, but there is a problem with the connections between the steps. Problems with testing and synchronization of the phrase table and map are not resolved.
- Map 3/5
- Editing and Syncing - 0/5
- Testing - 0/5
Complained, and what's next
It is clear that the study did not consider all the available options (I will be glad to your advice in the comments), but we can draw a clear conclusion on the analyzed services - not a single tool is like the Holy Grail. The temporary solution for me personally is the combo from Realtimeboard + Google Sheets + Google Docs.
However, I did not put up with the loss of time and effort on design and set out to develop my own tool - Tortu .
Functional development directly depends on the opinion of interested developers. Especially for this, I prepared several questions that will help me orient myself. I would be grateful if you could help me and fill out the form . Filling takes no more than 5-7 minutes.
If you are interested in the topic of conversational interfaces, and you want to learn more about design, development, or you have any questions, then feel free to send a message to my telegram chat dedicated to conversational interfaces, where a small community of developers and designers has already gathered.
Only registered users can participate in the survey. Please come in.