How speech recognition is used in business

A revolution is ripening in the call tracking market: we are on the verge of introducing speech recognition technology. Recently, it found application in American services, how soon the trend will come to Russia, what benefits business owners will receive, says Yevgeny Vlasov, CEO of Calltouch.
Background
Let us first examine what speech recognition is. In the scientific community, the process of converting a speech signal into digital information (for example, text data) is called that. The opposite of technology is talking robots that convert digital information into a speech signal.
The first speech recognition device appeared in 1952, it understood the numbers spoken by a person. In the early 1990s, programs were released that allowed people with disabilities to work with text. But the technology did not get widespread, because recognition was inaccurate due to several problems:
- arbitrary, naive user;
- spontaneous speech, accompanied by agramatisms and “speech garbage”;
- acoustic noise and distortion;
- speech interference.

In addition, the same word can sound differently if a person speaks with an accent, incorrectly puts emphasis, changes the pace and volume of speech. Sometimes these details affect understanding between people, not to mention computers.
However, gradually the programs learned to recognize the voice and the first language they understood was English: it is widespread, simple enough (simpler than Russian and Chinese), and therefore requires less complex mathematical algorithms. The market for Western IT technologies has grown rapidly and, thanks to high competition, speech recognition soon became widely used in business.
Conversation business
In Russia, only Yandex has achieved the greatest success so far: in 2013, the company launched SpeechKitCloud cloud technology, which helps to synthesize and recognize speech.
Synthesis occurs through a statistical approach in acoustic modeling. Simply put, the program forms a new voice, based on the intonations of living people. This allows you to give artificial speech an emotional coloring (kind, evil, neutral) or to endow with sexual characteristics (man, woman). At the time of writing, the service was offering free testing for a month, with a further estimated cost of $ 5 per 1000 requests.
Unfortunately, large companies are not in a hurry to use this technology to the fullest, but there are still enthusiasts. For example, Oktell, a Russian developer of call center automation systems, uses SpeechKitCloud to create greetings and voice menus that callers hear, as well as record answers to frequently asked questions. The technology is applied simultaneously with the work of the call center, reducing the burden on operators.
Colleagues from Repka.UA almost succeeded in replacing people with cars. They connected the SpeechKitCloud speech synthesizer and the online store accounting system, developed an order confirmation script, as a result, Christina appeared, which automatically checks the availability of the product and its price when the order is received, calculates the dispatch date and calls the customer for confirmation.

If a person has questions, the call is transferred to the call center operator. Naturally, at first, Christina’s speech recognition rate was low, in unusual situations, she could not replace a person and connected to work during the peak of incoming calls.

But, thanks to the creation of its own speech model, the company managed to increase its accuracy; and now, the cost of a robot is 5 times lower than that of an operator and 8 times lower than an external call center.
Another feature SpeechKitCloud - speech recognition, allows customers to place an order over the phone in automatic mode. Today, in addition to the standard answers (“issue”, “delivery”, “confirm”), the technology recognizes phrases such as “let's arrange”, “I don’t know”, “I will pick it up”, “okay”. At the same time, the system recognizes about 82-95% of Russian speech, depending on the original sound, coding quality, intelligibility and pace of speech, the complexity of phrases and their length. The use of technology, as in the case with speech synthesis, primarily reduces the load on the call center, and in the future, provided that the quality is improved, it can completely replace it.
Near future
Now the systems are used only for receiving and distributing outgoing and incoming calls. However, we at Calltouch are sure that this is not the limit and by the end of the year we plan to complete the integration of speech recognition technology with the call tracking service, which will allow us to take the optimization of advertising campaigns and business processes to a whole new level.
Take for example the distribution of calls. Most businessmen today want to learn how to manage the flow of phone calls and separate those who call to make a purchase from those who are interested in consulting.
For example, a toy store that advertises on Yandex.Direct, Yandex Advertising Network (YAN), and the VKontakte social network, I wonder what source brings him calls that end with purchases. Suppose that VKontakte leads mainly those who want to get a free consultation, YAN - calls to the service, and Yandex.Direct - sales. In this case, it is worth redistributing the budgets in favor of Yandex.Direct, while minimizing the cost of advertising on the social network. But, not knowing how the calls of potential customers are distributed, this is impossible.
Today, there are two separation methods:
- Auto. When calling, the voice system is triggered, which offers to make a choice: button “1” transfers to the sales department, button “2” - to the service center. This information is sent to the call tracking system and analyzed.
- Mechanical . The company secretary, realizing that the customer is interested in buying, presses the number “1”, if the services of the service - the number “2”. The service marks the first calls as “sales”, the second as “service” and builds an analysis based on this data.
Both methods depend on the human factor. In the first case, it is inconvenient for the client to perform an additional action (press the buttons), and he will hang up or press another number. In the second, the secretary may forget to mark the call or “wind up” the result if, for example, his KPI depends on the number of “sales” calls.
With the advent of new technology, the human factor can be eliminated. If you teach the system to understand the keywords that are most often used in advertisements, it itself will divide them into groups and mark the call as “sales” or “service”.
Speech recognition will also help control employees. For example, the owner of a company has a feeling that subordinates do not modify, are rude to customers or miss calls. You can verify this in only one way - by listening to the records of telephone conversations. But this will have to spend time and money if you hire an employee to complete the task. Whereas the speech recognition call tracking service will automatically indicate existing problems. To do this, scripts and templates for “correct” communication with clients are downloaded to the system and it is determined how many times an employee must say these words. If the program detects them in a conversation, it means that the manager behaved correctly. Call tracking, on the other hand, will help identify aggressive behavior or improper conversation. Naturally, the method will not give an absolute result,
By analyzing the information provided by the technology, you can increase sales. For example, the program will identify words that have never been used in advertising campaigns, but most customers pronounce them when they call the store. It’s useful to insert such phrases into ads, this will expand the audience of the context and bring more effect than the usual text written by marketers.
Of course, it takes time for Russian call tracking services to understand speech. But there is no doubt that users will appreciate the opportunities that technology will open.
Source: SearchEngines.ru .