
Asterisk speech recognition through Google + smart IVR

Good day, dear Habra users.
In one project, it was necessary to make a smart IVR based on Asterisk IP-PBX. What is meant by the word “smart”: when calling a certain number, the station asks for the name of the subscriber, the person on the other end of the wire calls the name and the station connects it to the desired subscriber.
In my case, I used the ready-made assembly of AsteriskNow with FreePBX already preinstalled, although in this case it does not play a special role, because the differences will be only in editing the dial plan.
Step 1:
Work with google
The first thing is to somehow recognize the caller’s speech. There were enough ( one , two ) articles on the Habr how to do this using Google Translate. I decided to take ready-made scripts found on the expanses of github: googletts.agi - in order to teach Asterisk to speak and speech-recog.agi - so that Asterisk can recognize speech.
Files googletts.agi and speech-recog.agi are thrown into the folder / var / lib / asterisk / agi-bin.
For successful scripts to work, you must have the following packages: Perl, perl-libwww, IO-Socket-SSL, flac, sox, mpg123. All the packages I successfully downloaded and installed from the repositories (via yum install), with the exception of mpg123, I had to download it separately.
In googletts.agi we change the value of the $ lang variable from en to ru, because we want Asterisk to speak Russian.
In speech-recog.agi, we change the value of the $ language variable from en-US to ru-RU so that Google returns the result in Russian.
Everything, I didn’t touch anything else in the scripts.
Step 2:
Writing a dial plan
As I said above, I have FreePBX installed, so I will make all the changes to the extensions_custom.conf file.
To begin with, it's nice to greet the caller and give him a comment on what to do next. Then, using speech-recog.agi, we listen to what the user says, record, convert, send to Google and get the results from him. Next, using the GotoIf function , we check how the script worked. The script returns the following values: status: returns the execution status. 0 means utterance success : string returned by Google confidence: value from 0 to 1, indicating the probability of correct recognition
exten => 100,1,Answer()
exten => 100,n,agi(googletts.agi,”Здравствуйте! После звукового сигнала произнесите имя абонента.”,ru)
exten => 100,n(record),agi(speech-recog.agi,ru-RU)
exten => 100,n,GotoIf($[$["${status}" = "0"] & $["${confidence}" > "0.8"]]?if1:retry)
If the test succeeds, go to the if1 event, if it fails, go to the retry event, in which we ask the user to repeat.
exten => 100,n(retry),agi(googletts.agi,”Пожалуйста, повторите”,ru)
Next, we proceed to work directly with the line itself, which we received from Google. It is necessary to compare the received string $ {utterance} with some template and decide what to do next. We will use the GotoIf function.
exten => 100,n(if1),GotoIf($[“${utterance}” = “вася”]?vasya:retry)
If the line received from Google matches “Vasya”, go to the vasya event, if it does not match, we ask the user to repeat.
And it remains only to call Vasya
exten => 100,n(vasya),Dial(SIP/101)
Dial plan in full:
exten => 100.1, Answer ()
exten => 100, n, agi (googletts.agi, ”Hello! After the beep, say the name of the subscriber.”, ru)
exten => 100, n (record), agi (speech -recog.agi, ru-RU)
exten => 100, n, GotoIf ($ [$ ["$ {status}" = "0"] & $ ["$ {confidence}"> "0.8"]]? if1 : retry)
exten => 100, n (if1), GotoIf ($ [“$ {utterance}” = “Vasya”]? vasya: retry)
exten => 100, n (retry), agi (googletts.agi, ” Please repeat ”, ru)
exten => 100, n (vasya), Dial (SIP / 101)
exten => 100, n, agi (googletts.agi, ”Hello! After the beep, say the name of the subscriber.”, ru)
exten => 100, n (record), agi (speech -recog.agi, ru-RU)
exten => 100, n, GotoIf ($ [$ ["$ {status}" = "0"] & $ ["$ {confidence}"> "0.8"]]? if1 : retry)
exten => 100, n (if1), GotoIf ($ [“$ {utterance}” = “Vasya”]? vasya: retry)
exten => 100, n (retry), agi (googletts.agi, ” Please repeat ”, ru)
exten => 100, n (vasya), Dial (SIP / 101)
Variations on a Theme
- For simplicity of the example, I cited only one condition for comparing what they heard with the template, in fact there can and should be more.
- Not very nice phrases that say "iron lady" from Google. In the working version, of course, it’s nice to record phrases with a “live” voice and play them using the Playback function.
Subtleties
With this type of work with Google Translate, it should be borne in mind that it works well, but not completely, and this must be taken into account when creating those templates with which we will compare the result obtained from Google.
Here is an example of a rake that I stepped on:
My name is Cyril (two "L" at the end). Google for some reason only known to him once again returned either “Cyril” or “Cyril”.
Afterword
There is a suspicion that the comparison could be implemented in some more technological way, I will be happy to hear your opinion and suggestions in the comments.
And there remains an open question on a scale: what will happen if there are a lot of subscribers, how long will it take to go through all the comparisons, if, of course, they are implemented using the method I proposed. But for a small PBX for about 20 subscribers, this method is acceptable.
Thanks for attention.
UPD
Examples
As an example, I used a slightly different dial plan, but the essence of this does not change.
Dial plan example:
exten => 8251,1, Answer ()
exten => 8251, n, MixMonitor (/ var / spool / asterisk / monitor / 8251 / $ {CDR (start)} - $ {DST-NUM} - $ {ID_CALL} - full.wav)
exten => 8251, n, agi (googletts.agi, “Please tell the name of the subscriber with whom to connect you.”, ru)
exten => 8251, n (record), agi (speech-recog.agi, ru
-RU ) exten => 8251, n, GotoIf ($ [$ ["$ {status}" = "0"] & $ ["$ {confidence}"> "0.5"]]? If1: retry)
exten => 8251, n (if1), GotoIf ($ ["$ {utterance}" = "alexander"]? Al: retry)
exten => 8251, n (al), Dial (SIP / 8201)
exten => 8251, n ( retry), agi (googletts.agi, “Please repeat?”, ru)
exten => 8251, n, goto (record)
exten => 8251, n, MixMonitor (/ var / spool / asterisk / monitor / 8251 / $ {CDR (start)} - $ {DST-NUM} - $ {ID_CALL} - full.wav)
exten => 8251, n, agi (googletts.agi, “Please tell the name of the subscriber with whom to connect you.”, ru)
exten => 8251, n (record), agi (speech-recog.agi, ru
-RU ) exten => 8251, n, GotoIf ($ [$ ["$ {status}" = "0"] & $ ["$ {confidence}"> "0.5"]]? If1: retry)
exten => 8251, n (if1), GotoIf ($ ["$ {utterance}" = "alexander"]? Al: retry)
exten => 8251, n (al), Dial (SIP / 8201)
exten => 8251, n ( retry), agi (googletts.agi, “Please repeat?”, ru)
exten => 8251, n, goto (record)
Record link with successful recognition.
Astesterisk's conclusion:
- Executing [8251 @ from-internal: 1] Answer ("SIP / 8211-00000000", "") in new stack
- Executing [8251 @ from-internal: 2] MixMonitor ("SIP / 8211-00000000," / var / spool / asterisk / monitor / 8251 / 2013-04-24 10:28:03 --- full.wav ") in new stack
- Executing [8251 @ from-internal: 3] AGI (" SIP / 8211-00000000 "," Googletts.agi, "Please tell the name of the subscriber with whom you are connected.", Ru ") in new stack
- Launched AGI Script /var/lib/asterisk/agi-bin/googletts.agi
== Begin MixMonitor Recording SIP / 8211-00000000
- Playing '/ tmp / 16ae8d012843179807cfdabd9a34608f' (escape_digits =) (sample_offset 0)
- Playing '/ tmp / ef3ccb070117857a8045932052f3fd7b' (escape_digits =) (sample_offset 0)
-AGI Script googletts.agi completed, returning 0
- Executing [8251 @ from-internal: 4] AGI ("SIP / 8211-00000000", "speech-recog.agi, ru-RU") in new stack
- Launched AGI Script / var / lib / asterisk / agi-bin / speech-recog.agi
-Playing 'beep.ulaw' (language 'en')
-AGI Script speech-recog.agi completed, returning 0
- Executing [8251 @ from-internal: 5] GotoIf ("SIP / 8211-00000000", "1? If1: retry") in new stack
- Goto (from-internal, 8251.6)
- Executing [8251 @ from-internal: 6] GotoIf ("SIP / 8211-00000000", "1? Al: retry") in new stack
- Goto (from-internal, 8251.7)
- Executing [ 8251 @ from-internal: 7] Dial ("SIP / 8211-00000000", "SIP / 8201") in new stack
== Using SIP RTP TOS bits 184
== Using SIP RTP CoS mark 5
- Called SIP / 8201
- SIP / 8201-00000001 is ringing
- SIP / 8201-00000001 answered SIP / 8211-00000000
- Executing [h @ from-internal: 1] Hangup ("SIP / 8211-00000000", "") in new stack
== Spawn extension (from-internal, h, 1
) exited non-zero on 'SIP / 8211-00000000'
== MixMonitor close filestream
== End MixMonitor Recording SIP / 8211-00000000
- Executing [8251 @ from-internal: 2] MixMonitor ("SIP / 8211-00000000," / var / spool / asterisk / monitor / 8251 / 2013-04-24 10:28:03 --- full.wav ") in new stack
- Executing [8251 @ from-internal: 3] AGI (" SIP / 8211-00000000 "," Googletts.agi, "Please tell the name of the subscriber with whom you are connected.", Ru ") in new stack
- Launched AGI Script /var/lib/asterisk/agi-bin/googletts.agi
== Begin MixMonitor Recording SIP / 8211-00000000
- Playing '/ tmp / 16ae8d012843179807cfdabd9a34608f' (escape_digits =) (sample_offset 0)
- Playing '/ tmp / ef3ccb070117857a8045932052f3fd7b' (escape_digits =) (sample_offset 0)
-
- Executing [8251 @ from-internal: 4] AGI ("SIP / 8211-00000000", "speech-recog.agi, ru-RU") in new stack
- Launched AGI Script / var / lib / asterisk / agi-bin / speech-recog.agi
-
-
- Executing [8251 @ from-internal: 5] GotoIf ("SIP / 8211-00000000", "1? If1: retry") in new stack
- Goto (from-internal, 8251.6)
- Executing [8251 @ from-internal: 6] GotoIf ("SIP / 8211-00000000", "1? Al: retry") in new stack
- Goto (from-internal, 8251.7)
- Executing [ 8251 @ from-internal: 7] Dial ("SIP / 8211-00000000", "SIP / 8201") in new stack
== Using SIP RTP TOS bits 184
== Using SIP RTP CoS mark 5
- Called SIP / 8201
- SIP / 8201-00000001 is ringing
- SIP / 8201-00000001 answered SIP / 8211-00000000
- Executing [h @ from-internal: 1] Hangup ("SIP / 8211-00000000", "") in new stack
== Spawn extension (from-internal, h, 1
) exited non-zero on 'SIP / 8211-00000000'
== MixMonitor close filestream
== End MixMonitor Recording SIP / 8211-00000000
Link to NOT successful recognition.
Astesterisk's conclusion:
- Executing [8251 @ from-internal: 1] Answer ("SIP / 8211-00000002", "") in new stack
- Executing [8251 @ from-internal: 2] MixMonitor ("SIP / 8211-00000002", "/ var / spool / asterisk / monitor / 8251 / 2013-04-24 10:36:29 --- full.wav ") in new stack
- Executing [8251 @ from-internal: 3] AGI (" SIP / 8211-00000002 "," Googletts.agi, "Please tell the name of the subscriber with whom you are connected.", Ru ") in new stack
- Launched AGI Script /var/lib/asterisk/agi-bin/googletts.agi
== Begin MixMonitor Recording SIP / 8211-00000002
- Playing '/ tmp / 16ae8d012843179807cfdabd9a34608f' (= escape_digits) (sample_offset 0)
- Playing '/ tmp / ef3ccb070117857a8045932052f3fd7b' (= escape_digits) (sample_offset 0)
-AGI Script googletts.agi completed, returning 0
- Executing [8251 @ from-internal: 4] AGI ("SIP / 8211-00000002", "speech-recog.agi, ru-RU") in new stack
- Launched AGI Script / var / lib / asterisk / agi-bin / speech-recog.agi
-Playing 'beep.ulaw' (language 'en')
-AGI Script speech-recog.agi completed, returning 0
- Executing [8251 @ from-internal: 5] GotoIf ("SIP / 8211-00000002", "1? If1: retry") in new stack
- Goto (from-internal, 8251.6)
- Executing [8251 @ from-internal: 6] GotoIf ("SIP / 8211-00000002", "0? Al: retry") in new stack
- Goto (from-internal, 8251.8)
- Executing [ 8251 @ from-internal: 8] AGI ("SIP / 8211-00000002", "googletts.agi," Please repeat? ", Ru") in new stack
- Launched AGI Script / var / lib / asterisk / agi-bin / googletts.agi
- Playing '/ tmp / 0c5de11c17dda57dabeaebe335110036' (escape_digits =) (sample_offset 0)
-AGI Script googletts.agi completed, returning 0
- Executing [8251 @ from-internal: 2] MixMonitor ("SIP / 8211-00000002", "/ var / spool / asterisk / monitor / 8251 / 2013-04-24 10:36:29 --- full.wav ") in new stack
- Executing [8251 @ from-internal: 3] AGI (" SIP / 8211-00000002 "," Googletts.agi, "Please tell the name of the subscriber with whom you are connected.", Ru ") in new stack
- Launched AGI Script /var/lib/asterisk/agi-bin/googletts.agi
== Begin MixMonitor Recording SIP / 8211-00000002
- Playing '/ tmp / 16ae8d012843179807cfdabd9a34608f' (= escape_digits) (sample_offset 0)
- Playing '/ tmp / ef3ccb070117857a8045932052f3fd7b' (= escape_digits) (sample_offset 0)
-
- Executing [8251 @ from-internal: 4] AGI ("SIP / 8211-00000002", "speech-recog.agi, ru-RU") in new stack
- Launched AGI Script / var / lib / asterisk / agi-bin / speech-recog.agi
-
-
- Executing [8251 @ from-internal: 5] GotoIf ("SIP / 8211-00000002", "1? If1: retry") in new stack
- Goto (from-internal, 8251.6)
- Executing [8251 @ from-internal: 6] GotoIf ("SIP / 8211-00000002", "0? Al: retry") in new stack
- Goto (from-internal, 8251.8)
- Executing [ 8251 @ from-internal: 8] AGI ("SIP / 8211-00000002", "googletts.agi," Please repeat? ", Ru") in new stack
- Launched AGI Script / var / lib / asterisk / agi-bin / googletts.agi
- Playing '/ tmp / 0c5de11c17dda57dabeaebe335110036' (escape_digits =) (sample_offset 0)
-