descine March 9, 2017 at 18:57

Speech AI with Python & Google API

Good afternoon!

More recently, the idea came up with the idea of making a "talker" in Russian. In my head was a simple scheme like this:

1) Recognize speech from the microphone
2) Come up with a more or less reasonable answer.
At this point, you can do a lot of interesting things.
For example, to implement control of something physical and not very.
3) Convert this very answer to speech and reproduce.

The most interesting thing is that for all these points libraries were found under Python, which I used.

The result was a bunch, almost independent of the chosen spoken language.

Speech recognition

Speech Recognition

This library is a wrapper over many popular speech recognition services / libraries.
Because Of all the services presented in the list of libraries, Google Speech Recognition was the first to work, and I used it in the future.

Speech processing

Chatterbot

The library uses machine learning methods. Training takes place on data sets in dialogue format.

The learning process in the chatterbot library

Files of such a simple format can act as data sources for training .
In fact, they are a set of dialogs in the form of:

- Вопрос
- Ответ
- ...
- Ответ

For English, there is a good set of training classes, one of which takes dialogs from Ubuntu Dialog Corpus, and the other from Twitter.

Unfortunately, for the Russian language, I did not find alternatives to Ubuntu Dialog Corpus (of the same volume). Although the same TwitterTrainer should work.

As an experiment, I tried to use dialogs from the first volume of Warriors and Peace when learning.

It turned out funny, but unlikely, because dialogues there are often aimed at certain characters of the novel.

Since without a large amount of data it is difficult to get an interesting person from the bot, the search for a good base for dialogs is ongoing.

The chatterbot library also provides a set of "LogicAdapter". With the help of which you can, for example, filter the answer, teach the bot to count or say the current time.

The library is quite flexible, it allows you to write your own classes for training and logical modules.

Speech synthesis and reproduction

Google Text to Speech

This library can convert a string to an mp3 file with speech. Because Since Google is behind this library, there are many languages to choose from, including Russian.

First successes

Project code

Available at: GHub

How to install and run?

Immediately I want to advise you to create a separate virtual environment for python.
For example, using conda .

conda create --name speech_ai
source activate speech_ai
conda install python=3.5

For experiments with the above set of libraries is suitable:

python 3 (since there is no hassle with non-ascii characters in it, as in Python 2)

Install packages according to the instructions from the sites:

Also, when installing SpeechRecognition, sometimes you need to help one dependency (PyAudio):

sudo apt-get install python-pyaudio python3-pyaudio 
pip3 install pyaudio

chatterbot advises using MongoDB to work in production.
By default, a Json file is used as a data store, which leads to a multiple slowdown in training on medium-sized samples.

What's next?

From thoughts:

To diversify the logic of the bot, for example by adding a search query adapter to Google
Use Computer Vision here, for example, to voice the objects you saw or the names of people who passed by
Add emotion bot using state machine
Try to train the bot on Ubuntu Dialog Corpus
Use similar in robotics (for smart home)

Tags:

Speech AI with Python & Google API

Speech AI with Python & Google API

Good afternoon!

Speech recognition

Speech Recognition

Speech processing

Chatterbot

Speech synthesis and reproduction

Google Text to Speech

First successes

Project code

What's next?

Also popular now: