Speech AI with Python & Google API

    Speech AI with Python & Google API


    image


    Good afternoon!


    More recently, the idea came up with the idea of ​​making a "talker" in Russian. In my head was a simple scheme like this:


    1) Recognize speech from the microphone
    2) Come up with a more or less reasonable answer.
    At this point, you can do a lot of interesting things.
    For example, to implement control of something physical and not very.
    3) Convert this very answer to speech and reproduce.


    The most interesting thing is that for all these points libraries were found under Python, which I used.


    The result was a bunch, almost independent of the chosen spoken language.


    Speech recognition


    Speech Recognition


    This library is a wrapper over many popular speech recognition services / libraries.
    Because Of all the services presented in the list of libraries, Google Speech Recognition was the first to work, and I used it in the future.


    Speech processing


    Chatterbot


    The library uses machine learning methods. Training takes place on data sets in dialogue format.


    image
    The learning process in the chatterbot library


    Files of such a simple format can act as data sources for training .
    In fact, they are a set of dialogs in the form of:


    - Вопрос
    - Ответ
    - ...
    - Ответ

    For English, there is a good set of training classes, one of which takes dialogs from Ubuntu Dialog Corpus, and the other from Twitter.


    Unfortunately, for the Russian language, I did not find alternatives to Ubuntu Dialog Corpus (of the same volume). Although the same TwitterTrainer should work.


    As an experiment, I tried to use dialogs from the first volume of Warriors and Peace when learning.


    It turned out funny, but unlikely, because dialogues there are often aimed at certain characters of the novel.


    Since without a large amount of data it is difficult to get an interesting person from the bot, the search for a good base for dialogs is ongoing.


    The chatterbot library also provides a set of "LogicAdapter". With the help of which you can, for example, filter the answer, teach the bot to count or say the current time.


    The library is quite flexible, it allows you to write your own classes for training and logical modules.


    Speech synthesis and reproduction


    Google Text to Speech


    This library can convert a string to an mp3 file with speech. Because Since Google is behind this library, there are many languages ​​to choose from, including Russian.




    First successes



    Project code


    Available at: GHub


    How to install and run?

    Immediately I want to advise you to create a separate virtual environment for python.
    For example, using conda .


    conda create --name speech_ai
    source activate speech_ai
    conda install python=3.5

    For experiments with the above set of libraries is suitable:


    • python 3 (since there is no hassle with non-ascii characters in it, as in Python 2)

    Install packages according to the instructions from the sites:



    Also, when installing SpeechRecognition, sometimes you need to help one dependency (PyAudio):


    sudo apt-get install python-pyaudio python3-pyaudio 
    pip3 install pyaudio

    chatterbot advises using MongoDB to work in production.
    By default, a Json file is used as a data store, which leads to a multiple slowdown in training on medium-sized samples.


    What's next?


    From thoughts:


    • To diversify the logic of the bot, for example by adding a search query adapter to Google
    • Use Computer Vision here, for example, to voice the objects you saw or the names of people who passed by
    • Add emotion bot using state machine
    • Try to train the bot on Ubuntu Dialog Corpus
    • Use similar in robotics (for smart home)

    Also popular now: