
ROS Speech Recognition with Google Speech API
I already talked about using pocketsphinx for speech recognition in ROS. In this article I want to talk about using gspeech for speech recognition. gspeech is an ROS package that uses the Google Speech API: wiki.ros.org/gspeech .
So, let's begin. First you need the Google API key. To get it, you first need to have a Google account. Secondly, you need to subscribe to chromium-dev@chromium.org (you need to subscribe here ).
Now you can get your Google API key. To do this, go to the Google developer console: cloud.google.com/console . Here you need to create a project. After creating the project, you need to activate the Speech API in the APIs section under the APIs & auth item in the left menu. Be careful: this item may not be on the list, as happened with me. If you don’t see the Speech API, then check that you signed up for chromium-dev and that you are currently logged in with the Google account whose email address you used when signing up for chromium-dev.
The Google API key can be obtained in the Credentials section under the same item APIs & auth. Here you need to create a key by clicking on the Create new Key button in the Public API access section.
Now the only thing left is to install the gspeech package. To do this, clone gspeech from the Github page: github.com/kusha/gspeech . Gspeech requires sox to work:
You also need to insert your Google API key into the gspeech.py script in the line:
Everything is ready and you can start the gspeech ROS node:
During the recognition process, gspeech publishes recognized phrases in the subject / speech in String format and the degree of “confidence” of recognition in the subject / confidence in Int8 format.
The phrase recognition process may take some time, as gspeech makes requests to Google’s servers. Nevertheless, gspeech has rather high recognition accuracy, gspeech recognizes phrases much better than the pocketsphinx package. During testing, gspeech recognized phrases with “confidence” of 70-80. In some cases, it recognizes with “certainty” up to 94.
I wish you the best of luck in speech recognition with the Google Speech API.
Getting a Google API Key
So, let's begin. First you need the Google API key. To get it, you first need to have a Google account. Secondly, you need to subscribe to chromium-dev@chromium.org (you need to subscribe here ).
Now you can get your Google API key. To do this, go to the Google developer console: cloud.google.com/console . Here you need to create a project. After creating the project, you need to activate the Speech API in the APIs section under the APIs & auth item in the left menu. Be careful: this item may not be on the list, as happened with me. If you don’t see the Speech API, then check that you signed up for chromium-dev and that you are currently logged in with the Google account whose email address you used when signing up for chromium-dev.
The Google API key can be obtained in the Credentials section under the same item APIs & auth. Here you need to create a key by clicking on the Create new Key button in the Public API access section.
Gspeech installation
Now the only thing left is to install the gspeech package. To do this, clone gspeech from the Github page: github.com/kusha/gspeech . Gspeech requires sox to work:
sudo apt-get install sox
You also need to insert your Google API key into the gspeech.py script in the line:
api_key = "" # PASTE HERE YOUR GOOGLE API KEY
Gspeech launch
Everything is ready and you can start the gspeech ROS node:
rosrun gspeech gspeech.py
Gspeech recognition
During the recognition process, gspeech publishes recognized phrases in the subject / speech in String format and the degree of “confidence” of recognition in the subject / confidence in Int8 format.
The phrase recognition process may take some time, as gspeech makes requests to Google’s servers. Nevertheless, gspeech has rather high recognition accuracy, gspeech recognizes phrases much better than the pocketsphinx package. During testing, gspeech recognized phrases with “confidence” of 70-80. In some cases, it recognizes with “certainty” up to 94.
I wish you the best of luck in speech recognition with the Google Speech API.