AI translated brain activity to speech
For many paralyzed people who cannot speak, the signals of what they want to say are hidden in their brains. And no one could decipher these signals. But recently, three teams of researchers have made progress in translating data from the electrodes placed on the brain surgically into computer-synthesized speech.
Using models built on neural networks, they reconstructed words and even whole sentences, which, in some cases, were quite intelligible for an ordinary human listener.
None of the attempts described in bioRxiv's preprints to recreate a speech from thoughts did not lead to success. Instead, the researchers observed the activity of various areas of the brain of patients while they were reading aloud, or were reading to themselves, but moving their lips, inwardly “pronouncing” the text, or listening to the recordings.
“To show that the reconstructed speech is completely understandable is really exciting,” says Stephanie Martin, a neuro engineer at the University of Geneva in Switzerland, involved in the work on this project.
People who have lost the ability to speak after a stroke, or as a result of illness, can use eyes or some other small movements to control the cursor or select letters on the screen (cosmologist Stephen Hawking strained his cheek to activate the switch installed on his glasses). But if the brain-computer interface can reproduce the speech of the patients directly, then this will greatly expand their capabilities: give control over the tone and allow them to take part in quickly ongoing conversations.
"We are trying to develop a scheme of ... neurons that are activated at different points in time and conclude how speech sounds at the same time," says Nima Mesgarani, an engineer at Columbia University. “The transformation of one into another is not so straightforward.”
The way these signals from neurons are converted into speech varies from person to person, therefore computer models must be trained separately for each individual. And best of all this is obtained from models who study on extremely accurate data, the receipt of which requires the opening of the skull.
Researchers can get this opportunity in very rare cases. One of them is when the patient is removed a brain tumor. Surgeons use sensors that read electrical signals directly from the brain to locate and avoid speech and motor areas. Another example is when patients with epilepsy are implanted with electrodes for several days in order to localize the source of the seizures before performing an operation.
“We have a maximum of 20, sometimes 30 minutes to collect data,” says Stephanie Martin. "We are very, very limited in time."
The best results were achieved by the teams “feeding” the data obtained from recording the brain activity to artificial neural networks. As an output (ed. Of labels), the networks were given a speech, which the patient either said out loud or heard.
The Nima Mesgarani team relied on data from five different patients with epilepsy. Their neural networks were trained on records from the auditory cortex of people (which is active both during their own speech and while listening to someone else's), which at that time played records of various stories and voicing a sequence of numbers from 0 to 9. Then the computer model synthesized speech , pronouncing the same sequence of numbers and a control group of people was able to recognize 75% of this data.
Computer generated speech from patient's brain activity while listening to numbers
Another team, led by Tanja Schultz from the University of Bremen in Germany, used data from 6 people undergoing surgery to remove brain tumors. Their speech was recorded on the microphone as they read the monosyllabic words out loud. At the same time, electrodes placed on their brain, captured the activity of planning areas and motor areas, sending commands to the vocal tract for the pronunciation of words.
Engineers Miguel Angrick and Christian Herff, from Maastricht University, trained the neural network, which matched the data read using electrodes with the audio records, and then reconstructed the words and phrases for the previously not shown model data sets. According to these data, the model synthesized a speech about 40% of which turned out to be understandable to man.
Computer-generated speech recording of data from electrodes
And finally, neurosurgeon Edward Chang and his team from the University of California, San Francisco reconstructed entire proposals for the activity of the speech center, read by electrodes in 6 patients suffering from epilepsy, at the time when they read out loud. The researchers conducted an online test in which 166 people listened to one of the suggestions generated by a computer model and then had to choose among the 10 proposed options the one that, in their opinion, was read. Some proposals were correctly identified in more than 80% of cases. But the researchers didn’t stop at that and forced the model to recreate the person’s speech according to the brain activity, obtained while he was reading the words to himself, but at the same time he moved his lips, as if “inwardly pronouncing” them.
“This is a very important result,” says Christian Herff, “we are one step closer to prosthetic speech.”
“However, what we are really waiting for is how these methods show themselves in a case where the patient cannot speak at all.” - Responds to Stephanie Riès, a neuroscientist at the University of San Diego in California. “The signals of the brain, while the person is reading to himself or listening to others, are different from those that appear during reading aloud or live communication. Without external sound, with which brain activity could be compared, it will be very difficult for computer models to predict where internal speech begins and where it ends. ”
“Decoding imaginary speech will require a huge leap forward.”, Says Gerwin Schalk, a neuroscientist at the National Center for Adaptive Neurotechnology at the New York State Department of Health. - "And now it is absolutely not clear how to achieve this."
One of the methods, according to Herff, can be the feedback that the patient will give to a computer model that will reproduce speech in real time as the person mentally speaks the words. With enough training for both the patient and the AI, the brain and the computer can meet somewhere in the middle.