musictheory March 13, 2017 at 12:14

“I hear voices” or does Siri have a face

We hear voices all the time: in the subway, in the navigators and in our smartphones. And if there is no doubt that the voices in the metro belong to real people, then the answer to the question of who voices virtual assistants and robots may soon cease to be so unambiguous.

On the other hand, voice actors need not be afraid to lose their jobs, because even for voice acting the BB-8 robot from Star Wars, Bill Hader, the host of the famous American Saturday Night Live show on NBC , was involved . Everything in more detail in today's material. Photo Vancouver Film School CC-BY

Siri

Almost everyone heard the American version of Siri sound, but few people think that this voice belongs to a real person, a professional voice actress , Susan Bennett. True, the actress herself, while working on the recording, did not even imagine that her voice would sound from every pocket. The fact is that the recording was made by a company engaged in converting text to speech, which was later bought by Apple.

In 2005, Susan spent 20 hours a week in the recording studio, but it was very stressful 20 hours: I often had to take breaks, drink a lot of water and read absolute nonsense, consisting of a set of all kinds of unrelated words. So that sounds can be later combinedinto the necessary words that would sound natural, it is necessary to say all the possible combinations of sounds in the language. And the finalization of the voice acting in 2011 took already 4 months, though the “Siri voice” worked only two hours a day.
For more on Siri and how the recording went, Susan Bennett herself says in a speech at TED Talks:

The actress is worried about the insecurity of the voice actors - their voice can be used for any purpose, and they do not receive any additional money even for such commercial use.

The British male version of Siri, under the name Daniel, was voiced by television and radio host Jon Briggs, who also did not know that his voice would be used for Siri until he saw an advertisement on TV. He also recorded a voice for Scansoft in 2005. It was later bought by Nuance, which, together with Apple, was involved in the development of Siri. During the work, John wrote down 5 thousand proposals in three weeks, but unlike Susan, he is quite satisfied with the fee received for voice acting.

Women vs Men

But the actress who records the voice for Google Now prefers not to show her face. But you can see how the recording process takes place:

The actress notes that this process is quite complicated, since it is necessary to speak at the same pace and with the same timbre. It is impossible to change the voice throughout the recording, while the correct intonations should be observed. But at Google this is being monitored by a team consisting of a linguist and stage speech specialist, which ultimately allows for a more natural speech.

In the case of Cortana from Microsoft, the situation is completely different: the image and name of the virtual assistant were borrowed from the Halo series of games. Therefore, for her voice acting, the same actress who workedover the voice of the eponymous heroine in video games. Jen Taylor knew exactly what the recordings would be used for, and she didn’t hide at all and even played the role of Cortana in the 2012 mini-series Halo 4: Going to Dawn.

Most virtual assistants speak in a female voice or are called by female names. Some even see this as a manifestation of digital sexism. However, research results show that women themselves are more likely to choose a female voice. People believe that it sounds friendlier, and the male is perceived as more aggressive.

This, of course, is not always the case; intonation and timbre play a large role. The difference between the perception of two different male voices can be seen in the exampleMark Zuckerberg’s home virtual assistant. The assistant is called Jarvis, and with the voice of Morgan Freeman he is perceived as a very courteous and well-mannered system:

We ride, ride, ride

Even more people experience synthesized voices when using navigators. The male voice of Yandex.Navigator was recorded by a professional announcer, but an employee of the company was involved in recording the female version. The recording took only 3 hours, and the text fit on 4 sheets, which, in comparison with the voice acting of virtual assistants, is quite a bit.

Separate words are used to build sentences that the navigator pronounces, but whole phrases had to be pronounced on the record to make the text sound more natural. To voice the navigator, he was invited to the OlympicsVasily Utkin, who spent several hours in the studio and spoke 160 phrases. Only 120 are used in the navigator, but the creators promised to change some of them to diversify the trip. And Vasily even thought up some phrases himself.

His features have voice acting and ads in the subway. For example, the first recordings with modern metro voices were made more than 20 years ago, which means that they were written on reels with a film. Therefore, the actors did not have the right to make a mistake. More precisely, if a mistake was made, it was necessary to rewrite everything all over again. And now, if you need to add new information to a record, you have to rewrite the voice acting of the entire branch.

And not only Siri has a face, but also the Moscow metro. In fact, there are even three: actors, radio and television hosts Yulia Romanova-Kutina, Sergey Kulikovskikh and Alexei Rossoshansky. For different holidays, celebrities or children are involved in the voice acting of announcements. But what people say in the subway can be affected by ordinary people. For example, after activists expressed dissatisfaction with the phrase “Request to empty cars”, it was replaced by “Request to leave the car”.

But in the near future, speech synthesis will occur very differently thanks to the development of Google. WaveNet does not synthesize speech from fragments of human voice recordings: the program reproduces sound waves by analyzing them using convolutional neural networks (you can listen here ).

In addition to her voice, she can even imitate music. So far, this technology is still quite expensive, since it takes quite a lot of resources and time to train networks and record processing, but already now 50% of the people in the control group accepted WaveNet's speech as human. And in the future it will be possible to imitate the voice and intonations of any person, however, for training, anyway, as long as you need voice recordings of real people.

PS What else can be read on our blog:

Tags:

“I hear voices” or does Siri have a face

Siri

Women vs Men

We ride, ride, ride

Also popular now: