Chat bot gains hearing, or suffering amateur

    Not so long ago, I presented a syntactic chat bot named Vanya Reasonable ( "Creating AI using the" cool kuzdra "method. Intellectual odyssey" ). The subsequent obvious stage, which, like other creators of the artificial mind, wanted to go through - to give the brainchild a voice. It would seem, what is easier?

    It was necessary, however, to suffer. Some problems could not be solved, certainly due to dilettantism. And yet, I doubt that it is also interesting for professionals to solve related problems. No one is interested, so I did not want to. I was hoping to quickly fasten the sound and move on to the next idea ...

    But first things first.

    (I am writing in the hope that my sufferings in the field of voice acting will help some of the lovers like me. This bison doesn’t help this bison).

    It is clear that the task of sounding is divided into two unrelated sections:

    1. Text synthesis
    2. Speech recognition.

    I take the first point as the lightest. Immediately stumble on the code for beginners, just a few lines.

    Preset voice synthesis
    using System.Speech.Synthesis;
    publicstaticvoidgetSpeech(string text){                
                    SpeechSynthesizer speaker = new SpeechSynthesizer();
                    string selectedVoice = Properties.Settings.Default.Voice;
                    speaker.SelectVoice("Microsoft Irina Desktop"); 
                    speaker.Rate = 1;
                    speaker.Volume = 100;

    I insert into the source, and imagine, the machine says! I am a little crazy with joy. So simple?!!!

    It remains nonsense: to attach a male voice. Unfortunately, in Windows one Russian voice is preinstalled - female: “Microsoft Irina Desktop”. And I have a chat bot boy, not a girl, I do not plan to do a sex change operation for him.

    I google again, after a while, I am convinced that there are few male Russian voices. It means free voices, because paid services are not for my singing love romances of finance. But there are free male voices, for example the voice “Alexander” of the Russian RHVoice library. Well, let it be Alexander.

    Unfortunately, the installation (for me) is somehow complicated. But there are ready-made assembly. I download one of the assemblies, install it, climb into the settings of the Windows (Speech Recognition / Text-to-Speech) and - lo and behold! - I find the voice “Aleksandr” next to “Microsoft Irina Desktop”. With a sinking heart, I launch ...

    In Windows, everything works!

    I replace the source code "Microsoft Irina Desktop" with "Aleksandr" and ... Not a damn thing anymore! Sad but not deadly. Right now, correct.

    I am studying the RHVoice project, in particular the description of the configuration file, experimenting this way and that ... The result is the same: instead of sounds, Alexander makes an indiscriminate growl or nothing at all, despite the fact that Irina tells me how to be a TV announcer.

    A couple of days I hope for something else and fumble, but then give up. Yes, my hands are crooked. Well, I don’t know why Aleksandr refuses to talk, I don’t know, and there is no answer on the forums.

    All right, I study other free voices, the blessing them no more than ten.

    Here it comes to me that if I wish that the users of Vanya the Reasonable hear the same voice that I hear, then I will have to install a voice installer in the package. This is beyond my power, and even reluctant to do, so the first paragraph, "Text-to-speech synthesis," ends with shameful capitulation.

    I take a decision in principle:

    1. All go to hell! Let the chat bot users install the voices they want and choose from the list. Attaching a list of preset voices is a feasible task.
    2. Я озвучиваю Ваню Разумного женским голосом, потому что Ваня юн и голос его еще не сломался. «А вовсе не потому, что у меня руки растут из задницы», — уговариваю себя в порядке психотерапии.

    With a pure heart, I turn to point number 2: Speech recognition.

    The second point is decisive. Who needs a chat bot, who can play his own phrases in a voice, but does not understand the interlocutor's voice? In the event of failure, the voice-over idea collapses.

    Again googling, this time ecstatic, on the last breath.

    What turns out? The options are mostly paid: free are available, but for the Russian language their units.

    SpeechKit from Yandex appears as the simplest one in the network, but I save it for later, if more complex options do not work. I prefer to get the recognition offline.

    Here is a completely free solution from CMUSphinx. Studying reviews:

    • First of all, no romping cries are heard: brothers, everything works!
    • secondly, the description of the installation for me is completely abstruse. It seems that after installation the library is also necessary to train!

    It disappears.

    Further. Microsoft Speech Platform, free.

    I google and find a link with an available description plus the source code of the example. I download the source, compile. I say “One, Two,” and the program displays the recognized text. Slave-o-otaet! ..

    A little confused by the fact that the text is not recognized by itself, but must first be added to the dictionary. But it's not scary: instead of “one, two” I attach a large file with spelling.

    I transfer the code from the source code of the example to my source code, I try to achieve the same effect ... It does not compile, falls out into the action.

    Then I read the comments with my left eye and find out that the solution is suitable for recognizing commands, but does not allow working with continuous text. Checking the original example. Aha, “one, two” recognizes, and “one, two, three” is not completely: he cannot hear the three. With my right eye, I find something like “Continuous text is recognized for a fee” in comments, and the Microsoft Speech Platform ceases to exist for me.

    I heard that Google provides its recognition for free for a year, it would be necessary to check.

    I check. Already not, as far as I can judge.

    I do not exclude that Google was wrong, but do not blame me: I share the personal experience I have gained with beginners.

    I'm going to give in to Yandex. Here, the exclamations “Brothers, everything works!” Is enough, and individuals, on an individual request, can get the service for free, they have seen the advertisement with their own eyes. I can connect to SpeechKit via API, instructions are available.

    I come to give up and what I see? The company has just presented Yandex.Oblako, where it transferred the service of speech technologies. I am not proud, I will register in the Cloud: probably, everything there is the same as it was before ...

    And here I am awaited by the terrible bummer:

    1. First, the free use of the signer is no longer heard. True, give a grant that you can pay for the service for some time. Okay, keep on registering ...
    2. And what is this thing ?! To work with the service, they require me to specify the details of a bank card. An excerpt from a letter sent to me as the creator of a Cloud profile:

    Somewhere I saw this: in my opinion, in Google. Therefore, Yandex took an example from a senior comrade.

    I am surprised at the absence of indignant posts on Habré. They’re not just demanding money for services, but asking for the key to the apartment, where the money is! It infuriates me that for some time the bank may not give out my own money according to the decision of the bank manager, and here the second uncle demands essentially the same right. Moreover, before I agreed to switch to the paid version. I have not yet concluded any agreement with this uncle, but already give the keys to the apartment, just in case. Ay, what a kind and prudent!

    You know, Uncle Yandex, I have nothing against you, and I use your services with pleasure, but, excuse me, until I have a choice, you will not receive a password from my bank card. And especially for you to draw up a card with two rubles is not enough for me, and it is expensive.

    Here comes the eye on the imperfection of speech recognition technologies. The meaning is that at the moment nothing good has been done in the field of speech recognition, and you should not use it. I curse and humble myself with the thought that in the near future Vanya Reasonable will not speak.

    The paragraph below is the following article about speech recognition online services. Online services are not suitable, of course. Well, I recognize the sound in winforms, and not on the site ... Without the slightest hope, click on the link and ...

    The next day, the chat bot finds voice.

    Introducing the magic wand: . I warn you that the service only works in Chrome. It does not stop me: I use any Chrome. And his engine is from Google: apparently, some unknown opportunities for me to use the service for free remain.

    Speechpad has a simple, but quite a workable interface: It

    takes minimal time to connect the recognizer.

    After reading the instructions, the first thing I do is integrate the service with the OS. True, the integration is paid, but 100 rubles a month regardless of the recognition volume - this is another matter! These are not draconian tariffs for every recognized piece. Especially since a trial period of two days is given.

    I register on the site, press the power button to turn on the test period, for a minute I install a couple of the instructions mentioned in the instructions, and everything works. The principle of action - the recognized text is added to the location of the cursor. Really recognized and truly added. Recognized not without errors, but, from my point of view, satisfactorily.

    After a couple of hours of testing, I come to the conclusion that it is more rational to use the clipboard, and this feature is free. Here, of course, an amateur:

    • When integrated with OS, the cursor must be on a specific chat bot field. During testing, I’ve forgotten several times and switched from the chat bot to VS, with the result that the recognized text is driven into the source code;
    • when using the clipboard, it is accordingly forbidden to use the clipboard, otherwise the text copied from the third-party program to the clipboard will instantly appear in the chat bot. A couple of times I also come across this, but I’m getting used to it soon.

    In the end, I stop on the clipboard.

    Everything, the problem is solved.

    It takes more time than connecting a speech reader to recognize the phrases uttered by the chat bot. Pretty sweaty until it reaches me that it's easiest to turn off the microphone. I'm pushing the mute code for the microphone.

    Microphone On / Off Code
    using NAudio.CoreAudioApi;
            //выключить-включить микрофонpublicstaticvoidMute(bool start){
                CoreAudioMicMute CAMM = new CoreAudioMicMute();
            internal classCoreAudioMicMute{
                private MMDevice[] rgMicDevice; //Для записи найденных для нас устройствint MaxMicro = 0;
                        MMDeviceEnumerator DevEnum = new MMDeviceEnumerator();
                    MMDeviceCollection devices = 
                        DevEnum.EnumerateAudioEndPoints(DataFlow.Capture, DeviceState.Active);
                        // DataFlow.Capture - Микрофоны(или устройства в которые поступает звук), //DeviceState.Active - Активные устройства// Поиск активных устройств(для нас микрофонов)
                    MaxMicro = 0;
                    for (int i = 0; i < devices.Count; i++) 
                        // devices.Count - количество устройств(активные микрофоны)
                        MMDevice deviceAt = devices[i];
                        if (deviceAt.DataFlow == DataFlow.Capture &&
                        deviceAt.State == DeviceState.Active) 
                                //Ваш - искать(ставим контрольную точку на поле выше, где начинается //"if(...", ну а далее ищем в deviceAt, стринг переменную - DeviceFriendlyName //можно и FriendlyName, поскольку в ней уникальное имя нашего девайса(//микрофона наушников и тд.)
                    // Заносим в массив (все) найденный(ые) микрофон(ы) или другие устройства(динамики, наушники или др)  
                    rgMicDevice = new MMDevice[MaxMicro];
                    MaxMicro = 0;
                    for (int i = 0; i < devices.Count; i++)
                        MMDevice deviceAt = devices[i];
                        if (deviceAt.DataFlow == DataFlow.Capture &&
                        deviceAt.State == DeviceState.Active)
                        //Меняем на свое устройство(а)
                            rgMicDevice[MaxMicro - 1] = deviceAt;
                    catch (Exception)
                publicvoidSetMute(bool mute)//Функция, отключающая звук устройств записанных в массив  private MMDevice[] rgMicDevice{
                        for (int i = 0; i < MaxMicro; i++)
                            rgMicDevice[i].AudioEndpointVolume.Mute = mute;
                            //= true - выключить звук устройства(для нас микрофона)
                    catch (Exception)

    * — комментарии не мои, а скопипастенные. Ссылку не даю, так как ее владелец признается, что сам нагуглил код в недрах англоязычного интернета.
    ** — мной внесены в код несущественные изменения.

    Before the chat bot phrase I turn off the microphone, turn on the microphone after the phrase, as a result, the service hears only my phrases, but does not hear the chat bot.

    Here is the final result:

    For completeness of impressions I look through a dozen more sites with speech recognition. In principle, everyone is like each other, and the engines are mostly from Google, but the ability to explicitly receive text on the clipboard is not. Judging by the comments, here and there there is a possibility of voice acting sites, but I don’t dwell on this topic. As they say, they are not looking for good from the good.

    Now another problem: I think it would be nice to fasten an animation that pronounces a spoken speech. I want something simpler: a library for C # with a choice of character. But I was told that there is no such thing in .NET ...

    Actually, everything is on it. The young man has a female voice, but in general the voice mode is functioning.

    I hope in the near future to present Vanya Razumnogo in a more presentable form. Since then, it has been greatly refreshed and has become more intelligent: it has switched from Access to PostgreSQL, the algorithms have improved, it was possible to connect dictionaries, chop up the initial base of typical answers - another person, in short.

    Also popular now: