Yandex hears you dude

    image
    Suddenly, the order came - to write an iOS application using Yandex Speechkit to recognize Russian speech. More precisely, to recognize short phrases on an arbitrary topic. The goal of the assignment is to compare the successes of the Yandex engine with our Sarov engine.

    They ordered - took the following steps.

    1. I went to yandex.ru in the speech recognition section
    2. Registered and received a key, it's API_KEY
    3. Sent a letter to yandex asking to activate the key


    When asked how the key will be used, I replied that I was releasing a Diablo 3-13 card game, voice-driven.

    Two days later, the key was activated. At first I impatiently beat with a hoof, then I realized that thoughtful, synchronous employees work in yandex .
    In my application, in the future, I also refused asynchronous requests to yandex.api.

    Having received the magic key, API_KEY , I downloaded the archive from the specified link .
    YandexSpeechKit-2.1-ios.zip


    The archive contains two projects demonstrating the work of the library.
    Having collected both examples, I replaced SAMPLE_API_KEY in my text with my own and launched the applications.
    Both do not work under Xcode 5.1.1, they crash due to some internal error hidden deep in the bowels of the library.

    I had to download the current SDK from github .

    I downloaded the archive from the link above
    yandex-speechkit-ios-master.zip

    collected examples, but the error did not disappear.

    I immediately sent a diagnostic letter to the support service and, awaiting a response, wrote another toy.
    A couple of days passed, there was no response from the service.

    Having put the toy into the market , I decided to write my own iOS application using url requests to the Yandex speech recognition service.

    After all, the magic key can be used the same.

    Step One - Command Line Check


    At the command line, you need to submit wav files with pre-recorded phrases.

    The request looks just like bamboo
       curl -v -4 "asr.yandex.net/asr_xml?key=e547b4f5-хрен-вам-ключ-97130fdbcd74&uuid=01ae13cb744628b58fb536d496daa177&topic=notes&lang=ru-RU" -H "Content-Type: audio/x-wav" --data-binary "@recordedFile.wav"
     


    The request does not need a comment, everything is done exactly with excellent documentation on the yandex website .
    The first time the request failed, because instead of the 32-digit uuid, I slipped the udid of my iPhone. And it is not only HEX.

    Type phrases
    thirty-eight parrots
    let's go have a smoke
    Vladimir Sysoev
    cho you watch a club let's help
    someone who doesn’t plow that net
    Anton Subbotin
    Habrahabr - complete fly away


    They were recognized perfectly, performed by various announcers.

    image

    To my pleasure, Yandex mercilessly cuts out the shameful words .

    Step two - collect an iOS application where speech is recorded


    There is a standard project on the apple website, which demonstrates the recording / playback of sound.
    Download the SpeakHere project , run it - everything is in order. I respect these guys from Cupertino, even Hindus. The code, of course, is hmm, but it works.

    Modify the file SpeakHereController.mm

    Go to the function - (void) stopRecord and add one line

     - (void)stopRecord
    {
       // здесь ничего не трогаем
    ...
      // btn_play.enabled = YES;
      // эту строчку добавляем
        [self yandexTool];
    }
    

    It is clear that we have added a function call that processes the audio file generated during the recording.
    Initially, sound is recorded in the recordedFile.caff file in the project.

     recordFilePath = (CFStringRef)[NSTemporaryDirectory() stringByAppendingPathComponent: @"recordedFile.caff"];
    

    Yandex does not know how to work with files of this type, so the name expander must be replaced with the whole SpeakHereController.mm file

     recordFilePath = (CFStringRef)[NSTemporaryDirectory() stringByAppendingPathComponent: @"recordedFile.wav"];
    

    In addition, in the project file AQRecorder.mm in the body of the void function AQRecorder :: StartRecord (CFStringRef inRecordFile), you need to change the parameter in the line

    		OSStatus status = AudioFileCreateWithURL(url, kAudioFileCAFType, &mRecordFormat, kAudioFileFlags_EraseFile, &mRecordFile);
    

    on the

    		OSStatus status = AudioFileCreateWithURL(url, kAudioFileWAVEType, &mRecordFormat, kAudioFileFlags_EraseFile, &mRecordFile);
    

    And the last - Yandex understands sound files recorded at a frequency of 16000. At Apple, the default frequency is 44100. It must be changed.

    In the project file AQRecorder.mm in the body of the void function AQRecorder :: SetupAudioFormat (UInt32 inFormatID) add the line

       Float64 newRate = 16000;
       XThrowIfError(AudioSessionSetProperty(	kAudioSessionProperty_PreferredHardwareSampleRate,
                                              sizeof(newRate),
                                              &newRate), "couldn't set hardware sample rate");
     

    All that remains is to insert the request function to the Yandex server. In the request, we will file the recordedFile.wav file in the same way as the command request. I give
    below the text of the yandexTool function , simple as a track from the Belarus tractor.

    -(void) yandexTool
    {
        NSString *urltext_temp = [NSString stringWithFormat:@"https://asr.yandex.net/asr_xml?key=%@&uuid=%@&topic=queries&lang=ru-RU", API_KEY, API_UUID];
        NSString* urltext =
        [urltext_temp stringByAddingPercentEscapesUsingEncoding:
         NSUTF8StringEncoding];
        NSLog(@"url=%@", urltext);
        NSURL *url = [NSURL URLWithString: urltext];
        NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url];
        [request setURL:url];
        [request setHTTPMethod:@"PUT"];
        [request setValue:@"audio/x-wav" forHTTPHeaderField:@"Content-Type"];
        NSString *filePath=[NSTemporaryDirectory() stringByAppendingPathComponent: @"recordedFile.wav"];
        NSData *myData = [NSData dataWithContentsOfFile:filePath];
        request.HTTPBody = myData;
       NSError *error;
        NSURLResponse *response;
        NSData *data2 = [NSURLConnection sendSynchronousRequest:request returningResponse:&response error:&error];
        NSString *responseString = [[NSString alloc] initWithData:data2 encoding:NSUTF8StringEncoding];
        NSLog(@"responseString=%@",responseString);
     //  Ответ возвращается в XML - разбор ответа сами сделаете, чай не маленькие
     //   NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data2];
     //   [parser setDelegate:self];
     //   [parser parse];
    }
     

    I modified my application a bit more - I added my own face and the output of the recognized text.

    Recognizes, I must say awesomely good.

    image

    I, as promised to Yandex , inserted a recognition card into the King of Hearts card game , but a delay of 1-2 seconds when controlling the voice starts to annoy 5 minutes after the start of the game.

    Nevertheless, not a single error in recognizing the name of the playing card during the game has occurred.
    Bravo, Yandex !

    While preparing the publication, a response came from tech support for the yandex team , they asked me to send full logs of broken applications.

    Probably need to answer them.

    Also popular now: