Atreides07 November 13, 2012 at 15:13

Prototype of a voice shopping list for WP8, Win8, Android with a backend in Azure in 2.5 hours

From November 9 to 11, Windows 8 Hackathon RUWOWZAPP took place, where I first registered as a participant, and then I was honored to attend the event as an expert. As an expert, I get to know many wonderful people and their projects. It was so interesting that he continued to advise even at night, and 4-5 hours remained for sleep. I was so infected by the positive and energy and desire of people to create that I also could not resist creating my own small prototype of the application - a Shopping List with voice recognition support.
In a couple of hours I managed to make a functional prototype demonstrating the idea of the application with clients for WP, Win8, Android

I didn’t want to participate in an application contest with such a crude prototype, but I really wanted to show what I did in a couple of hours, and at the last moment, before the last participant spoke, I stood in line for a speech, and the moderator allowed me to demonstrate my crafts:

The application aroused great interest among the hackathon participants and in fact this is the promised article with all the answers to the questions for which I did not have enough time then.

For those who want to see the code right away, the source code can be downloaded here
.

In contrast to this video, I was in a hurry during the performance and did not manage to launch the android version. I expected that the main interest would be in how exactly voice recognition happens, but I did not expect so much interest in everything else and over the next half hour, a couple of dozen people asked various questions about the project, such as: how exactly does synchronization happen, how does the analysis commands, how to write for android, what the server part is made on, etc. I apologize that at that time I did not have the opportunity to demonstrate the source of the project to everyone and this article promised there, in fact, answers to all the questions that I was asked.

The idea of the application.

In fact, initially I wanted to finally finally try voice recognition in WP8, which became available to developers. And I wanted to make a decision that would be friends specifically with the Russian language.
I settled on the following set of commands:

Buy [product] - adding products to the list
Bought [product] - installing check marks “bought”
Delete [product] - removing a product from the list
Delete list - cleaning the list
Price [product] [price] - setting the price
[product] in the store [store] - an indication of the store where you can buy the product

I figured that I could make such an application for three platforms in 6 hours, looking ahead, I’ll say that I had less time than I had expected and had only the first 4 teams.

Voice recognition. - 1 hour.

WP8 works very well with English and recognizes well even for my accent. But it turned out that the recognition capabilities in Russian are much limited. For Russian language WP8 only recognize a predefined dictionary. I killed it for about half an hour.
I really wanted to make it the Russian language, and since I already had experience working with voice recognition services, I decided to screw on some commercial voice recognition engine for a while. However, since the last time I worked with them, nothing has changed and, in fact, no one has had an automated trial or paid period. And since it was necessary to communicate with all the services with managers, I decided to screw voice recognition from Google for the demo. I specifically searched for the terms of use of the Google voice recognition engine and could not find it, but I remembered that I had seen somewhere that I could not be used for commercial purposes (although I might be mistaken). Many thanks to Yakhnev for the excellent article.with sources in C #. It took only 10 minutes to make a web project out of his desktop project, with an API for voice recognition. But since the application did not have the ability to save the file to disk, and there was no time to redo the recognition in memory, I had to abandon the free Web Role in Azure. Fortunately, I had already deployed a couple of virtual machines in Azure, there were no problems in remaking and uploading the project to the server. As a result, I picked up the recognition service with the voice.akhmed.ru/recognize.ashx access point - where I send the WAV file with the POST request and I get the text at the output.

Application for WP7 - 30 minutes

Most of the time it took for the application on WP7. But just because this platform was a testing ground and constantly changed the code during development.

After I raised the voice recognition service, there was a question about voice recognition on the device.
Since it was a functional prototype, I decided to throw away everything unnecessary, user authorization, button click processing, download indicator, resending, error handling (therefore, the application may crash periodically), saving to the database, saving wav files, etc.
Since the application had to be ported also to android, I decided to make a prototype without MVVM, so I got a terrible mess of code.

Since now we didn’t have to make the application specifically for WP8, I decided to make a version on WP7, which provided an additional advantage - the prototype works on any WP device. Recording a microphone is a fairly non-trivial task on WP7, but I already had my WPExtensions library which made it easy to record voice into a WAV file. In AppBar, he added one dummy button to add entries to the list with his hands and added a button with a microphone, which, when pressed the first time, started recording and, when pressed again, sent the recording to the server and processed the result:

private bool isRecording = false;
private readonly MicrophoneWrapper microphone = new MicrophoneWrapper();
private void ApplicationBarRecordIconButton_Click(object sender, System.EventArgs e)
{
    if (!isRecording)
    {
        microphone.Record();
        PageTitle.Text= "Слушаю...";
    }
    else if (isRecording)
    {
        microphone.Stop();
        var wav = microphone.GetWavContent();
        Send(wav);
        PageTitle.Text = defaultHeader;
    }
    isRecording = !isRecording;
}

The sending method is also quite trivial, in it I send a response to the server and process the received response

private void Send(byte[] wav)
{
    var client = new HttpWebClient();
    client.Post("http://voice.akhmed.ru/recognize.ashx", wav, (result) => Dispatcher.BeginInvoke(() => ParseString(result)));
}
private void ParseString(string result)
{
    logicLayer.Parse(result);
    RefreshView();
}

There were a lot of questions about how the analysis of the commands is carried out, what kind of library I use for text analysis, how extra words like “buy” or “and” are filtered. Of course, in the release it is necessary to make a much more competent decision with morphological and syntactic analysis, but now the code is ugly to the ugliness. I just use the first word as a command and filter all words to two letters.

public void Parse(string voiceText)
{
    var words = voiceText.Split(new[]{' '}, StringSplitOptions.RemoveEmptyEntries);
    if(words.Length>1)
    {
        var command = words.First();
        if(command.Equals("купить"))
        {
            Add(words.Skip(1));
            IncrementUpdate();
        }
        if(command.StartsWith("купил"))
        {
            SetBoughtStatusTrue(words.Skip(1));
            IncrementUpdate();
        }
        if (command.Equals("удалить") || command.Equals("очистить"))
        {
            if (words[1].Equals("список"))
            {
                shopList.ShopItems.Clear();
            }
            else
            {
                RemoveShopListItems(words.Skip(1));
            }
            IncrementUpdate();
        }
    }
}

Application backend - 20 minutes

In order to ensure synchronization with other devices, it was necessary to make the server part. Of course, such a backend is an ideal candidate for hosting in Azure as a web role, but for the prototype it could be placed on the same virtual machine in Azure as voice recognition. Since our time is very limited, it makes sense to make a SOAP service, since the studio can quickly generate proxies on the client.
The service is also simple to disgrace. I have one shopping list that I transferred from the client to the server (for the client it will be generated in the proxy).

public class ShopList
{
    public ShopList()
    {
        ShopItems=new List();
    }
    public List ShopItems { get; set; }
    public int Version { get; set; }
}
public class ShopItem
{
    public string Name { get; set; }
    public decimal Price { get; set; }
    public char Valute { get; set; }
    public bool IsBought { get; set; }
}

Honestly, the two fields Price and Valute are superfluous, since I did not manage to use them, but I quote the code “as is”.
Saving and retrieving a list is also very easy.

public class GroceryService : System.Web.Services.WebService
{
    private LogicLayer logicLayer = new LogicLayer();
    [WebMethod]
    public ShopList GetVersion()
    {
        return logicLayer.GetShopList();
    }
    [WebMethod]
    public void UploadVersion(ShopList request)
    {
        logicLayer.Update(request);
    }
}

Of course, the release should not be a complete update of the list as is, but a partial update of the changed data, but for the prototype it will do.
The logic is also made very simple to disgrace, since this is a prototype, without a database, while maintaining the current value in the database. Honestly, there was no sense in such logic, but I quote “as is”. Method names are unsuccessful but did not change.

public class LogicLayer
{
    private static ShopList shopList = new ShopList();
    public ShopList GetShopList()
    {
        return shopList;
    }
    internal void Update(ShopList newshopList)
    {
        shopList = newshopList;
    }
}

Ultimately, I raised this service at voicegrocery.akhmed.ru/GroceryService.asmx
Now the question is how to deliver updates to customers? Of course by PushNotification. But its implementation could take a lot of time, which was scarce and I made a request from the client in 5 seconds.

DispatcherTimer dispathcerTimer = new DispatcherTimer();
dispathcerTimer.Interval = TimeSpan.FromSeconds(5);
dispathcerTimer.Tick += dispathcerTimer_Tick;
dispathcerTimer.Start();

The logic of updating to / from the client is very simple.
1. If the current version is less than received from the server, the current list is replaced by the server one.
2. If some change occurs on the client, the version is increased by 1 and sent to the server.

void dispathcerTimer_Tick(object sender, System.EventArgs e)
{
    var client = new ServiceReference1.GroceryServiceSoapClient();
    client.GetVersionCompleted += client_GetVersionCompleted;
    client.GetVersionAsync();
}
void client_GetVersionCompleted(object sender, ServiceReference1.GetVersionCompletedEventArgs e)
{
    if (e.Result.Version > logicLayer.GetVersion())
    {
        logicLayer.UpdateShopList(e.Result);
        RefreshView();
    }
}
private void IncrementUpdate()
{
    var shopListItem = new ShopList()
    {
        Version = shopList.Version + 1,
        ShopItems = shopList.ShopItems
    };
    var client = new ServiceReference1.GroceryServiceSoapClient();
    client.UploadVersionAsync(shopListItem); 
}

Porting to Windows 8 - 10 minutes.

Porting the application to Win8 was very simple. I did not implement voice recognition on the client and it turned out one-way synchronization. XAML was copied with virtually no changes, a bit had to correct the sending code to the server. It became a little easier - in one method

async void dispathcerTimer_Tick(object sender, object e)
{
    var client = new ServiceReference1.GroceryServiceSoapClient();
    var result = await client.GetVersionAsync();
    if (result.Body.GetVersionResult.Version > logicLayer.GetVersion())
    {
        logicLayer.UpdateShopList(result.Body.GetVersionResult);
        RefreshView();
    }            
}

Porting an application to Android - 15 minutes.

I love the mono platform. The code remains virtually unchanged, it remains to tweak the UI. Since the presentation is much more difficult for Android, I did not spend much time creating a custom adapter and after 5 minutes I rolled back and made a simple text list with text crosses in brackets:

void client_GetVersionCompleted(object sender, ru.akhmed.voicegrocery.GetVersionCompletedEventArgs e)
{
    try
    {
        list.Clear();
        var result = e.Result.ShopItems;
        foreach (var item in result)
        {
            var checkBox = item.IsBought ? "( X ) " : "(   ) ";
            list.Add(checkBox + item.Name);
        }
        this.RunOnUiThread(() =>
        {
            this.ListAdapter = new ArrayAdapter(this, Resource.Layout.ListItem, list);
            ((BaseAdapter)this.ListAdapter).NotifyDataSetChanged();
        });
    }
    catch (Exception)
    {
    }
}

Porting to iOS - None

Of course, I was thinking about porting to iOS, but since I didn’t have i-devices ~~and hackintosh, which I use in home development, it’s incorrect to show at such events~~ and it took very little time to postpone this idea. ~~Especially on a laptop with me I had no hackintosh~~

Summary

If you do not take into account the 40 minutes spent on researching the capabilities of the WP8 platform, then in just less than 2 hours, taking into account the costs of uploading to the server and small bug fixes, a full-fledged prototype was implemented that shows the main idea of the application and it’s not a pity to throw it out and proceed to full .
Of course, the code turned out to be very dirty, non-optimal, with a bunch of flaws and unfinished features. But functional prototypes are just needed in order to have a “paper sketch” - in a draft show the customer / superiors the product that will turn out.

Tags: