ahriman August 7, 2015 at 13:08

How Microsoft Project Oxford Can Make Your Applications Smarter

Many thanks to Evgeny Grigorenko, Microsoft Student Partner, for the help in writing this article. The rest of our Azure related articles can be found on the azureweek tag .

Let me guess, you, like me, have been burning the idea of a brilliant application for a couple of months now. In addition to its basic functionality, in an ideal world, it simply must have many additional features, for example, to identify the user (or cat ) from his photograph from the front camera or understand commands in a natural language. Or make a second How-Old (which was made just at Oxford).

But we all know the sad truth. Much is possible only with the use of complex machine learning algorithms, which we absolutely do not have time to study. And this is precisely what stops development, since without such innovations we will be completely lost among analogues. But there is a solution to this problem, and his name is Microsoft Project Oxford. If you want to know how Microsoft Project Oxford can simplify your life and make your applications truly intelligent, then welcome to cat.

At the // BUILD conference in April 2015, Microsoft, among many other announcements, introduced a new service in the Microsoft Azure cloud services group. It was a project codenamed Project Oxford- A set of ready-made REST APIs that in an accessible form give developers the full power of machine vision algorithms, natural language analysis and voice recognition for use in their applications. It is worth noting that the availability of services in the form of a REST API allows you to use it on absolutely any platform and using your favorite development technologies, not limited to those offered by Microsoft.

The Oxford Project itself extends the existing Azure Machine Learning Gallery.new highly intelligent solutions. The original idea of creating Azure ML Gallery a year ago was to try to put together machine learning services that were easy enough to use in one place. To use them, you don’t have to be an expert mathematician, all you need to do is call the API and not think about complex mathematical aspects at all - internally, the services will do everything on their own. And Project Oxford, more than ever, is perfectly in line with this idea.

What is Project Oxford made up of? The project consists of four groups of self-contained cloud APIs: Face APIs, Computer Vision APIs, Speech APIs and, so far, in the closed beta state Language Understanding Intelligent Services (LUIS).

Into Face APIs includes cloud algorithms for detecting and recognizing human faces in photographs, namely:

detecting the boundaries of faces in the form of describing rectangles with highlighting additional characteristics, such as the coordinates of parts of the face, head position, gender and heuristic age estimates;
a wide range of recognition services, representing such features as evaluating the similarity of two faces, searching for similar faces in a series of photos by a given sample, automatic grouping of photos and identification (recognition) of people based on a pre-prepared training sample.

In addition to the standard tasks of finding faces in photographs and automatically categorizing photo galleries like the one presented in Apple iPhoto, the service can be used in many other scenarios. Just remember the modern blockbusters. Tracking the movement of people with the help of external surveillance cameras, automatic authorization when approaching a top-secret installation, are possible and implemented using the Microsoft Project Oxford Face API services.

But, even if you are not going to become a participant in the spy drama, the services offered may turn out to be no less useful: intellectual content targeting based on information about the user's gender and age, filtering photos by image from the contact list - there are a lot of scenarios and they are limited only imagination.
To everyone who is interested, and so far not being such, I highly recommend that you familiarize yourself with additional information on the Face API on the official website of the project . There, along with the documentation , several interesting demos are presented., besides visualization giving an idea of the textual response of the API to requests, it’s a great way to try out services and understand all their capabilities in a couple of minutes.

Gain popularity immediately after its announcement service how-old.net and its newly announced sibling twinsornot.net the original plan also created only as a demo Face API for the duration of the conference // BUILD 2015. Read about the success stories of the first service and try to move it on the history of your application here .

The most intriguing part of the Oxford Project are undeniably service intellectual understanding of the language the Language Understanding the Intelligent the Service or abbreviatedLUIS . LUIS gives developers the ability to build natural language understanding models for easy use in their applications.

There can be several sources of such models. Simple models can be built on the basis of existing and successfully used in projects such as Cortana or Bing. If you want your software solution to understand basic concepts like time, numbers, or temperature and successfully respond to a request like “remind me of training at 8am,” standard models will be enough. If you need to answer more complex queries like “start tracking runs” or “turn on the light”, you will need to build your own models, which, in principle, is also achievable with the tools provided by LUIS.

Further, these models can be published as REST APIs and used on any devices and operating systems capable of such calls. The opportunities that LUIS can present are hard to overestimate. Just a couple of years ago, virtual assistants such as Cortana and Siri seemed complex and unattainable, and now they are becoming available to any developer. Like never before, your decision can become truly intelligent easily and naturally, and maybe in the end it will even succeed in passing the Turing test.

But, unfortunately, a project of this magnitude takes time to complete. Unlike other Oxford project services, initially limitedly available for use, LUIS is in a state of closed testing. Additional project information can be found atthe official page and video of one of the reports since // BUILD 2015, and here you can register in the queue for access to the project. Do not miss the opportunity to plunge into the world of natural language intellectual analysis and give these opportunities to your users first!

Services of the Computer Vision APIs group continue the direction of visual analytics, but do it in a completely different way. They specialize in the analysis of arbitrary photographs and provide the following wide range of features:

categorization of images, such as food, people, buildings and, of course, cats;
search for inappropriate sexual or racial content in photographs;
determination of the dominant colors of the image, the facts that it is black and white, a clipart or line drawing;
text recognition (OCR);
thumbnail generation of images based on intelligent exposure analysis.

Frankly, the Computer Vision API services are my favorite in terms of the number of features presented. The categorization of images is probably one of the oldest machine vision tasks, and you have probably come up with dozens of scenarios for using it in your project. Not yet? Just try it! Have you ever wanted to implement OCR text recognition in your application, read checks, pointers on poles or any other textual information everywhere around us in the real world, but thought it was complicated and expensive? Now it is accessible and convenient to use as one of the services of Project Oxford Computer Vision APIs.

But, in addition to the well-known technologies described above, Computer Vision APIs provide many additional features. For example, consider a purely technical problem such as generating thumbnails of images, also known as thumbnail. The task seems simple until you come across it in real life. While maintaining proportions, scaling is a simple task, but you just try to change them and bodies without heads, “sky” instead of “cat against the sky” and other problems of incorrect cropping of photos begin to pour in from the cornucopia. And the Computer Vision API has a solution. It not only hides the technical issues of scaling while maintaining the maximum quality of the sketch, but also first tries to determine the significant elements of the exposure, which ensures a more correct choice of the borders of the cropping image. In most cases, this approach allows you to achieve maximum preservation of the content of the created sketch. Just look at the example below, the service was not informed of any additional information other than the image of a person on a mountain top.

All site owners where users can upload their images are aware of the moderation problem. And the Oxford project’s ability to detect sexual and racial content puts this difficult task on the shoulders of the machine. All that is required when uploading a photo to your service is to make a parallel call to Computer Vision APIs and, based on the resulting level of image correspondence to groups of inappropriate content, make a decision on additional human processing or categorically prohibiting the user from publishing. But, if such a solution is not enough and a more complex approach is required, then it may make sense to pay attention to a group of similar services that use Computer Vision APIs as the basis: Microsoft Content Moderator and PhotoDNA Cloud Service.

All interested can find more information on the official website of the project , where, as before, additional documentation and convenient demos are available for experiments . Speech APIs

services determine the algorithms that have been used for many years in the voice services of the Bing search engine, recently introduced by Skype Translator, and recently included in the natural form of Windows 10 in the form of the already well-known virtual assistant Cortana. As you might guess, the Speech API includes voice recognition services from an audio file to text and vice versa.

The described functionality does not require much presentation, and therefore only discuss some additional features. First of all, you need to talk about languages and here everything is not in favor of Russian. Voice recognition services currently only support English, German, Spanish, French, Italian and Chinese, but this list is constantly expanding. The generation of voice over the text pleases with the support of a number of additional languages, including Russian, and therefore can be actively used now. It is also worth noting that recognition services support online processing with the ability to return preliminary results. This allows you to significantly speed up the process of parsing the incoming stream and make the user interface as responsive as possible.

In addition, the Speech API is the only Project Oxford service that does not require a permanent active Internet connection. The corresponding algorithms are built into the Universal Windows Platform and can be used in your universal applications on all devices based on Windows 10 offline.

If during the time of victorious globalization, the lack of support for the Russian language is not an insurmountable restriction or you are interested in learning how the Speech API meets your particular usage scenario, I advise you to visit the main page of the project in search of additional information about the solution, technical documentation and more than once recommended interactive demos .

If you already have a mobile application or website, or you are just going to create something like this, think about how Project Oxford can be personally useful to you and, I'm sure, there will be something. With it, your solution will become more modern and stand out from many others, and users will be satisfied with previously unseen usability and capabilities. And most importantly, you will not need any analysis of complex furious mathematics, long development of a complex algorithm ready for everything — no efforts at all except a couple of lines of code to call the necessary service. With Project Oxford, using machine learning services is easier than ever.

Tags:

How Microsoft Project Oxford Can Make Your Applications Smarter

Also popular now: