Recognizing eco-labels using Azure Custom Vision from a mobile app

From the sandbox

In this article I want to talk about using the Custom Vision service to recognize photos of eco-labels from a mobile application.

The CustomVision service is part of the cloud-based Cognitive Services on the Azure platform.
About what technologies we had to study, how to work with CustomVision, what it is and what it allows to achieve - further.

The task of recognizing eco-labels appeared three years ago when my wife and I began to discuss the mobile application that her organization (environmental NGOs) wanted to do to spread information about eco-labels.

What is eco-labeling?

Eco-labeling is a certificate and a corresponding logo issued by certifying organizations that check products or services of a manufacturer’s supplier for compliance with certain criteria related to the life cycle of a product-service and focused on its environmental friendliness. After certification, a manufacturer can place an eco-label logo on its products.

Eco-labeling also includes a plastic mark with its composition for ease of processing and processing, and other similar signs.

For example, here is a sign:

The process of selecting recognition technology

The two main features of the application should have been the search for stores with eco-products and the recognition of eco-labels. If technologically everything is relatively simple with the search for stores, then with recognition it is not very. The word is fashionable, but how to make it was not clear. And I began to study the question.

The logos of the markings are standardized and are ideal objects for recognition - he pointed the phone at the image on the packaging of the product, took a picture and the application shows what a sign it means and whether it should be trusted.

I started thinking how to make recognition and analyze different options - I tried OpenCV with its recognition algorithms (Haar cascades, SWIFT, Template matching, etc.), but the recognition quality was not very good - no more than 70% with a training set of several dozen images .

Probably, I misunderstood somewhere and did something wrong, but we also asked another acquaintance to investigate this topic and he also said that 70% at the cascades of Haar is the maximum on such a date.

In parallel with this, materials about various frameworks of neural networks and the successful use of neural networks for solving such problems began to appear more often. But everywhere there were glimpses of some horrific sizes of datasets (hundreds or thousands of images for each class), unfamiliar to me in Python, TensorFlow, the need for their own backend - all this was somewhat frightening.

As a .NET developer, I looked at Accord.NET but I also did not quickly find something that would fit right away.

At that time, we were busy finishing the application and launching it, and I postponed the proceedings with recognition.

About a year ago, I came across an article describing Microsoft's early preview of Custom Vision, a service for classifying images in the cloud. I tested it on 3 characters and I liked it - an understandable portal where you can both train and test a classifier without technical knowledge, learning a set of 100 images in 10-20 seconds, the quality of classification is above 90% even on 30 images of each character - then what do you need.

I shared the find with my wife and we started making a less functional international version of the application, which does not contain information about products and stores, but is able to recognize eco-labels.

Let's move on to the technical details of a running recognition application.

Custom vision

CV is part of Cognitive Services in Azure. It can now be formalized and will be paid for with an Azure subscription, although it is still listed in the Preview.

Accordingly, like any other Azure product, CognitiveServices are displayed and managed on the Azure portal.

CV provides two REST APIs - one for training (Training), the other for recognition (Prediction). In more detail, I will describe the interaction with Prediction further

In addition to the Azure portal and API, CV users can access the customvision.ai portal, where you can easily and visually upload images, place labels on them, and see the images and recognition results that passed through the API.

You can start using the customvision.ai portal and API without any binding to Azure — a project is created for testing purposes even without Azure Subscription. But if you want to make a project out of your test project in the future, then it is better to do it right away, otherwise we had to manually copy the pictures from the test project and re-mark it in production.

In order to make a project in Azure, you need to register there and create a subscription. This is relatively easy, problems can only be with the input and validation of data from a credit card - sometimes it happens.

After registration, you need to create a ComputerVision instance through the Azure portal

After creating resources in Azure, they will be available in customvision.ai

On the customvision.ai portal you can upload images and tag them with tags - there can be several tags for one image, but without selecting areas. That is, the image belongs to several classes, but at this stage of development of the service it is impossible to select a specific fragment in the image and assign it to the class.

After marking, you need to start training by pressing the Train button - the training of a model of 70 tags and 3 thousand images lasts about 30 seconds.

The results of the training are stored in the essence of Iteration. In fact, versioning is implemented through Iteration.

Each Iteration can be used independently - that is, you can create an Iteration, test the result and delete it if it doesn’t fit or translate it into the default one and replace the current default Iteration and then all the recognitions from the applications will come to the model from this Iteration.

The quality of the model is displayed as Precision and Recall (more here ) for all classes at once, or separately.

This is what a project looks like with already loaded and passed through images.

On the portal, you can run image recognition from a disk or from a URL using Quick Test and perform recognition testing on a single image.

On the Predictions tab, you can see the results of all the latest recognitions - the percentages of belonging to tags are displayed directly in the picture.

The ability to see all the recognition results and add them to the training set with just a couple of mouse clicks helps a lot - anyone can do this without any knowledge of AI or programming.

API usage

Custom Vision Service has a very simple and intuitive REST API for learning and recognition.

In our application, only the recognition API is used and I will talk about its use.

Url for recognition of this kind:

https://southcentralus.api.cognitive.microsoft.com/customvision/v2.0/Prediction/{Your project GUID} / image

where
southcentralus ** is the name of the Azure region where the service is located. While the service is available only in the South Central US region. This does not mean that only there it can be used! He just lives there - you can use it from anywhere, where the Internet is.
{Your project GUID} ** - your project identifier. It can be viewed on the portal customvision.ai

For recognition it is necessary to send the image via POST. You can also send a publicly available image url and the service will download it yourself.

In addition, you need to add the header "Prediction-Key” to the Headers in which to transfer one of the Access Key that will be issued upon registration - they are available both on the customvision.ai portal and on the Azure portal.

The result contains the following field:

"Predictions":[
        {"TagId":"35ac2ad0-e3ef-4e60-b81f-052a1057a1ca","Tag":"dog","Probability":0.102716163},
        {"TagId":"28e1a872-3776-434c-8cf0-b612dd1a953c","Tag":"cat","Probability":0.02037274}
    ]

Where Probability indicates the likelihood that the image belongs to the specified tag (class).

In C # it looks like this

var client = new HttpClient();
   client.DefaultRequestHeaders.Add("Prediction-Key", "{Acess key}");
   string url = "https://southcentralus.api.cognitive.microsoft.com/customvision/v2.0/Prediction/{Your project GUID}/image";
   HttpResponseMessage response;
    List<RecognitionResult> recognitions = new List<RecognitionResult>();
    using (var content = new ByteArrayContent(imageBytes))
    {
                content.Headers.ContentType = new MediaTypeHeaderValue
                                                                ("application/octet-stream");      
                response = await client.PostAsync(url, content);
                if (response.IsSuccessStatusCode)
                {
                    string strRes = await response.Content.ReadAsStringAsync();
                    dynamic res = (dynamic) JsonConvert.DeserializeObject(strRes);
                    foreach (var pr in res.predictions)
                    {
                        recognitions.Add(
                           new RecognitionResult() 
                           { Tag = pr.tagName, RecognPercent = pr.probability });
                    }
                }
                else
                {
                    Debug.WriteLine(
                            "Non successful response. " + response.ToString());
                }
}

As you can see - absolutely nothing complicated. All the magic happens on the service side.

Application and some selected options.

The application is quite simple and consists of a list of eco-labels, information about eco-labels, how they are divided and the scanner itself.

The main part is written in Xamarin.Forms, but the scanner window works with the camera and had to be rendered as renders and implemented for each platform separately.

The level when the application decides that the ecolabel is recognized exactly> = 90% while almost all the images are recognized if they are of more or less acceptable quality and there are no other signs in the picture.
This number was derived empirically - we started with 80, but realized that 90 reduces false positives. And they happen quite a lot - many markings are similar and contain similar elements and the color scheme is shifted to green.

For example, this is not the highest quality image is recognized correctly with an accuracy of 91%

At the same time, this class was trained in 45 images.

I hope the article was useful and will allow interested readers to look at new AI and ML tools.

Tags: