Clipart and graphics in Yandex.Pictures

  • From RSS

One of the main sources of information about the needs of users is the analysis of their requests. On Yandex.Pictures quite often there are requests to search for specific types of images. For example, [ Victory Day clipart ], [ people clipart ], [ coloring ], [ internal combustion engine diagram ], [ embroidery scheme ], etc. In a word, these are queries aimed at searching for a clipart (images of objects on a uniform background) and graphics (hand-drawn pictures, sketches, schemes, coloring, images of objects created by thin lines).

To make such images easier to find, we made new filters in the  advanced search of  Yandex.Pictures. These filters are called “clipart” and “graphics”.

 

 

Now some technical details on how our algorithms learned to recognize such images.

First of all, it was necessary for us to identify the characteristic features that separate the clipart and graphics from other images. We attributed such signs, for example, to the presence of one object in a picture, uniformity of the background color, certain values ​​of gradients, and the distance from the edge of the image to the object in the frame. Having established several tens of such attributes, we introduced algorithms that determine the presence and severity of attributes of a particular picture.

To teach the machine to automatically make decisions about whether a picture belongs to any class, a classifier was used based on the Support Vector Machine method. To train the system, a sample of several thousand images was used. Each of them was manually marked on a scale of “clipart - not clipart” and “graphics - not graphics”, and also correlated with pre-calculated attributes that were mentioned above.

This data is fed to the input of the classifier, which represents the image as points in multidimensional space. The coordinates of these points depend on the presence and severity of the attributes. The task of the classifier is to select a function that divides the space into regions containing objects of the same class. 

Training was not without difficulties. At the first stages, it was difficult for the machine to determine whether some of the images are clip art:

 

or graphics:

 

We needed to improve the quality of the attributes and calculate new ones, as well as expand the test sample and collect additional estimates.

The markup results are placed in the index in the form of bit flags and are used to filter queries.

 

In addition to clipart and graphics, Yandex.Pictures will help you find images by size, color, format and orientation. You will find these and other features on the advanced search page  .

Dmitry Kotlyarov, Nikolay Shturkin and the Yandex.Pictures team

.

Also popular now: