In search of a UFO. Detection of objects in the image

    Hacking captcha is, of course, interesting and informative, but, by and large, useless. This is just a special case of the problem that arises in one of the most interesting areas of IT development - pattern recognition ( pattern recognition ).

    Today we will consider an algorithm (more precisely, it is more correct to consider this a technique, because it combines many algorithms), which is at the junction of such areas as Machine Learning and Computer Vision.

    Using this algorithm, we will search for UFOs (looking at the sacred) in the images.


    The presented technique was first described in the article “ Rapid Object Detection using a Boosted Cascade of Simple Features ”, Paul Viola, Michael Jones, 2001. Since then, it has been recognized and widely used in its field. And the scope, not hard to guess, is the search for objects in images or in a video stream.

    It turned out that initially the technique was developed and applied in the field of developing algorithms for face detection, but nothing prevents learning the algorithm for searching for other objects: a machine; Prohibited X-ray objects at the airport tumor in medical images. In general, as you understand, this is serious and can bring great benefit to humanity.

    Description of the technique


    The methodology is based on the adaptive boosting algorithm (or AdaBoost for short) . The meaning of the algorithm is that if we have a set of reference objects, i.e. there are values ​​and the class to which they belong (for example, -1 - there is no face, +1 - there is a face), in addition there are many simple classifiers, then we can create one more perfect and powerful classifier. Moreover, in the process of compiling or training the final classifier, the emphasis is on standards that are recognized “worse”, this is the adaptability of the algorithm, in the learning process, it adapts to the most “complex” objects. You can see the operation of the algorithm here .

    In general, AdaBoost is a very efficient and fast algorithm. In my projects, I use it to detect weak anomalies against the background of strong interference in various data that are not related to images and have a different nature. Those. The algorithm is universal and I advise you to pay attention to it. It is common in data mining, which is now popular on the hub, even entered the “ Top 10 algorithms in data mining ”. Very informative publication, I advise everyone.

    Haar-like features

    The question is how to describe a picture? What to use as a characteristic for classification? Given that it is necessary to do this quickly and our objects can be of different shapes, colors, tilt angles ... In this technique, the so-called haar-like features are used (I will call them primitives below).

    In the figure above you see a set of such primitives. To understand the essence, imagine that we take a reference image and superimpose any of the primitives on it, for example 1a, then we consider the sum of the pixel values ​​in the white area of ​​the primitive (left side) and the black area (right side) and subtract the second from the first value . As a result, we obtain a generalized anisotropy characteristic of a certain part of the image.

    But there is a problem. Even for a small image, the number of superimposed primitives is very large, if you take a 24x24 image, the number of primitives is ~ 180,000. The task of the AdaBoost algorithm is to select those primitives that most effectively select the given object.

    For instance:

    For the object on the left, the algorithm chose two primitives. For obvious reasons, the eye area is darker compared to the middle area of ​​the face and nose. Primitives of this configuration and size in the best way "characterize" this image.
    On the basis of such classifiers with the most effective primitives selected, a cascade is constructed. Each subsequent element of the cascade has more stringent conditions for successful passage than the previous one (more primitives are used), thereby only the most “correct” ones reach the end.

    Implementation of algorithms

    We are lazy guys, so we will use the implementation of this technique from the OpenCV library. There are already written modules for creating samples, training the cascade and testing it. I must say that the implementation is quite crude and therefore you should be prepared for frequent crashes, freezes in the learning process and other unpleasant things. Several times I had to dive into the source and edit them for myself. A very detailed and easy to understand tutorial on how to work with the implementation of this technique can be viewed here .

    The learning algorithm is a very long-running thing. With the proper approach, the learning process can last 3-7 days. Therefore, we simplify the task as much as possible. I have neither the time nor the computing means to spend a week learning. On learning the cascade for this article, I needed 1 day of Core 2 Duo.

    It should be noted that the implementation of OpenCV used a more advanced modification of the AdaBoost algorithm - Gentle AdaBoost.

    Formulation of the problem

    That's all with theory. Let's move on to practice. Our task is to find this (the artist from me is bad):

    On such images (you need to consider that in the work all color images are translated into grayscale, otherwise the number of invariants is too large):

    Provided that:

    1. The object may have a different color. ± 50 values ​​from the original.
    2. The object may have a different size. Size can be changed up to 3 times.
    3. The object has a different angle of inclination. The angle ranges from 30 °.
    4. The object has a random location in the image.

    Stage 1. Creation of a training sample.

    The first and very important step is to create a training set. Here you can go in two ways. Submit for training a pre-compiled database of images (for example, persons) or generate a predetermined number of cases based on one reference object. The latter option suits us, all the more so because OpenCV has a createsamples module for generating samples based on one object. As background images (i.e., images that do not have the desired object), a couple of hundred space images are used (example above).

    Depending on the specified parameters, the module takes a reference object, applies various deformations to it (rotation, color, noise is added), then selects a background image and randomly places the object on it. It turns out the following:

    In real tasks, you need to focus on the size of the training sample in the region of 5000. I generated 2000 of these objects.

    Stage 2. Training

    Now it is necessary to create a cascade of classifiers for the existing database of objects. For this we use the haartraining module. Many parameters are passed to it, the most important of which are the number of classifiers in the cascade, the minimum required classifier efficiency coefficient (minimum hit rate), and the maximum permissible false alarm rate (maximum false alarm). There are much more parameters and those who decide to repeat the experiment will be able to get to know them in more detail here .

    Stage 3. Testing the cascade

    After a long wait, the program produces a trained cascade in the form of an xml file that can be used directly for object detection. To test it, we again generate 1000 objects according to the principle described in the first stage, thereby creating a test sample.

    To test the cascade, the performance module is used. Feeding him a test sample and a cascade, after a few seconds we can observe the following picture in the console:

    + ================================= + ====== + ====== + == ==== +
    | File Name | Hits | Missed | False |
    + ================================= + ====== + ====== + == ==== +
    | 0001_0032_0126_0138_0066.jpg | 1 | 0 | 0 |
    + -------------------------------- + ------ + ------ + - ---- +
    | 0002_0088_0079_0188_0091.jpg | 1 | 0 | 1 |
    + -------------------------------- + ------ + ------ + - ---- +
    | 0003_0059_0170_0127_0061.jpg | 0 | 1 | 0 |
    + -------------------------------- + ------ + ------ + - ---- +
    | 0004_0035_0143_0134_0065.jpg | 1 | 0 | 0 |
    + -------------------------------- + ------ + ------ + - ---- +
    + -------------------------------- + ------ + ------ + - ---- +
    | Total | 457 | 543 | 570 |
    + ================================= + ====== + ====== + == ==== +
    Number of stages: 7
    Number of weak classifiers: 34
    Total time: 14.114000

    First of all, look at the time (“Total time” value) that was needed to process 1000 images. Given that they had to be read from the disk, the time taken to process one image is fractions of a second: 14/1000 = 14 ms. Very fast.

    Now directly on the classification results. “Hits” is the number of objects found; “Missed” is the number of missed; “False” is the number of false positives (ie, the cascade gave a positive response in the area where the object is not). Overall, this is a bad result. :) More precisely, as an example for this article, it is satisfactory, but for use in real-life tasks, you should carefully approach the creation of a training sample and determine the optimal parameters for training, then it is possible to achieve an efficiency of 95% with a false positive rate of 0.001.

    Some results of the algorithm:

    Here are a couple of examples with false positives:


    The described technique has a fairly wide application. It can be successfully combined with other algorithms. For example, to search for an object in an image, the described method can be used, and for recognition, a classical neural network or another method.

    Thank you for your attention, I hope it was interesting.

    What to read in addition to the indicated sources:
    An empirical analysis of boosting algorithms for rapid objects with an extended set of haar-like features .
    Implementing Bubblegrams: The Use of Haar-Like Features for Human-Robot Interaction .

    This article shows that pattern recognition is not alive in a single neural network. Therefore, in continuation of the above thoughts and in the next article I would like to talk about object recognition using a statistical approach, namely the use of multidimensional statistical characteristics of the image and the Principal Component Analysis (PCA).

    Also popular now: