Image classifier

    Given a bit matrix containing a filled image of a circle, square or triangle.
    The image may be slightly distorted and may contain noise.
    It is necessary to write an algorithm to determine the type of the drawn figure from the matrix.

    This simple task at first glance met me at the entrance exam at DM Labs.
    In the first lesson, we discussed the solution, and the teacher (Alexander Shlemov; he also led the further implementation) showed why it is better to use machine learning for the solution.

    During the discussion, we found that our decision is made in two stages. The first stage is noise filtering, the second stage is the calculation of the metric by which the classification will pass. This raises the problem of defining boundaries: you need to know what values ​​a metric can take for each figure. You can draw these boundaries manually “by eye”, but it is better to entrust this work to a mathematically sound algorithm.
    This learning task was an introduction to Machine Learning for me, and I would like to share this experience with you.

    main idea


    First, we create many random images of circles, triangles and quadrangles. Next, we pass the pictures through the filters to eliminate interference and calculate the set of metrics that will be used to classify the figures. We define the boundary values ​​for metrics using the Decision tree, or rather, using its version of CART (more details can be found here ). The implementation of the algorithm is simple and better interpreted in terms of boundaries.
    By passing the unknown image through the filters and calculating the metrics, we can say in the decision tree which class the figure in the picture belongs to.

    Generation


    The generator should not only create pictures, but also be able to interfere with them. When generating, we must discard shapes with sides less than a certain size (for example, 15 pixels) and with obtuse angles (more than 150 degrees) - even it’s difficult for a person to classify them.
    The script was carefully provided by Victor Evstratov.
    bitbucket.org/rampeer/image-classifier/src/HEAD/picture_generator.py

    Filters


    During the discussion with colleagues, we found good ideas for filters. We take black pixels surrounded by pixels of the opposite color for interference (this idea formed the basis of the median filter), and for the desired path we take a large cluster of black pixels, that is, we use a bicomponent filter.

    Median filter : for each black pixel we determine the number of “black” neighbors in a neighborhood of radius R. If this number is less than a certain T , then we discard this pixel, that is, paint it in white. In the “original” median filter, we discard the pixel if it has less than half the black neighbors, and we have determined our threshold T , so, in fact, this is a quantile filter.



    I wrote the median filter honestly and “head-on”, so it works for O(WHR2), where W and H are the dimensions of the picture. The teacher said that you can speed up the algorithm using the Fourier transform. Indeed, the median filter can be expressed through convolution, and convolution can be quickly calculated using the fast Fourier transform.
    It turns out that it is possible to calculate a matrix with the number of neighbors as
    result = FFT-1 (FFT(data) FFT(window))
    This algorithm works forO(WHlog(W+H)), that is, much faster than a naive implementation, and does not depend on the size of the window. Due to the convolution cycle, an artifact arises at the borders: when processing the leftmost pixels, the rightmost pixels will be considered neighbors. This can be corrected by adding a frame of pure pixels along the edges of the image, or you can leave it as I did - anyway this effect does not affect the performance.
    bitbucket.org/rampeer/image-classifier/src/HEAD/filter_median.py
    However, I found this filter has a bad property: it rounds off the sharp corners of the triangles. Because of this, you have to take the radius of the window R rather small and go through the image with the filter only a few (N) times. Although at first it seemed that you could apply a median filter as long as it removes some pixels.

    Bicomponent Filter: take an arbitrary black pixel, assign it some number Q. Assign the same number to the “black” neighbors of this pixel in a neighborhood with radius R and repeat for the neighbors of the neighbors. We will repeat until we get to the border of the image (this resembles the action of the Paint tool in Paint, but we paint it in the color Q). Then we increase the number Q by one, select the next “unpainted” pixel and repeat the process.
    After executing this algorithm, we get a set of non-touching islands. We can say with high reliability that the largest island is the figure, and the rest of the islands are obstacles.


    <filter_bicomp.py>
    Unlike the previous one, this filter does not spoil the image. It can cause damage only if the figure crosses the line from interference width greater than Twhich is unlikely.
    In the median filter, I found another property, this time positive. He divides the space filled with interference into “islands”. So, then applying a bicomponent filter, we get a contour with sticking noise. After processing with a bicomponent filter, it is worth going through the median again to remove the remaining bumps. Then you need to build a convex contour around the remaining points, fill it in and calculate the metrics already for the new image.
    bitbucket.org/rampeer/image-classifier/src/HEAD/process.py Filters
    work:
    Source imageMedian filterMedian, bicomponentMedian, bicomponent,
    median, fill

    Filter parameters are selected based on the size of the images in the sample and their noise. For the tricky quadrilateral shown above:
        median.filter(img, 2, 6, 1)
        bicomp.filter(img, 2)
        median.filter(img, 2, 5, 2)
    

    In our sample, the images will be less noisy and the settings will be more sparing.

    Metrics


    In the course of the same discussion with colleagues, we came up with many different interesting metrics.

    Counting angles. This is the very first thing that comes to mind after reading the task. But due to interference, additional angles close to 0 degrees may appear. I tried unsuccessfully to deal with this, "gluing together" almost collinear vectors and setting thresholds. Such methods are hard to configure, and they can still give an incorrect result, since the figure is smoothed when filtered. It is better to sum the squares of the sines of the angles, and if the angles are greater than a certain threshold T, round the square up to unity. This gives a fairly good result: sharp angles add one to the counter, and angles close to 0 add almost nothing. By the way, it seemed funny to me that in this case the number of angles of a triangle can vary from 2.5 to 4.

    bitbucket.org/rampeer/image-classifier/src/HEAD/feature_edges.py

    Mirroring. The meaning of this metric is to calculate how much the figure will coincide with an inverted copy of itself when reflected horizontally / vertically / along both axes. That is, this metric is a kind of measure of symmetry. Circle, whatever one may say, will always completely coincide with itself. Also, I intuitively suggested that the square would more coincide with itself than the triangle.

    bitbucket.org/rampeer/image-classifier/src/HEAD/feature_mirror.py

    Ratio of area to square of perimeter . The perimeter must be squared so that the metric does not have a dimension and does not depend on the size of the image. The perimeter and area will be taken from the convex contour. The circle has the largest metric value among the figures: s/p^2 =(πr^2)/(4π^2 r^2 )=1/4π. For an equilateral triangle (it has the largest ratio among triangles):s/p^2 =(4√3 a^2)/(3*9a^2 )=(4√3)/27. For the square: s/p^2 =(a^2)/(16*a^2 )=1/16.

    bitbucket.org/rampeer/image-classifier/src/HEAD/feature_perimeter.py

    Metrics on the description rectangle . The previous metrics did not separate the four and the triangle very well, and I decided to come up with a new metric. Draw a bounding box around the shape. For each side of the rectangle, we find the first (“minimum”) and last (“maximum”) intersection with the figure. We get an “octagon” for which we can calculate different metrics.

    For example, the ratio of the area of ​​a figure to the area of ​​a describing square (sbound).
    bitbucket.org/rampeer/image-classifier/src/HEAD/feature_sbound.py
    Also, the ratio of the area of ​​the figure to the area of ​​this octagon (sbound2) seems to be a promising metric:
    bitbucket.org/rampeer/image-classifier/src/HEAD/feature_sbound2.py

    Data collection


    Applying the acquired knowledge, I generated images and collected statistics. In this image set, I added shapes that represent potential “bad cases” for metrics:
    • equilateral triangle
    • right triangle with two sides parallel to the axes
    • a square with sides parallel to the axes.

    I gave a lot of weight to these cases when building a tree. The fact is that the probability of random generation of such pictures is small, and these cases are boundary for some metrics. Therefore, for the correct classification, they must be added to the sample with a large weight.
    I put the procedure for filtering the image and collecting metrics into a separate file, it will be needed for analysis. By the way, in the decision tree our metrics are called “input parameters” or “features” (feature).
    bitbucket.org/rampeer/image-classifier/src/HEAD/process.py
    bitbucket.org/rampeer/image-classifier/src/HEAD/stats.py
    Then I plotted to make sure that the metrics distinguish the figures well enough. For this, a schedule like “boxes with antennae” is suitable:

    The "antennae" intersect, which means that inaccuracies are possible in the analysis. As it was written above, just for accuracy we need several metrics, and not one. Then I tried to make sure that the “bad cases” of metrics do not intersect. To do this, I built a dependence of one metric on another. If it turns out that they are monotonously dependent on each other, then their “bad cases" will also coincide.

    * As we can see from the graphs, the “clouds” of points intersect strongly. Consequently, a big mistake is possible in the classification. In addition, metrics are not monotonically dependent on each other.

    Analysis


    Based on the collected data, you can build a decision tree:
    bitbucket.org/rampeer/image-classifier/src/HEAD/analyze.py
    I tried to visualize the tree. The result is such a scheme:

    It follows from it that some metrics have remained unused. From the very beginning, we could not predict which metrics would be “better” than others.
    The prediction accuracy is about 91 percent, which is not bad, given the distortion of the squares and the noise in the sample:



    Let's try to draw the images ourselves and analyze them:

    Circle

    Rectangle

    Triangle

    Let's try to increase the voltage.
    We will distort the figures until they become incorrectly determined.

    Rectangle

    Rectangle

    Triangle

    Rectangle

    That's all.
    In the last image, the angles of the triangle are very rounded, the edges cannot work correctly, and the perimeter gives too much error. The triangle is unsuccessfully rotated: when constructing the bounding rectangle, only two vertices will touch it, and sbound and sbound2 will not produce anything reasonable. Only a mirror could produce the correct result, but it is not included in the tree. And if out of 5 metrics only one points to a triangle, then this conclusion can be interpreted as erroneous.

    Conclusion


    Machine learning methods allowed us to build a system that copes well with the task - it recognizes figures in the image quite well.
    Charts were built at R:
    bitbucket.org/rampeer/image-classifier/src/HEAD/charts.R

    Also popular now: