# Labeling machine for labels - unwrap cylindrical distortion programmatically

In our application, there is a feature, like vivino ~~'s mother’s son’s mother’s~~ son - the definition of wine from a photograph. Under the hood - the use of third-party services, Tineye - to determine the most appropriate label, Google Vision - to read the text on it. The latter is necessary in order to clarify the correct product, because Image search does not take into account the importance of some regions, as a rule - this is text information - the year and type of wine.

However, the accuracy of both services is markedly reduced due to the fact that the label is distorted by a cylindrical surface.

This is especially noticeable in Google Vision - almost any text outside the center of the label is practically unreadable, although a person easily recognizes it. In this article I will describe how to invert distortion and increase the accuracy of product recognition.

First of all, consider what distortion is.

The rectangular label, when pasted on the cylinder, has a characteristic barrel shape (b in the diagram above). The ABC curve in this case, in a rather good approximation, is an ellipse, since we see a circle (section of the cylinder) at an angle. The set of horizontal lines of the label is similarly transformed into a set of ellipses in the photo.

The most interesting thing is that to expand the label, it is enough to specify 6 markers (ABCDEF):

And using them, build a complete surface grid:

Having a surface grid, we can expand each tile separately, and get the original surface:

The library code is available on the githaba. The convenience of this method is that the input parameters for the inverse transformation are visually definable characteristics of the label (corners and upper, lower points), which allows you to fully automate the process.

The next part is devoted to the definition of markers. The working code is available only partially in the branch on github , because A really working solution is covered by hacks and shamanism, so putting such tin into a githab simply does not allow conscience.

Stage one - we convert the image into a black and white version.

Then you need to get the contours of the bottle with the label. For this we use the transformation sobel. In short, this filter first blurs the image and then subtracts it from the original one. As a result, the uniform areas remain dark, and the edges (changes) are bright.

The next thing to do is to identify the two most visible vertical lines that, if lucky, are the edges of the bottle. In this case, this is the case, but if you photograph a bottle standing next to other bottles, then this is no longer the case.

To define these lines, use the Hough transform.. The essence of the technique is that we take a lot of lines going across the entire screen, and count the average value of the pixels (for example, we take the lines going from the top of the picture to the bottom). We transfer these values to a new coordinate plane and get something like a heat map. We are looking for two extremums on this heat map - they are the side lines.

The diagram below shows how the left line goes to a point on the new coordinate plane:

With ellipses it is a bit more complicated, but knowing that the Hough transform can be applied to any mathematically defined curves, we will use this method again, but this time we will look for a set of elliptical curves .

But first you need to bring the problem to a two-dimensional form. Knowing that the bottle is centrally symmetric, we take the central axis for the Y coordinate, and one of the sides for the X. As the values on the new coordinate plane, we take the set of ellipses built between the central axis and the side. This is possible due to the fact that an arbitrary point of the side and the central axis have only one connection method. Perhaps this is not very obvious at first glance, but much easier to understand if we turn to the parametric formula of an ellipse:

x = a * cos (t)

y = b * sin (t)

In exactly the same way, we find two sought extrema that define two ellipse labels (curves AB, FE). Now that we have all the necessary label parameters (side curves, as well as upper and lower ellipses), we can apply the algorithm from the first part of the article and perform the inverse transformation.

What can be improved. First, the algorithm does not take into account the distortion of the perspective of the ellipse itself, as a result, the side label fragments stretch a little more than they should. To make a correction, you need to know the real angle of the camera, or, at least, use the most typical for the phone (you can choose empirically).

Secondly, the Hough transformation is rather unstable working in difficult conditions — for example, when bottles are placed next to the frame and the edges of the bottle of interest may be incorrectly determined.

Third, if the label is not rectangular (for example, elliptical), then the markers will be defined incorrectly, and the transformation will only distort the image more.

In practice, it is much more interesting to use a neural network to define markers, since it can be trained using complex examples, so that, at a minimum, the algorithm does not perform the transformation, if the markers cannot be determined. But so far I have not tried to use neuron for this task, so maybe this will be the topic of a separate article :)

However, the accuracy of both services is markedly reduced due to the fact that the label is distorted by a cylindrical surface.

This is especially noticeable in Google Vision - almost any text outside the center of the label is practically unreadable, although a person easily recognizes it. In this article I will describe how to invert distortion and increase the accuracy of product recognition.

First of all, consider what distortion is.

The rectangular label, when pasted on the cylinder, has a characteristic barrel shape (b in the diagram above). The ABC curve in this case, in a rather good approximation, is an ellipse, since we see a circle (section of the cylinder) at an angle. The set of horizontal lines of the label is similarly transformed into a set of ellipses in the photo.

The most interesting thing is that to expand the label, it is enough to specify 6 markers (ABCDEF):

And using them, build a complete surface grid:

Having a surface grid, we can expand each tile separately, and get the original surface:

The library code is available on the githaba. The convenience of this method is that the input parameters for the inverse transformation are visually definable characteristics of the label (corners and upper, lower points), which allows you to fully automate the process.

The next part is devoted to the definition of markers. The working code is available only partially in the branch on github , because A really working solution is covered by hacks and shamanism, so putting such tin into a githab simply does not allow conscience.

Stage one - we convert the image into a black and white version.

Then you need to get the contours of the bottle with the label. For this we use the transformation sobel. In short, this filter first blurs the image and then subtracts it from the original one. As a result, the uniform areas remain dark, and the edges (changes) are bright.

The next thing to do is to identify the two most visible vertical lines that, if lucky, are the edges of the bottle. In this case, this is the case, but if you photograph a bottle standing next to other bottles, then this is no longer the case.

To define these lines, use the Hough transform.. The essence of the technique is that we take a lot of lines going across the entire screen, and count the average value of the pixels (for example, we take the lines going from the top of the picture to the bottom). We transfer these values to a new coordinate plane and get something like a heat map. We are looking for two extremums on this heat map - they are the side lines.

The diagram below shows how the left line goes to a point on the new coordinate plane:

With ellipses it is a bit more complicated, but knowing that the Hough transform can be applied to any mathematically defined curves, we will use this method again, but this time we will look for a set of elliptical curves .

But first you need to bring the problem to a two-dimensional form. Knowing that the bottle is centrally symmetric, we take the central axis for the Y coordinate, and one of the sides for the X. As the values on the new coordinate plane, we take the set of ellipses built between the central axis and the side. This is possible due to the fact that an arbitrary point of the side and the central axis have only one connection method. Perhaps this is not very obvious at first glance, but much easier to understand if we turn to the parametric formula of an ellipse:

x = a * cos (t)

y = b * sin (t)

In exactly the same way, we find two sought extrema that define two ellipse labels (curves AB, FE). Now that we have all the necessary label parameters (side curves, as well as upper and lower ellipses), we can apply the algorithm from the first part of the article and perform the inverse transformation.

What can be improved. First, the algorithm does not take into account the distortion of the perspective of the ellipse itself, as a result, the side label fragments stretch a little more than they should. To make a correction, you need to know the real angle of the camera, or, at least, use the most typical for the phone (you can choose empirically).

Secondly, the Hough transformation is rather unstable working in difficult conditions — for example, when bottles are placed next to the frame and the edges of the bottle of interest may be incorrectly determined.

Third, if the label is not rectangular (for example, elliptical), then the markers will be defined incorrectly, and the transformation will only distort the image more.

In practice, it is much more interesting to use a neural network to define markers, since it can be trained using complex examples, so that, at a minimum, the algorithm does not perform the transformation, if the markers cannot be determined. But so far I have not tried to use neuron for this task, so maybe this will be the topic of a separate article :)