Number Recognition: A to 9

    Already a couple of times on Habré there were discussions on the topic of how number recognition now works. But articles where different approaches to recognition of numbers would be shown, were not yet on Habré. So here we’ll try to figure out how it all works. And then, if the article arouses interest, we continue and lay out a working model that can be investigated.


    Software VS Iron

    One of the key parameters for creating a recognition system is the hardware used for photography. The more powerful and better the lighting system, the better the camera, the more likely it is to recognize the number. A good infrared (IR) floodlight can even illuminate the dust and dirt present in the room, overshadowing all interfering factors. I think someone received a similar “letter of happiness”, where nothing is visible except the number.


    The better the shooting system, the more reliable the result. The best algorithm without a good shooting system is useless: you can always find a number that is not recognized. Here are two completely different frames:


    In this article, it is the software part that is considered, and the emphasis is on the case when the number is poorly visible and with distortions (simply removed from the hands of any camera).

    Algorithm structure

    Preliminary number search - finding the area in which the number is contained
    • Number normalization - determining the exact number boundaries, normalizing the contrast
    Text recognition - reading everything that was found in the normalized image
    This is the basic structure. Of course, in a situation where the number is linearly located and well lit, and you have an excellent text recognition algorithm at your disposal , the first two points will disappear. Some algorithms may combine the search for a number and its normalization.

    Part 1: pre-search algorithms

    Border and shape analysis, contour analysis

    The most obvious way to highlight a number is to search for a rectangular outline. It works only in situations where there is a clearly readable contour, not obstructed by anything, with a sufficiently high resolution and with a smooth border.


    The image is filtered to find the boundaries, after which all the found outlines are selected and analyzed . Almost all student work with image processing is done in this way. Examples on the internet is full . It works poorly, but at least somehow.

    Analysis of only part of the boundaries

    A much more interesting, stable and practical approach is where only part of the framework is analyzed. Contours are selected, after which all vertical lines are searched. For any two lines located close to each other, with a slight shift along the y axis, with the correct ratio of the distance between them to their length, we consider the hypothesis that the number is located between them. In essence, this approach is similar to the simplified HOG method .


    Bar graph analysis of regions

    One of the most popular approach methods is the analysis of image histograms ( 1 , 2 ). The approach is based on the assumption that the frequency response of the region with the number is different from the frequency response of the neighborhood.


    Borders are highlighted on the image (highlighting the high-frequency spatial components of the image). The projection of the image on the y axis (sometimes on the x axis) is built. The maximum projection obtained may coincide with the location of the number.
    This approach has a significant minus - the size of the machine should be comparable to the frame size, because the background may contain inscriptions or other detailed objects.

    Statistical analysis, classifiers

    What is the minus of all previous methods? The fact that the real rooms, dirty with dirt, have neither pronounced boundaries, nor pronounced statistics. Below are a couple of examples of such numbers. And, I must say, for Moscow, such examples are not the worst options.


    The best methods, although not frequently used, are methods based on various classifiers. For example, the trained Haar cascade works well . These methods allow you to analyze the region for the presence of relations, points or gradients characteristic of the number. The most beautiful method seems to me based on a specially synthesized transformation . True, I did not try it, but, at first glance, it should work stably.
    Such methods allow you to find not just a number, but a number in difficult and atypical conditions. The same Haar cascade for the base collected in the winter in the center of Moscow yielded about 90% of correct number detections and 2-3% of false capture. No algorithm for detecting boundaries or histograms can give such a quality of detection for such poor pictures.


    Many methods in real algorithms directly or indirectly rely on the presence of number boundaries. Even if the boundaries are not used in the detection of numbers, they can be used in further analysis.
    Unexpectedly, for statistical algorithms, even a relatively clean number in a chrome (light) frame on a white car may turn out to be a difficult case, since it is found much less often than dirty numbers and may not occur a sufficient number of times during training.

    Part 2: normalization algorithms

    Most of the above algorithms do not accurately detect the number and require further clarification of its position, as well as improving the quality of the picture. For example, in this case, you need to rotate and trim the edges:

    Rotate the number in horizontal orientation

    When only the neighborhood of the number is left, the selection of borders starts working much better, since all the long horizontal lines that we managed to select are the borders of the number.
    The simplest filter that can distinguish such lines is the Hough transform :
    The Huff transform allows you to very quickly select two main lines and crop the image on them:

    Contrast increase

    And it is better in one way or another to improve the contrast of the resulting image. Strictly speaking, we need to strengthen the region of spatial frequencies that interests us:



    After the turn, we have a horizontal number with inaccurate left and right edges. It’s no longer necessary to precisely cut off excess, just cut the letters in the number and work when recognizing them.

    (The binarization operation has already been carried out in the figure, that is, some kind of rule for dividing pixels into two classes has been used. When dividing the number into characters, this operation is not necessary at all, and in the future it may turn out to be harmful).

    Now it’s enough to find the maxima of the horizontal diagram, these will be the gaps in letters. Especially if we expect a certain number of characters and the distance between the characters will be approximately the same, then splitting into letters according to the histogram will work fine.
    It remains only to cut out the available letters and go to the procedure for their recognition.

    Weak spots

    With significant contamination of the number, periodic maxima when divided into symbols may simply not appear, although the symbols themselves can be visually readable.
    Horizontal border numbers are not always a good guide. Numbers can be bent regularly (Mercedes C-class), can be carefully recessed into an inappropriate almost square recess for a room on American cars. And the upper limit of the rear number is simply often covered by body elements.
    Naturally, to take into account all such problems - this is the task for serious number recognition systems.

    Part 3: character recognition algorithms

    The task of recognizing text or individual characters (optical character recognition, OCR) is difficult on the one hand, and quite classic on the other. There are many algorithms for solving it, some of which have reached perfection . But, on the other hand, there are no the best open-source algorithms. There is, of course, Tesseract OCR and several of its analogues, but these algorithms do not solve all problems. In general, text recognition methods can be divided into two classes: structural methods based on morphology and contour analysis dealing with a binarized image, and raster methods based on direct image analysis. In this case, a combination of structural and raster methods is often used.

    Differences from the standard OCR task

    Firstly, in any case in Russia, the standard font is used in car numbers. This is just a gift for an automatic character recognition system. 90% of the OCR effort is spent on handwriting.
    Secondly, dirt.


    Here the absolute majority of the known methods of character recognition have to be thrown away, especially if the image is binarized along the way to check the connectedness of areas, to separate characters.

    Tesseract OCR

    This is open source software that automatically recognizes both a single letter and text at once. Tesseract is convenient because it is available for any OS, it works stably and is easy to train. But it works very poorly with washed out, broken, dirty and deformed text. When I tried to make number recognition on it, only 20-30% of the numbers from the database were recognized correctly by force. The cleanest and most direct. Although, of course, when using ready-made libraries, something depends on the radius of curvature of the hands.


    A very easy to understand method of character recognition, which, despite its primitiveness, can often defeat not the most successful implementations of SVM or neural network methods.
    It works as follows:
    1) we pre-record a decent amount of images of real characters already correctly broken down into classes with our own eyes and hands
    2) we enter a measure of the distance between the characters (if the image is binarized, then the XOR operation will be optimal)
    3) then, when we try to recognize the character , in turn, calculate the distance between it and all the characters in the database. Among the k nearest neighbors, there may be representatives of various classes. Naturally, representatives of which class are more numerous among neighbors, the recognizable symbol should be assigned to that class.

    In theory, if you write a very large base with examples of characters shot at different angles, lighting, with all possible scuffs, then K-nearest is all that is needed. But then you need to very quickly calculate the distance between images, and, therefore, binarize it and use XOR. But then it will be in the case of dirty or worn numbers. Binarization completely unpredictably changes a symbol.
    The method has one very important advantage: it is simple and transparent, which means it is easily debugged and tuned to the optimal result. In many cases, it is very important to understand how your algorithm works.


    Often, the methods that are used in image recognition are built on empirical approaches. But no one forbids using the mathematical apparatus of probability theory, which was simply polished in the problems of signal detection in radar systems. We know the font on the car number, the camera noise or dust on the number can be called a Gaussian with a stretch. There is some uncertainty about the location of the symbol and its tilt, but these parameters can be sorted out. If we leave the image not binarized, then we still do not know the amplitude of the signal, i.e., the brightness of the symbol.
    I really do not want to go into a rigorous solution to this problem in the framework of the article. In fact, all the same, it all comes down to the operation of calculating the covariance of the input signal with a hypothetical one (taking into account the given displacements and rotations):
    X is the input signal, Y is the hypothesis. The designation E is the mathematical expectation.
    If you need to choose from different symbols, then hypotheses on rotation and displacement are built for each symbol. If we know for sure that the input image contains a symbol, then the maximum covariance for all hypotheses will determine the symbol, its offset and slope. Here, of course, the problem arises of the proximity of images of various symbols (“p” and “c”, “o” and “c”, etc.). The simplest thing is to enter a weight matrix of coefficients for each symbol.
    Sometimes these methods are called “template-matching”, which fully reflects their essence. Set the samples - compare the input image with the samples. If there is any kind of uncertainty in the parameters, then we either go through all the possible options or use adaptive approaches, the truth here is to already know and understand the math.
    Advantages of the method:
    - a predictable and well-studied result, if the noise is at least slightly consistent with the selected model;
    - if the font is set strictly, as in our case, it is able to make out a very dusty / dirty / worn character.
    - computationally very expensive.

    Neural networks


    A lot has already been written about artificial neural networks on Habré . Now they are usually divided into two generations:
    - classic 2-3-layer neural networks trained by gradient methods with back propagation of errors (a 3-layer neural network is shown in the figure);
    - the so-called deep-learning neural networks and convolutional networks.
    For the last 7 years, the second generation of neural networks has won various competitions in image recognition, giving the result somewhat better than other methods.
    There is an open base of handwritten numeral images. The results table very clearly demonstrates the evolution of various methods, including algorithms based on neural networks.
    It is also worth noting separately that for printed fonts, the simplest single-layer or two-layer (terminology issue) network works perfectly , which in its essence is no different from template-matching approaches.
    Advantages of the method:
    - with proper setup and training, it can work better than other known methods;
    - with a large training data array, it is resistant to character distortion.
    - the most difficult for the described methods;
    - diagnosis of abnormal behavior in multilayer networks is simply impossible.


    The article examined the basic methods of recognition, their typical glitches and errors. Perhaps this will help you make your number a little more readable when traveling around the city, or vice versa.
    I also hope that I was able to show a complete lack of magic in the problem of recognizing numbers. Everything is absolutely clear and intuitive. It is not a terrible task for a student's term paper in the corresponding specialty.
    And in a few days ZlodeiBaal will post a small recognizer by numbers, based on our work on which this article was written. She can be tormented.
    Z.Y. All numbers that are given in the article are extracted from Google and Yandex by simple requests.


    2) A Real-Time Mobile Vehicle License Plate Detection and Recognition Kuo-Ming Hung and Ching-Tang Hsieh - a bar chart approach for license plate recognition
    3) Robust License Plate Detection Using Covariance Descriptor in a Neural Network Framework Fatih Porikli, Tekin Kocak - a neural network approach when searching for a number
    4) Automated Number Plate Recognition Using Hough Lines and Template Matching Saqib Rasheed, Asad Naeem and Omer Ishaq - searching for numbers through vertical line HOG descriptors
    5) Survey of Methods for Character RecognitionSuruchi G. Dedgaonkar, Anjali A. Chandavale, Ashok M. Sapkal - a small review article on the recognition of beech and numbers
    7) Textbook “ The Basis of Image Processing Theory ”, Krasheninnikov V. R.

    Also popular now: