Recognition of the Passport of the Russian Federation on a mobile phone. (UPD: 03/28/2015 + posted the program on the App Store)

    Today it is difficult to find a person who at least once in his life has not directly or indirectly encountered document recognition. Indeed, when in the world for the performance of any arbitrarily serious matter, an identification is necessary, we hear “You can have your passport” so that you can once again enter your data into the computer to check if you are allowed to enter, whether there are any unpaid debts and etc.

    Of course, giants in the field of data recognition could not get past such a well-posed task in our century of global automation. Today, there are many different programs and hardware-software systems (both from large companies and from relatively new players in this market) that allow us to solve this specific practical problem. At the same time, despite the local differences of all the proposed solutions (someone better recognizes, someone has a more thoughtful and modern interface, someone is simpler and more understandable in integration, someone is cheaper or more expensive) globally, all existing software solves the problem the same: obtaining a passport image using a scanner and subsequent recognition on a personal computer.

    It would seem that the problem is solved! But how often do we see such “smart” passport recognition solutions in life? Alas, in post offices, many banks and even police stations (which deal with passports, probably more often than anyone else), passport data are still entered manually. What is the stumbling block? Why are such a reliable and high-quality solution to a specific application not used everywhere?

    To understand the essence of the problem, we turn to another example of the development of innovative technology that is not directly related to recognition tasks - digital photography. Let's remember the 90s, when the first consumer digital cameras began to appear on the market. It would seem that here it is happiness: no film, instant viewing of the taken pictures, ease of storing photos - take and take everything in a row for your pleasure. In practice, for the most part, people, as before, used cameras not too often: on vacation, celebrations, and memorable events. But the real boom in photography occurred at the time the camera appeared in the smartphone. Digital photography immediately healed a second life and gained immense popularity. And many other technologies in completely different areas survived the same consideration: maps and navigation, Wi-Fi,

    Let us now return to the recognition of documents and try to draw a parallel. Maybe the small popularity of passport recognition systems is connected precisely with the inconvenience of the process itself, and not with quality? Indeed, it is difficult to imagine the local police who laid out a laptop and scanner on the lawn and checked the documents of the migrant. It would be a completely different matter if one could recognize and verify the passport directly in the hands with the help of some compact improvised device (for example, a smartphone). So we had the idea to write an ID-document recognition program for a mobile phone. And of course, we decided to start by recognizing the passport of a citizen of the Russian Federation.

    To make reading more interesting, we’ll show our application in action. Federal Law 152-FZforbids us to publish images of these passports. Therefore, for demonstration purposes, a synthesized image of a passport printed from Wikipedia is used .

    Formulation of the problem

    So, the final goal is to recognize the passport of a citizen of the Russian Federation on a mobile phone. But in this setting, the task sounds very vague. Let's clarify the essence of the problem by setting the constraints “on the axes”, forming some semblance of technical requirements.

    Target platform. You need an application that can run on modern Android devices, as well as an Apple iPhone version of at least 5s. Such restrictions appeared after analyzing the current situation in the mobile device market. In this case, an important element is the writing of recognizing on a mobile deviceprograms, and not a layer program that receives images, sends pictures to the cloud and gets back the result. And the point here is not at all in the slow mobile Internet, as it might seem at first glance. It's just that in our country the federal law “On Personal Data” (152-ФЗ) is in full force, which strictly regulates the processing of personal data. In accordance with the law, the requirements for all private and state-owned companies and organizations, as well as individuals who store, collect, transmit or process personal data (including surname, first name, middle name) are substantially increasing in Russia. Therefore, from the point of view of the law, the faster any recognition program forgets personal data, the better (and even more so it is not worth sending anywhere the data itself or the images of the passport).

    Recognition object. In most applications, the client mainly requires a series-passport number, photograph, last name, first name, middle name, gender and date of birth. All these data are located on the third (in accordance with the numbering) page of the passport. Therefore, we first solve the problem of recognizing the above "main" fields. That is, we will solve the problem of recognizing the third page of the passport of the Russian Federation.

    Input data.Unlike the classical approach (recognition of a scanned image), a smartphone allows you to get a video sequence. Combining information recognition results from different frames can significantly improve the quality of the system as a whole. True, this advantage is true only under the condition that it is possible to process individual frames very quickly, which smoothly takes us to the issue of performance.

    Performance.According to competitors, to date, the best passport recognition software can cope with this task in about 1-3 seconds on an average computer, excluding scanning. Therefore, we set ourselves the goal of solving this problem on a mobile phone in no more than 3 seconds. At the same time, we want to process data with a speed of at least three frames per second on devices such as the Apple iPhone 5s. In other words, the average processing time of one frame should not exceed 0.3 seconds. If we recall that 1 frame consists of approximately 2 million pixels, and recognition is performed on devices much weaker than the average PC (see table 1), then the task is more than unsolvable. I admit, we had to sweat a lot during code optimization and the development of fast algorithms before we achieved such speed. Later we will write a separate post about approaches to optimizing recognition programs on mobile devices. Now I can only remember: a year ago we replied “Challenge accepted” to this bold statement about speed.

    Quality. The quality of recognition is often a decisive factor when choosing a particular system. Therefore, at the very beginning of development, we set ourselves a pretty high bar - in the first version of the product, 95% of passports should be recognized correctly (excluding passports that cannot be automatically recognized). In general, assessing the quality of such recognition systems is a serious task that we want to talk about in future posts on Habré.

    New problems with recognition on a smartphone

    As our colleagues from various organizations have repeatedly emphasized, the task of recognizing the Passport of the Russian Federation is extremely difficult. Moreover, the complexity is caused by both various protective elements of the passport form itself (guilloche background, holographic elements, the presence of a glossy film) and high variability of filling (inaccurate printing of personal data, the use of non-standard fonts, the presence of mechanical damage).

    However, when recognizing the passport on the phone, all of the above problems are supplemented with fundamentally new ones that have not previously been encountered when working with the scanner:

    • Projective image distortion of a document. When shooting with a camera, the angles and their relations, as well as the proportions of objects, change depending on the shooting angle. This leads to the fact that classical algorithms (search for reference lines, selection of text fields, etc.) cannot be applied directly, but require preliminary projective normalization of the image.
    • Glare. Glossy film, holograms and other security features that help us distinguish a real passport from a fake one interfere very much with recognition (partially destroying information). Try to look at your passport even through the camera lens (for example, using the standard camera application of your smartphone) from different angles, and you will immediately understand the depth of the problem.
    • Uneven lighting. Unlike the scanner, which uses its illuminator, when photographing a document, light comes from external sources in an uncontrolled way. This raises a number of problems such as shadows and inaccurate color reproduction.
    • Defocus and blur. This occurs due to the constant displacement of the camera during recognition (after all, shooting takes place without using a tripod).
    • Digital noise. Often occurs in low light situations. At the same time, the lower the illumination, the greater the influence of digital noise.

    In addition to the new problems that "pop up" at the stage of image acquisition, no less serious difficulties await us further. So, for example, the task of crude localization and identification of a document in a frame becomes relevant. Indeed, in contrast to the scanned image, when recognizing a video sequence, you must be sure that the target document is present on the next frame. At the same time, it is usually necessary to solve this problem before projective normalization.

    We move on. For precise positioning of text strings, it is necessary to find the borders of the passport and determine the projective basis. For this, it is necessary to distinguish linear boundaries, angles, fillets, and other primitives under noise conditions; generate and select options for the borders of the document, the most appropriate models. After determining the projective basis, it is necessary to projectively correct the image area, and to position the fields.

    Now we are ready for recognition. Data recognition requires special methods of optical recognition of both individual characters and text fragments. A feature of video stream processing is a rather low initial resolution (not exceeding 150-200 DPI) in the presence of noise and distortion, in particular glare and flare, image defocusing and blurring.

    After all the difficulties associated with processing a single frame are successfully overcome, new tasks arise related to the recognition of the entire video sequence - this is a contextual analysis and integration of the results. This topic is very interesting, and we will certainly devote a single article to it in future posts. For now, we restrict ourselves to announcing the existence of such tasks.


    Thus, solving a seemingly “simple” problem, recognizing a Russian citizen’s passport, we faced more than a dozen interesting tasks, both in the field of computer vision and the architecture of effective software and writing high-performance programs for mobile devices.

    This post is more of an introductory one and tells dear readers as a whole about our tasks, problems, interests. On specific scientific and technical achievements, we will certainly continue a series of publications on Habré, in which we will talk about solutions to individual subtasks of document recognition (and not only) on mobile devices.

    As for the ready-made solution for recognizing the Russian passport on a mobile device, we are pleased to inform you that you can download the Demo recognition program for Android right now (Smart PassportReader on Google play ) and for iOS ( Smart PassportReader on the App Store ). And if by occupation you are interested in the SDK of our product in order to “touch” it live and try to integrate it into your mobile applications - write to us at and we will be happy to tell you how to do this, as well as answer other questions you are interested in .

    And at the very end, a few screenshots of our program for the Apple iPhone

    Also popular now: