Is it possible to stuff number recognition in any tamagotchi?

    We tell about recognition of numbers on Habr a long time ago. I hope it’s even interesting. It seems that the time has come to tell how it is applied, why it is needed at all, where it can be shoved. And most importantly - how this has changed in recent years with the advent of new machine vision algorithms.



    Oddly enough, number recognition is used not only in traffic control. Dozens of applications. And each task has its own specifics, target quality, input images, speed requirements, etc. Let's start with the simplest and most commonplace:

    Traffic control


    First, it’s worth telling a few stories about how traffic cameras are used. We ourselves are very superficial to this topic, but we have communicated with a large number of large firms, more or less understand the situation in the industry.

    For example, a simple question. Who, in your opinion, installs cameras, monitors their work?


    Speed ​​control

    Oddly enough - this is the prerogative of private companies. Traffic police inspectors receive fines in their database, check that everything is filled out correctly and send it to us. The profit from fines is divided 50/50. Half goes to the concessionaire who serves the camera, half to the budget.

    A concession is a contract for the installation and maintenance of a number of cameras. Typically, regions offer a couple of dozen cameras as a lot. But it happens in different ways. One example of the most epic concession is how the Cossacks fine (by the way, this is not the only case).

    To date, there are several violations that are detected (the list is not complete, I list the main ones):

    1. Speed ​​control
    2. Traffic light control
    3. Pegging control
    4. Movement Control


    Moving control

    With the exception of Moscow, the tidbit is speed control. There is most of the stream of fines. Cameras at traffic lights and lanes quickly stop making a profit, people start to drive normally after the first fine, especially if it is a small region. But speeding brings a steady profit, especially on large routes.
    Just a couple of years ago, all speed cameras were radar. But the radar is the most expensive thing in them. And for about 15 years everyone has been trying to get rid of him.


    Speed ​​control from a tripod

    And only about a year or two ago they licensed the first radarless system. Now there are already several of them. To be honest, I don’t know how many there are today. It seems to me that three, but I could be wrong. Based on the fact that a lot of new number recognition algorithms have appeared, the quality has improved, the price of good iron has decreased - now there is a global replacement, increase, updating of the camera fleet. There are many points where you can put the camera and pay off. Most likely 80 percent of the installations will go over the next five years to radarless systems.

    Given that cameras are set by concessionaires, the topic is terribly profitable and liquid.
    The cameras themselves can be placed in very different places. It can be a tripod, it can be a special farm, there can be stationary bedside tables near the road, or there can be simple posts.

    About 10 years ago, the algorithms were weaker, the numbers were recognized worse. Now even on a mobile phone you can get good quality. The variability of installation locations is increasing.
    What does it mean “were recognized worse”? Actually, on a well-built camera with proper lighting, you can recognize about 95-98% of the passing numbers. This was before neural networks, 5-7 years ago. Now the percentage is slightly higher, but not much. All unrecognized numbers are stained with dirt, snow, and spoiled. 90% of what the system does not recognize - the person himself does not recognize. But the recognition on cameras in poor conditions has greatly improved. Hanged on a curved pillar on the side, gave little light, etc. Now all this works with a minimum number of settings.

    Of course, in a blizzard, recognition quality drops to zero, but it doesn’t bother anyone.
    Despite the difficulty of entering the market and its regulation by the state (certification of a new number recognition system is at least half a year), it is oversaturated with decisions.

    Parkon


    The peculiarity of Parkonov is that the quality of images is worse than ever. The shooting is either from moving cars, or pedestrian inspectors. The data that comes to recognition may look something like this (recorded from the registrar, recognized by our algorithm, so this has nothing to do with Parkon, video for example):


    Due to the low quality of recognition, it is necessary to constantly check that the system recognized (in real parkon the camera is much better, the light is on, etc.). Plus to check whether the same car is in the parking lot. For example, in DIT, as I heard, algorithms are still of the last generation. The processing of fines a year ago was carried out by a workshop of almost 100 workers who verified all the data with their eyes (all these were rumors circulating in the region that may not correspond to reality).

    By the way, for mobile inspectors the data is also verified. This is an additional protection so that he does not write a fine to his foe.

    With the new generation algorithms, verification is no longer so relevant, so I really hope that they have already updated their pipeline.

    In any case, the main difficulty is not in recognizing the number, but in making the system stable in terms of the information received and eliminating errors.

    Recognition on smartphones / mobile devices


    From pedestrian inspectors, you can smoothly move into a wider category - “recognition on smartphones”.



    Car plate recognition is divided into two ways. The first method is recognition of numbers on the server. The second is on the device. Recognition on the device is more difficult. Two years ago it was unrealistic to do it well .

    We will return to comparison of recognition. First of all, I will talk about where this recognition is needed:

    The first task is to optimize the input data. It occurs when an employee must fill out some kind of document on the street. It can be a traffic inspector, an insurance agent, a car dealer. The employee takes photos of the car and press the button. In documents the car number is automatically recognized.

    The task is not good news, but they say that in winter it helps people.

    The second problem to be solved when recognizing numbers on a smartphone is the control of cars in private areas. For example, parking, unloading areas, etc. An inspector walks there, who immediately after the picture displays the necessary information on the tablet.
    Recognition feature on smartphones can be called the next moment. The person himself edits the number if it is recognized incorrectly (which does not happen in Parkon). And the psychological threshold of correct recognition for the first task is somewhere around 80%, so even past-generation algorithms did a good job. We tried our past algorithm to use several of these tasks. It works fine, but the economic effect is doubtful. Pilots do not fire, only once was used when the system customer ordered such a function, and the contractor asked us. And even that is all for show.

    For the second task, it is good to have 95%. There are only modern algorithms.

    Some level of error is allowed. People still recognize everything manually. And if something is wrong, then they check and rule.

    Barriers


    Whether this is good or bad, but Russia is a country of barriers. And barriers can be absolutely everywhere. It can be parking, it can be entrances to the territory of enterprises, it can be entrances to supermarkets, it can be an entrance to your summer cottage or house territory.



    Wherever control of the territory and speed of response is required, it is easiest to stick a barrier. But, if you start to understand, then it turns out trash.

    No, even the most modern recognition algorithm does not give 100% quality. Of course, now there has been a huge breakthrough in recognition. And for barriers, the quality jumped. It was 93-95 percent, it became 97-98. The growth occurred due to those places where there was dirt and poor installation: not enough light, large angles, poor combination of light filters. Now installation is much simpler. This should no longer be a super-mega installer.

    In reality, units of solutions are based on new technologies. Usually worse.

    Not to let 2-5% of cars into the territory is unacceptable. Need someone who can adjust the recognition of the number. Security guard?

    Large warehouses where the territory protected a long time ago went this way. The recognition of their number is tied to a neighboring booth with a security guard. The price of such a solution is usually 50-100 thousand. Depending on the hardware, camera, algorithm, light. Room recognition conditions are close to ideal. The installer who understands the topic usually installs such a solution. His services can increase the cost of the project by a dozen or two thousand.

    And this is without a barrier, only for the recognition complex!

    The second way that is popular in Moscow yards is a barrier on a phone call. Call the number, and if your number is in the database, they will open it for you. But then again. Someone forgot the phone. Someone has discharged. Guests have come to you. As a result, a pass by number is often added. And since it’s difficult and expensive to hang a good camera + most of the solutions are based on old-generation algorithms - the guard is looking at the picture.

    In addition, the theme with the phone often does not work - people arrange private parking from the yards where they have access, to the detriment of neighbors.



    There are many companies specializing in installing barriers in the yards. But almost none of them are trying to pick up recognition on objects. Although the market is huge. For those half a year since we made the new algorithm, we tried to launch several pilots, one of them even now successfully works. The ideal scheme for any yard is a phone / card + number recognition + access to the camera for recognition

    Statistics


    Another interesting application for barriers is statistics by numbers. It is sometimes used by shopping centers. To know how often people come (they often still recognize which cars).

    Can be installed in car washes to control cars. In car services, at the box office, etc.

    Server


    Server recognition is very common. Everything except traffic rules can be recognized on servers. Here I’ll briefly say what tasks only use server recognition, and a little more general story about general use will be below:

    • Blurring the numbers of cars in the pictures. For example, on ad sites.
    • Parsing ad databases. For those who do not slip the numbers.
    • Connecting the server to several cameras and recognizing the stream from them. Statistics / Barriers / Observation

    One of our backup servers on jetson (and it’s good to have a good piece of hardware!):



    How and where should the algorithms work?

    Enumerating the application, I abstracted from where and how to recognize everything. Where to push the cascade of grids and where to process.

    In principle, if you generalize everything, there are not many options:

    • Machines with x86 calculator. Good for past generation algorithms. They are not adequate for the current, but high performance. It depends, of course, on the machine and how to optimize the algorithm. It turned out that the search for the number is somewhere at the level of a dozen or two fps, and the recognition of the number is somewhere at the level of ¼ second. Naturally, the algorithms can be different. Someone faster, someone better. All this is tradeoff. Hundreds of FPS can be made if, for example, the detection areas are intended. These are often set for recognition on the road, for recognition on barriers. They always have energy problems. Most likely this is a dying class of devices.
    • Machines with NVIDIA graphics cards . This does not put on the street. Yes, and power consumption is high. But for servers they use them. Fast, profitable. But they are applicable where there are Internet tasks.
    • ARM calculators. On them, you can well deploy recognition. Let the speed is not very high. But on RPi 3 we were able to achieve 5FPS for finding the number and a half seconds for recognition. Considering that RPI is not the best calculator, this is not bad for the quality that can be dragged.
    • Jetson . This class of calculators should be separately mentioned. Jetson is ideal for recognition tasks. Performance per unit of power is phenomenal. Jetson's performance can be compared to any i5-processor. Jetson consumption alone is 10-15W. Many machine vision systems for implementation are now doing on them.

    How to use all this.

    In reality, all the questions above rest on one problem. What to do with recognition errors? What to do with quality?

    If we fix the recognition algorithm, then the quality of the system is determined by the quality of the equipment: light, lenses, installation point.

    Optimizing the quality of this bundle is a task known to all installers. There is almost nothing new to come up with. Each office that is involved in the installation of cameras has its own stack of developments. The stack to be applied depends on how much money the client has for the task.
    To select errors - usually plant operators. All fines go through them. Parking control systems also go through the operators. One operator can be enough for a couple of dozen barriers.

    If recognition on smartphones - the user controls. If recognition is only for the sake of statistics, they control it. 2-3% of losses is considered an acceptable value

    . What have we come to

    ? We constantly received offers to test our algorithm / attach it to something, or use it. Plus they did a bunch of experiments themselves. Until the full implementation, it hasn’t reached yet, but heels were struck by other machine vision systems, which are already in production.

    As you can see from the description above, number recognition is an unstable thing. There are always errors. Even if 1%. I wanted to come up with a way for the algorithm to become absolutely oak, for it to work in any task, without any special settings and additions.

    But even at first we began to do things wrong. Since we had a large supplyquality and good speed, one of the first thoughts was: “why not run it on an RPi and hang it on every barrier.” The cost of iron for this approach was minimal.



    They took an RPi, unsoldered the barrier control board, stuffed it into a box.

    But, understood one simple moment. Even it’s difficult for us to put the camera on the barrier so as to rid it of all possible artifacts. We can overcome headlights, the sun, the wrong installation angle or blur. But how will all those factors be overcome by a person who is trying to set up number recognition for the first time ?!

    It’s difficult to set up a picture on a device buried somewhere deep in the barrier. You need to connect a laptop, build a camera.

    Tried with bluetooth. Same. Even if you display the picture on the phone - it does not greatly simplify. You can put it perfectly. But still, after some time, the device stops working. You need to re-climb into the intestines and reconfigure. And often at random. At some point you knock down the camera - and for 10 minutes you think what's wrong.

    Or the system works for two days, and suddenly stopped. What happened

    When we did recognition from mobile phones about 3 years ago - there were no such problems. Send errors? You do filtering that was not recognized, you try to pull out the error pattern - you try to finish the algorithm, or to retrain the operators. As a result, 1-2 days and the error is fixed.

    This is clearly nicer than crawling under each barrier.

    As a result, we realized that the most stable is recognition on the server. On the server it is much easier to understand the cause of errors, compensate, show. A tool that allows you to build statistics for the day, find the time when the system makes a mistake and does not work - it is written in one hour. Recognition graphs:



    Blue graph - recognition quality over time. Red - percentage of frames with numbers in time. Has the quality slipped? An error is issued to the operator console. And for a quick understanding of the error, a map of the latest recognitions is displayed: It is



    immediately clear that some of the numbers are not caught. What's the matter? We look closer: The



    backlight is gone, the number is not readable!

    Moreover, if something does not work, then the error can often be fixed on the server. If the camera has turned to the area where it does not work, redo the homography. If the camera is out of focus, then for many models you can rebuild them remotely.

    If there is a problem with the installation, then you can give a clear indication: “from 10 to 10:30 the sun shines in the camera”. If the camera is installed without Internet access and statistics, then catching errors of such a plan is very difficult and dreary. If there are 2-3 such errors, the quality may fall below 95% and the client will be disappointed in the system.

    At the same time, the system itself is made easier. You can stick an RPi + Lan camera + key:



    Or you can stick an arduino with GPRS in general, which tells the server when to process the camera, get a response and open the barrier:



    (Yes, Arduino is an expensive and not optimal board. But when you need 10 pieces + stability, your development price will be more expensive).

    This quality control rule applies to any DeepLearning system. Mistakes will always be. And the only way that the customer liked the system is to lay down ways to catch and deal with errors before they even appear. Collecting a database, monitoring statistics online, retraining. All this is better to lay so as to catch errors and compensate for them before the client managed to understand that something is not working and take offense.

    When we recognized goods on the shelves, all recognition went to the verification of people. When we recognized containers - for operator verification. Etc. etc. Not everything can be sent for verification, not always. But you always need to come up with a way that will make it clear that the algorithm has not collapsed.

    We had a wonderful experience when one of our customers shipped a semi-raw system in which there were several hundred modules (not number recognition, a rather distant task). For this system, we developed the mathematical core.

    The system even worked well. But the users of the system were not very educated people: they began its direct sabotage.

    Our customer was a very reasonable person - the system had the ability to remotely retrain, reflash, change the algorithm. And he had a whole support team. As a result, sabotage was very timely cut off. Make algorithms for its catch and processing.

    With autonomies the same. The quality of the system needs to be monitored constantly. If there are problems, the customer can refuse the system, even if these are problems on his side. Of course, there are always borders that cannot be crossed. But a simple addition of monitoring allows you to solve 80% of the problems proactively, by competent communication with the customer.

    And it doesn’t matter, this is a server, a camera on the highway, on a home barrier, or in a car wash. The main thing is to properly configure monitoring so that it minimally distracts, but controls as much as possible.

    Also popular now: