Computer vision in industry. Lecture in Yandex

    Machine learning continues to penetrate industries outside the internet industry. At the Data & Science conference “The World through the Robots ” Alexander Belugin from the company “Tsifra” spoke about the successes, challenges and current tasks along the way. The introduction of technologies such as computer vision, requires serialization and product approach, allowing to reduce the cost of single implementations. The fact is that there are many types of tasks in production. From the report you can learn about the products, global trends and the experience of the team of Alexander in the areas of industrial safety and process automation.


    - Good morning. I am glad that everyone came to this interesting conference. I will first briefly talk about the company “Tsifra”, then a little about the tasks that are in the industry, and about typical ways to solve such problems. These are tasks without robots, not assembly, but different process productions. In the end we will look a little at our experience.

    We have been working in the market for a year now and as our goal we see the full automation of industrial production, which will make it possible to achieve a 10-15% increase in their profitability. To do this completely, you need to solve all problems, ending with some kind of joint optimization of all processes, logistics, procurement and production itself, but starting with such basic things as the Internet of things, sensors, information gathering.



    Now it is called the buzzword of digitization. This is the transfer of data about all the processes into digital form, so that later they can be used to increase efficiency.



    Today we are talking more about computer vision. There is also the term "machine vision", which refers to the technique. There are video cameras similar to those used for video surveillance, there are webcams that are used for communications, and there are special cameras in the industry. They differ in that they often do not have a normal Ethernet port, special protocols are used, they can transmit, for example, 750 frames per second and not in burst mode, but continuously, without compression. There are special cameras with special sensitivity in other ranges than optically visible to the eye. There are even cameras that read one lane, make a lot of frames per second, but one pixel wide. Such a camera stands above the conveyor and looks at what is happening there.

    A distinctive feature of the tasks of computer vision is that the output should not be a picture - it does not interest anyone - but a number that characterizes the quality or dimensions of what we observe.



    I want to list a few basic tasks. The first major block is security related. There is a perimeter control, so that nothing is carried out from the enterprise. This is an example of the number of video analytics tasks that have already been solved for 15–20 years, it is getting better every year. If there is a fence and a video camera and someone is trying to climb, then the video analyst will catch it for sure.

    There are more complex tasks - control of movement in some zones. For example, an enterprise can always get burned, end up in the unloading-loading area or on the tracks where the carts travel. There is already a more difficult task, it is necessary to observe narrow restrictions, to understand which paths people can walk.

    Another example of a security-related task is the detection of helmets on their heads when they put cameras on the pads. In Russia, this topic is very poorly sold. When people hear how much such systems cost, they say that we have a regulation, a person should wear a helmet and he will wear it, and if not, he has violated the regulation, his problem. In general, in the world this is a popular solution, which is promoted by both vendors and private companies.

    The next block of tasks is related to accounting. Basically it is the recognition of some stickers. There are special stickers when they print a barcode. Then it works a little easier. There is a bunch of ready-made software for recognizing barcodes or clearly typed characters. Often they try to save money, not to change the coding system, but to use computer vision for recognition. Then it can be, for example, stuffed on a railroad car and poorly distinguishable numbers. Then it's harder, you need to spend more time building all this. It is necessary to combat theft and to control the goods - what came to the company, how it moved inside it and where it eventually came out.



    The last set of tasks is quality control. It can also be divided into two components. One is related to physical quality control. You can watch the dimensions of certain objects. Most often this applies to small things: some kind of caps from milk bags or bottles. They have a fairly simple cheap production process, a lot of rejects, they just need to be filtered out, making them more qualitative is unprofitable.

    And there is a part that is in the picture. There are already more complex tasks. This is when we are trying to understand - is the right action, in fact, done with our product? For example, you need to assess the posture of the mechanic and understand what kind of operation he performs. Or there was a task when there is a platform where drilling rigs are assembled and disassembled. The large field, assemble installation, driven to work, then dismantled and taken away. Putting a person north to track these operations is very expensive, despite the fact that he will be idle most of the time. On a video camera too. On the camcorder, you can watch automatically what events are happening, and track the assembly and disassembly schedule.



    Another example is a screenshot of the partner software, the control of marriages in castings, all sorts of plastic things before painting, is poured in such forms like that. You can detect a marriage with a camera.

    There are two main approaches to solving these problems. Both are invented a long time ago, but the classic one is to work on images with some kind of algorithms.



    Left lever, an attempt to denote it. The right is not so clear. Circles are rolls of steel sheets rolled up, it is not clear what is in the center. The methods are to somehow process the image, increase its contrast, maybe make it two-color, select some edges, edges of objects, try to find the objects themselves, and continue to work with them.



    The second method, more modern, related to data science, is all connected with neural networks. There are certain advantages. The first and most important thing is that in terms of quality it is possible to achieve better results in most complex tasks that are not solved by classical methods. Some sample tasks are listed.

    There is adaptability, it is possible to customize the learning algorithm of a neural network, and transfer from a task to a task not the trained neural network itself, but all together with the algorithm, and then slightly different tasks can be solved with the same tool.

    There are downsides that often play in industry - lack of data. To start detecting defects, if we are talking about classical methods, we need a video stream that shoots finished products, we need to see what defects there are, with our eyes, see them and make our code see them. Enumerate several parameters; manual marking is not required for this. In the case of a neural network, you need a large number of examples, either to collect them manually, or to use modern cunning methods to generate them. This is a long and complex process, which may still need to be repeated from time to time when transferred to other tasks.



    Here is an example of such a picture associated with the detection of defects. One of the popular topics, if you look at what articles are, on the bottom of the picture a small marriage on the structures is shown. With the use of neural networks, it is possible to detect from 92% to 99% of all defects, in different jobs differently, with false positives at the level of 3-4%, quite useful results. The normal level of marriage in different industries from 0.5% to small units of percent. Such indicators are quite suitable to replace the person who detects these defects. Or even improve the results.



    Another example of the tasks associated with digitization, connecting different equipment that does not have digital interfaces, where the green arrow is leverage. A small frame from the workplace of the driller who manages the drilling, he has some levers that he switches. Drilling is important, an expensive process, a couple of million rubles a day. And it does not register in any way, it switches some levers, and there is no record anywhere, or at best, it is in the manual journal that the switching of these levers went. This is critically important.



    This is a furnace that hardens the wire. In this example, the wire is made of gold. The stove is about 25 years old, pure gold comes inside, melts, pours into a thin thread, and heat is tempered, fired, turns into solid material. It is known that sometimes this wire turns out to be reliable, all kinds of chains are woven from it, and sometimes a certain amount of wire leads to a marriage, during the weaving of chains they break, crack, break. It seems that it depends on the heat treatment regimes, given that the raw material varies slightly. The data logger is written here, there is a recorder on the right behind the frame, which can write its own parameters on a roll of paper. There are three parameters: the temperature in the cup in which the gold is melted, the heating temperature is the mode of the furnace, and the speed with which it all passes.

    In order to understand what the defect is connected with and whether it is possible to adjust the stove so that the defect can be reduced, these parameters need to be digitized. How? It has industrial connectors, but it was all 25 years ago, it will be very expensive, either to make a connection using reverse engineering, or to pay the manufacturer of the furnace, if the company has not ruined itself, for connection. Connecting such equipment to a USP or MS system [00:14:24] can cost, for example, a million rubles. Or maybe hundreds of thousands. Especially considering the fact that there are only two such furnaces, not a hundred.



    How can this problem be solved with the tools we talked about? The classic approach with the help of OpenCV in this case does not work, there are too many highlights, the image is not clear, even a person doesn’t really distinguish what the numbers are. OCR, text-ready libraries for text recognition are also not very suitable.

    There is a second option - neural networks. In this case, it works, but it involves a large number of steps. Surely you need to collect some markup for network training, testing, pick up some kind of network, train it. All this needs to be done, tested. I figured labor costs. Here you can discuss, you can do it faster or slower, but in general it turns out 72 hours. At the rate of a good specialist, it can cost so much. At the same time, we received neither infrastructure nor software. We just got a tuned and tested network that recognizes these numbers well.

    Plus approach - it works. Minus - so, too, no one is willing to implement. First you need to learn how to collect this data, and only then to understand whether there is really a relationship between this data and a marriage. If there is, you need to figure out how and what to change in order to reduce the marriage rate. What if it's too much? And to pay for the pilot, for automation and connection, immediately it is necessary to at least that much. Even, most likely, more.



    Therefore, over the past three years, in our experience, such projects have not been able to sell a single one. If this is a pipe marriage where a person stands, then a person is much cheaper. If this is a difficult thing, then risks are too great for customers.

    Conclusion - you need to produce it.

    Now in the world, in the machine learning markets there is a lot of movement towards productisation. All sorts of auto ML solutions that allow partly replacing the data scientist, and finished products or solutions for specific applications. The simplest example is recommendations in e-commerce. There are products for a long time, where data is connected in a standard format, and they themselves issue recommendations.

    We tried to do the same in the field of computer vision. Offer a product that allows you to automate and reduce manual labor by an order of magnitude to connect old equipment with number recognition: dial gauges and others.



    The first task to be solved is to reduce setup costs. When they put the camera, we must allow people to highlight the zone of interest. For example, circle a rectangle like this and say that I want to recognize in this zone.



    The next question is that all the tasks are different, and just in this place you need to learn some neural networks so that they work well here.


    Link from the slide

    We know that there are different neural networks. If we talk about numbers, in smartphones, many have autotranslators: we suggest any text, and it more or less begins to translate it, regardless of the font or angle. There are such solutions, which means that using the board you can train the network, which will work well with any boards. But it will have disadvantages - it will be difficult, difficult, it will work slowly, and since it is universal, the quality of a particular task will suffer. Therefore, we used an approach called Tutor-Student, in which a solution is embedded in a set of powerful networks for specific tasks. For example - separately for texts, for some levers, for dial gauges. There are not many types of such devices. This system works by itself, recognizes something, and then gives the operator the opportunity to make additional markup, look through your eyes and correct 3-5% of the errors that he sees. And then on the basis of such an express method of generated markup to train an already lightweight network, which is adapted to the specific task of the customer, according to his data. This approach can significantly reduce the cost of implementation, while making the quality almost the same as if the work went by hand.



    The lightweight net is needed then, that it is not everywhere in enterprises that it is possible to connect video cameras to some kind of video management system. If there was such a VMS, then you can do everything on the server, where the limit on resources is associated only with the cost. And there are still chips embedded in a video camera like Nvidia Jetson, and individual solutions. In particular, our solution works on Orange PI, it is a type of Raspberry PI microcomputer, and outputs 8–10 frames per second, receiving a Full HD image at the input.



    Next, too, the grocery part. All this data needs to be put somewhere. It immediately provides a set of standard connectors.



    Let's sum up. Such productivization allows moving machine learning and computer vision to the masses, to a wide market, at the expense of low cost and low implementation costs, without the use of expensive specialists and data scientists. I think this is the future, including in the industry.

    Also popular now: