Why is the demonstration of video analytics in offices so different from real work in life?
In this publication, we will talk about the overwhelming market for video analytics, which today is represented by the so-called intelligent video surveillance.
Already by its very scale, one can stick to this direction the concept of “classical”. Moreover, Intel was at the forefront, and this is already a classic. It is on the basis of its open CV library Open CV that video surveillance developers still make their products. For the sake of pride, I must say, programmers in this area - Russian and also located in Russia - in the Nizhny Novgorod branch of Intel. Why are they located? The direction has been closed for several years, the people have dispersed to other firms. Apparently, Intel was the first to feel the futility of its “classic”.
Nevertheless, his business lives and is actively developing. Only the most lazy surveillance system developers did not use Open CV in their “smart” codes. And this library after its death works wonders! As many sellers of CCTV systems claim, it calculates criminal moments, detects fights, identifies abandoned and carried away objects, finds extremists ... And people hops. Billions of rubles are being torn into such tasks for the projects Safe City, Security on the Metro, Operation Anti-Terror, etc. But this is more of a politician, we’ll talk about technology, why this beautiful exhibition wrapper cannot work in practice.
Experts call this direction “hard” because the algorithms of such video analytics are based on the exact setting of parameters and the order of actions: cross a certain virtual line, exceed the detected area, put an object ... There is another direction (not Intel), flexible video analytics, the work of which is not tied formalized tasks, but we’ll talk about it next time.
The principle of classical “hard” video analytics is mostly based on an object detector that localizes closed areas of video detection by common signs of their coexistence. But so far there are no principles to clearly distinguish people from dogs, cats from cars, and a tree branch from a lawn mower. Unfortunately, all this works well only in ideal laboratory conditions, where they try to bypass such slippery moments as:
1. The video detector is based on contrast. The areas merging with the background do not fall under its analysis. This means that it is impossible to somehow predict the main parameters of the object of interest.

The first camera sees a person against a dark background, respectively, only a white shirt is detected, the rest of the body parts merge with the background and are not available for analysis. Considering also the problems of lighting, it is practically impossible to distinguish darker from dark or less dark to dark, because this is at the interference level.

The second sees a person on a white background, respectively, only a dark head and dark trousers are detected. The white shirt is not taken into account at all, because there is no information for the detector. Thus, the first camera will generally see several objects instead of one person.
2. It is practically impossible to filter out such phenomena as a shadow successfully - it painfully takes many forms, constantly running after all of us.

As a result, the proportions of the target are violated, and the computer does not understand that it is a person.
3. Intersecting goals lead the mind "pieces of iron" into complete chaos. To determine that these are two people, not one or not five, today's algorithms just can not.

4. Group goals are indistinguishable in the form of detection from third-party objects, for example, several people and a car.

5. The parameter “object size”, which video analytics demonstrators rely on when proving the ability to distinguish people from machines, is unacceptable in principle for 2D video surveillance.

What is more: a bird or a car?
6. Often you hear such an achievement: but we register with several cameras at once! This, perhaps, should sound like a flaw, because cameras see the object in different ways.

- The first sees a dark smooth nape, the second - a bright face with a long protrusion - a nose.
- The first sees a large object, because the person goes closer to her, the second is small, because man on. Two-dimensional vision has no prospects.
- The first sees the inscription on the front side of the Sport shirt, the second on the back, Rest.
- The first sees a swinging branch above a person’s head, merging with the head in perspective. The second is a fly that sits in front of the camera, creating the appearance of an elephant (because it is closer).
In general, the list of why a tough video analytics is impossible in practice is long, but it has a very interesting aspect: These problems are easy to hide with a pre-prepared show.
A given uniform background, a given contrasting suit, predetermined actions with disjoint goals, the absence of interference in the form of bushes, trees, rainfall, glare ... All this is easy to organize in your office, and then video analytics turns into a miracle!
PS: We only talked about the “classics” that the creator buried long ago, and whose name is used in many monetary projects. But there are live video analytics algorithms on the market, we will talk about their advantages and disadvantages in the next article.
It is possible that the corpse will someday rise again. Well, at some point in the new types of computers or X-ray surveillance systems. I would like it because Intel, most likely, was not the first, Russian guys from other Russian companies came to its laboratory of Computer Vision in Nizhny Novgorod and were at the forefront of video analytics. In fact, this is a Russian invention. And it’s a pity that we have to write such articles. But for the sake of this, do not deceive the other Russian people, who are still buying stale, advertised processed meat?
Already by its very scale, one can stick to this direction the concept of “classical”. Moreover, Intel was at the forefront, and this is already a classic. It is on the basis of its open CV library Open CV that video surveillance developers still make their products. For the sake of pride, I must say, programmers in this area - Russian and also located in Russia - in the Nizhny Novgorod branch of Intel. Why are they located? The direction has been closed for several years, the people have dispersed to other firms. Apparently, Intel was the first to feel the futility of its “classic”.
Nevertheless, his business lives and is actively developing. Only the most lazy surveillance system developers did not use Open CV in their “smart” codes. And this library after its death works wonders! As many sellers of CCTV systems claim, it calculates criminal moments, detects fights, identifies abandoned and carried away objects, finds extremists ... And people hops. Billions of rubles are being torn into such tasks for the projects Safe City, Security on the Metro, Operation Anti-Terror, etc. But this is more of a politician, we’ll talk about technology, why this beautiful exhibition wrapper cannot work in practice.
Experts call this direction “hard” because the algorithms of such video analytics are based on the exact setting of parameters and the order of actions: cross a certain virtual line, exceed the detected area, put an object ... There is another direction (not Intel), flexible video analytics, the work of which is not tied formalized tasks, but we’ll talk about it next time.
The principle of classical “hard” video analytics is mostly based on an object detector that localizes closed areas of video detection by common signs of their coexistence. But so far there are no principles to clearly distinguish people from dogs, cats from cars, and a tree branch from a lawn mower. Unfortunately, all this works well only in ideal laboratory conditions, where they try to bypass such slippery moments as:
1. The video detector is based on contrast. The areas merging with the background do not fall under its analysis. This means that it is impossible to somehow predict the main parameters of the object of interest.

The first camera sees a person against a dark background, respectively, only a white shirt is detected, the rest of the body parts merge with the background and are not available for analysis. Considering also the problems of lighting, it is practically impossible to distinguish darker from dark or less dark to dark, because this is at the interference level.

The second sees a person on a white background, respectively, only a dark head and dark trousers are detected. The white shirt is not taken into account at all, because there is no information for the detector. Thus, the first camera will generally see several objects instead of one person.
2. It is practically impossible to filter out such phenomena as a shadow successfully - it painfully takes many forms, constantly running after all of us.

As a result, the proportions of the target are violated, and the computer does not understand that it is a person.
3. Intersecting goals lead the mind "pieces of iron" into complete chaos. To determine that these are two people, not one or not five, today's algorithms just can not.

4. Group goals are indistinguishable in the form of detection from third-party objects, for example, several people and a car.

5. The parameter “object size”, which video analytics demonstrators rely on when proving the ability to distinguish people from machines, is unacceptable in principle for 2D video surveillance.

What is more: a bird or a car?
6. Often you hear such an achievement: but we register with several cameras at once! This, perhaps, should sound like a flaw, because cameras see the object in different ways.

- The first sees a dark smooth nape, the second - a bright face with a long protrusion - a nose.
- The first sees a large object, because the person goes closer to her, the second is small, because man on. Two-dimensional vision has no prospects.
- The first sees the inscription on the front side of the Sport shirt, the second on the back, Rest.
- The first sees a swinging branch above a person’s head, merging with the head in perspective. The second is a fly that sits in front of the camera, creating the appearance of an elephant (because it is closer).
In general, the list of why a tough video analytics is impossible in practice is long, but it has a very interesting aspect: These problems are easy to hide with a pre-prepared show.
A given uniform background, a given contrasting suit, predetermined actions with disjoint goals, the absence of interference in the form of bushes, trees, rainfall, glare ... All this is easy to organize in your office, and then video analytics turns into a miracle!
PS: We only talked about the “classics” that the creator buried long ago, and whose name is used in many monetary projects. But there are live video analytics algorithms on the market, we will talk about their advantages and disadvantages in the next article.
It is possible that the corpse will someday rise again. Well, at some point in the new types of computers or X-ray surveillance systems. I would like it because Intel, most likely, was not the first, Russian guys from other Russian companies came to its laboratory of Computer Vision in Nizhny Novgorod and were at the forefront of video analytics. In fact, this is a Russian invention. And it’s a pity that we have to write such articles. But for the sake of this, do not deceive the other Russian people, who are still buying stale, advertised processed meat?