Two opposite directions of VIDEO ANALYTICS: “rigid” and “flexible”, who is stronger?

    The problem - reducing redundant video information - is extremely relevant for today's video surveillance, the amount of data which is not able to digest a person. Only everyone solves it in different ways: some by searching for important points, others by filtering insignificant ones. Which is more effective?


    In previous articles, as soon as I touched on this topic, I immediately got into a discussion with the apologists of video analytics. Even the founders of classical video analytics - former Intel programmers - have fixed their positions on this issue, for which many thanks! On this portal there are many who are considered luminaries in this area - it’s a sin not to take advantage of this. I think to start with them. In this article, I only outline the differences - and I hope for a discussion of professionals. And then we'll see how events will develop.

    Unfortunately, I can’t afford to give links to the sites of “analysts” so that I won’t be banned before the discussion, so I’ll try to describe the basic concepts in my own words, well, a little using Wikipedia. After studying a huge number of domestic and foreign companies, I can identify two specific areas of video analysis used in video surveillance to reduce the amount of information:

    1. Hard video analytics is a classic that is based on the good old Intel Open CV library, but which Intel no longer develops. For the most part, its foundation is an object detector. This algorithm localizes in a stream of video frames changing closed areas according to certain criteria. We have already examined it using the example of Synesis". The “video surveillance” program is trying to analyze these “objects” in order to calculate useful goals in them: people, cars ... When they are discovered, the main idea is to analyze actions, movements and, ultimately, the resulting behavioral pattern suitable for interpretation in a socio-criminal sense.


    2. Flexible video analytics is a younger area of ​​knowledge, which appeared, apparently, in Russian penates. Wikipedia calls it video semantics and interprets it this way: “Video semantics is a brief logical presentation of video information by decomposing it into semantic units (videos), each of which has its own complete meaning that differs from the previous and subsequent video segments. This is a special area of ​​video analytics - the so-called flexible video analytics, which does not have hard parameters and precise formalization. ”

    In general, on the fly after a second reading, the first option is more suitable for me personally. Still, it is necessary to clearly and immediately say who is preparing the attack. Moreover, this is what our “comrades” demand, for billions, they are purchasing intelligent video surveillance systems in Safe cities and subways all over the country. It only scares that the results are often negative. But let's leave politics to politicians.

    So what is the opposite of these two approaches? If you listen to the text - that's it. The first ones look for crime or the actions of people (cars) in the video stream that are a threat. The second - deny this possibility, appealing to the theory of building the world. Sorry if I expressed my incorrect attitude to the descriptions, which usually begin with the fact that a rigid video analytics is impossible in principle. Oddly enough, I also started myarticles about video analytics - but I was based only on specific examples of specific manufacturers. However, this does not mean that I will end up with flexible video analytics as something better. Fell - so bring down everything, a lot of forest!

    Well, now, I have already outweighed my attitude towards rigid, saying that it suits me better - I need to correct the scales, I will say something about the other: I like the word “flexible” more, it’s prettier!

    So, the former formalize the behavior of objects (I don’t know if they can do it), others cannot (well, or they don’t want to). The first shout to the guard - look, a fight! Second - pay attention, something happened! Again, the first are drawn in a better light - more understandable. Although, but the latter sound somehow more honest.

    "Hard" look for the important, "flexible" remove the unnecessary. After this phrase, I suddenly felt that there was no difference between them. However, they consider themselves class technological enemies.

    We have already said that tough video analytics is based on the classification of objects: a person, a car, a cat ... But how is she looking for crime? The vast majority of companies offer algorithms for crossing virtual lines, crowding, various options for moving targets. Those. most often you need to clearly know the "boundaries of what is permitted" - specific places at the facility, the intersection of which is a crime or an occasion for verification. We will talk about all this later, now only about comparing the methods of approach. But in all cases, the “tough” one assumes that the ways and means of unauthorized actions are defined.


    Supporters of flexible video analytics ridicule the very statement of the question with phrases like “Do you know exactly how they will kill you?”. In the "flexible" they don’t get attached to anything, they don’t count anything, they give their security completely to the computer. And this phrase is worrying! But how does flexible video analytics protect human peace? According to Wikipedia, “video semantics tracks the characteristics of video content as a result of the analysis of statistical changes”, i.e. The basis is STATISTICS. Take, say, 1000 frames, check to see if any of them have something new unusual or their nature of the changes completely falls within the previous 1000, or even the previous 100 000 frames. Suppose all people always walked straight on this road, and someone suddenly ran across the lawn. Or just jumped where no one jumped. He ran sharply ...


    In the middle of the road I crouched, lay down, took a barrel out of my pocket ... - any non-standard. Here, I am only embarrassed by the phrase of one of the companies “took the barrel out of his pocket or a scarf”, i.e. no formalization of the threat. But we won’t put pressure on anyone yet.


    By the way, in the “tough” one, all the moments of determining the class of the target and its actions require quite complicated, in my opinion, settings, and any malfunctions of the camera view (from wind, vibration, etc.) or the rearrangement of large objects on the ground entail failure functioning. And in the “flexible” there are no settings at all, as claimed by some manufacturers, which, judging by the logic of its operation, may correspond to the truth.

    Rigid video analytics, as we have already seen, is very sensitive to interference, especially street. About flexible Wikipedia claims: “The absence of hard-set parameters and precise formalization protects against interference, since they are included in the general analysis and subtracted from themselves as a result of the difference in statistical changes.” Well, yes, if the spider sat on the camera, then this spider will be on all frames - theoretically, there should not be any changes in statistics. If another spider does not crawl.

    I’ll throw something from the previous discussion about low-contrast goals. This means that the villain crawls in a camouflage uniform and merges with the terrain, but you need to calculate it. To classify a human figure, an object detector needs more sensitivity and greater contrast, otherwise it will take a lot of scattered small targets, some parts of the camouflage uniform will completely merge - well, since we're talking about serious low contrast. Thus, a rigid video analytics in this matter is probably inferior to a flexible one - for which the classification of the goal is unimportant in principle. But how much is it? While I threw this topic only for discussion, there is no conclusion.

    Another topic for discussion is the transaction being resolved. For example, the definition of congestion falls under both rigid and flexible video analytics. Both of them - allegedly - are dealing with this issue. Only by different methods. So which one is more effective?


    There are still a lot of questions, I will try not to torment the length of the article, we will discuss the rest in the future. (If not banned.)

    Also popular now: