MACROSCOP November 1, 2017 at 15:30

Deep calculation. How do 3D technologies help count people and make life easier?

let's get acquainted

Our team is developing intelligent software for IP video surveillance systems. Over the 9 years of its existence, we have created dozens of video analysis functions and modules, faced hundreds of problems and won no less victories. In our Macroscop blog, we will talk about some of them, share our vision of the development process and reveal some of our technologies.

"Get to the point"

A few years ago, we determined the low popularity of one of our functions - the interactive search function - by the fact that users did not contact the company with questions or a problem in working with it. When the tech support phone is silent - this is a bad sign for developers.

With the module for counting visitors, it was the other way around. Users did not just buy and install it, they really used it! And therefore, they regularly called technical support with questions, wrote their wishes, talked about non-standard methods of use, and, of course, shared difficulties in their work. The accuracy of the calculation was high (more than 92%), but it could be achieved if the camera was installed correctly, providing good shooting conditions (no flare, glare, etc.) and painstaking adjustment.

frame from the movie “The Kid Who Counted to Ten”

According to our assessment, when working with intelligent modules, users put emphasis on accuracy and ease of function control. When we adopted the Macroscop product development concept through simplification two years ago, one of the local milestones of this simplification was the redesign of popular visitor counting.

But first things first…

Traditionally, people are video-counted using tracking technology or optical flow method.

Tracking builds the trajectories of moving objects, and counting fixes the direction of intersection of the virtual line of entry / exit. Trajectories can be built in several ways:

1. By analyzing the sequence of frames on which moving objects are present. In the general case, several moving objects may be present in one frame; therefore, the program needs not only to construct trajectories, but also to distinguish between objects and their movements. When moving objects cross the entry / exit line one at a time, there is no difficulty in counting: the task is to determine the direction of the line intersection.

The calculation method based on the simplest tracking implementation can analyze this task, analyzing foreground objects (moving objects) in two consecutive frames. First, areas of motion that differ from the background image are highlighted in the current and previous frame, then, analyzing the speed, direction of movement of the objects, as well as their sizes, the probabilities of transition of objects from one point of the trajectory of the previous frame to another point of the current are calculated. The most probable movements of each object add up to the trajectory.

2. In the general case, people in the frame can move in different ways: their paths can intersect or overlap, and the zones of movement corresponding to objects can be combined into one area. In this case, the program needs to identify each object, separate the groups of objects and correctly count the people crossing the virtual line in one direction or another.

In these cases, the task of constructing the exact trajectory of individual objects is complicated, then the method of constructing the trajectory from two frames is not suitable, it gives a high error. An analysis of the sequence of frames and continuous post-processing of the results is used: the program builds graphs - analyzes the transitions of objects from one state (position) to another; analyzes the speed and direction of movement, position, color characteristics. As a result, a set of the most probable movements of the object is formed, which forms a trajectory.

To increase the accuracy of counting, methods of dividing people in groups are also used. This can be done by assessing the area of the groups or by detecting and counting people's heads.

Tracking counting provides the best accuracy when people in the frame overlap minimally. And in real systems, this can often be achieved only by installing the camera over a limited passage (narrow entrance door, escalator, etc.).

Visitor counting based on optical flow analysis
If the calculation based on tracking finds an object in the video stream and monitors its movements, this method monitors the virtual input / output line and analyzes the movement of color pixels through it. The method monitors the movement of an area of a certain brightness and a certain color through the line, calculates the characteristics of the image features (edges, angles, special points, texture information, etc.) The method only fixes the fact that some object is moving through the line, but not determines what kind of object it is, how many people move in this object. To determine the number of people crossing the line, methods for detecting heads and analyzing the area of a moving object are also used.

This method is applicable for a dense flow of people when traditional tracking methods are unsuitable. The most accurate result is achieved when the density of people’s flow is approximately homogeneous.

We have implemented the first and second methods in our calculation. Depending on the conditions under which the calculation takes place, the user can choose the most suitable mode of operation. If the shooting conditions are close to ideal (from the point of view of counting), the setup will not require much effort and time, but if it is not, the administrator will have to be puzzled.

As a result, counting visitors for complex observation scenes worked in proportion: the more effort a user spends on setting up a module, the more accurately he will work. What did not fit into our general concept.

We were puzzled by the search for new solutions for counting visitors.

New 3D visitor counting module

The new module is implemented in a fundamentally different way. If earlier the calculation used the data of two measurements, then in the new one the third one was introduced - the depth (the distance from the video camera to the person). Now counting is not just a module, it is a software and hardware complex from a special device - a depth sensor - and a software data processing module. The sensor calculates the distance from the device to the objects, emitting and receiving IR signals, builds a depth matrix with which the program is already working.

Depth provides information on the height of the one who crosses the entry-exit line, and allows you to distinguish people from other objects. The user needs to set the minimum growth of the visitor in the settings, and all people of this growth and higher will be counted by the system.

For the user, the 3D-counting of visitors is extremely simple: you need to set only two settings - height and entry line.

Its results practically do not depend on the conditions under which the calculation is made (unless you think in some very complex reliefs).

It is highly accurate - 98.5% in real conditions with real users (and not in "greenhouse" laboratory ones, as programmers often like to test). The highest accuracy is due to the fact that the module does not work with a picture, but with a three-dimensional map. In addition, we have implemented several technologies to solve a number of key tasks in the calculation:

The task of separating people. When people are close to each other, their contours at a given height can connect into one. To avoid the "loss" of a person, we "cut" the depth map into layers and get multilayer contours of objects. Outline without attachment corresponds to the top of the person. We count the tops.
The task of determining the trajectories of the movement of people. For this, tracking is used, but a completely new one, which takes into account the features of the received depth data.
The task of processing a depth map. We obtain depth data by evaluating the infrared signals of the emitting device reflected from surfaces. But the rays are reflected differently from different surfaces, so in some cases the map is obtained with "holes". We have created an algorithm that completes the map based on values in known areas.
Angle compensation task. In order to unload the installers of the video systems as much as possible, we implemented an algorithm that takes into account the deviation of the visitor counting device from the horizontal and adjusts the depth values accordingly.
The task of automatically determining the distance to the floor. Its solution eliminates the need to accurately measure the height and set it in the module settings. It is also aimed at improving the convenience of working with counting.

“Be simpler and people will reach for you ...”

The new 3D visitor counting is much simpler than the traditional module. It is simpler both from the point of view of the technologies implemented in it, and from the point of view of user work. Moreover, it is significantly more accurate and less "whimsical" to the shooting conditions.

When we came to users with real 3D-counting for real objects, the greatest impression on them was not even the highest accuracy (98.5% in real conditions), but just this very simplicity and almost complete absence of any settings. This once again assures us of our desire to develop, balancing simplicity and functionality, refuting the stereotype that a cool product should be sophisticated and complex.

Tags: