Detection of palms and fingers in the image
Over time, our ideas about how to interact with a computer change. To replace the "classic" keyboard and mouse, touchpads and touch screens have firmly entered our lives. But this is not the last stage of evolution for input media. With the advent of augmented reality devices , such as Google Glass , there is a need for interfaces that can harmoniously fit into this concept. There are prerequisites for the appearance of such interfaces, for example, devices such as Intel Creative Camera , Microsoft Kinect, or Leap Motion. The main control elements in these devices are the hands of the user. Therefore, one of the fundamental algorithmic tasks for interacting with similar devices is detecting the user's hands and fingers and reconstructing their spatial location.
This article will focus on one of the methods for solving the problem of detecting palms and fingers.
Formulation of the problem
By the detection of hands and fingers, we mean the detection of such points by which it is possible to restore the position of the palm on the plane and its position. As such points, it is rational to use the center of mass of the palm and points describing the tips of the fingers.
Algorithm description
Consider some contour that describes the silhouette of the palm:
Search for a specific point on the palm
First, we define the point that is the descriptor of the palm. As mentioned above, we will use the center of mass of the contour as such a point. To find it, we need to calculate the spatial moments. The moment is a contour characteristic calculated by integrating (or summing) all the pixels in the contour. In general terms, the moment (p, q) can be written as:
Then the formulas for the coordinates of the center of mass can be written as: The
approximate location of the center of mass is marked with a red dot in the image.
Search for finger points
Now consider the parts of the contour corresponding to the fingers.
For each point P [n] of the contour, we will also consider points P [nr], P [n + r], where r is some positive number (r
As can be seen from the image, on the contour corresponding to the silhouette of the fingers, there can be 2 types of points:
1) Points lying on a straight line (correspond to finger points). The angle P [nr] P [n] P [n +] is obtuse.
2) Points lying on arcs (correspond to fingertips and spaces between fingers). The angle P [nr] P [n] P [n +] is obtuse.
We are interested in points of the second type, because they describe the tips of the fingers.
As you know from the course mat. Analysis: . Therefore, as points describing the fingertips, we will look for points of type 2 with the maximum (in the vicinity) cosine of the angle P [nr] P [n] P [n +].
But, as can be seen from the figure above, type 2 points correspond not only to the fingertips, but also to the gaps between the fingers. To determine if a point is the tip of a finger, we use the bypass properties of the contour. Let us go around the pixels of the contour clockwise, then the points corresponding to the fingertips will correspond to the right turn P [n] P [n + r] relative to P [nr] P [n], and the points lying in the gap between the fingers to the left turn.
To determine whether the three points P [nr], P [n], P [n + r] form a right rotation, we can use a generalization of the vector product to two-dimensional space, namely, the right rotation condition will look like this:
Thus, we obtain the coordinates of the points corresponding to your fingertips.
Algorithm implementation
Generally speaking, the algorithm described above will work with the video stream of a regular webcam, but in this case there will be problems with accurately separating the foreground from the background. To avoid these problems, an RGB-D sensor (Microsoft Kinect) was used, which, instead of subtracting the background, simply limits the working distance by threshold cutoff in depth. In general, Kinect is not very suitable for this task, because the minimum working distance for it is about 40 cm, and this imposes significant restrictions on its placement. But it's still better than nothing. OpenNI was used as a driver for working with Kinect .
The OpenCV library was also used to simplify working with Kinect and contouring.
Experiment Results
An example of a picture during the operation of the algorithm:
An example of a video with the process of the algorithm (tracking is not used, hands and fingers are searched for in a new frame in each frame):
Source Code: github.com/BelBES/HandDetector