Legs, wings ... the main thing is the tail! The human body in terms of Intel RealSense


    The work of a programmer is interesting in its variety. Depending on the problem being solved, you delve either into the modeling of climatic processes, or into the biology of cell division, or into stellar physics ... But it happens in a different way: the most common at first glance problem opens up an abyss of nuances. Developers who first encountered Intel RealSense technology are probably surprised at how complex the processes of recognizing and tracking the position of hands or face are, because our brain does this almost without our participation. What features of our anatomy should be taken into account when designing natural interfaces and what successes did the creators of RealSense achieve along this path?
    At the end of the post - an invitation to Intel RealSense Meet Up in Nizhny Novgorod on April 24. Nizhny Novgorod, do not miss!

    Look at your hands and try to bend different fingers. Note: when bent, they depend on one another. Therefore, it is enough to track only two joints in order to achieve realistic flexion of the four fingers (all except the thumb). Only the index finger can be bent without bending the rest of the fingers, so the index finger requires its own joint tracking algorithm. For other fingers, everything is simpler: if you bend the middle finger, the ring finger and little finger are also bent; if the ring finger is bent, the middle finger and little finger are bent; if the little finger is bent, then the middle and ring fingers are bent.

    We continue to study our hands. The angle at which a certain phalanx of a finger can be bent depends on the length of this phalanx (and not on the joint). The upper phalanx of the middle finger can bend at a smaller angle than the average phalanx of the same finger, and the angle of the bend of the middle phalanx is less than the angle at which the lower phalanx can bend. We note here that tracking the hands of a child is much more difficult than tracking an adult, since it is more difficult to obtain data for small hands and interpret them accurately.

    Currently, RealSense technology allows you to simultaneously track 22 joints in each hand and two hands (by the way, they do not have to be right and left, but can belong to different people). In any case, the computer knows which hand is in front of it. An important step forward was the exclusion of the calibration stage, although in some difficult cases (again, say, if there is a child in front of the camera), the system asks for an initial calibration. But then the human hand is not just sighted at the nodal points, but can also be completed on its own if its part has left the camera’s field of vision or is not sufficiently lit. In the same case, if conditions allow, the hand will be separated from the background located behind, even if it changes periodically.

    The accuracy of determining the position of some parts of the hand relative to others allows you to implement very interesting options for transmitting information. Say, you can use the relative values ​​of the opening of the palm - from fully open to fully clenched fist (from 0 to 100). Agree, this is somewhat similar to sign language. By the way, the implementation of the classic sign language will open another important and necessary area for RealSense - the rehabilitation of people with disabilities. It is unlikely that any computer technology could have a more human application ...

    Let's move on to gesture recognition. Currently, Intel RealSense supports 10 ready-made gestures - you see them in the figure. Recognition can be static (motionless posture) or active (pose in motion). Naturally, nothing prevents you from switching from one mode to another, for example, a static open palm becomes active when waving. Recognizing gestures is an order of magnitude more complicated than simply tracking movement, since here it is necessary not only to calculate the position of the points, but also to compare movements with certain patterns. Therefore, you can’t do without training here, and both sides must learn: the computer needs to learn how to detect your movements, and you need to move correctly.

    The clearer your gesture is, the clearer it will be for the car. Perhaps at first you will experience a certain psychological discomfort: in a real “human” life, we almost never record gestures, but issue them one after another continuously. For RealSense, however, an initial and final phase and a long action between them are necessary (the gesture duration, by the way, can also be used as a parameter). Dynamic movement is determined by the situation of movement or time.

    As you can see, there is plenty of room for misunderstanding in natural interfaces. Let's say those gestures that we call “similar” the computer will most likely interpret as the same. Designers should avoid such situations. Further, the application must constantly monitor so that the person in the frame does not crawl out of it and, if necessary, issue warnings. The RealSense camera, which has its own peculiarities, adds a lot of nuances ... well, it's not interesting to solve a simple task, right? So we solve the difficult.

    Next time, if there is a chance, we’ll talk about face recognition. In the meantime, taking this opportunity, I want to invite all Nizhny Novgorod programmers who are interested in Intel RealSense technology to an informal meeting with the company's specialists , which will be heldApril 24, Friday, at ul. Magistratskaya, house 3 . In the program: reports on the topic, answers to questions, demonstrations of equipment and, of course, interested discussions - what would happen without them? Come, it will be interesting.

    When writing a post, IDZ articles were used:

    Also popular now: