
What awaits us in Microsoft Kinect 2.0?
Microsoft’s first version of Kinect, a touch-sensitive game controller, ranks second in the Top 10 Most Innovative Technical Products of 2011 ranking of Popular Mechanics magazine. By February 2013, the number of devices sold had reached 24 million. At the same time, the first 8 million were sold within 60 days from the moment the device went on sale, which made Kinect “the fastest-selling user device” in the Guinness Book of Records.
But time passes, and the release of the new version of Kinect is not far off. I liked the latest TechCrunch article (in the original How Microsoft built the cameras in the upcoming Kinect») About the improvements of the new Kinect. The text was written by a person who specially visited the Microsoft campus in Mountain View to familiarize themselves with the new controller, and also talked with the engineers who developed it.
Under the cat 5 arguments, why the new Kinect has every chance to repeat the success of the first version.

Figure 1. View modes of the new Kinect.
Thought 1: The new Kinect was cooler than expected, far ahead of its predecessor in technical terms. An increased viewing angle, a larger number of pixels in the sensor and improved resolution make it possible to recognize the movement of the child’s wrist at a distance of 3.5 meters.
Thought 2:The second version adds several new viewing modes. Despite the fact that these modes are not accessible to ordinary users, they can be extremely useful for application developers to improve the accuracy of tracking movements and reduce errors.
For example, the Infrared Vision Mode and new human body modeling tools can be used to track muscle movements and the relative orientation of body parts.
And the Deep Image Mode , acting like a radar, where each of the 220 thousand pixels of the sensor records data independently, allows you to create, for example, a surprisingly accurate and detailed display of the room.
Thought 3:The new Kinect also added camera settings that are invariant to light (remaining unchanged when light changes). With these settings, Kinect gives the same result, regardless of the lighting conditions of the room. In practice, this means that you can use Kinect in the dark or in a room with light noise. For example, two spotlights aimed directly at the sensor will not affect the performance of the device. The author of the article immediately decided to test it, and according to his conclusion, this mechanism really works, as the manufacturer promises.

Figure 2. Light-invariant settings for the new Kinect allow it to be used in the dark.
Thus, when processing data from Kinect, developers do not need to worry about how the user is lit, and the data will not be distorted due to an unexpected change in the lighting in the room, for example, the inclusion of overhead lights.
The number of recognizable joints of the human body is also significantly increased, which can be used to more accurately track the movements of the hands of users.
Thought 4: The minimum size of an object that Kinect of the first generation could recognize was 7.5 cm. The new Kinect simultaneously recognizes objects from 2.5 cm in size with a 60% increase in viewing angle. The number of simultaneously tracked people in the room is also increased to 6 compared to 2 in the previous version.
Thought 5:The first Kinect was the fastest-selling user device in history. Its existence has helped extend interest in the Xbox 360, even when the console is morally obsolete. Microsoft launches the new Kinect along with the new Xbox One. Both devices will be available in the US from mid-November 2013 and will compete for users with Sony's upcoming PlayStation 4 console.
Microsoft wanted to create a camera that works on the basis of “flight time,” that is, the time required for the light to travel from its source to the object and back. Given that this happens fairly quickly, the new Kinect needs to process gigantic data streams in real time, which in itself is a complex engineering task. Representatives of the Israeli branch of Microsoft and the central office in Silicon Valley have teamed up to implement the academic idea of “flight time” into a commercial product.

Figure 3. The idea of “flight time” formed the basis of the camera of the new Kinect.
By combining his best minds, Microsoft was able to solve the problem of data collection, however, there were other problems associated with data overload due to large volumes and blurring of objects.
In short, two conditions were required simultaneously:
Initially, the development team was far from the goal. The guys developed algorithms that optimize the load on the processor and perform image processing so that distant objects do not merge and do not blur when moving. According to engineers, the implementation of these software tasks was made possible thanks to the preliminary calibration of the camera. If you did not pre-configure the hardware, the algorithms would learn from imperfect or incorrect data. It is advisable to train the algorithms on the final data, and not on data with “noise” or test data.
The equipment of the new Kinect is multi-component, that is, there is an aggregation unit that collects sensor data and forms separate streams from them for each signal component (Microsoft does not go into details, however there is an assumption that this is a separate chip). Also, Microsoft refused to talk about where the process of "cleaning" the data. There is speculation that to reduce input, this process, at least in part, takes place on the console itself.
The end result is a multi-component data stream that can be used by developers of applications based on Xbox or PC.
In Russia, the new console will not be available until next year.
But time passes, and the release of the new version of Kinect is not far off. I liked the latest TechCrunch article (in the original How Microsoft built the cameras in the upcoming Kinect») About the improvements of the new Kinect. The text was written by a person who specially visited the Microsoft campus in Mountain View to familiarize themselves with the new controller, and also talked with the engineers who developed it.
Under the cat 5 arguments, why the new Kinect has every chance to repeat the success of the first version.

Figure 1. View modes of the new Kinect.
Thought 1: The new Kinect was cooler than expected, far ahead of its predecessor in technical terms. An increased viewing angle, a larger number of pixels in the sensor and improved resolution make it possible to recognize the movement of the child’s wrist at a distance of 3.5 meters.
Thought 2:The second version adds several new viewing modes. Despite the fact that these modes are not accessible to ordinary users, they can be extremely useful for application developers to improve the accuracy of tracking movements and reduce errors.
For example, the Infrared Vision Mode and new human body modeling tools can be used to track muscle movements and the relative orientation of body parts.
And the Deep Image Mode , acting like a radar, where each of the 220 thousand pixels of the sensor records data independently, allows you to create, for example, a surprisingly accurate and detailed display of the room.
Thought 3:The new Kinect also added camera settings that are invariant to light (remaining unchanged when light changes). With these settings, Kinect gives the same result, regardless of the lighting conditions of the room. In practice, this means that you can use Kinect in the dark or in a room with light noise. For example, two spotlights aimed directly at the sensor will not affect the performance of the device. The author of the article immediately decided to test it, and according to his conclusion, this mechanism really works, as the manufacturer promises.

Figure 2. Light-invariant settings for the new Kinect allow it to be used in the dark.
Thus, when processing data from Kinect, developers do not need to worry about how the user is lit, and the data will not be distorted due to an unexpected change in the lighting in the room, for example, the inclusion of overhead lights.
The number of recognizable joints of the human body is also significantly increased, which can be used to more accurately track the movements of the hands of users.
Thought 4: The minimum size of an object that Kinect of the first generation could recognize was 7.5 cm. The new Kinect simultaneously recognizes objects from 2.5 cm in size with a 60% increase in viewing angle. The number of simultaneously tracked people in the room is also increased to 6 compared to 2 in the previous version.
Thought 5:The first Kinect was the fastest-selling user device in history. Its existence has helped extend interest in the Xbox 360, even when the console is morally obsolete. Microsoft launches the new Kinect along with the new Xbox One. Both devices will be available in the US from mid-November 2013 and will compete for users with Sony's upcoming PlayStation 4 console.
Problems
Microsoft wanted to create a camera that works on the basis of “flight time,” that is, the time required for the light to travel from its source to the object and back. Given that this happens fairly quickly, the new Kinect needs to process gigantic data streams in real time, which in itself is a complex engineering task. Representatives of the Israeli branch of Microsoft and the central office in Silicon Valley have teamed up to implement the academic idea of “flight time” into a commercial product.

Figure 3. The idea of “flight time” formed the basis of the camera of the new Kinect.
By combining his best minds, Microsoft was able to solve the problem of data collection, however, there were other problems associated with data overload due to large volumes and blurring of objects.
In short, two conditions were required simultaneously:
- Processing about 6.5 million pixels every second;
- Low load on the Xbox One to maintain its high performance;
Initially, the development team was far from the goal. The guys developed algorithms that optimize the load on the processor and perform image processing so that distant objects do not merge and do not blur when moving. According to engineers, the implementation of these software tasks was made possible thanks to the preliminary calibration of the camera. If you did not pre-configure the hardware, the algorithms would learn from imperfect or incorrect data. It is advisable to train the algorithms on the final data, and not on data with “noise” or test data.
The equipment of the new Kinect is multi-component, that is, there is an aggregation unit that collects sensor data and forms separate streams from them for each signal component (Microsoft does not go into details, however there is an assumption that this is a separate chip). Also, Microsoft refused to talk about where the process of "cleaning" the data. There is speculation that to reduce input, this process, at least in part, takes place on the console itself.
The end result is a multi-component data stream that can be used by developers of applications based on Xbox or PC.
In Russia, the new console will not be available until next year.
Useful articles
- Original article (in English)