How We Developed a 3D People Scan System Using Intel RealSense 3D Cameras and Intel Edison Technology

    Cappasity has been developing 3D scanning technologies for two years. This year we release a scanning product for ultrabooks and tablets with an Intel RealSense camera - Cappasity Easy 3D Scan , and next year - hardware and software solutions for scanning people and objects.

    Due to the fact that I am an Intel Software Innovator and thanks to the Intel team that runs this program, we have been invited to show our prototype for scanning people much earlier than planned. Despite the fact that there was very little time for preparation, we still decided to take a chance. And in this article I will tell you how our demonstration was created for the Intel Developer Forum 2015, which was held in San Francisco on August 18-20.



    Our demonstration is based on the previously developed technology for combining depth cameras and RGB cameras into a single scanning complex (US Patent Pending). The general principle of operation is as follows: we calibrate the positions, tilts and optical parameters of cameras and thanks to this we can combine data for the subsequent reconstruction of the 3D model. To make 3D shooting of an object, we can place cameras around the subject, rotate the camera system around the object, or rotate the object in front of the camera system.

    We decided to choose Intel RealSense 3D cameras, since, in our opinion, they are the best solution in terms of price and quality. We are currently developing prototypes of two systems built using several Intel RealSense 3D cameras: a scanning box with several 3D cameras for instant scanning of objects and a full-length human scanning system.

    We showed both prototypes at IDF 2015, and the human scanning prototype successfully coped with the task of scanning a sufficiently large flow of visitors to the stand during the three days of the conference.



    Now let's move on to how everything works. On a vertical bar, we mounted three Intel RealSense long-range cameras so that the lower one would shoot the lower part of the legs, including the soles of the foot, the middle - the legs and most of the body, and the uppermost - the head and shoulders.



    Each camera was connected to a separate Intel NUC computer , and all computers were connected to a local network.

    Since the cameras are mounted on a stationary rod, we use a turntable to rotate the person. The table has a simple design based on plexiglass, roller bearings and a stepper motor. Using Intel Edison, it is connected to the computer and receives commands through the USB port.




    In addition, a simple constant lighting system is used, which allows you to evenly illuminate the front of the person. Of course, in the future, all the elements described above will be enclosed in a single package, but so far we have shown an early prototype of the scanning system, and therefore everything was assembled on the basis of commercially available elements.

    image

    Our software has a client-server architecture, but the server can be run on almost any modern computer. That is, we conditionally call the computer on which the calculations take place, the server and often as a server we use a conventional ultrabook with Intel HD Graphics. The server sends a write command to the NUC computers, downloads data from them, analyzes and reconstructs the 3D model.

    Now we turn to the features of the problem being solved. The 3D reconstruction that we use in Cappasity products is based on our implementation of the Kinect Fusion algorithm . But here the task was much more complicated - in a month it was necessary to write an algorithm that could reconstruct data from several sources. We called it Multi-Fusion, and in its current implementation it can integrate data from an unlimited number of sources into a single voxel volume. In the case of a human scan, we had three data sources.

    So, the first step is calibration. Cappasity software allows you to calibrate devices in pairs. At one time, it took us a year to R&D, and before the IDF 2015, old ideas were very useful to us. For a couple of weeks, we redesigned the calibration and supported the voxel volumes obtained after Fusion. Prior to this, calibration worked more with point clouds. Calibration needs to be performed only once after installing the cameras, and it takes no more than 5 minutes.

    Next, the question arose of an approach to data processing, and after a series of studies, we chose post-processing of data. Thus, at the beginning we record data from all cameras, then upload them over the network to the server and then we begin to reconstruct sequentially. Streams of color and depth are recorded from each camera. Thus, we have a complete nugget of data for further work with them, which is extremely convenient given the continuous improvement of post-processing algorithms, and we added them literally in the last days before the IDF in emergency mode.

    Intel RealSense long-range R200 cameras work better with black and complex materials than Intel RealSense F200 cameras. The number of failures in tracking was minimal, which, of course, pleased us. And most importantly, the cameras allow us to shoot at the distances we need. To make everything reconstruct quickly even on the HD Graphics 5500+, we optimized our Fusion algorithm for OpenCL. Noises were removed using Fusion and additional data segmentation after building a single mesh.

    imageimage

    In addition, we finalized the high-resolution texturing algorithm to IDF. Here we have the following approach: we take pictures in full resolution of a color camera and then project them onto the mesh. We do not use voxel colors, as this erodes the quality of the texture. The projection method is extremely difficult to implement, but this gives us the opportunity to use not only built-in cameras, but also external ones as a color source. For example, the scanning box we are developing uses DSRL cameras to obtain high-resolution textures, and this is extremely important for our e-commerce customers.

    But the cameras built into RealSense RGB gave excellent colors. Here is an example of a model after texture mapping:

    image

    We are currently working on a new algorithm that would allow us to eliminate texture shifts and plan to finish it before the release of our Easy 3D Scan product.

    As you can see, at first glance a simple demonstration is a lot of complex code, thanks to which we can compete with scanning systems, which cost about $ 100K +. Intel RealSense cameras are affordable and can change the market for B2B solutions.

    What is the advantage of the human scanning system we are developing:
    • Affordable solution and ease of setup and use - everything works at the click of a button;
    • Compactness - the scanning complex can be located in any trading floors, entertainment centers, medical centers, casinos, etc .;
    • Model quality is suitable for 3D printing, content development for AR / VR applications;
    • The accuracy of the resulting mesh allows you to remove the size from the scanned object.

    We understand that we have not yet discovered the full potential of Intel RealSense cameras, but we are confident that already at CES 2016 we will be able to show significantly improved products all in our hands!

    Also popular now: