saul March 20, 2015 at 11:26

Intel® RealSense ™. Working with raw data streams

Transfer

Developers who are interested in the features available when implementing control without the help of controllers in their applications need to familiarize themselves with the Intel RealSense SDK, related examples and resources on the Internet. If you "plunge" into this solution, you will find a wide range of functions that allow you to create completely new, great interfaces using new technologies.
In this article we will talk about streams of various raw data, about access to them and how to use them. Due to direct access to raw data, we can not only work with metadata, but also get the fastest way to determine what a user does in the real world.

In this article, we used a three-dimensional Bell Cliff camera as an Intel RealSense camera, which provides several data streams - from a traditional RGB color image to depth data and pictures from an infrared camera. Each of these flows behaves in its own way, but we will talk more about this below. By reviewing this article, you will learn which streams are available and when to work with them.
To understand the materials presented, it is useful (but not necessary) to know C ++ to familiarize yourself with code examples and have a general idea of Intel RealSense technology (or its earlier version - Intel Perceptual Computing SDK).

Why is it important

If you are only interested in the implementation of a simple gesture or face recognition system, then in the Intel RealSense SDK algorithm modules you will find everything you need, and you can not worry about raw data streams. The problem arises when you need functionality that is not available in the algorithm modules in the SDK. The application will not work without an alternative solution.
So, the first question: "What does your application need and is it possible to fulfill these requirements using the Intel RealSense SDK algorithm modules?" If you need a pointer on the screen that tracks the movement of the hand, the hand or finger tracking module may be enough for this. To quickly determine if the available functionality meets your needs, you can use the examples in the SDK. If these capabilities are not enough, then you can start planning for the use of raw data streams.

For example, two-dimensional gesture detection is currently supported. But what if you need to detect gestures by a set of three-dimensional hands and determine additional information on the user's hand movements? What if you need to record a high-speed stream of gestures and save them as a sequence, and not as a picture? It will be necessary to bypass the finger and hand recognition system, which forms the computational load, and introduce a technique for dynamic telemetry coding in real time. In general, you may encounter insufficient functionality, you may need a more direct solution to a particular software problem.

Another example: suppose you create an application that detects and recognizes sign language and converts it into text for transmission in a teleconference. Current Intel RealSense SDK functionality supports tracking of hands and fingers (but only in single gestures) and does not have targeted support for sign language recognition. The only solution in such cases is to develop your own gesture recognition system, which can quickly convert gestures into a sequence of positions of fingers and hands, and with the help of a template system it will recognize characters and restore text. The only currently available way to achieve this result is to access the raw data stream using high-speed recording and convert values on the fly.

The ability to write code to fill the gap between existing and desired functionality is extremely important, and it is provided in the Intel RealSense SDK.
This technology is still relatively new, and developers are still exploring its capabilities. Access to raw data streams expands possible actions, and new solutions are born from such improvements.

Streams

The best way to learn about data streams is to familiarize yourself with them. To do this, run the Raw Streams example located in the bin folder of the installed Intel Realsense SDK instance.

\ Intel \ RSSDK \ bin \ win32 \ raw_streams.exe The

example is provided with full source code and a project that will be very useful to us. If you run the executable file and press the START button when the application starts, you will get a color RGB stream, as shown in Fig. 1.

Figure 1. Typical RGB color stream.

Having waved a handle to yourself, press the STOP button, open the Depth menu and select 640x480x60. Press the START button again.

Figure 2. Filtered depth data stream from an Intel RealSense 3D camera.

As can be seen in fig. 2, this image is significantly different from the RGB color stream. You see a black and white image representing the distance of each pixel from the camera. Light pixels are closer and dark pixels are farther; black is either considered a background or not recognized reliably.
Moving in front of the camera, you will understand that the camera can very quickly make decisions about user actions. For example, it is completely clear how to distinguish the hands on the stage due to the thick black outline that separates them from the body and head, which are further from the camera.

Figure 3. Night vision. Intel RealSense 3D camera delivers raw infrared video

The latter type of stream may not be known to previous Intel Perceptual Computing SDK developers, but in fig. Figure 3 shows that in the IR menu you can get from the camera an image shot in the infrared range. This is a stream of raw data, its read speed is much higher than the refresh rate of typical monitors.

You can initialize all or any of these threads to read them simultaneously as the application needs; For each stream, you can select the desired resolution and refresh rate. It is important to note that the final frame rate of the incoming streams will depend on the available bandwidth. For example, if you try to initialize the RGB stream at 60 frames per second, the depth stream at 120 frames per second and the IR stream at 120 frames per second and transmit all these streams with a single synchronization, only the slowest refresh rate (60 frames per second) will be available , and only if the system copes with such work.

The sample with raw streams is suitable for starting work, but does not allow you to combine streams, so it should be used only to familiarize yourself with the types, resolutions and refresh rates available for your camera. Remember that the Intel RealSense SDK is designed to work with various types of three-dimensional cameras, so sample resolution may not be available on other cameras. Therefore, you should not hard-set the resolution in the application code.

Stream creation and data access

You can view the full source code of the raw thread example by opening the following project in Visual Studio *.

\ Intel \ RSSDK \ sample \ raw_streams \ raw_streams_vs20XX.sln

The example contains a simple user interface and a complete set of parameters, so the code is not very easy to read. It makes sense to remove the additional code and leave only the necessary lines of code that serve to create, process and delete the stream received from the camera. The code below is a “cleaned” version of the above project, but all the necessary components are saved in this code even for the simplest Intel RealSense applications.

The first two important functions are the initialization of the Intel RealSense 3D camera and its release at the end of the program. This can be seen in the code below, and detailed information about the called functions will be given below.

int RSInit ( void ) 
{ 
InitCommonControls(); 
g_session=PXCSession::CreateInstance(); 
if (!g_session) return 1; 
g_bConnected = false; 
g_RSThread = CreateThread(0,0,ThreadProc,g_pGlob->hWnd,0,0); 
Sleep(6000); 
if ( g_bConnected==false ) 
return 1; 
else 
return 0; 
} 
void RSClose ( void ) 
{ 
g_bConnected = false; 
WaitForSingleObject(g_RSThread,INFINITE); 
}

Here we have top-level functions for any application designed for raw data: creating an instance of a session and a thread to execute code that processes the stream, then releasing the stream using the g_bConnected global flag . It is recommended to use CPU streams when working with data streams, as this will enable the main application to work with any desired frame rate, regardless of the camera frame rate. In addition, it helps to distribute the CPU load across multiple cores, which improves overall application performance.

In the above code, we are only interested in the line with the ThreadProc function, which contains all the code for thread control. Before moving on to the details, we note that this source code is not exhaustive; global declarations and minor sections have been removed here to improve readability. For information on global declarations, see the source code for the raw_streams sample project .

static DWORD WINAPI ThreadProc(LPVOID arg)
{
CRITICAL_SECTION g_display_cs;
InitializeCriticalSection(&g depthdataCS);
HWND hwndDlg=(HWND)arg; ~
PopulateDevices(hwndDlg);
PXCCapture::DeviceInfo dinfo=GetCheckedDevice(hwndDlg);
PXCCapture::Device::StreamProfileSet profiles=GetProfileSet(hwndDlg);
StreamSamples((HWND)arg,
&dinfo,
&profiles,
false, false, false,
g_file
);
ReleaseDeviceAndCaptureManager();
g_session->Release();
DeleteCriticalSection(&g_depthdataCS); return 0;
}

To work with the data stream, it is important to create a “main section” in the code. If you do not do this in a multi-threaded environment, then two streams can theoretically write data to the same global variable at the same time, which is undesirable.

For those who are not familiar with multithreading, this function is called and does not end until g_bConnected is set to false for the main thread (which created this thread) . The main function call here is StreamSamples , and the rest of the code above and below serves only for entry and exit. The first function that interests us is PopulateDevices , it is almost identical to the same function in the raw_streams project . She populates the g_devices listnames of all available devices. If you use an Intel RealSense 3D camera on an ultrabook, then there is a chance that you will have two devices (the second is the built-in ultrabook camera). Pay attention to the following lines.

static const int ID_DEVICEX=21000;
static const int NDEVICES_MAX=100;
int с = ID_DEVICEX;
g session->CreateImpl(g_devices[c], &g_ capture); g_device=g_capture->CreateDevice((c-ID_DEVICEX)%NDEVICES_MAX);

The code, constants and global functions are copied from the original code, they can be further reduced. The most important calls here are Createlmpl and CreateDevice . As a result, the Intel RealSense 3D camera pointer is now stored in g_device .
If there is a valid pointer to the device, the rest of the initialization code works without problems. The StreamProfileSet function is a wrapper for this code.

g_device->QueryDeviceInfo(&dinfo);

The StreamProfileSet function is responsible for collecting all types of streams and permissions that need to be initialized; it can be simple or complex, based on needs. For compatibility with cameras, it is strongly recommended that you list the available resolutions and types in the list instead of hard-coding fixed settings.

PXCCapture::Device::StreamProfileSet GetProfileSet(HWND hwndDlg)
{
PXCCapture::Device::StreamProfileSet profiles={};
if (!g_device) return profiles;
PXCCapture::DeviceInfо dinfo;
g_device->QueryDeviceInfo(&dinfo);
for (int s=0, mi=IDXM_DEVICE+l;sQueryStreamProfileSetl\lum(st);
for (int p=0;pQueryStreamProfileSet(stj p, Sprofilesl);
profiles[st]=profilesl[st];
}
}
mi++;
}
return profiles;
}

QueryStreamProfileSet returns a significant amount of code in which we need available streams for a single depth stream and profile return. You can write your own conditions for searching for the necessary streams, whether with a certain resolution or with a certain frame rate, if there are rollback conditions so that the application can work with a stream of a suitable format.
The final function and the central block of code for accessing stream data is StreamSamples . If you remove the security code and comments, the code will look like this.

void StreamSamples(HWND hwndDlg, PXCCapture::DeviceInfо *dinfo, PXCCapture::Device::StreamProfileSet *profiles, bool synced, bool isRecord, bool isPlayback, pxcCHAR *file)
{
PXCSenseManager *pp=g_session->CreateSenseManager();
pp->QueryCaptureManager()->FilterByDeviceInfo(dinfo);
for (PXCCapture::StreamType st=PXCCapture::STREAM_TYPE_COLOR;st!=PXCCapture::STREAM_TYPE_ANY;st++)
{
PXCCapture::Device::StreamProfile &profile=(*profiles)[st];
if ([profile.imagelnfo.format) continue;
pp->EnableStream(st,profile.imagelnfo.width, profile.imagelnfo.height, profile.frameRate.max);
}
pp->QueryCaptureManager()->FilterByStreamProfiles(profiles);
MyHandler handler(hwndDlg);
if (pp->Init(&handler)>=PXC_STATUS_NO_ERROR)
{
pp->QueryCaptUreManager()->QueryDevice()->SetMirrorMode(PXCCapture: :Device: :MirrorMode: :MIRROR_MODE_DISABLED);
g_bConnected = true;
for (int nframes=0;g_bConnected==true;nframes++)
{
pxcStatus sts2=pp->AcquireFrame(synced);
if (sts2=PXC_STATUS_NO_ERROR)
{
PXCCapture::Sample *sample = (PXCCapture::Sample*)pp->QuerySample();
short invalids[l];
invalids[0] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthSaturationValue();
invalids [1] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthl_owConfidenceValue();
PXCImage::ImageInfo dinfo=sample->depth->QueryInfo(); PXCImage::ImageData ddata;
if (sample->depth->AcquireAccess( PXCImage::ACCESS_READ, PXCImage::PIXEL_FORMAT_DEPTH,
&ddata)>=PXC_STATUS_NO_ERROR)
{
EnterCriticalSection(&g_depthdataCS);
memset ( g_depthdata, 0, sizeof(g_depthdata));
short *dpixels=(short*)ddata.planes[0];
int dpitch = ddata.pitches[0]/sizeof(short);
for (int у = 0; у < (int)dinfо.height; y++)
{
for (int x = 0; x < (int)dinfo.width; x++)
{
short d = dpixels[y*dpitch+x];
if (d == invalids[0] || d == invalids[l]) continue; g_depthdata[x][y] = d; } }
LeaveCriticalSection(&g_depthdataCS); g_bDepthdatafilled = true;
}
sample->depth->ReleaseAccess(&ddata);
} pp->ReleaseFrame();
} }
pp->Close(); pp->Release();
}

At first glance, there’s a bit of code here, but if you look, you’ll see that these are just a few setup calls, a conditional loop and a final cleanup before returning to the ThreadProc function that called this code. The primary variable used is called pp, which is the Intel RealSense SDK manager pointer for our basic operations. Note. As mentioned above, to improve the readability of the code, error tracking has been removed from it, but in practice, you should not create code based on the assumption that all Intel RealSense SDK calls without exception will be successful.
The first piece of code, which will include the threads of interest to us, looks like this.

pp->EnableStream(st,profile.imagelnfo.width, profile.imagelnfo.height, profile.frameRate.max);

This simple request includes a stream type with a specific resolution and frame rate and instructs the camera to get ready to send us this raw data. The next important line activates the dispatcher so that it can begin the process of receiving data.

MyHandler handler(hwndDlg);
if (pp->Init(&handler)>=PXC_STATUS_NO_ERROR)

The MyHandler class is defined in the original raw_streams project and derives from the PXCSenseManager: Handler class . If successful, you will know that the camera is turned on and transmitting the data stream.
Now we need to start the conditional cycle, which will work until some external influence changes the condition of the cycle. In this loop we will receive stream data sequentially in one frame. To do this, use the AcquireFrame command .

for (int nframes=0;g_bConnected==true;nframes++)
{
pxcStatus sts2=pp->AcquireFrame(synced);

As long as g_bConnected is true, we will do this as quickly as possible in a separate thread created for this purpose. Getting the actual data requires a few more lines of code.

PXCCapture::Sample *sample = (PXCCapture::Sample*)pp->QuerySample();
short invalids[l];
invalids[0] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthSaturationValue();
invalids [1] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthl_owConfidenceValue();
PXCImage::ImageInfo dinfo=sample->depth->QueryInfo(); PXCImage::ImageData ddata;
if (sample->depth->AcquireAccess( PXCImage::ACCESS_READ, PXCImage::PIXEL_FORMAT_DEPTH,
&ddata)>=PXC_STATUS_NO_ERROR)

The first command receives a sample pointer from the manager and uses it to get a pointer to the actual data in memory using the last AcquireAccess command . The code performs two queries to find out from the dispatcher which values correspond to a “saturated” pixel and an “invalid” pixel. Both of these conditions can occur when receiving depth data from the camera. They should be ignored when interpreting returned data. The main result of this code: the ddata data structure is now filled with information that will allow us to directly access depth data (in this example). By changing the corresponding parameters, you can access the data of the COLOR and IR streams , if enabled.
This completes the code fragment related to the Intel RealSense SDK (from the first initialization call to receiving a pointer to the stream data). The rest of the code will be somewhat more familiar to developers with experience in creating image processing programs.

EnterCriticalSection(&g_depthdataCS);
memset ( g_depthdata, 0, sizeof(g_depthdata) );
short *dpixels=(short*)ddata.planes[0];
int dpitch = ddata.pitches[0]/sizeof(short);
for (int у = 0; у < (int)dinfо.height; y++)
{
for (int x = 0; x < (int)dinfo.width; x++)
{
short d = dpixels[y*dpitch+x];
if (d == invalids[0] || d == invalids[l]) continue;
g_depthdata[x][y] = d; }
}
LeaveCriticalSection(&g_depthdataCS);

Note that the important session object created earlier is used to block our thread so that no other thread can access our global variables. This is done so that you can write a global array and be sure that the code from another part of the application will not affect the work. If we trace the nested loops, you will see that after blocking the flow we clear global array g_depthdata and fill it with the values of the above-mentioned structure ddata , which contains a pointer to the depth data. In nested loops, we also compare the value of the depth pixels with the two invalid values that we set earlier with calls to QueryDepthSaturationValue andQueryDepthLowConf idenceValue .

After transferring data to the global array, the CPU stream can receive the next frame from the stream data, and the main CPU stream can begin to analyze this data and make decisions. You can even create a new workflow to perform this analysis, which will enable the application to work in three threads and make more efficient use of multicore architecture resources.

What to do with stream data

So, now you know how to get the data stream from the Intel RealSense 3D camera, and you are probably wondering what to do with this data. Of course, you can simply display this data on the screen and enjoy the image, but soon you will need to convert this data into useful information and process it in your application.

Just as there are no two identical snowflakes, all implementations of raw data streams will differ, but still there are several general approaches that help to organize data analysis. To reduce the amount of new code, we will use the above code as a template for the examples suggested below.

Find closest point

It is recommended that you find the point closest to the camera in front of it. In doing so, you just passed the depth data from the data stream to the global array in the CPU stream. You can create a nested loop to check each value in the array.

short bestvalue = 0;
int bestx = 0;
int besty = 0;
for ( int у = 0; у < (int)dinfo.height; y++)
{
for ( int x = 0; x < (int)dinfo.width; x++)
{
short thisvalue = g_depthdata[x][y]; if ( thisvalue > bestvalue )
{
bestvalue = thisvalue;
bestx = x;
 besty = y; }
}
}

Each time a closer value is found, it replaces the current best value, while the coordinates along the X and Y axes are recorded. By the time the loop goes around each pixel in the depth data, the coordinates of the depth data for the nearest to the nearest BESTX and BESTY variables will be stored camera points.

Ignore objects in the background

It may be necessary to identify the shapes of the foreground objects, but the application should not confuse them with objects in the background, for example, with a seated user or people walking by.

short newshape[dinfo.height][dinfo.width];
memcpy(newshape,0,sizeof(newshape));
for ( int у = 0; у < (int)dinfo.height; y++)
{
for ( int x = 0; x < (int)dinfo.width; x++)
{
short thisvalue = g_depthdata[x][y];
if ( thisvalue>32000 && thisvalue<48000 )
{
newshape[x][y] = thisvalue;
}
}
}

If you add a condition when reading each pixel and transfer only those pixels that are within a certain range, then objects can be extracted from the depth data and transferred to a second array for further processing.

Tips & Tricks

What to do

If this is your first time working with examples and using an ultrabook with a built-in camera, the application can use this built-in camera instead of the Intel RealSense camera. Ensure that the Intel RealSense camera is connected correctly and that your application uses the Intel RealSense 3D camera device. For more information on how to find a list of devices, see the g_devices links in this article.
Always try to use multi-threaded computing in the Intel RealSense application: in this case, the application will not be “tied” to the frame rate of the Intel RealSense 3D camera stream, and higher performance will be achieved on multi-core systems.

What should not be done

Do not hardcode the device or profile parameters when initializing threads, as future Intel RealSense 3D cameras may not support the parameters you specified. You should always list the available devices and profiles and use the search terms to find the one you need.
Avoid unnecessary data transfer to secondary arrays, as each such cycle consumes a lot of CPU and memory resources. Try to keep your data analysis as close as possible to the original data reading operation.

Conclusion

Knowing how to get the raw data stream from the Intel RealSense 3D camera will help expand the capabilities of this technology and create modern solutions. We have already seen great hands-free applications created by the first developers in this field, but this is only a small part of all the capabilities of new technologies.
Many users still regard computers as devices that should be actively influenced in order for them to work, but now computers have received “vision” and can observe all our movements. Do not spy, please notice, but just watch, so that at the right time to come to the rescue. According to a saying in the land of the blind, the one-eyed will become king. Is it not true that we live in a world populated by blind computers? Imagine what kind of revolution will happen if one of them "sees" in the not so distant future? As developers, we are the architects of this revolution, together we can create a completely new paradigm in which computers see their operators and try to help them.

Learn more about RealSense on Intel.
All about RealSense for Developers
Download the RealSense SDK
RealSense Developer Forum

Tags: