RealSense Application User Interface Design Guidelines
- Transfer

Intel® RealSense ™ technology supports two types of depth cameras: front-view camera, short-range (F200) designed for installation on laptops, ultrabooks, transformers and all-in-ones; rear-view camera, long-range (R200) is designed for installation on tablets and as a separate removable device. Both cameras are available as stand-alone peripheral devices, and are embedded in computer devices currently available on the market. When using Intel RealSense technology to develop applications for such devices, it should be remembered that the principle of interacting with three-dimensional applications without tactile feedback is significantly different from the model of work that developers who create applications for touch control are used to.
In this article, we describe some common principles and problems of user interfaces for the F200 and R200 cameras and show how visual feedback can be embedded in applications using the Intel® RealSense ™ SDK API.
Recommendations for creating user interfaces and using the API for the F200 camera
Result 1. Understanding the volume of shooting space and interaction areas for laptops and all-in-ones
UI use case
Consider the use cases shown in fig. 1.

Figure 1. Surround space of the survey The
pyramid emanating from the camera in this figure is what is called the surround space of the survey or the field of view of the camera. For the F200 camera, the volumetric shooting space is determined by deviations from the horizontal and vertical axes of the camera, as well as the effective distance between the user and the camera. If the user moves beyond this pyramid, the camera will not be able to track the interaction mode. Below for reference is a table with the parameters of the field of view.
| Parameter | Range |
|---|---|
| Effective gesture recognition range | 0.2-0.6 m |
| Effective Face Recognition Range | 0.35–1.2 m |
| Field of view of a color image camera, degrees | 77 x 43 x 70 (cone) |
| Field of view of an infrared (IR) camera, degrees | 90 x 59 x 73 (cone) Field of view of IR illuminator = n / a x 56 x 72 (pyramid) |
| Color resolution | Up to 1080p at a frame rate of 30 frames per second (fps) |
| Depth Map Resolution | Up to 640 x 480 at 60 fps |
The color image capture camera and the depth camera as part of the F200 device have different resolutions, so developers should consider the volume of the shooting space for the intended modes of operation with the application. As shown in the table above, the effective range of gesture recognition is small, while face tracking works at a greater distance.
Why is this important in terms of user interface? End users have no idea how exactly the camera “sees” them. Since they know about the interaction zones, this can lead to annoyance when working with the application, since it is impossible to determine exactly what the problem was. In the image on the left in fig. 1 user’s hand is in the camera’s field of view, and outside the field of view in the image on the right; In this case, tracking may be lost. The problem is further complicated if the application uses control with both hands or several control modes at once, for example, using both the face and hands. Also take into account the change in the field of view of the camera when deploying the application on devices of different sizes, for example, on laptops and all-in-ones: in the latter case, the interaction zone will be located higher than on laptops. In fig. 2 shows various scenarios in which users are in front of different devices.

Figure 2. Camera field of view and device size
Information about these parameters will help to build an effective feedback mechanism in the application to provide users with clear instructions on the correct way to use the device and camera. Now let's see how to get some of these field of view parameters in an application through the SDK.
Technical implementation
The Intel RealSense SDK provides APIs that allow you to obtain field of view and camera range data. The QueryColorFieldOfView and QueryDepthFieldOfView APIs work in the device interface, regardless of device type. This is how the code is implemented.

Although the returned data structure is in the PXCPointF32 format, the returned values indicate the angles X (horizontal view) and Y (vertical view) in degrees. These are the manufacturer-specified values for this camera model, and not configured on the device programmatically.
The next parameter of the shooting volume is the range. The QueryDepthSensorRange API returns the range value in millimeters. This value is also set by the manufacturer by default for this model, and is not configured programmatically on a specific device.
Knowing these APIs and how to implement them in code will help create an effective feedback system for users. In fig. Figures 3 and 4 show examples of visual feedback for volumetric shooting space.

Figure 3. Tips on the distance to the camera

Figure 4. Schematic representation of the surrounding world
Simple hints indicate the near and far boundaries of the interaction zone. Without prompts, the user simply will not understand what needs to be done if the system stops responding to his actions. Filter distance data and show a hint after a short delay. Also, use directions and tips instead of error alerts. A schematic representation of the world around us will help users navigate and become familiar with the concepts of the zone of interaction of the depth camera.
It is recommended to use such schematic images on the help screens and in educational screensavers, as well as in games, users of which can work with the camera for the first time. For maximum efficiency, you should show a schematic image of the world around you only when educating users and on help screens. Instructions should be simple and easy to understand, when compiling them, you need to focus on the intended audience of the application.
Instead of the APIs listed above, you can use the alerts provided in each SDK to record specific user actions. Consider, for example, the following face recognition solution. The following table lists the PXC [M] FaceData alerts .
As you already know, the SDK supports the detection of up to 4 faces in sight. Using the face identifier, you can receive alerts related to each person, depending on the needs of the application. Also, tracking can be completely lost (for example, if a person moves into the field of view of the camera and then leaves the field of view at a speed that is too high for tracking). In this scenario, you can use survey volume data along with alerts to create a reliable feedback mechanism for users.
| Alert Type | Description |
|---|---|
| ALERT_NEW_FACE_DETECTED | New face detected. |
| ALERT_FACE_NOT_DETECTED | There is no face in the scene. |
| ALERT_FACE_OUT_OF_FOV | The person is out of sight of the camera. |
| ALERT_FACE_BACK_TO_FOV | The face returned to the field of view of the camera. |
| ALERT_FACE_LOST | Lost face tracking. |
The SDK also allows you to detect overlays, i.e., cases where the subject is obstructed by a foreign object. For descriptions of unsupported and partially supported scenarios, see the F200 Camera User Interface Guide. Regardless of what type of overlay you are trying to track, the following set of alerts will be very useful.
| Alert Type | Description |
|---|---|
| ALERT_FACE_OCCLUDED | The face is obstructed. |
| ALERT_FACE_NO_LONGER_OCCLUDED | The face is no longer obstructed. |
| ALERT_FACE_ATTACHED_OBJECT | The face is obstructed by some object, such as a hand. |
| ALERT_FACE_OBJECT_NO_LONGER_ATTACHED | The face is no longer obstructed by any object. |
Now let's move on to the alerts in the hand tracking module. They are available in the PXC [M] HandData module as part of the SDK. As you can see, some of these alerts also implicitly use range detection (remember the different ranges for face recognition and hand recognition modules).
| Alert Name | Description |
|---|---|
| ALERT_HAND_OUT_OF_BORDERS | The tracked hand is outside the two-dimensional bounding box or three-dimensional bounding cube specified by the user. |
| ALERT_HAND_INSIDE_BORDERS | The tracked hand has returned inside the two-dimensional bounding box or three-dimensional bounding cube specified by the user. |
| ALERT_HAND_TOO_FAR | Tracked hand is too far from the camera. |
| ALERT_HAND_TOO_CLOSE | Tracked hand is too close to the camera. |
| ALERT_HAND_DETECTED | The tracked hand is recognized, its mark is available. |
| ALERT_HAND_NOTE_DETECTED | A previously discovered hand is lost because it is either out of sight or blocked. |
| And many others... | See the documentation. |
Now you know what features the SDK provides, and you can easily apply them in the application code. An example is shown in the following code snippet.


Replace the wprintf_s instructions with the visual feedback implementation logic. You can turn on not all alerts, but only some of them, as shown below.

In fig. Figures 5 and 6 show examples of effective visual feedback using alerts.

Figure 5. Image of the user in the camera's field of view

Figure 6. Overlay of the user image
API links in the SDK documentation
- QueryColorFieldOfView
- QueryDepthFieldOfView
- QueryDepthSensorRange
- Facial recognition module field of view alerts
- Hand Recognition Field Alerts
Result 2. Reduced user fatigue
User Interface Usage Scenario: Selecting the appropriate input method for the required accuracy.
When creating applications using the Intel RealSense SDK, it’s important to keep the input modes in mind. Choosing the appropriate input modes for different scenarios plays a crucial role in the application. Keyboard, mouse, and touch screen input are highly accurate, while gesture input is low accuracy. For example, to work with applications where you need to work a lot with data, it is preferable to use keyboard and mouse input rather than gestures. Try to imagine what it will try to select a specific cell in Excel with your finger instead of the mouse (see. Fig. 7). Such actions will not cause anything but extreme annoyance and fatigue of the user. When trying to perform precise actions, users naturally strain their muscles, which, in turn, leads to increased fatigue.

Figure 7. Choosing the right input method
You can use the touch control or the mouse to select items in the menu. The input modes supported by the Intel RealSense SDK provide an immediate, natural, touch-free interaction mechanism and enable you to create engaging applications. Use these modes in a way that does not require multiple repetitive gestures. To use gestures, continuous actions are best suited in which errors do not lead to undesirable risk.
Gesture direction selection
It is recommended to use gestures directed horizontally or in an arc. If a choice is available, then for the convenience of users, try to use horizontal movements instead of vertical movements. In addition, do not use actions that force users to raise their hands above shoulder height. Remember the gorilla hand effect?

Figure 8. Selecting the direction of motion
Select relative or absolute motion
Allow relative motion instead of absolute in all appropriate cases. With relative movement, the user can “reset” the position of the virtual hand on the screen to achieve a more comfortable position of his own hand in front of the camera. This is approximately the same as raising the mouse and moving it from the edge of the mat to the middle if you need to move the pointer further. With absolute movement, the relationship between the position of the pointer on the screen and the position of the hand on the screen is always preserved. Applications should use a motion model that is most appropriate for each specific context.
Understanding Speed
An integral part of the accuracy problem is the speed factor. If users move their hands too quickly in front of the camera, then there is a risk of a complete loss of tracking, since in doing so the hands may be outside the shooting volume. When using gestures with fast movements in applications, fatigue of users increases and the risk of errors increases. Therefore, it is very important to take into account the velocity factor and its influence both on the effective range (close to the camera, at a distance of 20 to 55 cm, you can detect fast movement at a speed of up to 2 m / s), and on the shooting space (at a small distance from the camera only one hand can be in sight).
Understanding user actions and interacting with objects
The natural movements of a person are not always smooth: the human body often moves unevenly and jerkily, which is interpreted by the camera as several different interactions. When creating applications for the Intel RealSense SDK, keep in mind the relationship between actions and objects. For example, if there are objects that can be “taken” by hand using gestures, you should consider the size of such objects and their location, you need to take into account the distance to the edges of the screen and the place where you can “drag” such objects, as well as ways to detect tracking failures .
Here are some suggestions to help you overcome these issues.
- Objects should be large enough so that they are not affected by trembling or uneven movement of the hand. The distance between the objects should be large enough so that users can not accidentally take the wrong object that they need.
- Do not place the interaction elements too close to the edges of the screen, since in this case there is an increased risk of the user's hand getting out of the field of view and losing tracking, which will cause the user inevitable and righteous irritation.
- If dragging objects is important in the interface, then it should be obvious where exactly you can drag the taken object and where it can be released.
- If a tracking failure occurs when a user moves an object, the moved object should return to its original location, and the user should be notified of the tracking failure.
Technical implementation: speed and accuracy
If the application does not require data on the joints of the hand, and more often quick movements of the hand are used, it makes sense to use the Blob module. The following table lists the various possible scenarios and the estimated accuracy in each of them. When tracking the entire arm with joint data, movement should be slower, but this limitation can be circumvented using either tip tracking or Blob mode. In addition, thanks to the Blob mode, you will get a number of advantages if the application is intended for children.
| Tracking mode | Only hands? | Output | Computing load | Limitations |
|---|---|---|---|---|
| Full hand | Yes | Segmented image, limb points, side of the arm, alerts, joint data, finger data, open or squeezed palm data, gestures | Highest, multiple threads | 2 hands, range 60 cm, low hand speed |
| Extremities | Yes | Segmented image, limb points, side of the arm, alerts | Medium Single Stream | 2 hands, range 60 cm, average hand speed |
| Blob | Not | Segmented image, limb points, contour line | Low single stream | 4 objects, range 100 cm, high speed |
If the application requires more complete control and you need to control the speed of movement, then you can get speed data at the level of the joints of the hand using PXCMHandConfiguration.EnableJointSpeed . This allows you to get either the absolute value of the speed calculated on the basis of the current and previous positions of the hand, or the average speed over a certain period of time. Nevertheless, with this approach, the load on the CPU and RAM increases significantly, therefore, this method should be used only when it is absolutely necessary.
Since you cannot force users to move smoothly without jerking, the Smoother program (PXC [M] Smoother) is also included in the SDK, smoothing jerks when moving hands in front of the camera. This program uses various linear and square algorithms. You can experiment with them and choose the most suitable. In fig. Figure 9 below shows that uneven hand movement is largely smoothed out by this program.

Figure 9. Data with anti-aliasing and without anti-aliasing
Another way to detect a too fast hand movement is to list TRACKINGSTATUS_HIGH_SPEED in the PXCMHandData.TrackingStatusType property . When a face is detected, rapid movements can lead to loss of tracking. Use PXCMFaceData.AlertData.AlertType- ALERT_FACE_LOST to determine lost tracking. If you use hand gestures to control the operating system using the Touchless Controller, use the SetPointerSensitivity and SetScrollSensitivity functions in the PXC [M] TouchlessController to adjust the pointer sensitivity and scroll.
Bounding box
An effective mechanism to achieve smooth action and interaction with objects is the use of restrictive frameworks. They provide users with clear visual indications of the source and destination of the object the user interacts with.
Face and hand tracking modules in the SDK support the PXCMHandData.IHand.QueryBoundingBoxImage API , which returns the location and size of the tracked hand (two-dimensional bounding box) on the depth map. The PXCMFaceData.DetectionData.QueryBoundingRect API returns the bounding box of a detected face. You can also use PXCMHandData.AlertType - ALERT_HAND_OUT_OF_BORDERS to detect a hand extending beyond the bounding box.
Ссылки на API в документации SDK
- Алгоритм отслеживания Blob
- EnableJointSpeed
- Программа Smoother
- TouchlessController и SetScrollSensitivity
Рекомендации по созданию пользовательских интерфейсов и использованию API для камеры R200
The R200 camera is built into tablets and comes in the form of a removable device. It is designed to capture the space around the user. Among possible scenarios for using the R200 camera, it should be noted such solutions as augmented reality and shooting the entire human body. The surrounding world enters the field of view of this camera, therefore the nature and set of design problems for user interfaces differ from those described above for the F200 camera. This section describes some of the known user interface issues associated with the Scene Perception module (which developers will use in augmented reality applications) and the 3D Scanning module.
Result 1. Understanding the shooting volume and interaction areas for tablets
UI use case
As can be seen in fig. 10, the viewing angles of the R200 camera vertically and horizontally, as well as its range significantly differ from the similar characteristics of the F200 camera. The R200 camera can be used in two different modes: in active mode (when the user moves around, shooting a scene) and in passive mode (when the user works with a still image). When shooting an object or scene, make sure that the object is in the field of view of the camera while the user is shooting it in active mode. Also note that the range of this camera (depending on where it is used: indoors or outdoors) is different from the range of the F200. How to get these data points at run time to provide the user with visual feedback?

Figure 10. Volumetric shooting space of the R200 camera
Technical implementation
We have already discussed the QueryColorFieldOfView () and QueryDepthFieldOfView () APIs above in the F200 camera section. These functions are device independent, and they can be used for surround shooting with the R200 camera. Nevertheless, to detect the range of the R200 camera, you need to use a specialized API designed only for this device. To obtain such data for the R200 camera, you must use the QueryDSMinMaxZ API , which is available as part of the PXCCapture interface . It returns the minimum and maximum range of the camera in millimeters.
API links in the SDK documentation
Result 2. Understanding user actions and interaction with the scene
User Interface Usage Scenario: Planning for Scene Features and Camera Capabilities
When working with the camera in active mode, keep in mind the limitations of the camera. Depth data will be less accurate when shooting scenes with very bright areas, with reflective and black surfaces. Information about when tracking may be unsuccessful will help to build an effective feedback mechanism in the application to gently remind the user of the necessary actions, and not finish the job with an error.
Technical implementation
The Scene Perception and 3D Scanning modules have different requirements, and therefore they use different mechanisms to detect the minimum requirements.
Scene Perception . Always use the CheckSceneQuality API in the PXCScenePerception module to determine if a scene is suitable for tracking. The API returns a value between 0 and 1. The higher the returned value, the better the scene to track. This is how the code is implemented.

After the scene quality is considered satisfactory and the tracking starts, you should dynamically check the tracking status using the TrackingAccuracy API in the PXCScenePerception module . This API provides tracking accuracy.
| Name | Description |
|---|---|
| High | High tracking accuracy |
| Low | Low tracking accuracy |
| MED | Average tracking accuracy |
| FAILED | Tracking Failure |
To maximize the quality of scene data, you can also adjust the resolution of voxels (a voxel is a unit of resolution for a three-dimensional image). Depending on what the camera is tracking (space the size of a room, table surface, or an object located close to it), adjust the resolution of the voxels according to the table below for best results.
| Name | Description |
|---|---|
| LOW_RESOLUTION | Low resolution voxels. Use this permission to track room-sized space (4/256 m). |
| MED_RESOLUTION | The average resolution of voxels. Use this resolution to track the surface of a table (2/256 m). |
| HIGH_RESOLUTION | High resolution voxels. Use this permission to track small objects (1/256 m). |
3D Scanning The 3D Scanning algorithm provides the alerts shown in the table below. Use PXC3DScan :: AlertEvent to get this data .
| Name | Description |
|---|---|
| ALERT_IN_RANGE | The subject is at a suitable distance. |
| ALERT_TOO_CLOSE | The subject is too close to the camera. Invite the user to move the object away from the camera. |
| ALERT_TOO_FAR | The subject is too far from the camera. Invite the user to move the object to the camera. |
| ALERT_TRACKING | The subject is tracked correctly. |
| ALERT_LOST_TRACKING | Tracking of the captured subject is lost. |
If the application contains data on camera tracking and the limitations of the module used, this data can be used to provide visual feedback, clearly indicating to users how their actions were interpreted by the camera. In case of loss of tracking, you can show how to work with the camera more correctly. Examples of visual feedback are shown here solely as an example, they must be adapted in accordance with the requirements of the application and with the user interface device.
Sample training program at startup.

Figure 11. Training
Preview of a shot area or subject.

Figure 12. Preview
Prompts for the user.

Figure 13. Tips for the user
Снижение утомляемости в случае, когда пользователь держит устройство в руках
Most applications will use the device with both active and inactive camera modes. (These two modes differ as follows: the camera works in the active mode when the user is holding a tablet in order to view the scene through the camera or for shooting; the camera works in the inactive mode when the user places the tablet and works with the contents on the screen, camera is off). To reduce user fatigue, it is necessary to understand how the user holds and uses the device in each of these modes, and choose the interaction zones accordingly. When using the camera in active mode, the user gets tired faster because he holds the device on weight, as shown in Fig. 14.

Figure 14. Using the device in active and inactive modes
Выбор подходящего режима для действия
The usage mode also directly determines the nature of interaction with the application through the user interface. In active mode, the user holds the device with both hands. Therefore, any visual elements of the application, such as buttons, should be in easily accessible places on the screen. Studies show that in such cases it is best to use the edges of the screen. Recommended touch zones are shown in fig. 15. In addition, in the active mode, the accuracy of the touch decreases, so the active mode is best suited for short-term shooting.
On the contrary, in the inactive mode it is more convenient for the user to work with the device, the user interacts more precisely with the interface elements and can use the application for a long time.

Figure 15. Touch zones in active and inactive modes
API links in the SDK documentation
Conclusion
When developing applications using Intel® RealSense ™ technology, developers must take into account the needs and specifics of the end-user experience from the earliest stages. The recommendations given in this article will serve as the basis for solving some important problems of user interfaces and for implementing the necessary components in the code using the SDK.