Restoration of three-dimensional models by the active parallax method

Hello dear readers.

I am a student at Moscow State Technical University. Bauman. I hasten to share my experience in the field of image processing and restoration of three-dimensional objects by the active parallax method.

Currently, in various fields of activity, such as manufacturing, medicine, computer graphics, robotics, and technical vision, three-dimensional modeling and prototyping of real-world objects are actively used. In this regard, the development of 3d scanners and cameras that create a 3d model of a registered object is becoming increasingly relevant.

A bit of history

In 1999, 3DVSystems, the world leader in three-dimensional video images, developed the ZCam camcorder with a unique technology for measuring distance to objects in real time. This technology made it possible to perceive and process a three-dimensional image, being directed at an object from just one of its sides. In 2009, Microsoft bought out the assets of 3DVSystems and, on the basis of ZCam, the controller for the Xbox game console began to be developed. In 2010, Microsoft announced the beloved Kinect - a game controller that allows you to control the game with your body. Artec-group company produces 3D scanners for digitizing the shape of an object in real time. Such scanners can be used in medicine, manufacturing and tuning cars and creating special effects in movies and video games.

Fig. 1. An example of using algorithms in video games

0. Parallax method for registering 3d objects

Three-dimensional object registration systems can be built on various principles, one of which is the stereoscopic principle. The stereoscopic system consists of two cameras recording an object from different but not too different angles. On the obtained images, the corresponding points (stereo identification) are determined. Then, knowing the internal parameters of the stereopair cameras, as well as their relative position, it is possible to determine the three-dimensional coordinates of the points of the object using the triangulation method [1]. Despite the successes of recent years, a number of issues remain in solving this problem related to the fundamental limitations of this method, in particular, to the stereo-identification of points of objects that do not have a pronounced texture or have large non-textured (homogeneous) regions [2].

Fig. 2. An example of a stereo pair.

To overcome these shortcomings of the stereoscopic method, you can replace one of the cameras of a stereo pair with a projector and obtain a device for recording three-dimensional objects based on the active parallax principle. A diagram of a system built on this principle is shown in the figure: a picture is projected onto the object (structured illumination), its distortions caused by the shape of the object are recorded by the camera [3, 4].

Fig. 3. Schematic diagram of the system built on the active parallax method

1.Active parallax method for registering three-dimensional objects

Currently, many different variants of paintings have been developed for use in structured illumination systems, which are both a series of changing paintings (paintings with temporary multiplexing) and constant paintings using various color coding options [3, 4]. Temporary encoding uses a sequence of black and white patterns, as shown in Fig. 4, a. The idea of ​​this method is to encode the position of a pixel on the projector matrix with a set of intensities in the sequence of projected pictures. The set of paintings shown in Fig. 2a, it uses “bit” coding: a set of two-color (black and white) pictures is a binary code that defines the “number” of a pixel in a row. In addition to “bit” coding, other binary coding methods (binary picture shift, Gray code and others [3, 4]). This method is not sensitive to the color of the surface, it allows you to encode every pixel on the projector matrix, but it requires a static position of the object
due to the large number of paintings used.

Fig. 4. Pictures used to create a structured backlight: a - with time coding, b - with color coding

The color coding, an example of which is shown in Fig. 4, b, uses only one picture. The position of each pixel is uniquely encoded by the color value of this pixel and several of its “neighbors”. When creating a picture with color coding, they usually strive to obtain the minimum size of the neighborhood (the number of "neighbors") of the pixel required for unambiguous restoration, and the minimum number of different colors (to increase the reliability of determining each color). M-sequences or de Brujin sequences possess such properties [3, 4]. The advantage of this method is the ability to restore the shape of an object in just one picture, and as a result, the ability to register moving objects.

2. Algorithms for processing registered images

The illumination pattern used, shown in Fig. 5, is 128 narrow vertical stripes of six colors (three primary - R, G, B, and three additional - C, M, Y), separated by black gaps. The color sequence was obtained using the M-sequence generator; a combination of every 3 adjacent bands occurs only once.

Fig. 5. The color sequence used.

The image processing algorithm solves two main problems:
1. Detecting bands in the image and determining the position of the center line for each band (highlighting the bands);
2. The definition of color for each selected area of ​​the strip (classification by color).

3. The allocation of the centers of the stripes

The strip highlighting algorithms for various color-coded backlight patterns can be divided into two types: with edges and peaks. The used backlight pattern contains black gaps and requires the use of an algorithm of the second type. To highlight the bands in the image, you can use the direct method of searching for local maxima, the method of crossing the zero line of the second derivative (Gaussian Laplacian detector (LoG)) or the Canny method [6]. The registered image contains three color channels (R, G, B), therefore, to apply the above methods, either conversion to one channel using it for processing, or combining the results of highlighting bands across multiple channels is required.

In various works, the following methods were proposed for solving this problem: in [9], the Canny method was used for the brightness component (Y) after conversion to the YCbCr color space; In [7], a direct method was used to search for local maxima along scan lines by three color channels R, G, B, followed by combining the results at the sub-pixel refinement stage; in [9], the centers of the bands were distinguished according to the second derivative of the color value (V) after conversion to the HSV color space.

Figure 6 shows the image registered in a dark room. In [5], implementations of the algorithms for searching for maxima and determining their colors, which work for images produced without external illumination, were considered. However, when adding an external light source, the algorithms showed unstable operation.

Fig. 6. Image registered in a dark room without exposure

To select the optimal color conversion for highlighting the maxima, we analyzed the cross sections of images recorded during the experiments and evaluated the suitability of using various values ​​from the point of view of choosing reliable threshold values ​​of the algorithm. In [7], the values ​​of various quantities were shown: the color channels R, G, and B, the arithmetic average of these three channels, the brightness component (Y) after conversion to YCbCr, and the color value (V) after conversion to HSV in the cross section of the image of the object, perpendicular to the direction of the projected bands. Also, in [7], it was concluded that when using a linear combination of colors (R + G + B), pure colors are strongly suppressed, and can be skipped. According to the results of the analysis, we can conclude that the most stable detection of the centers of the bands can be obtained,

Fig. 7. The values ​​of various values ​​in the cross section of the image of the object perpendicular to the direction of the projected bands.

During the work in the MatLab environment, algorithms were implemented to distinguish the center of the bands by the color value V and the mean square of the RGB channels using the direct method for finding local maxima and a method similar to the Canny method. When implementing the direct method for searching for maxima, two threshold values ​​were set: - the minimum absolute value at the estimated maximum, - the minimum value of the difference in values ​​at the estimated maximum and its "neighbors". When implementing the Canny method at the stage of non-maximum suppression, instead of the values ​​of the gradient modulus, in the first case, the V values ​​themselves are used directly, as in [5], and in the second case, the mean-square of RGB. Before converting the image to HSV color space, a Gaussian smoothing filter was applied to the image;

To evaluate the results of the two methods using different color channels, we used images recorded during the experiments at the bench used in [5]. The characteristics of the devices used and the registration conditions during the experiment are described in detail at the end of this article.
A quantitative assessment of the dependence of the strip detection results on the color channel used (R + G + B, Y or V) was carried out according to the images of the object in the form of a smooth white plane (“Plane”) and a smooth white object (plaster bust of Lenin, “Bust”). We compared the number of center points of the bands detected by the algorithm when working on the image color channel under consideration when illuminated by a pattern with colored stripes, and the number of center points of the bands detected by the same algorithm when working on the brightness channel Y of the image when illuminated by a pattern with white stripes. The threshold values ​​for each color channel were chosen as, where the maximum value for a given color channel within the considered image fragment is a single fixed value for all channels.

The same images were used to evaluate the dependence of the strip detection results on the method used (the direct method for searching for local maxima or the Canny method). A quantitative assessment in this case is possible only for the “Plane” object, since for it one can determine the “true” number of points of strip centers as the product of the number of stripes by the number of lines of the image. For both methods, when detecting bands on a flat object by V, the same maximum result was obtained. For the object "Bust" it is possible to conduct a qualitative assessment, the results of the detection of bands are shown in Fig. 8. It can be seen that the Canny method allows one to detect more points of local maxima at the same threshold value due to the use of “strong” and “weak” thresholds, but this advantage is insignificant. The main advantage of the Canny method is

Fig. 8. The results of the work of strip allocation algorithms: (a) direct method, (b) Canny method

4 Classification of highlighted stripes by color. Clustering

At this stage, the problem of classifying the selected center points of the bands by color into 7 types is solved: 6 types corresponding to 6 used colors and “unclassified” points that cannot be reliably correlated with any of the used colors. To solve this problem, a number of different methods can be applied, among which two groups can be distinguished: methods with fixed thresholds and adaptive methods.

Consider possible solutions to this problem in different color spaces: YCbCr and HSV, presented in the literature. Classification is performed by the threshold method, threshold values ​​are selected in advance. Using the clustering algorithm allows us to adapt the algorithm to changes in ambient light and increase the classification reliability when working with color objects. Consider the clustering algorithms in the color spaces YCbCr (for difference components Cb and Cr) and HSV (for saturation S and tone H).

The clustering algorithm consists of two repeating actions:
• Assign each point to the cluster, the distance to the center point of which is the smallest of all;
• Using the current distribution of points across the clusters, determine the average value for each cluster and assign this value to the center point of the cluster.

Fig. 9. The histogram obtained from the white plane

Fig. 10. The influence of the structure of the object on the quality of the histogram. On the left is a histogram obtained by processing a frame with a white bust. On the right is a histogram obtained by processing a frame with a colored toy

Fig. 11. Clustering results for a white object a, b — clustering in the Cb-Cr color space; c, d - clustering in the HS color space

The affiliation of a point to a cluster is shown in color, the location of the points corresponds to their coordinates in the CbCr plane (Fig. 12, a) and in the HS plane (Fig. 12, b). It can be seen from the figures that the algorithm incorrectly classifies the set of points with a low value of saturation S. The reason for this error is the method of calculating the distance from the point to the center of the cluster, which is determined without taking into account the shape of the cluster. For clustering in Cartesian space, the distance from the point to the center of the cluster without taking into account the shape of the clusters.

This problem can be eliminated by clustering according to the color tone H and saturation S, and introducing an artificial anisotropy coefficient. In this case, the distance from an arbitrary point to the nearest cluster can be calculated using the formula, where k <1. The introduction of such anisotropy will allow us to take into account the “elongation” of the cluster along S. In the figure, Fig. 11, c, d show the results of clustering by the values ​​of H and S at k = 1/3. A color tone shift was used to preserve the integrity of the red cluster.

Also in the figures of Fig. 12b, d, one can notice many that cannot be reliably attributed to any of the clusters. By entering the threshold value for saturation S or for the distance to the center of the cluster, we can distinguish these points into the “unclassified” cluster.

Fig. 12 The result of the work of the algorithms for band allocation and clustering

After decoding, we get a three-dimensional cloud of points:

Fig. 13. Reconstructed three-dimensional point cloud for a white object

Unfortunately, adding a weak background or texture introduces non-linear distortions and the algorithm stops working. About how they fought with the background and made the algorithm adaptive - in the next part.

List of references

1. Hartley RI, Zisserman A. Multiple View Geometry. Cambridge, UK: Cambridge University Press, 2000.
2. Scharstein D., Szeliski RA taxonomy and evaluation of dense two frame stereocorrespondence algorithms // International Journal of Computer Vision. 2002. Vol. 47 (1-3).
P. 7–42.
3. Salvi J., Pages J., Batlle J. Pattern codification strategies in structured light systems // Pattern Recognition. 2004. Vol. 37 (4). P. 827–849.
4. Geng J. Structured-light 3d surface imaging: a tutorial // Advances in Optics and Photonics. 2011. Vol. 3. P. 128–160.
5. Safroshkin M.A. Experimental studies of the parallax registration method for 3D objects with color coding // Youth Scientific Technical Bulletin, 2013. URL: .
6. Gonzalez R., Woods R., Eddins S. Digital image processing in MatLab. M .: Technosphere, 2006.
7. Fechteler P., Eisen P. Adaptive color classification for structured light systems // Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference. P. 1-7.
8. Fechteler P., Eisen P., Rurainsky J. Fast and High resolution 3D face scanning // Image Processing, 2007. ICIP 2007. IEEE International Conference. P. 81-847. VK De Wansa Wickramarante, VV Ryazanov, AP Vinogradov. Accurate reconstruction of 3D model of a human face using structured light // Pattern Recognition and Image Analysis, 2008. Vol. 18, No. 3, P. 442-446.
9. Xing Lu, Jung-Hong Zhou, Dong-Dong Liu, Jue Zhang, Application of color structured light pattern to measurement of large out-of-plane deformation // Acta-Mech. Sin (2011). Vol. 27 (6). P. 1098-1104.
10. Permuter, H .; Francos, J .; Jermyn, IH A study of Gaussian mixture models of color and texture features for image classification and segmentation // Pattern Recognition, 2006. Vol. 39, No. 4, P. 695–706.6. Zhang Z. Flexible camera calibration by viewing a plane from unknown orientations // International Conference on Computer Vision, 1999. P. 666–673.
11. Zhang Z. Flexible camera calibration by viewing a plane from unknown orientations // International Conference on Computer Vision, 1999. P. 666–673.
12. Falcao G., Hurtos N., Massich J. Plane-based calibration of a projector-camera system // VIBOT Master, 2008.
13. VK De Wansa Wickramarante, VV Ryazanov, AP Vinogradov. Accurate reconstruction of 3D model of a human face using structured light // Pattern Recognition and Image Analysis, 2008. Vol. 18, No. 3, P. 442-446 I

took a stereo pair from .

Also popular now: