Viz - New 3D visualization module in the OpenCV library



    Good afternoon, today's blog post I want to devote to reviewing the new module for 3D visualization of Viz in the OpenCV library , in the design and implementation of which I participated. Probably I should introduce myself here, my name is Anatoly Baksheev , I work for Itseez , I have been using the OpenCV library for 7 years now, and together with my colleagues I have been developing and developing it.

    What does 3D visualization have to do with computer vision, you ask, and why did we even need such a module? And you will be right when you look at computer vision as an area that works with images. But we live in the 21stcentury, and the scope of computer vision has gone far beyond just image processing, highlighting the boundaries of objects or face recognition. Science and technology have already learned to measure our three-dimensional world in a more or less acceptable quality. This was facilitated by the appearance of cheap Kinect sensors on the market a few years ago , which at that time made it possible to get a scene representation in the form of a three-dimensional color cloud of dots with good accuracy and speed, and progress in reconstructing the 3D world of data from a series of images, and even leaving mobile technologies, where the integrated gyroscope and accelerometer greatly simplifies the task of assessing the movement of the camera of a mobile device in the 3D world, and therefore the accuracy of scene reconstruction.

    All this prompted the development of various methods and algorithms working with 3D data. 3D segmentation, 3D noise filtering, 3D shape recognition of objects, 3D face recognition, 3D body position tracking, or hands for gesture recognition. You probably know that when Kinect for XBox went on sale, Microsoft provided SDKs for game developers to determine the position of the human body, which led to the appearance of a large number of games with an interesting interface - when, for example, a game character repeats the movements of the player standing in front of Kinect . The results of such 3D algorithms must be somehow visualized. They are three-dimensional trajectories, reconstructed geometry, or, for example, the calculated position of a human hand in 3D. Also, such algorithms must be debugged,


    Various ways to display camera paths in OpenCV Viz

    Thus, since the development vector is shifted to the 3D region, algorithms working with 3D data will appear more and more in OpenCV . And since there is such a trend, we are in a hurry to create a convenient infrastructure for this. Viz module- This is the first step in this direction. OpenCV has always been a library containing a very convenient base on the basis of which algorithms and computer vision applications were developed. It’s convenient because of the functionality, since it includes almost all the most frequently used operations for manipulating images and data, and because of the API (containers, basic types and operations with them) that has been carefully developed over the years and allows for very compact implementation computer vision methods, saving developer time. We hope Viz meets all these requirements.

    For the impatient, here is a video demonstrating the capabilities of the module.



    Viz Philosophy


    The idea of ​​creating such a module came to me when I somehow had to debug one visual odometry algorithm ( vslam ), in a limited time, when I felt in my own skin how such a module would help me and what functionality I would like to see in it . And colleagues stated that it would be healthy to have such a module. Everything led to the beginning of its development, and then bringing it to a more or less mature state with Ozan Tonkal , our Google Summer Of Code student. Work on improving Viz is ongoing.

    The design idea is that it would be nice to have a system of three-dimensional widgets, each of which could be rendered in a 3D visualizer, simply by passing the position and orientation of this widget. For example, a point cloud that comes with Kinect is often stored in a coordinate system associated with the position of the camera, and for visualization it is often necessary to convert all point clouds taken from different positions of the camera into some kind of global coordinate system. And it would be convenient not to recalculate the data every time in the global system, but simply to set the position of this point cloud. Thus, in OpenCV Viz, each supported widget object is formed in its own coordinate system, and then it is shifted and oriented already during the rendering process.

    But no good idea comes to mind only to one person. As it turned out, the libraryVTK also implements the same approach for manipulating and visualizing scientific data. Therefore, the task boiled down to writing a literate wrapper over a subset of VTK , with an interface and data structures in the OpenCV style and writing some set of basic widgets with the ability to expand this set in the future. In addition to the described, VTK satisfies the cross-platform requirement, so the decision to use it was chosen almost immediately. I think a little inconvenience due to VTK dependency is more than offset by usability and extensibility in the future.

    Representation of the position of objects in Viz


    The position in Euclidean space is determined by rotation and translation. A rotation can be represented as a rotation matrix, as a rotation vector ( Rodrigues' vector ), or as a quaternion. Translation is just a three-dimensional vector. Rotation and translation can be stored in separate variables or sewn into an expanded 4x4 affine transformation matrix. Actually, this method is proposed for ease of use. But ... “It’s also convenient for me!”, You will say, “each time to form such a matrix when rendering any object!” And I agree with you, but only if you do not provide a convenient means for creating and manipulating poses in this format. This tool is a specially written class cv :: Affine3d, which by the way, in addition to visualization, I recommend using in the development of odometry algorithms. Yes, quaternion lovers can throw stones at me. I will justify that in the future it is planned to support them.

    So let's give a definition. The position of each object in Viz is a transformation from the Euclidean coordinate system associated with the object to a certain global Euclidean coordinate system. In practice, there are various agreements on what transformation is and what is being converted. In our case, we mean the conversion of points (point transfer) from the coordinate system of the object to the global one. Those:

    image


    where P G , P O are the coordinates of the point in the global coordinate system and in the coordinate system of the object, M is the transformation matrix or the pose of the object. Let's look at how you can form the pose of an object.

    // Если известна система координат связанная с объетом
    cv::Vec3d x_axis, y_axis, z_axis, origin;
    cv::Affine3d pose = cv::makeTransformToGlobal(x_axis, y_axis, z_axis, origin);
    // Если же необходимо вычислить позу камеры
    cv::Vec3d position, view_direction, y_direction;
    Affine3d pose = makeCameraPose(position, view_direction, y_direction);
    // Единичные преобразования, поза объекта совпадает с глобальной системой
    Affine3d pose1;  
    Affine3d pose2 = Affine3d::Identity();
    // Из матрицы поворота и трансляции
    cv::Matx33d R;
    cv::Vec3d t;
    Affine3d pose = Affine3d(R, t);
    // Если вы сторонник жесткой оптимизации и храните матрицы как массивы на стеке
    double rotation[9];
    double translation[3];
    Affine3d pose = Affine3d(cv::Matx33d(rotation), cv::Vec3d(translation));
    

    Or maybe you have already developed visual odometry algorithms, and your program already has these transformation matrices stored inside cv :: Mat? Then the pose in the new format can be easily obtained:
    // Для матриц 4x4 или 4х3
    cv::Mat pose_in_old_format;
    Affine3d pose = Affine3d(pose_in_old_format);
    // Для матрицы 3х3 и трансляцией отдельно
    cv::Mat R, t;
    Affine3d pose = Affine3d(R, translation);
    // Для вектора Родригеса и трансляции
    cv::Vec3d rotation_vector:
    Affine3d pose = Affine3d(rotation_vector, translation);
    

    In addition to constructing, this class also allows you to manipulate poses and apply them to three-dimensional vectors and points. Examples:
    // Поворот на 90 градусов вокруг Oy затем перемещение на 5 вдоль Ox.
    Affine3d pose = Affine3d().rotate(Vec3d(0, CV_PI/2, 0,)).translate(Vec3d(5, 0, 0));
    // Применение позы
    cv::Vec3d a_vector;
    cv::Point3d a_point;
    cv::Vec3d transformed_vector = pose * a_vector;
    cv::Vec3d transformed_point  = pose * a_point;
    // Комбинация двух поз
    Affine3d camera1_to_global, camera2_to_global;
    Affine3d camera1_to_camera2 = camera2_to_global.inv() * camera1_to_global
    

    This should be read as follows: if we multiply the point on the right in the coordinate system of camera 1, then after the first (right) conversion we get the point in the global system, and then invert the transformation from the global system to translate it into the coordinate system of camera 2. That is, we get the pose of camera 1 relative to the coordinate system of camera 2.
    // Расстояние между двумя позами можно вычислить так
    double distance = cv::norm((cam2_to_global.inv() * cam1_to_global).translation());
    double rotation_angle = cv::norm((cam2_to_global.inv() * cam1_to_global).rvec());
    

    On this, probably, we must complete our excursion into the possibilities of this class. Who liked it, I suggest using it in your algorithms, as The code with it is compact and easy to read. The fact that cv :: Affine3d instances are allocated on the stack, and all methods are inline methods, opens up possibilities for optimizing the performance of your application.

    Visualization with Viz


    The most important class in charge of rendering is called cv :: viz :: Viz3d. This class is responsible for creating a window, initializing it, displaying widgets, and managing and processing user input. You can use it as follows:
    Viz3d viz1(“mywindow”); // подготавливаем окно с именем mywindow
    ... добавляем содержимое ...
    viz1.spin();    // отображаем; исполнение блокируется, пока окно не будет закрыто
    

    Like almost all the high-level functionality in OpenCV, this class is essentially a smart pointer with counting links to its internal implementation, so it can be freely copied, or retrieved by name from the internal database.
    Viz3d viz2 = viz1;
    Viz3d viz3 = cv::viz::getWindowByName(“mywindow”):
    Viz3d viz4(“mywindow”); 
    

    If a window with the requested name already exists, the resulting Viz3d instance will point to it, otherwise a new window with that name will be created and registered. This is done to simplify the debugging of algorithms - now you do not need to transfer the window into the depth of the call stack every time you need to display something somewhere. It is enough to open a window at the beginning of the main () function, and then access it by name from anywhere in the code. This idea is inherited from the proven cv :: imshow (window_name, image) function in OpenCV, which also allows you to display a picture in a named window anywhere in the code.

    Widget System

    As mentioned earlier, a widget system is used to render various data. Each widget has several constructors and sometimes methods for managing its internal data. Each widget is formed in its own coordinate system. For instance:

    // задаем линию двумя точками
    WLine line(Point3d(0.0, 0.0, 0.0), Point3d(1.0, 1.0, 1.0), Color::apricot()); 
    // задаем куб двумя углами с гранями паралельно осям координат 
    WCube cube(Point3d(-1.0, -1.0, -1.0), Point3d(1.0, 1.0, 1.0), true, Color::pink());
    


    As you can see, we can specify an arbitrary line, however, for a cube, it is possible to set only a position, but not orientation relative to the coordinate axes. However, this is not a limitation, but rather even a feature that teaches you to think in the style of Viz. As we discussed earlier, when rendering, you can set any widget pose in the global coordinate system. Thus, with a simple constructor, we create a widget in its coordinate system, for example, we set the dimensions of the cube in this way. And then we position and orient it in the global when rendering.

    // Вектор Родригеса определяющий поворот вокруг (1.0, 1.0, 1.0) на 3 радиана
    Vec3d rvec = Vec3d(1.0, 1.0, 1.0) * (3.0/cv::norm(Vec3d(1.0, 1.0, 1.0));
    Viz3d viz(“test1”);
    viz.showWidget(“coo”, WCoordinateSystem());
    viz.showWidget(“cube”, cube,  Affine3d(rvec, Vec3d::all(0)));
    viz.spin();
    

    And here is the result:


    As we can see, rendering takes place through a call to the Viz3d :: showWidget () method, passing it the string name of the object, an instance of the created widget and its position in the global coordinate system. The string name is necessary so that you can add, delete and update widgets in the 3D scene by name. If a widget with that name is already present, then it is deleted and replaced with a new one.

    In addition to a cube and a line, Viz implements a sphere, a cylinder, a plane, a 2D circle, pictures and text in 3D and 2D, various types of trajectories, camera positions, and, of course, point clouds and a widget for working with a mesh (colorless, colored or textured). This many widgets is not final, and will expand. Moreover, there is the possibility of creating custom widgets, but more about that some other time. If you are interested in this feature, read this tutorial here. Now let's look at another example of how to draw point clouds:
    // читаем облако точек с диска. возвращается матрица с типом CV_32FC3
    cv::Mat cloud = cv::viz::readCloud(“dragon.ply”); 
    // создаем массив цветов для облака и заполняем его случайными данными
    cv::Mat colors(cloud.size(), CV_8UC3);
    theRNG().fill(colors, RNG::UNIFORM, 50, 255);
    // копируем облако точек и выставляем часть точек в NAN - такие точки будут проигнорированы
    float qnan = std::numeric_limits::quiet_NaN();
    cv::Mat masked_cloud = cloud.clone();
    for(int i = 0; i < cloud.total(); ++i)
        if ( i % 16 != 0)
            masked_cloud.at(i) = Vec3f(qnan, qnan, qnan);
    Viz3d viz(“dragons”);
    viz.showWidget(“coo”, WCoordinateSystem());
    // Красный дракон
    viz.showWidget(“red”, WCloud(cloud, Color::red()), 
    Affine3d().translate(Vec3d(-1.0, 0.0, 0.0)));
    // Дракон со случайными цветами
    viz.showWidget(“colored”, WCloud(cloud, colors), 
    Affine3d().translate(Vec3d(+1.0, 0.0, 0.0)));
    // Дракон со случайными цветами и отфильтрованными точками с единичной позой
    viz.showWidget(“masked”, WCloud(masked_cloud, colors), Affine3d::Identity());
    // Aвтоматическая раскраска, полезно если у нас нет цветов
    viz.showWidget(“painted”, WPaintedCloud(cloud), 
    Affine3d().translate(Vec3d(+2.0, 0.0, 0.0)));
    viz.spin();
    

    The result of this code:

    For more information about the available widgets, read our documentation .

    Dynamically changing scene

    Often, it is not enough just to display the objects so that the user can view them, but it is necessary to provide some dynamics. Objects can move, change their attributes. If we have a video stream with Kinect, then we can play the so-called point cloud videо. To do this, you can do the following:
    cv::VideoCapture capture(CV_CAP_OPENNI)
    Viz3d viz(“dynamic”);
    //... добавляем содержимое...
    // выставляем положение камеры чуть сбоку
    viz.setViewerPose(Affine3d().translate(1.0, 0.0, 0.0));
    while(!viz.wasStopped())
    {
        //... обновляем содержимое...
        //если надо, меняем позы у добавленных виджетов
        //если надо, заменяем облака новыми полученными с Kinect
        //если надо, меняем положение камеры
        capture.grab();
        capture.retrieve(color, CV_CAP_OPENNI_BGR_IMAGE);
        capture.retrieve(depth, CV_CAP_OPENNI_DEPTH_MAP);
        Mat cloud = computeCloud(depth);
        Mat display = normalizeDepth(depth);
        viz.showWidget("cloud", WCloud(cloud, color));
        viz.showWidget("image", WImageOverlay(display, Rect(0, 0, 240, 160)));
        // отрисовываем и обрабатываем пользовательский ввод в течении 30 мс
        viz.spinOnce(30 /*ms*/,  true /*force_redraw*/));
    }
    

    This loop will run until the user closes the window. At the same time, at each iteration of the loop, the widget with the old cloud will be replaced with a new one with a new cloud.



    Control interface

    At the moment, the camera control is made in the so-called trackball camera style, convenient for viewing various 3D objects. Imagine that in front of the camera there is some point in 3D around which this camera rotates with the mouse. The scroller on the mouse draws closer to / from this point. Using the shift / ctrl buttons and the mouse, you can move this rotation point in the 3D world. In the future, it is planned to implement a free-fly mode for navigating large spaces. I also recommend pressing the 'H' hot button while Viz is running to read information on other hot keys and features printed in the console, from saving screenshots to turning on anaglyphic stereo mode.

    How to build an OpenCV Viz module


    And finally, for those who, after reading this text, have the urge to start using this module, this section is intended. Viz can be used on all three dominant PC platforms - Windows, Linux, and Mac. You will need to install VTK and compile OpenCV with VTK support. OpenCV itself with the Viz module can only be downloaded from our GitHub repository https://github.com/Itseez/opencv in the 2.4 and master branches. So, the instruction:

    1. VTK installation

    Under Linux, the easiest solution is to install VTK from the apt repository using the apt-get install libvtk5-dev command. For Windows, you need to download VTK from the developer's site, the best version is 5.10, generate a project for Visual Studio with CMake and compile it in Release and Debug configurations. I recommend unchecking CMake BUILD_SHARED_LIBS, which will lead to compilation of VTK static libraries. In this case, after compilation, the size of the OpenCV Viz module without any dependencies will be only about 10 MB.

    For Mac, for OSX versions 10.8 and earlier, any version of VTK is suitable; for 10.9 Mavericks, it will be possible to compile VTK 6.2 from the official repository github.com/Kitware/VTK.git. There were no releases 6.2 at the time of writing this blog post. For Mac, it is also recommended to generate a project for Xcode using CMake and build static libraries in Release and Debug configurations.

    2. Compilation of OpenCV with VTK

    This step is simpler and faster. I give commands for Linux, under Windows everything is not much different
    1. git clone github.com/Itseez/opencv.git
    2. [optional] git checkout -b 2.4 origin / 2.4
    3. mkdir build && cd build
    4. cmake -DWITH_VTK = ON -DVTK_DIR = <path to VTK build directory> ../opencv


    If you installed VTK through apt-get install, then you do not need to specify a path to it - it will be found by CMake automatically. Next, you need to make sure in the CMake console log that he found and connected VTK. And did not report any incompatibilities. For example, if you compile OpenCV with Qt5 support, and VTK is built with Qt4, linking with VTK will cause the application to crash at the initialization stage before entering the main () function. The solution is to choose one thing. Or compile VTK without Qt4 by unchecking the checkbox in CMake for VTK. Or take VTK 6.1 and higher and build it with Qt5 support. Finally, to build OpenCV, run make -j 6

    3. Launch of texts (optional)

    I also recommend downloading this repository here: github.com/Itseez/opencv_extra.git , write the path to opencv_extra / testdata in the environment variable OPENCV_TEST_DATA_PATH. And run the opencv_test_viz file from the build directory of OpenCV. On this application, you can familiarize yourself with all the current features of this module, and its source can be used to study the API.

    Conclusion


    Well, then I got to the conclusion. I hope it was interesting. In this post I wanted to show what the main trend, from my point of view, is now observed in computer vision, and that the OpenCV library is moving with the times. And that in OpenCV will appear algorithms for working with the 3D world. Because we ourselves will develop them either with the help of Google Summer of Code students, or grateful users using our database, will also participate in the creation and development of such algorithms in OpenCV.

    I also wanted to interest you in this developed tool, or maybe even this area for research. By the way, if you have a desire to lead a similar development for OpenCV - You are welcome! We accept pull requests through GitHub. The instruction is posted here. We will be glad to see a new well-working approach :-)

    And although the basic base necessary now is created, I think that in the future new features will be added to Viz. For example, a model of the skeleton of a human hand and its visualization. Or 3D world maps from algorithms such as PTAM. Or maybe a network client, so that it is possible to send data for visualization from a mobile device when debugging algorithms on it :) But these are crazy ideas so far :-). If it’s interesting, in the next blog post I could talk about some algorithm, for example, ICP or Kinect Fusion, and how Viz was used to debug and visualize it.

    And for those who read to the end - a bonus. Here lies my optimized and lightweight remake of my own implementation of Kinect Fusion in the PCL library.

    Also popular now: