Make3D from one photo, part 1



    The project from Stanford University (now Cornell University ) " Make3D ", is notable for the fact that he set himself the task, which has not yet become typical, of restoring a three-dimensional model of a scene from just one photograph. Until now, in order to achieve a similar result, developers have restored three-dimensional information by combining several (two or more) pictures of the same object. In this case, it was demonstrated that a significant amount of information is contained in monocular signs ( monocular cues ) of the image itself, which until then were often ignored. In practical implementation, it has already been possible to achieve satisfactory results by more than 60%arbitrary photographs provided and evaluated by external users of the system during its testing.

    The publication consists of: Part 1, Part 2
    Published to quench curiosity, in order to expose magic to make it clear how it works.



    Content:


    Part one ( current ):
    • Algorithm Summary
    • List of used theories and algorithms
    • Ising model or where did this formula come from in MRF
    • Effective graph image segmentation
    • We want 3D Model and Polygon Location
    • You’ll have to look wider ...
    • Monocular image features
    • Borders and refractions
    • MRF Input and Output
    • What will MRF do? Target: min (distance error)
    Part Two ( here ):
    • The wish list is ready, cram it into the MRF Model
    • 1. Local features: Preliminary determination of distance by local features
    • 2. Connection: Connection
    • 3. Co-planar: Coplanarity
    • 4. Co-linear: Collinearity
    • Briefly about MRF Training
    • Briefly about MRF Solution
    • Source code and model parameters studied
    • Program execution, testing
    • Total
    • Some files


    Algorithm Summary


    The image is divided into superpixels , small segments of the image that are similar in texture. Homogeneous objects and even even surfaces, as a rule, are divided into several superpixels, but, nevertheless, superpixels lie entirely within the same object or surface if the surface has a pronounced border.

    The algorithm for each superpixel tries to find out its position (depth) and orientation in three-dimensional space relative to the point from which the photograph was taken, i.e. find the surface (plane) on which it lies. This surface can have any location in space, not necessarily horizontal or vertical, as was assumed in similar studies conducted earlier.

    The project uses preliminary training in the relationships of the totality of the set of features of superpixels (image sections) and their distances. The training algorithm uses the MRF model ( Markov field ), taking into account restrictions on the relative distances between adjacent superpixels, i.e. given that with some probability two neighboring sections are most likely at approximately the same distance from the observation point, or even they can be on the same plane than belong to different objects that are far apart in space (such as the fence and the background behind it).

    By processing a segmented superpixel image in a pre-trained MRF, the algorithm receives at the output the position and orientation of each superpixel. This is enough to build a three-dimensional model of the scene, the texture of which is the photograph itself.

    Pic.1 Result of processing photos in Make3D

    Pic.1 Result of processing photos in Make3D


    List of used theories and algorithms


    In the method of restoring a three-dimensional model of a scene, the following theories and algorithms are used:


    Ising model or where did this formula come from in MRF


    You can scroll quickly, this is just an explanation of why the formula looks like this in MRF and in “ Make3D ”. In general, historically, it happened ...

    The Ising model ( Ising model ) served as the prototype of Markovskaya , so let's start with it. This is a mathematical model of statistical physics designed to describe the magnetization of material.

    In the Ising physical model, each vertex of the crystal lattice is associated with a number called a spin and a numerical one designated as “ +1 ” or “ −1 ”, which determines the direction of the spin: “ up ” or “ down ”. Spin - this is the proper angular momentum of elementary particles, which has a quantum nature and is not related to the movement of a particle as a whole.

    Pic.2 One-dimensional Ising model

    Pic.2 One-dimensional Ising model


    Let us consider the simplest one-dimensional case when the nodes of the crystal lattice are located along one straight line. The model describes the probability distribution in a space ωof all possible configurations of the substance ω = (ω_0, ω_1, ..., ω_n), wherein n- the number of elementary particles, ω_i = {+ 1, -1}spin i -th particle. Such a distribution is called a random field - it is a random function defined on the set of points of a multidimensional space. P (ω)= how likely is it that the substance will be in a state where all the nodes have spins equal to ω = (ω_0, ω_1, ..., ω_n)

    Each of the 2 ^ npossible spins ω, the energy is attributed from the pairwise interaction of the spins of adjacent atoms and an external field:

    Total energy of the system in the Ising model:

    Total energy of the system in the Ising model


    In the formula, the first sum is taken over all pairs of neighboring nodes, and is equal to the interaction energy of the spins of the nodes. Ising simplified the model, taking into account the interaction only between neighboring nodes located in close proximity. Exchange interaction constant J is a characteristic of the material and describes the spin energy of interaction. If J> 0 , then it describes the attractive energy of the ferromagnet ( attractive case ), when the neighboring nodes of the lattice try to line up in one direction, i.e. collinearly , for values ​​of J <0 it describes the repulsive energy of an antiferromagnet ( repulsive case), when neighboring nodes of the lattice try to line up in opposite directions, i.e. coplanarly . The second sum describes the effect of an external magnetic field of intensity H , where m - characterizes the magnetic properties of the substance.

    In the case when J> 0 , then the first sum will be minimal, in the case when all nodes of the crystal lattice will be aligned in a collinear direction. The second sum reaches a minimum when the direction of the spin lattice nodes coincides with the direction of the external field H .

    At physics, we know that any system, brought out of equilibrium, tends to the state with the least energy reserve. Therefore, sooner or later, the model will “calm down” and take a position in which the total energy U (ω)will be equal to the lowest value ( min ). This will be her most likely condition ( P (ω)- max ).

    The probability distribution in space of ωall possible configurations of matter in the Ising model obeys the distribution given by the so-called Gibbs distribution function ( Gibbs measure ):

    Distribution in the Ising model:

    Distribution in the Ising Model


    where k is the Boltzmann constant, T is the temperature of the substance, the Z = ∑_ωe ^ (- 1 / kT U (ω))normalizing value.
    We do not need these quantities, it is important that the formula has this form and has been working since the beginning of the 19th century .

    For a more convenient form of the probability distribution in the model recording associate with each i -th node of its lattice ( personal ) own the total energy:

    Own energy of a node of the Ising model:

    Self-energy of a node of the Ising model


    The Gibbs measure function is interesting from two perspectives. The first is related to entropy, which probably led to its widespread use in statistical studies.

    Pic.3 Two-dimensional Ising model

    Pic.3 Two-dimensional Ising model


    The second important property of the distribution in this form, from the point of view of probability theory, is as follows. Let N_jthe set of all neighbors of node j , ie, many nodes in the immediate vicinity. Then:

    Markov type property:

    Markov type property

    This property is called the Markov type property. The probability that the spin at the point ω_jfaces a , for given spins in all other nodes of the crystal lattice, is equal to the probability in which only spins in neighboring nodes are taken into account ω_k, k ∈ N_j. This property is called a local characteristic , and a distribution with a similar property is called a random Markov field .

    The Markov network was presented as a generalization of the Ising model, and in it the probability distribution function ωcan be written in multiplicative form , as the product of all the self-energies of the nodes of the crystal lattice:

    Model in multiplicative form:

    Multiplicative model

    In practice, the main purpose of applying a Markov random field is to detect the most likely w *random field configuration ω: w * = argmaxw (P (w)).

    The main difficulties associated with the use of this method are the right choice of parameters that control the strength of the spatial interaction.

    Now we have a formula, on its basis the MRF model for “Make3D” will be built ...




    Effective graph image segmentation


    A picture is worth a thousand words:

    Segmentation example with various parameters

    Segmentation example with various parameters


    These large monochrome areas are superpixels, in the terminology of the authors of “Make3D”. In more detail in an accessible form, see the article " Effective segmentation of images on graphs ."


    We want 3D Model and Polygon Location


    We want to get a 3D model . The reconstructed three-dimensional model is represented by a combination of polygons and texture , i.e. parts of the original photograph superimposed on these polygons. The polygon has an arbitrary location and orientation in space.

    Pic.4 Super Pixel Location and Orientation

    Pic.4 Super Pixel Location and Orientation


    More formally, the three-dimensional location of the polygon will be determined by the plane specified by the parameters α∈R ^ 3. For every point q∈R ^ 3belonging to this plane α , the equality α ^ T q = 1. The shortest distance from the observation point, the point from which the photograph was taken, to the plane along the perpendicular dropped to it is: 1 / | α | . The normal vector α = α / | α |uniquely determines the orientation of the plane in space.

    If we are given a unit vector R_i(called a ray R_i) connecting the observation point ( camera center ) with an arbitrary pixel i on the image belonging to the plane α, The distance from the viewpoint to the pixel i is given by: d_i = 1 / (R_i ^ T α). The distance to the superpixel will also be determined.



    You’ll have to look wider ...


    A person is well distinguished by monocular signs of the image, helping him to restore a three-dimensional model of the scene and determine the distance to objects. At one glance, he is able to notice differences in image textures, color gradients, color differences, fuzzy outlines and out-of-focus objects in the distance.

    However, only local monocular signs are still not enough, because the blue sky and the blue object on the earth have similar local monocular signs. Therefore, when reconstructing a scene, a person has to take into account not only local signs, but also the relative relationship between the various parts of this image. So, a person can determine the distance to an object with dimly expressed signs, for example, a uniformly illuminated large gray area, relative to other objects in the photograph. And determine that it is a wall and even approximately see its length.

    Therefore, in the model of MRF takes into account not only local signs of monocular ( monocular cue sheet ), but also the relative relationship between different parts of the image:
    • Локальные особенности (local features): локальные особенности суперпикселя, монокулярные, во многом влияют на его положение и ориентацию в пространстве
    • Соединения (connections): за исключением границ различных объектов, наиболее вероятно, что соседние суперпиксели соединены на границах и в трехмерном пространстве
    • Копланарность (co-planarity): если соседние пиксели обладают схожими монокулярными особенностями (feature), и не имеют ярко выраженных границ (edge) между ними, то наиболее вероятно, что они лежат в одной и той же плоскости
    • Collinearity ( co-linearity ): long straight line, most likely, it will also be long straight lines in space trehmernomi
    These properties are taken into account in the model together with the imposed condition ( condition ) in the form of a " level of confidence " ( confidence ) on each of these properties, ie, as far as we can be sure of a calculated result. The level of confidence is different depending on the local characteristics of the image area.

    But before moving on to the MRF model ... what is so interesting that the authors found in just one photograph?



    Image Features


    Superpixels For each tuple in the algorithm computed characteristics ( features ), in order to detect signs of monocular ( monocular cues ), mentioned above, as well as the calculated characteristics to determine the boundaries and refractive superpixels (objects in the image). In order to make the algorithm more robust and flexible, so that he could restore the scene in an arbitrary photograph, not necessarily similar to those on which they were trained model, calculated with respect to a large number of features ( features ).

    Monocular image features


    For each superpixel i , statistical features related to the image texture, shape and position are calculated. The project “Make3D” uses 17 filters F_n (x, y), where n = 1, ..., 17 :
    • 9 masks Laws for calculating an average image intensity ( averaging ), detecting spots ( spot detection ) and borders ( edge detection )
    • 6 masks for identifying oriented edges ( oriented edge detection ), rotated by ( 30 ° × k ), k = 0, ..., 5 degrees
    • 2 color channels in Cb and Cr in YCbCr format

    Pic.5 Masks used in Make3D

    Pic.5 Masks used in Make3D

    Pic.6 Two channels of color Cb and Cr

    Pic.6 Two channels of color Cb and Cr


    The application of filters to superpixels i image I (x, y) is formed by a 34-dimensional vector features calculated as E_i (n) = ∑ _ ((x, y) ∈S_i) | I (x, y) * F_n (x, y) | ^ kwhere k = {2.4} , which corresponds to the power ( energy ) and kurtosis ( kurtosis ). This gives 34 values ​​for each superpixel. Additionally, for a superpixel i , 14 features are computed related to its position in the photograph and the shape, such as:

    Pic.7 Super pixel layout features

    Pic.7 Super pixel layout features


    В проекте "Make3D" учитываются не только особенности отдельного суперпикселя, а так же принимается в расчет «контекст» в виде, особенностей (features) 4-х самых больших соседних суперпикселей, и данная операция повторяется в 3-х различных масштабах. Таким образом, каждый суперпиксель содержит дополнительно информацию и об изображении вокруг него, что приводит к более качественным результатам, чем если ограничиться только локальными особенностями суперпикселя. В конечном итоге с каждым суперпикселем ассоциируется вектор, который содержит 34*(4+1)*3+14=524 элемента, т.е. x∈R ^ 524

    Pic.8 Особенности каждого суперпикселя рассматриваются в контексте

    Pic.8 Features of each superpixel are considered in context

    Pic.9 Выделение особенностей изображения на различных масштабах

    Pic.9 Highlighting image features at various scales

    Pic.10 On the same level, each superpixel is connected to 4 max neighbors

    Pic.10 On the same level, each superpixel is connected to 4 max neighbors



    Borders and refractions


    To determine the boundaries, parameters y_ij∈ {0,1}are used that determine the presence of a boundary between two superpixels (" edgel "). In view of the fact that the determination of boundaries occurs, as a rule, inaccurately, the model does not use discrete quantities, but softer conditions for y_ij∈ [0,1]. More formally, equality y_ij= 0 means the presence of an object boundary ( occlusion boundary ) or an infold ( unfold ) between the superpixels i and j , and y_ij= 1 means the absence, i.e. super pixels lie on a flat surface.

    Pic.11 edge detection - edgels

    Pic.11 edge detection - edgels

    Pic.12 edge detection

    Pic.12 edge detection

    In some cases, the presence of a large gradient or difference in lighting, which are taken into account in the algorithms for determining the boundaries of objects ( edge detection ), is also a sign that these parts of the image belong to different objects. For example, the shadow of a building cast on the lawn may cause such a difference, and algorithms for determining the boundaries of objects can erroneously determine the presence of a border and divide the lawn into 2 objects. Nevertheless, if we take into account more visual features ( visual cues ) of the image to determine whether there are borders between these objects, whether they are connected, coplanar or not, and not just gradients, then better results can be obtained.

    In this work, an approach was proposed for determining the boundary of objects ( edge detection ), using training that takes into account lighting, color and texture of image areas. Using their result, in “Make3D” a value y_ijthat determines the presence of a boundary between superpixels i and j is derived based on a model with the distribution:

    Distribution for confidence calculation

    • ϵ_ijfeatures ( features ) superpixels pairs i and j
    • ψ model parameter (after training)
    If two adjacent superpixels in an image have sharply differing features , then a person will most likely also perceive them as two different objects. Thus, the boundary between two superpixels with sharply differing features is very likely to be an object boundary ( occlusion boundary ) or an infold ( unfold ). To find out how much the features of the two superpixels i and j differ, Make3D carries out 14 different image segments: for 2 different scales and with 7 different sets of segmentation parameters to reveal texture, color and border features.

    Pic.13 Пример сегментации одного изображения с различными параметрами

    Pic.13 An example of segmentation of one image with different parameters


    Pic.14 y — 14-мерные вектора различия суперпикселей при различных сегментациях

    Pic.14 y - 14-dimensional superpixel difference vectors for different segmentations

    In the 14-dimensional vector obtained after segmentation, ϵ_ijeach element {ϵ_ij_k}is a sign: did both superpixels i and j fall into the same segment when performing the i- th segmentation. After all, two superpixels that are in the same segments in all 14 cases are most likely to be coplanar or interconnected. The conclusions about the presence of boundaries between superpixels made on the basis of multiple segmentation are more reliable than with a single segmentation. This vector ϵ_ijis an input parameter in determining the “soft” boundary y_ij∈ [0,1]between two superpixels i and j .

    Pic.15 Parameter y (right) indicating the presence or absence of a border

    Pic.15 Параметр y (справа), указывающий наличие или отсутствие границы




    MRF Input and Output


    Now you can take the interim results:
    The model MRF participates 5 types of settings. Input are:
    • X variables related to computed image features
    • Empirically calculated variables θ (parameters of the trained model)
    In the calculation, conditional parameters are also taken into account:
    • Parameters υ represents the “level of trust” ( confidence ) to the distance to the object, calculated based only on the local properties of the image area
    • The y parameters indicate the presence or absence of a clear border between the objects in the image. These parameters are used to identify coplanarity and superpixel connections.
    Calculated in the MRF parameters are:
    • Parameters α , specifying the location and orientation of the planes in space, on which the polygons are located in the three-dimensional model
    Pic.16 Model MRF combines all the features of the image to find 3D

    Pic.16 Модель MRF объединяет в себе все особенности изображения, для нахождения 3D




    What will MRF do? Target: min (distance error)


    When constructing a three-dimensional model of the scene, the relative error in determining the distance is the most significant. For true distance ( ground-truth depth ) dand estimated in the model d^, the relative error is defined as ((d^-d))/d=d^/d-1. Thus, the MRF model will be constructed in order to minimize this value.

    Continuation: here

    Also popular now: