
Make3D from one photo, part 1
The project from Stanford University (now Cornell University ) " Make3D ", is notable for the fact that he set himself the task, which has not yet become typical, of restoring a three-dimensional model of a scene from just one photograph. Until now, in order to achieve a similar result, developers have restored three-dimensional information by combining several (two or more) pictures of the same object. In this case, it was demonstrated that a significant amount of information is contained in monocular signs ( monocular cues ) of the image itself, which until then were often ignored. In practical implementation, it has already been possible to achieve satisfactory results by more than 60%arbitrary photographs provided and evaluated by external users of the system during its testing.
The publication consists of: Part 1, Part 2
Published to quench curiosity, in order to
Content:
Part one ( current ):
- Algorithm Summary
- List of used theories and algorithms
- Ising model or where did this formula come from in MRF
- Effective graph image segmentation
- We want 3D Model and Polygon Location
- You’ll have to look wider ...
- Monocular image features
- Borders and refractions
- MRF Input and Output
- What will MRF do? Target: min (distance error)
- The wish list is ready, cram it into the MRF Model
- 1. Local features: Preliminary determination of distance by local features
- 2. Connection: Connection
- 3. Co-planar: Coplanarity
- 4. Co-linear: Collinearity
- Briefly about MRF Training
- Briefly about MRF Solution
- Source code and model parameters studied
- Program execution, testing
- Total
- Some files
Algorithm Summary
The image is divided into superpixels , small segments of the image that are similar in texture. Homogeneous objects and even even surfaces, as a rule, are divided into several superpixels, but, nevertheless, superpixels lie entirely within the same object or surface if the surface has a pronounced border.
The algorithm for each superpixel tries to find out its position (depth) and orientation in three-dimensional space relative to the point from which the photograph was taken, i.e. find the surface (plane) on which it lies. This surface can have any location in space, not necessarily horizontal or vertical, as was assumed in similar studies conducted earlier.
The project uses preliminary training in the relationships of the totality of the set of features of superpixels (image sections) and their distances. The training algorithm uses the MRF model ( Markov field ), taking into account restrictions on the relative distances between adjacent superpixels, i.e. given that with some probability two neighboring sections are most likely at approximately the same distance from the observation point, or even they can be on the same plane than belong to different objects that are far apart in space (such as the fence and the background behind it).
By processing a segmented superpixel image in a pre-trained MRF, the algorithm receives at the output the position and orientation of each superpixel. This is enough to build a three-dimensional model of the scene, the texture of which is the photograph itself.
Pic.1 Result of processing photos in Make3D

List of used theories and algorithms
In the method of restoring a three-dimensional model of a scene, the following theories and algorithms are used:
- Image segmentation for splitting an image into superpixels
- The use of filters to the image to identify monocular signs
- Random Markov Fields ( MRF ) for modeling monocular features and the relative position of different parts of the image
- Selecting edges and borders in an image ( edge detection )
- Elements of computational geometry , linear programming , etc.
Ising model or where did this formula come from in MRF
You can scroll quickly, this is just an explanation of why the formula looks like this in MRF and in “ Make3D ”. In general, historically, it happened ...
The Ising model ( Ising model ) served as the prototype of Markovskaya , so let's start with it. This is a mathematical model of statistical physics designed to describe the magnetization of material.
In the Ising physical model, each vertex of the crystal lattice is associated with a number called a spin and a numerical one designated as “ +1 ” or “ −1 ”, which determines the direction of the spin: “ up ” or “ down ”. Spin - this is the proper angular momentum of elementary particles, which has a quantum nature and is not related to the movement of a particle as a whole.
Pic.2 One-dimensional Ising model

Let us consider the simplest one-dimensional case when the nodes of the crystal lattice are located along one straight line. The model describes the probability distribution in a space






Each of the


Total energy of the system in the Ising model:

In the formula, the first sum is taken over all pairs of neighboring nodes, and is equal to the interaction energy of the spins of the nodes. Ising simplified the model, taking into account the interaction only between neighboring nodes located in close proximity. Exchange interaction constant J is a characteristic of the material and describes the spin energy of interaction. If J> 0 , then it describes the attractive energy of the ferromagnet ( attractive case ), when the neighboring nodes of the lattice try to line up in one direction, i.e. collinearly , for values of J <0 it describes the repulsive energy of an antiferromagnet ( repulsive case), when neighboring nodes of the lattice try to line up in opposite directions, i.e. coplanarly . The second sum describes the effect of an external magnetic field of intensity H , where m - characterizes the magnetic properties of the substance.
In the case when J> 0 , then the first sum will be minimal, in the case when all nodes of the crystal lattice will be aligned in a collinear direction. The second sum reaches a minimum when the direction of the spin lattice nodes coincides with the direction of the external field H .
At physics, we know that any system, brought out of equilibrium, tends to the state with the least energy reserve. Therefore, sooner or later, the model will “calm down” and take a position in which the total energy


The probability distribution in space of

Distribution in the Ising model:

where k is the Boltzmann constant, T is the temperature of the substance, the

We do not need these quantities, it is important that the formula has this form and has been working since the beginning of the 19th century .
For a more convenient form of the probability distribution in the model recording associate with each i -th node of its lattice ( personal ) own the total energy:
Own energy of a node of the Ising model:

The Gibbs measure function is interesting from two perspectives. The first is related to entropy, which probably led to its widespread use in statistical studies.
Pic.3 Two-dimensional Ising model

The second important property of the distribution in this form, from the point of view of probability theory, is as follows. Let

Markov type property:

This property is called the Markov type property. The probability that the spin at the point


The Markov network was presented as a generalization of the Ising model, and in it the probability distribution function

Model in multiplicative form:

In practice, the main purpose of applying a Markov random field is to detect the most likely


The main difficulties associated with the use of this method are the right choice of parameters that control the strength of the spatial interaction.
Now we have a formula, on its basis the MRF model for “Make3D” will be built ...
Effective graph image segmentation
A picture is worth a thousand words:
Segmentation example with various parameters

These large monochrome areas are superpixels, in the terminology of the authors of “Make3D”. In more detail in an accessible form, see the article " Effective segmentation of images on graphs ."
We want 3D Model and Polygon Location
We want to get a 3D model . The reconstructed three-dimensional model is represented by a combination of polygons and texture , i.e. parts of the original photograph superimposed on these polygons. The polygon has an arbitrary location and orientation in space.
Pic.4 Super Pixel Location and Orientation

More formally, the three-dimensional location of the polygon will be determined by the plane specified by the parameters




If we are given a unit vector



You’ll have to look wider ...
A person is well distinguished by monocular signs of the image, helping him to restore a three-dimensional model of the scene and determine the distance to objects. At one glance, he is able to notice differences in image textures, color gradients, color differences, fuzzy outlines and out-of-focus objects in the distance.
However, only local monocular signs are still not enough, because the blue sky and the blue object on the earth have similar local monocular signs. Therefore, when reconstructing a scene, a person has to take into account not only local signs, but also the relative relationship between the various parts of this image. So, a person can determine the distance to an object with dimly expressed signs, for example, a uniformly illuminated large gray area, relative to other objects in the photograph. And determine that it is a wall and even approximately see its length.
Therefore, in the model of MRF takes into account not only local signs of monocular ( monocular cue sheet ), but also the relative relationship between different parts of the image:
- Локальные особенности (local features): локальные особенности суперпикселя, монокулярные, во многом влияют на его положение и ориентацию в пространстве
- Соединения (connections): за исключением границ различных объектов, наиболее вероятно, что соседние суперпиксели соединены на границах и в трехмерном пространстве
- Копланарность (co-planarity): если соседние пиксели обладают схожими монокулярными особенностями (feature), и не имеют ярко выраженных границ (edge) между ними, то наиболее вероятно, что они лежат в одной и той же плоскости
- Collinearity ( co-linearity ): long straight line, most likely, it will also be long straight lines in space trehmernomi
But before moving on to the MRF model ... what is so interesting that the authors found in just one photograph?
Image Features
Superpixels For each tuple in the algorithm computed characteristics ( features ), in order to detect signs of monocular ( monocular cues ), mentioned above, as well as the calculated characteristics to determine the boundaries and refractive superpixels (objects in the image). In order to make the algorithm more robust and flexible, so that he could restore the scene in an arbitrary photograph, not necessarily similar to those on which they were trained model, calculated with respect to a large number of features ( features ).
Monocular image features
For each superpixel i , statistical features related to the image texture, shape and position are calculated. The project “Make3D” uses 17 filters

- 9 masks Laws for calculating an average image intensity ( averaging ), detecting spots ( spot detection ) and borders ( edge detection )
- 6 masks for identifying oriented edges ( oriented edge detection ), rotated by ( 30 ° × k ), k = 0, ..., 5 degrees
- 2 color channels in Cb and Cr in YCbCr format
Pic.5 Masks used in Make3D

Pic.6 Two channels of color Cb and Cr

The application of filters to superpixels i image I (x, y) is formed by a 34-dimensional vector features calculated as

Pic.7 Super pixel layout features

В проекте "Make3D" учитываются не только особенности отдельного суперпикселя, а так же принимается в расчет «контекст» в виде, особенностей (features) 4-х самых больших соседних суперпикселей, и данная операция повторяется в 3-х различных масштабах. Таким образом, каждый суперпиксель содержит дополнительно информацию и об изображении вокруг него, что приводит к более качественным результатам, чем если ограничиться только локальными особенностями суперпикселя. В конечном итоге с каждым суперпикселем ассоциируется вектор, который содержит 34*(4+1)*3+14=524 элемента, т.е.

Pic.8 Особенности каждого суперпикселя рассматриваются в контексте

Pic.9 Выделение особенностей изображения на различных масштабах

Pic.10 On the same level, each superpixel is connected to 4 max neighbors

Borders and refractions
To determine the boundaries, parameters

![y_ij∈ [0,1] y_ij∈ [0,1]](http://img267.imageshack.us/img267/6086/form31ysoftbelong.png)


Pic.11 edge detection - edgels

Pic.12 edge detection

In some cases, the presence of a large gradient or difference in lighting, which are taken into account in the algorithms for determining the boundaries of objects ( edge detection ), is also a sign that these parts of the image belong to different objects. For example, the shadow of a building cast on the lawn may cause such a difference, and algorithms for determining the boundaries of objects can erroneously determine the presence of a border and divide the lawn into 2 objects. Nevertheless, if we take into account more visual features ( visual cues ) of the image to determine whether there are borders between these objects, whether they are connected, coplanar or not, and not just gradients, then better results can be obtained.
In this work, an approach was proposed for determining the boundary of objects ( edge detection ), using training that takes into account lighting, color and texture of image areas. Using their result, in “Make3D” a value


features ( features ) superpixels pairs i and j
model parameter (after training)
Pic.13 Пример сегментации одного изображения с различными параметрами

Pic.14 y — 14-мерные вектора различия суперпикселей при различных сегментациях

In the 14-dimensional vector obtained after segmentation,



![y_ij∈ [0,1] y_ij∈ [0,1]](http://img267.imageshack.us/img267/6086/form31ysoftbelong.png)
Pic.15 Parameter y (right) indicating the presence or absence of a border

MRF Input and Output
Now you can take the interim results:
The model MRF participates 5 types of settings. Input are:
- X variables related to computed image features
- Empirically calculated variables θ (parameters of the trained model)
- Parameters υ represents the “level of trust” ( confidence ) to the distance to the object, calculated based only on the local properties of the image area
- The y parameters indicate the presence or absence of a clear border between the objects in the image. These parameters are used to identify coplanarity and superpixel connections.
- Parameters α , specifying the location and orientation of the planes in space, on which the polygons are located in the three-dimensional model
Pic.16 Model MRF combines all the features of the image to find 3D

What will MRF do? Target: min (distance error)
When constructing a three-dimensional model of the scene, the relative error in determining the distance is the most significant. For true distance ( ground-truth depth )



Continuation: here