# Play Tetris in AR

- Tutorial

It occurred to me a little strange idea that the house could be a good platform for playing Tetris. Not far from me was just one building suitable for this. The result can be seen in the video:

The project is implemented at a fairly low level, without using any ready-made solution.

Source Code

Most often, 2 options for implementing augmented reality are used:

These implementations do not require special preparation or special conditions.

There is another implementation option - to recognize a specific object and use it as a marker. This requires at least its presence, but makes it possible to visually control it. One of the methods of such recognition is the detection of an object by edges. It has limitations - the marker object must have clearly defined edges, i.e. the object should most likely be solid.

Or the edges should be clearly delineated, such as the illumination of this building:

It can be seen that the illumination can easily be separated in the image and used for detection.

On Qt. This framework allows you to work on different platforms and at the same time in C ++. Since performance is important to us, the pros seem like an obvious choice.

Although Qt didn’t work very well with android (long launch, debugging was disabled), but this was all leveled out by the ability to debug the algorithm on the desktop.

Three-dimensional graphics were visualized on the raw OpenGL embedded in Qt.

Work with the camera was carried out through Qt. A video was recorded for debugging, and it was convenient enough to replace the camera’s video stream with a video stream from a file.

The output was carried out through qml tools. To make friends qml and OpenGL was not without problems, but we will not dwell on this.

For image processing, the OpenCV library is connected.

Now let's move on to the most interesting part - the algorithm for tracking an object along its edges.

And start by highlighting the edges in the image. All edges in our case have the form of straight lines, so the first thought that comes to mind is to use a line detector. Hough transforms could be used as a line detector . However, this way seems to me not very true, since the Hough transform is quite expensive, and this detector is not very reliable (this is subjective, maybe everything depends on the task).

Instead, let's go in a different, more general way. We will not take into account that our lines are straight, but we will work simply on a binary image. The presence of edges will be encoded into the binary image. Those. a pixel with a value of zero means that there is an edge in this place, the pixel value is greater than zero - there is no edge. Such an image can be obtained using a Canny boundary detector or using a simple threshold transformation . These algorithms can be found in OpenCV.

OpenCV also has another useful function for us now - distanceTransform , which takes a binary image at the input and gives an image at the output, in pixels of which the distance to the nearest zero pixel is encoded.

Now suppose we already have the first good approximation of where our model should be located. Next, we describe the error function, which describes how far the edges of our approximation do not coincide with the edges in the resulting image. Using the image from distanceTransform we are already able to do this. And then we run the function optimization algorithm, changing only our approximation of the position of the object in space. As a result, our approximation should fairly accurately describe the real position of the object.

As a result, the algorithm can be divided into two stages:

At this point, you need to highlight the edges in the image. You can use the Canny boundary detector, but in our case, the usual threshold conversion or its adaptive version works better (in OpenCV, these are threshold or adaptiveThreshold functions). It is clear that there will be noise on such an image, so filtering is needed. Let's do it as follows - select the contour using the OpenCV function findCountours and delete segments that are too small or not enough like a line.

The processing result can be seen in the image:

Sequentially: the original image -> after the threshold transformation -> after filtering.

This image already quite clearly tells us where there is the right edge and where not. After that, we use the distanceTransform function, and as a result, we will have information about how far each point is from the edge. The resulting image is denoted as .

This is how it looks if normalized and visualized:

Next, we need some mathematical tools.

Function optimization is the task of finding the minimum of a function.

If we are dealing with a linear system of equations, then finding a minimum is quite simple. We represent the system of equations in matrix form , then our decision . If we have an overdetermined system of equations, you can use the method of least squares : .

If our function is non-linear, then the task becomes more complicated. To find the minimum, you can use the Gauss-Newton algorithm . The algorithm works as follows:

Let's analyze the algorithm in more detail.

Let be a working function, a well-known vector of function values. If there is an ideal solution to the equation , the following statement is true . But we have only its approximation . Then the error vector from this approximation is denoted as: . A common error is the function: . Now finding one at which it will reach a minimum, we get a better approximation of the solution .

Starting from the approximation, we will iteratively approximate it, receiving at each iteration . To do this, we need each iteration to calculate the Jacobi matrix for the functionthe current approach, consisting of a derivative of our function:

And the next approximation is given by: .

Often tasks are constructed in such a way that we have a large number of data independent from each other (only from values ). As a result, the general Jacobi matrix is very sparse. There is a way to optimize the calculations.

Suppose a common function is computed from a set of points. From the

It may happen that the next value will give a bigger error than . To solve this problem, you can use a modification of the algorithm - the Levenberg-Marquardt algorithm . Add a value to our formula:, where is the identity matrix. The value is selected as follows:

The more non-linear the function , the greater the value should be . However, the larger the value , the slower the algorithm converges.

We complete the algorithm when it differs from by a sufficiently small value and take it as a solution.

The algorithm is quite universal and can be used for a variety of tasks.

Since we are dealing with coordinates in space, it is clear that we need to be able to manipulate these coordinates. Suppose we have some set of points . And we need to rotate them around the point with zero coordinates. Probably the easiest way is to use a rotation matrix of

Thus, you can arbitrarily change the position of an object in space. It turns out that the coordinates of the object are determined by the three-dimensional matrix

The normalized vector is the rotation axis, and the length of this vector is the rotation angle around this axis.

We define the rotation function of the vector

And in the end we can set the coordinates of the object 6-dimensional vector:

the following formula: .

Now we describe a simple mathematical model of the camera used in the project:

This model does not take into account the lens distortion of the cameras ( distortion ). Suppose they are not.

With this model, we get a central projection, all the points of which tend more toward the optical center, the farther away from the camera they are. Thus we get the effect of a narrowing railway:

In space, the camera is aligned with the

Thus, we obtained a model with which we can in algebraic form simulate the projection of points from the outside world onto the camera image (from world coordinates to screen). For us, the parameters of the relative position of the camera in space remain unknown in this model . These parameters are called extrinsic parameters of camera.

Implemented already without OpenCV tools. First, we need to get the error function for our approximate solution, which was described above. And we will write its calculation in stages:

Now we have to find the minimum of

For each control point, we obtain an equation independent of other points. It has already been described above that in this case it is possible to consider these equations independently of each other, calculating the Jacobi matrix specifically for each. We analyze it in order, using the rules of differentiation of a complex function:

Denote , then

From here:

Next, denoteand , then:

Derivatives in the image

exponential coordinates» Guillermo Gallego, Anthony Yezzi :

,

wherein

It remains to paint and (here the index

Thus, we got the Jacobi matrix for the point we need and can use it for the optimization algorithm described above.

There are several problems with this algorithm. Firstly, accuracy. As a result, the global position of the camera jumps slightly from frame to frame. You can fix it a bit. We have a priori information that the camera position cannot change dramatically from frame to frame. And we can reduce this jitter by adding additional equations to the function.

It should be remembered that the displacement vector

Remember the position of the previous frame in

The second problem of the algorithm is that it can get stuck in local minima. In other works, this problem is solved using a particle filter. In our case, this option turned out to be, in principle, enough.

Knowing the position and shape of the object, you can visually manipulate them, which I tried to demonstrate on the video. The object was distorted using OpenGL shaders. With the help of our model, I projected the point of the object onto the image, and thus received the color of this point. Then you can move this point, getting interesting effects - for example, morphing. However, one must remember that shifting the point, it is necessary that something remains in its place, otherwise inconsistencies will become noticeable. In addition, depending on the quality of our tracking and the shape of the object, we will receive various undesirable effects due to accumulated errors, which will still be. It just needs to be taken into account somehow. In the video above, I wanted to show that augmented reality can be used a little wider,

By the way, the Vuforia SDK implements tracking of an object by its shape, though I don’t think that it would be possible to implement this project with it, since it is not possible to use strictly defined edges and cannot be associated with the illumination of the building.

The project is implemented at a fairly low level, without using any ready-made solution.

Source Code

Most often, 2 options for implementing augmented reality are used:

- markerless, i.e. the position of the camera is determined by the movement of the key points of its video stream;
- image as a marker relative to which the position of the camera is.

These implementations do not require special preparation or special conditions.

There is another implementation option - to recognize a specific object and use it as a marker. This requires at least its presence, but makes it possible to visually control it. One of the methods of such recognition is the detection of an object by edges. It has limitations - the marker object must have clearly defined edges, i.e. the object should most likely be solid.

Or the edges should be clearly delineated, such as the illumination of this building:

It can be seen that the illumination can easily be separated in the image and used for detection.

## Implementation

On Qt. This framework allows you to work on different platforms and at the same time in C ++. Since performance is important to us, the pros seem like an obvious choice.

Although Qt didn’t work very well with android (long launch, debugging was disabled), but this was all leveled out by the ability to debug the algorithm on the desktop.

Three-dimensional graphics were visualized on the raw OpenGL embedded in Qt.

Work with the camera was carried out through Qt. A video was recorded for debugging, and it was convenient enough to replace the camera’s video stream with a video stream from a file.

The output was carried out through qml tools. To make friends qml and OpenGL was not without problems, but we will not dwell on this.

For image processing, the OpenCV library is connected.

#### Tracking Algorithm

Now let's move on to the most interesting part - the algorithm for tracking an object along its edges.

And start by highlighting the edges in the image. All edges in our case have the form of straight lines, so the first thought that comes to mind is to use a line detector. Hough transforms could be used as a line detector . However, this way seems to me not very true, since the Hough transform is quite expensive, and this detector is not very reliable (this is subjective, maybe everything depends on the task).

Instead, let's go in a different, more general way. We will not take into account that our lines are straight, but we will work simply on a binary image. The presence of edges will be encoded into the binary image. Those. a pixel with a value of zero means that there is an edge in this place, the pixel value is greater than zero - there is no edge. Such an image can be obtained using a Canny boundary detector or using a simple threshold transformation . These algorithms can be found in OpenCV.

OpenCV also has another useful function for us now - distanceTransform , which takes a binary image at the input and gives an image at the output, in pixels of which the distance to the nearest zero pixel is encoded.

Now suppose we already have the first good approximation of where our model should be located. Next, we describe the error function, which describes how far the edges of our approximation do not coincide with the edges in the resulting image. Using the image from distanceTransform we are already able to do this. And then we run the function optimization algorithm, changing only our approximation of the position of the object in space. As a result, our approximation should fairly accurately describe the real position of the object.

As a result, the algorithm can be divided into two stages:

- Image preprocessing - binarization, filtering, and the use of the distanceTransfrom function.
- Tracking - optimization of the error function.

#### Image preprocessing

At this point, you need to highlight the edges in the image. You can use the Canny boundary detector, but in our case, the usual threshold conversion or its adaptive version works better (in OpenCV, these are threshold or adaptiveThreshold functions). It is clear that there will be noise on such an image, so filtering is needed. Let's do it as follows - select the contour using the OpenCV function findCountours and delete segments that are too small or not enough like a line.

The processing result can be seen in the image:

Sequentially: the original image -> after the threshold transformation -> after filtering.

This image already quite clearly tells us where there is the right edge and where not. After that, we use the distanceTransform function, and as a result, we will have information about how far each point is from the edge. The resulting image is denoted as .

This is how it looks if normalized and visualized:

Next, we need some mathematical tools.

##### Function Optimization Algorithm

Function optimization is the task of finding the minimum of a function.

If we are dealing with a linear system of equations, then finding a minimum is quite simple. We represent the system of equations in matrix form , then our decision . If we have an overdetermined system of equations, you can use the method of least squares : .

If our function is non-linear, then the task becomes more complicated. To find the minimum, you can use the Gauss-Newton algorithm . The algorithm works as follows:

- It is assumed that we already have some initial approximation of the solution , which we will iteratively refine.
- Using the Taylor expansion, we can approximate our nonlinear function linear at the current approximation point. We solve the resulting linear system of equations by the least squares method, obtaining . As a result, the resulting solution will not be a solution, but it will be closer than the current approximation.
- We replace the current approximation with the obtained solution and go to step 2. So we repeat until the difference between and becomes less than a certain value.

Let's analyze the algorithm in more detail.

Let be a working function, a well-known vector of function values. If there is an ideal solution to the equation , the following statement is true . But we have only its approximation . Then the error vector from this approximation is denoted as: . A common error is the function: . Now finding one at which it will reach a minimum, we get a better approximation of the solution .

Starting from the approximation, we will iteratively approximate it, receiving at each iteration . To do this, we need each iteration to calculate the Jacobi matrix for the functionthe current approach, consisting of a derivative of our function:

And the next approximation is given by: .

Often tasks are constructed in such a way that we have a large number of data independent from each other (only from values ). As a result, the general Jacobi matrix is very sparse. There is a way to optimize the calculations.

Suppose a common function is computed from a set of points. From the

*jth*point we get . Rather than calculate the Jacobi matrix for all functions, compute Jacobi matrix specifically for and designate it as . Then the following approximation will be set as follows: . In addition, this change allows you to parallelize the calculations.It may happen that the next value will give a bigger error than . To solve this problem, you can use a modification of the algorithm - the Levenberg-Marquardt algorithm . Add a value to our formula:, where is the identity matrix. The value is selected as follows:

- first, it has some rather small value (such that the algorithm converges);
- then, if the error for is greater than , then increase the value and try to calculate the error for again.

The more non-linear the function , the greater the value should be . However, the larger the value , the slower the algorithm converges.

We complete the algorithm when it differs from by a sufficiently small value and take it as a solution.

The algorithm is quite universal and can be used for a variety of tasks.

##### Mathematical Tracking Model

Since we are dealing with coordinates in space, it is clear that we need to be able to manipulate these coordinates. Suppose we have some set of points . And we need to rotate them around the point with zero coordinates. Probably the easiest way is to use a rotation matrix of

*the R*, describing the required rotation: . If we need to move the point, simply add the appropriate vector*t*: .Thus, you can arbitrarily change the position of an object in space. It turns out that the coordinates of the object are determined by the three-dimensional matrix

*R*and the three-dimensional vector*t*, i.e. 12 parameters. Moreover, these parameters are not independent of each other, the components of the rotation matrix are interconnected by certain conditions. Therefore, from the point of view of using these functions in optimization, these parameters are not the best solution. There are more parameters than degrees of freedom, there is a relationship between them. There is another form of rotation - Rodrigue's rotation formula . This rotation is specified by three parameters, forming a three-dimensional vector.The normalized vector is the rotation axis, and the length of this vector is the rotation angle around this axis.

We define the rotation function of the vector

*v*: using the parameters*r*of the Rodrigue formula. We get out of this the following formula: .And in the end we can set the coordinates of the object 6-dimensional vector:

the following formula: .

##### Pinhole camera model

Now we describe a simple mathematical model of the camera used in the project:

$$

where is the focal distance in pixels; - The optical center is also in pixels. These are individual camera parameters, which are called intrinsic parameters of camera. Typically, these parameters are known in advance. In this project, these parameters are selected by eye.This model does not take into account the lens distortion of the cameras ( distortion ). Suppose they are not.

With this model, we get a central projection, all the points of which tend more toward the optical center, the farther away from the camera they are. Thus we get the effect of a narrowing railway:

In space, the camera is aligned with the

*z*axis , the image plane is parallel to the*xy*plane. We complement our model with the ability to move in space:$$

Thus, we obtained a model with which we can in algebraic form simulate the projection of points from the outside world onto the camera image (from world coordinates to screen). For us, the parameters of the relative position of the camera in space remain unknown in this model . These parameters are called extrinsic parameters of camera.

##### Tracking

Implemented already without OpenCV tools. First, we need to get the error function for our approximate solution, which was described above. And we will write its calculation in stages:

- We select such edges of the tracking model that are visible based on the parameters of the current approximation.
- We turn the selected set of edges into a fixed set of points, to simplify the calculations. It is possible, for example, to take the nth number of points from each edge, or (a more correct option) to choose such a quantity so that there is a fixed distance in pixels between the points. We will call them control points (in the project: controlPoint - control points and controlPixelDistance - the same fixed distance in pixels).
- We project control points on the image. Thanks to
*distanceImage,*we can get the distance of the projection of the control point to the edge in the image. In the ideal case, all control points should lie strictly on the edges of the image, i.e. the distance to the rib should be zero. Based on this we get an error for a specific reference point: . - We get the following error function:

Now we have to find the minimum of

*E*. To do this, we use the Levenberg-Marquardt algorithm described above. As we already know, the algorithm requires the calculation of the Jacobi matrix, i.e. derived functions. You can use the numerical finding of derivatives. You can also use some ready-made solutions to this algorithm. However, in this project everything was written manually, so I will describe the full conclusion of the whole solution.For each control point, we obtain an equation independent of other points. It has already been described above that in this case it is possible to consider these equations independently of each other, calculating the Jacobi matrix specifically for each. We analyze it in order, using the rules of differentiation of a complex function:

Denote , then

From here:

Next, denoteand , then:

Derivatives in the image

*distanceImage*are numerically. And to calculate the vectors and you will need to find derivatives according to the Rodrigue rotation formula. The Jacobian of this formula I found in the publication «A compact formula for the derivative of a 3 rotation in D-exponential coordinates» Guillermo Gallego, Anthony Yezzi :

,

wherein

*R*- is the rotation matrix obtained by the formula of Rodrigues rotation vector ; - the point we are turning;*I*is the identity matrix;. As we see here, we have a division by the length of the rotation vector, and if the vector is zero, then the formula no longer works. This is probably due to the fact that at the zero vector the rotation axis is not defined. If the rotation vector is very close to zero, then we use has the formula: .It remains to paint and (here the index

*j is*omitted):Thus, we got the Jacobi matrix for the point we need and can use it for the optimization algorithm described above.

There are several problems with this algorithm. Firstly, accuracy. As a result, the global position of the camera jumps slightly from frame to frame. You can fix it a bit. We have a priori information that the camera position cannot change dramatically from frame to frame. And we can reduce this jitter by adding additional equations to the function.

It should be remembered that the displacement vector

*t is*not in our case the coordinate of the global position of the camera. The global position is a local point with zero coordinates, so it can be*displayed*as follows:Remember the position of the previous frame in

*prevGlobalPosition*. Now the previous position should be close to zero, i.e. vector lengthshould be small enough. Those. besides other values of discrepancies, the vector*d*must also be minimized . To determine the impact of this modification, we introduce the value and multiply before adding the vector*d*for : . Those. in the optimization algorithm, we additionally minimize the vector*d '*. Of course, for this it will be necessary to calculate the Jacobi matrix for it, which is derived in the same way as we already derived above for the general error function.The second problem of the algorithm is that it can get stuck in local minima. In other works, this problem is solved using a particle filter. In our case, this option turned out to be, in principle, enough.

#### Bonuses from tracking an object

Knowing the position and shape of the object, you can visually manipulate them, which I tried to demonstrate on the video. The object was distorted using OpenGL shaders. With the help of our model, I projected the point of the object onto the image, and thus received the color of this point. Then you can move this point, getting interesting effects - for example, morphing. However, one must remember that shifting the point, it is necessary that something remains in its place, otherwise inconsistencies will become noticeable. In addition, depending on the quality of our tracking and the shape of the object, we will receive various undesirable effects due to accumulated errors, which will still be. It just needs to be taken into account somehow. In the video above, I wanted to show that augmented reality can be used a little wider,

By the way, the Vuforia SDK implements tracking of an object by its shape, though I don’t think that it would be possible to implement this project with it, since it is not possible to use strictly defined edges and cannot be associated with the illumination of the building.