# Least squares methods without tears and pain

- Tutorial

So, another article from the cycle "Math on the fingers ." Today we will continuetalk about least squares techniques, but this time from a programmer’s point of view. This is another article in the series, but it stands apart, since it generally does not require any knowledge of mathematics. The article was conceived as an introduction to the theory, so from basic skills it requires the ability to turn on the computer and write five lines of code. Of course, I will not stop at this article, and in the near future I will publish a sequel. If I can find enough time, I will write a book from this material. Target audience - programmers, so habr is a suitable place for running. In general, I don’t like to write formulas, and I really like to learn from examples, it seems to me that this is very important - not just look at the squiggles on the blackboard, but try everything on the tooth.

So, let's begin. Let's imagine that I have a triangulated surface with a scan of my face (in the picture on the left). What do I need to do to strengthen the characteristics of turning this surface into a grotesque mask?

In this particular case, I solve an elliptic differential equation named after Simeon Demi Poisson . Comrades, programmers, let's play a game: think for how many lines in the C ++ code that solves it? Third-party libraries can not be called, we have at our disposal only the bare compiler. The answer is under the cut.

In fact, twenty lines of code is enough for a solver. If you count with everything, everything, including the parser of the file of the 3D model, then two hundred lines to keep within - just spit.

Let's tell how it works. Let's start from afar, imagine that we have a regular array of f, for example, of 32 elements, initialized as follows:

And then we will perform the following procedure a thousand times: for each cell f [i] we write the average value of the neighboring cells into it: f [ i] = (f [i-1] + f [i + 1]) / 2. To make it clearer, here is the full code:

Each iteration will smooth the data of our array, and after a thousand iterations we will get a constant value in all cells. Here is the animation of the first hundred and fifty iterations:

If it is not clear to you why anti-aliasing occurs, stop right now, grab a pen and try to paint the examples, otherwise there is no sense to read further. The triangulated surface is not fundamentally different from this example. Imagine that for each vertex we find neighbors with it, calculate their center of mass, and move our vertex to this center of mass, and so ten times. The result will be like this:

Of course, if you do not stop at ten iterations, then after a while the entire surface will shrink into one point exactly the same as in the previous example, the entire array became filled with the same value.

The full code is available on the githaba , and here I will give the most important part, omitting only reading and writing 3D models. So, the triangulated model is represented by two arrays: verts and faces. An array of verts is just a set of three-dimensional points, they are also vertices of a polygonal mesh. The faces array is a set of triangles (the number of triangles is equal to faces.size ()); for each triangle, indices from the array of vertices are stored in the array. The data format and work with him I described in detail in my course of lectures on computer graphics. There is a third array, which I recalculate from the first two (more precisely, only from the faces array) - vvadj. This is an array, which for each vertex (the first index of a two-dimensional array) stores the indices of its neighboring vertices (the second index).

The first thing I do is consider the curvature vector for each vertex of my surface. Let's illustrate: for the current vertex v, I iterate over all its neighbors n1-n4; then I consider their center of mass b = (n1 + n2 + n3 + n4) / 4. Well, the final curvature vector can be calculated as c = vb, it is nothing but ordinary finite differences for the second derivative .

Directly in the code, it looks like this:

Well, then we do the following thing many times (see the previous picture): we move v: v = b + const * c. Please note that if the constant is equal to one, then our vertex will not move anywhere! If the constant is zero, then the vertex is replaced by the center of mass of the neighboring vertices, which will smooth our surface. If the constant is greater than unity (the title picture was made using const = 2.1), then the vertex will move in the direction of the surface curvature vector, reinforcing the details. This is what the code looks like:

By the way, if less than one, then the details will be weakened (const = 0.5), but this will not be equivalent to smoothing, the “picture contrast” will remain:

Please note that my code generates a 3D model file in the Wavefront .obj format , I rendered third-party program. You can see the resulting model, for example, in the online viewer . If you are interested in the rendering methods, and not in the model generation, then read my course of lectures on computer graphics .

Let's go back to the very first example, and do exactly the same thing, but let's not rewrite the elements of the array numbered 0, 18 and 31:

The rest of the “free” elements of the array I initialized with zeros, and still iteratively replace them with the average value of the neighboring elements. This is how the array evolution looks at the first hundred and fifty iterations:

It is quite obvious that this time the solution will converge not to the permanent element that fills the array, but to two linear ramps. By the way, is it obvious to everyone? If not, experiment with this code, I specifically give examples with a very short code so that you can thoroughly understand what is happening.

Let us be given the usual system of linear equations:

It can be rewritten, leaving in each of the equations on one side the equal sign x_i:

Let us be given an arbitrary vector that approximates the solution of the system of equations (for example, zero).

Then, sticking it into the right side of our system, we can get an updated solution approximation vector .

To make it clearer, x1 is obtained from x0 as follows:

Repeating the process k times, the solution will be approximated by a vector.

Let's write the recurring formula just in case:

Under certain assumptions about the coefficients of the system (for example, it is quite obvious that the diagonal elements should not be zero, because we divide them), this procedure converges to the true solution. This gymnastics is known in the literature under the name of the Jacobi method . Of course, there are other methods for the numerical solution of systems of linear equations, and much more powerful ones, for example, the conjugate gradient method , but perhaps the Jacobi method is one of the simplest.

And now let's take another close look at the main loop from Example 3:

I started from some initial vector x, and I update it a thousand times, and the update procedure is suspiciously similar to the Jacobi method! Let's write out this system of equations explicitly:

Spend a little time, make sure that each iteration in my Python code is strictly one update of the Jacobi method for this system of equations. The values of x [0], x [18] and x [31] are fixed for me, respectively, they are not included in the set of variables, so they are transferred to the right-hand side.

So, all the equations in our system look like - x [i-1] + 2 x [i] - x [i + 1] = 0. This expression is nothing but the usual finite differences for the second derivative. That is, our system of equations simply dictates that the second derivative should be zero everywhere (well, except at the point x [18]). Remember, I said that it is quite obvious that the iterations should converge to linear ramps? So that is why, for a linear function, the second derivative is zero.

Did you know that we just solved the Dirichlet problem for the Laplace equation ?

By the way, the attentive reader should have noticed that, strictly speaking, in my code, the linear equation systems are not solved by the Jacobi method, but by the Gauss-Seidel method , which is a kind of optimization of the Jacobi method:

And let's change the third example just a little: each cell is placed not just in the center of mass of neighboring cells, but in the center of mass

In the past example, we found out that placing the center of mass is a discretization of the Laplace operator (in our case, the second derivative). That is, now we are solving a system of equations that says that our signal should have some constant second derivative. The second derivative is the curvature of the surface; thus, a piecewise quadratic function should be the solution to our system. Let's check on sampling at 32 samples:

With an array length of 32 elements, our system converges to a solution in a couple of hundred iterations. And what will happen if we try an array of 128 elements? Everything is much sadder, the number of iterations must be calculated in the thousands:

The Gauss-Seidel method is extremely simple in programming, but it is hardly applicable for large systems of equations. You can try to speed it up by applying, for example,multigrid methods . In words it may sound cumbersome, but the idea is extremely primitive: if we want a solution with a resolution of a thousand elements, we can first solve with ten elements, get a rough approximation, then double the resolution, solve again, and so on, until we reach the desired result. In practice, it looks like this:

But you can not show off, and use real solvers of systems of equations. So I solve the same example by constructing the matrix A and the column b, then solving the matrix equation Ax = b:

And here is the result of the work of this program, we note that the solution came out instantly:

Thus, indeed, our function is piecewise quadratic (the second derivative is equal to a constant). In the first example, we set the zero second derivative, in the third non-zero, but the same everywhere. And what was the second example? We solved the discrete Poisson equation by defining the curvature of the surface. I remind you what happened: we calculated the curvature of the incoming surface. If we solve the Poisson problem by setting the curvature of the surface at the exit to be equal to the curvature of the surface at the entrance (const = 1), then nothing will change. Strengthening the characteristic features of the face occurs when we simply increase the curvature (const = 2.1). And if const <1, then the curvature of the resulting surface decreases.

Developing the idea proposed by SquareRootOfZero , play with this code:

This is the default result, red Lenin is the initial data, the blue curve is their evolution, at infinity the result will converge to a point:

And here is the result with a coefficient of 2 .:

Homework: why in the second case Lenin first turns into Dzerzhinsky and then again converges to Lenin the same, but larger?

A lot of data processing problems, in particular, geometry, can be formulated as a solution of a system of linear equations. In this article I did not tell

By the way, the least squares are present in the title of the article. Did you see them in the text? If not, then it is absolutely not scary, this is exactly the answer to the question “how?”. Stay on the line, in the next article I will show exactly where they are hiding, and how to modify them, in order to get access to an extremely powerful data processing tool. For example, in a dozen lines of code you can get this:

Stay tuned for more!

So, let's begin. Let's imagine that I have a triangulated surface with a scan of my face (in the picture on the left). What do I need to do to strengthen the characteristics of turning this surface into a grotesque mask?

In this particular case, I solve an elliptic differential equation named after Simeon Demi Poisson . Comrades, programmers, let's play a game: think for how many lines in the C ++ code that solves it? Third-party libraries can not be called, we have at our disposal only the bare compiler. The answer is under the cut.

In fact, twenty lines of code is enough for a solver. If you count with everything, everything, including the parser of the file of the 3D model, then two hundred lines to keep within - just spit.

# Example 1: data smoothing

Let's tell how it works. Let's start from afar, imagine that we have a regular array of f, for example, of 32 elements, initialized as follows:

And then we will perform the following procedure a thousand times: for each cell f [i] we write the average value of the neighboring cells into it: f [ i] = (f [i-1] + f [i + 1]) / 2. To make it clearer, here is the full code:

```
import matplotlib.pyplot as plt
f = [0.] * 32
f[0] = f[-1] = 1.
f[18] = 2.
plt.plot(f, drawstyle='steps-mid')
for iter in range(1000):
f[0] = f[1]
for i in range(1, len(f)-1):
f[i] = (f[i-1]+f[i+1])/2.
f[-1] = f[-2]
plt.plot(f, drawstyle='steps-mid')
plt.show()
```

Each iteration will smooth the data of our array, and after a thousand iterations we will get a constant value in all cells. Here is the animation of the first hundred and fifty iterations:

If it is not clear to you why anti-aliasing occurs, stop right now, grab a pen and try to paint the examples, otherwise there is no sense to read further. The triangulated surface is not fundamentally different from this example. Imagine that for each vertex we find neighbors with it, calculate their center of mass, and move our vertex to this center of mass, and so ten times. The result will be like this:

Of course, if you do not stop at ten iterations, then after a while the entire surface will shrink into one point exactly the same as in the previous example, the entire array became filled with the same value.

# Example 2: Amplification / Attenuation

The full code is available on the githaba , and here I will give the most important part, omitting only reading and writing 3D models. So, the triangulated model is represented by two arrays: verts and faces. An array of verts is just a set of three-dimensional points, they are also vertices of a polygonal mesh. The faces array is a set of triangles (the number of triangles is equal to faces.size ()); for each triangle, indices from the array of vertices are stored in the array. The data format and work with him I described in detail in my course of lectures on computer graphics. There is a third array, which I recalculate from the first two (more precisely, only from the faces array) - vvadj. This is an array, which for each vertex (the first index of a two-dimensional array) stores the indices of its neighboring vertices (the second index).

```
std::vector<Vec3f> verts;
std::vector<std::vector<int> > faces;
std::vector<std::vector<int> > vvadj;
```

The first thing I do is consider the curvature vector for each vertex of my surface. Let's illustrate: for the current vertex v, I iterate over all its neighbors n1-n4; then I consider their center of mass b = (n1 + n2 + n3 + n4) / 4. Well, the final curvature vector can be calculated as c = vb, it is nothing but ordinary finite differences for the second derivative .

Directly in the code, it looks like this:

```
std::vector<Vec3f> curvature(verts.size(), Vec3f(0,0,0));
for (int i=0; i<(int)verts.size(); i++) {
for (int j=0; j<(int)vvadj[i].size(); j++) {
curvature[i] = curvature[i] - verts[vvadj[i][j]];
}
curvature[i] = verts[i] + curvature[i] / (float)vvadj[i].size();
}
```

Well, then we do the following thing many times (see the previous picture): we move v: v = b + const * c. Please note that if the constant is equal to one, then our vertex will not move anywhere! If the constant is zero, then the vertex is replaced by the center of mass of the neighboring vertices, which will smooth our surface. If the constant is greater than unity (the title picture was made using const = 2.1), then the vertex will move in the direction of the surface curvature vector, reinforcing the details. This is what the code looks like:

```
for (int it=0; it<100; it++) {
for (int i=0; i<(int)verts.size(); i++) {
Vec3f bary(0,0,0);
for (int j=0; j<(int)vvadj[i].size(); j++) {
bary = bary + verts[vvadj[i][j]];
}
bary = bary / (float)vvadj[i].size();
verts[i] = bary + curvature[i]*2.1; // play with the coeff here
}
}
```

By the way, if less than one, then the details will be weakened (const = 0.5), but this will not be equivalent to smoothing, the “picture contrast” will remain:

Please note that my code generates a 3D model file in the Wavefront .obj format , I rendered third-party program. You can see the resulting model, for example, in the online viewer . If you are interested in the rendering methods, and not in the model generation, then read my course of lectures on computer graphics .

# Example 3: add restrictions

Let's go back to the very first example, and do exactly the same thing, but let's not rewrite the elements of the array numbered 0, 18 and 31:

```
import matplotlib.pyplot as plt
x = [0.] * 32
x[0] = x[31] = 1.
x[18] = 2.
plt.plot(x, drawstyle='steps-mid')
for iter in range(1000):
for i in range(len(x)):
if i in [0,18,31]: continue
x[i] = (x[i-1]+x[i+1])/2.
plt.plot(x, drawstyle='steps-mid')
plt.show()
```

The rest of the “free” elements of the array I initialized with zeros, and still iteratively replace them with the average value of the neighboring elements. This is how the array evolution looks at the first hundred and fifty iterations:

It is quite obvious that this time the solution will converge not to the permanent element that fills the array, but to two linear ramps. By the way, is it obvious to everyone? If not, experiment with this code, I specifically give examples with a very short code so that you can thoroughly understand what is happening.

# Lyrical digression: numerical solution of systems of linear equations.

Let us be given the usual system of linear equations:

It can be rewritten, leaving in each of the equations on one side the equal sign x_i:

Let us be given an arbitrary vector that approximates the solution of the system of equations (for example, zero).

Then, sticking it into the right side of our system, we can get an updated solution approximation vector .

To make it clearer, x1 is obtained from x0 as follows:

Repeating the process k times, the solution will be approximated by a vector.

Let's write the recurring formula just in case:

Under certain assumptions about the coefficients of the system (for example, it is quite obvious that the diagonal elements should not be zero, because we divide them), this procedure converges to the true solution. This gymnastics is known in the literature under the name of the Jacobi method . Of course, there are other methods for the numerical solution of systems of linear equations, and much more powerful ones, for example, the conjugate gradient method , but perhaps the Jacobi method is one of the simplest.

# Example 3 again, but on the other hand

And now let's take another close look at the main loop from Example 3:

```
for iter in range(1000):
for i in range(len(x)):
if i in [0,18,31]: continue
x[i] = (x[i-1]+x[i+1])/2.
```

I started from some initial vector x, and I update it a thousand times, and the update procedure is suspiciously similar to the Jacobi method! Let's write out this system of equations explicitly:

Spend a little time, make sure that each iteration in my Python code is strictly one update of the Jacobi method for this system of equations. The values of x [0], x [18] and x [31] are fixed for me, respectively, they are not included in the set of variables, so they are transferred to the right-hand side.

So, all the equations in our system look like - x [i-1] + 2 x [i] - x [i + 1] = 0. This expression is nothing but the usual finite differences for the second derivative. That is, our system of equations simply dictates that the second derivative should be zero everywhere (well, except at the point x [18]). Remember, I said that it is quite obvious that the iterations should converge to linear ramps? So that is why, for a linear function, the second derivative is zero.

Did you know that we just solved the Dirichlet problem for the Laplace equation ?

By the way, the attentive reader should have noticed that, strictly speaking, in my code, the linear equation systems are not solved by the Jacobi method, but by the Gauss-Seidel method , which is a kind of optimization of the Jacobi method:

# Example 4: Poisson’s equation

And let's change the third example just a little: each cell is placed not just in the center of mass of neighboring cells, but in the center of mass

*plus some (arbitrary) constant:*```
import matplotlib.pyplot as plt
x = [0.] * 32
x[0] = x[31] = 1.
x[18] = 2.
plt.plot(x, drawstyle='steps-mid')
for iter in range(1000):
for i in range(len(x)):
if i in [0,18,31]: continue
x[i] = (x[i-1]+x[i+1] +11./32**2 )/2.
plt.plot(x, drawstyle='steps-mid')
plt.show()
```

In the past example, we found out that placing the center of mass is a discretization of the Laplace operator (in our case, the second derivative). That is, now we are solving a system of equations that says that our signal should have some constant second derivative. The second derivative is the curvature of the surface; thus, a piecewise quadratic function should be the solution to our system. Let's check on sampling at 32 samples:

With an array length of 32 elements, our system converges to a solution in a couple of hundred iterations. And what will happen if we try an array of 128 elements? Everything is much sadder, the number of iterations must be calculated in the thousands:

The Gauss-Seidel method is extremely simple in programming, but it is hardly applicable for large systems of equations. You can try to speed it up by applying, for example,multigrid methods . In words it may sound cumbersome, but the idea is extremely primitive: if we want a solution with a resolution of a thousand elements, we can first solve with ten elements, get a rough approximation, then double the resolution, solve again, and so on, until we reach the desired result. In practice, it looks like this:

But you can not show off, and use real solvers of systems of equations. So I solve the same example by constructing the matrix A and the column b, then solving the matrix equation Ax = b:

```
import numpy as np
import matplotlib.pyplot as plt
n=1000
x = [0.] * n
x[0] = x[-1] = 1.
m = n*57//100
x[m] = 2.
A = np.matrix(np.zeros((n, n)))
for i in range(1,n-2):
A[i, i-1] = -1.
A[i, i] = 2.
A[i, i+1] = -1.
A = A[1:-2,1:-2]
A[m-2,m-1] = 0
A[m-1,m-2] = 0
b = np.matrix(np.zeros((n-3, 1)))
b[0,0] = x[0]
b[m-2,0] = x[m]
b[m-1,0] = x[m]
b[-1,0] = x[-1]
for i in range(n-3):
b[i,0] += 11./n**2
x2 = ((np.linalg.inv(A)*b).transpose().tolist()[0])
x2.insert(0, x[0])
x2.insert(m, x[m])
x2.append(x[-1])
plt.plot(x2, drawstyle='steps-mid')
plt.show()
```

And here is the result of the work of this program, we note that the solution came out instantly:

Thus, indeed, our function is piecewise quadratic (the second derivative is equal to a constant). In the first example, we set the zero second derivative, in the third non-zero, but the same everywhere. And what was the second example? We solved the discrete Poisson equation by defining the curvature of the surface. I remind you what happened: we calculated the curvature of the incoming surface. If we solve the Poisson problem by setting the curvature of the surface at the exit to be equal to the curvature of the surface at the entrance (const = 1), then nothing will change. Strengthening the characteristic features of the face occurs when we simply increase the curvature (const = 2.1). And if const <1, then the curvature of the resulting surface decreases.

# Update: another toy for homework

Developing the idea proposed by SquareRootOfZero , play with this code:

**Hidden text**

```
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
fig, ax = plt.subplots()
x = [282, 282, 277, 274, 266, 259, 258, 249, 248, 242, 240, 238, 240, 239, 242, 242, 244, 244, 247, 247, 249, 249, 250, 251, 253, 252, 254, 253, 254, 254, 257, 258, 258, 257, 256, 253, 253, 251, 250, 250, 249, 247, 245, 242, 241, 237, 235, 232, 228, 225, 225, 224, 222, 218, 215, 211, 208, 203, 199, 193, 185, 181, 173, 163, 147, 144, 142, 134, 131, 127, 121, 113, 109, 106, 104, 99, 95, 92, 90, 87, 82, 78, 77, 76, 73, 72, 71, 65, 62, 61, 60, 57, 56, 55, 54, 53, 52, 51, 45, 42, 40, 40, 38, 40, 38, 40, 40, 43, 45, 45, 45, 43, 42, 39, 36, 35, 22, 20, 19, 19, 20, 21, 22, 27, 26, 25, 21, 19, 19, 20, 20, 22, 22, 25, 24, 26, 28, 28, 27, 25, 25, 20, 20, 19, 19, 21, 22, 23, 25, 25, 28, 29, 33, 34, 39, 40, 42, 43, 49, 50, 55, 59, 67, 72, 80, 83, 86, 88, 89, 92, 92, 92, 89, 89, 87, 84, 81, 78, 76, 73, 72, 71, 70, 67, 67]
y = [0, 76, 81, 83, 87, 93, 94, 103, 106, 112, 117, 124, 126, 127, 130, 133, 135, 137, 140, 142, 143, 145, 146, 153, 156, 159, 160, 165, 167, 169, 176, 182, 194, 199, 203, 210, 215, 217, 222, 226, 229, 236, 240, 243, 246, 250, 254, 261, 266, 271, 273, 275, 277, 280, 285, 287, 289, 292, 294, 297, 300, 301, 302, 303, 301, 301, 302, 301, 303, 302, 300, 300, 299, 298, 296, 294, 293, 293, 291, 288, 287, 284, 282, 282, 280, 279, 277, 273, 268, 267, 265, 262, 260, 257, 253, 245, 240, 238, 228, 215, 214, 211, 209, 204, 203, 202, 200, 197, 193, 191, 189, 186, 185, 184, 179, 176, 163, 158, 154, 152, 150, 147, 145, 142, 140, 139, 136, 133, 128, 127, 124, 123, 121, 117, 111, 106, 105, 101, 94, 92, 90, 85, 82, 81, 62, 55, 53, 51, 50, 48, 48, 47, 47, 48, 48, 49, 49, 51, 51, 53, 54, 54, 58, 59, 58, 56, 56, 55, 54, 50, 48, 46, 44, 41, 36, 31, 21, 16, 13, 11, 7, 5, 4, 2, 0]
n = len(x)
cx = x[:]
cy = y[:]
for i in range(0,n):
bx = (x[(i-1+n)%n] + x[(i+1)%n] )/2.
by = (y[(i-1+n)%n] + y[(i+1)%n] )/2.
cx[i] = cx[i] - bx
cy[i] = cy[i] - by
lines = [ax.plot(x, y)[0], ax.text(0.05, 0.05, "Iteration #0", transform=ax.transAxes, fontsize=14,bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5)), ax.plot(x, y)[0] ]
defanimate(iteration):global x, y
print(iteration)
for i in range(0,n):
x[i] = (x[(i-1+n)%n]+x[(i+1)%n])/2. + 0.*cx[i] # play with the coeff here, 0. by default
y[i] = (y[(i-1+n)%n]+y[(i+1)%n])/2. + 0.*cy[i]
lines[0].set_data(x, y) # update the data.
lines[1].set_text("Iteration #" + str(iteration))
plt.draw()
ax.relim()
ax.autoscale_view(False,True,True)
return lines
ani = animation.FuncAnimation(fig, animate, frames=np.arange(0, 100), interval=1, blit=False, save_count=50)
#ani.save('line.gif', dpi=80, writer='imagemagick')
plt.show()
```

This is the default result, red Lenin is the initial data, the blue curve is their evolution, at infinity the result will converge to a point:

And here is the result with a coefficient of 2 .:

Homework: why in the second case Lenin first turns into Dzerzhinsky and then again converges to Lenin the same, but larger?

# Conclusion

A lot of data processing problems, in particular, geometry, can be formulated as a solution of a system of linear equations. In this article I did not tell

**how to**build these systems, my goal was only to show that this is possible. The topic of the next article will no longer be “why”, but “how”, and which solvers will then be used.By the way, the least squares are present in the title of the article. Did you see them in the text? If not, then it is absolutely not scary, this is exactly the answer to the question “how?”. Stay on the line, in the next article I will show exactly where they are hiding, and how to modify them, in order to get access to an extremely powerful data processing tool. For example, in a dozen lines of code you can get this:

Stay tuned for more!