# How I fought with cameras, or GMT in inept hands

Good evening, dear Khabrovites, good evening, the glorious city of Belgorod.

I'll tell you today a tale about a fool. And he is a fool (I, then run) because he did not follow one simple truth:

And it will be about how the fool tried to teach to find the position of the camera in space.

I had to write a programming project at the end of my second year. I decided, as usual, to hunt for freebies and join some group at the end of the year. Did not work out. He agreed with the dean and prep about the completion of the project in late August and calmly left to teach children art. Returning to Nerizinovaya, I realized that I had essentially two weeks left. Since then, almost all the time I was sitting in a coffee shop and doing pretty nice coding. The essence of the project was to determine the spatial position of the fingertips using two webcams in real time.

It is clear that each pixel in the image from the camera in space corresponds to a ray in space. Two cameras - all of a~~sudden~~two rays intersecting at a point of interest to us. In theory, everything is simple. In practice, I night bolted to MSVS library of the OpenCV , and then one and a half weeks to create a variety of image processing algorithms to quickly write a simple 3D viewer, compiled two cameras together and debug, debug, debug ... At the time basis in the space I asked easily - put the camera on one line, directed them “approximately up” and counted the distance between cameras for conventional 1000 units.

In general, everything was almost ready. The cameras individually were able to catch a single finger, and with good accuracy, all the mathematical functions of the recounts were calculated, a feature was even written that made it possible to have not any black, but almost any motionless background. But something was not right - a point in space made out bizarre somersaults with an amplitude of about a centimeter when moving with a hand. Trouble! And then I realized that the waitress three hours ago just hit the camera a little.

I sighed and realized that I would have to write a function that determines the position and orientation of the camera itself. The camera's field of view, ideally, is an infinite quadrangular pyramid with a rectangle at the base. It is completely set by eight values: 3 coordinates of the vertex, 2 coordinates of the direction vector (“bisector” of the pyramid, axis), 1 - rotation around the axis, and 2 more - angular viewing width.

The last two coordinates are initially known - google the viewing angle of the camera diagonally and solve the simplest geometric problem. Turning around the axis - it’s clear that, but it can vary depending on the position of the camera. There are two coordinates of the direction vector, because it can be specified as the end of a vector of a certain length (3 coordinates) and one unknown will be removed from the equation x ^ 2 + y ^ 2 + z ^ 2 = l ^ 2. Well, the three coordinates of the vertex are understandable. Total, we need to calculate 6 quantities.

"ABOUT! I need a triangle! ” I exclaimed. 3 points, and with each image we get 2 numbers. Total plan - we put in space some isosceles right triangle and say that its coordinates are (100, 0, 0), (0, 0, 0) and (0, 100, 0). Next, we mark the vertices of this triangle on the camera image and all that remains to be done is to substitute the values in a simple formula. Well, I thought so, anyway.

But it was not there. I killed 4 hours to find this formula using the usual methods of exact mathematics, connected two of my best school mathematics friends to the search for solutions, began to type Wolframalfa's address faster than the password, but all I got was that the exact solution exists, only, but after finding it I know Zen.

And then the fool made a mistake. There was a system of 6 trigonometric equations closely tied to the equation of a circle. And just in the next semester, we passed Calculations, in which, as you know, a method for solving nonlinear systems is described. And it would be right to read the theory and do everything as expected - despite the fact that it will take more time, the result will be better and faster, and also useful for self-development. But no, the tricks of Peter I woke up in me and I decided to chop with an ax.

As is known from the school course of planimetry, the HMT from which this segment (AB) is visible at a given angle (alpha) is an arc of a circle. The figure clarifies everything.

Plus, in space this picture can be rotated around a segment. We get something like a torus, only without a hole. Since there are three segments, we get three tori, or rather, three torus surfaces. One surface is a flat figure, the intersection of two is already a line (in the general case, several closed curves), Three surfaces are already a point. Torahs in the picture below.

So, the clumsy method: we must cross these three tori. And since computer science is discrete, we will have to represent the surface of the torus with nodes of a mesh stretched over it. Here it is:

Further into the blunt, the distances between the points are compared and the closest to each other is found.

As a result, this function ate more memory than the rest of the project with all debugging images and mountains of necessary junk, worked for five minutes on one camera (long live real-time!), And sometimes it was wrong.

And after half a year, out of boredom, as a couple, I wrote the function of intersecting these tori. Everything is as it should be, with subtractions, matrices and other things. She worked instantly, thought for sure, and was generally light and pleasant. But since that project was over, it was written on the fly, with zero design, and, therefore, I can’t understand anything in the text (then I was still afraid of the word “class”), I left the source code in my office. And now it's time to finally finish this project, which is what I, in fact, am doing. But this is a completely different story.

Goodbye, dear Khabrovites, good dreams, the city of Belgorod.

PS Soon I plan to describe image processing algorithms - I just remember them myself. So see you soon!

I'll tell you today a tale about a fool. And he is a fool (I, then run) because he did not follow one simple truth:

The famous programmer laziness lies in the fact that instead of unnecessary gestures (whether your own, machine ones), it is better to think and find a solution more elegant and simpler.

And it will be about how the fool tried to teach to find the position of the camera in space.

#### The saying

I had to write a programming project at the end of my second year. I decided, as usual, to hunt for freebies and join some group at the end of the year. Did not work out. He agreed with the dean and prep about the completion of the project in late August and calmly left to teach children art. Returning to Nerizinovaya, I realized that I had essentially two weeks left. Since then, almost all the time I was sitting in a coffee shop and doing pretty nice coding. The essence of the project was to determine the spatial position of the fingertips using two webcams in real time.

It is clear that each pixel in the image from the camera in space corresponds to a ray in space. Two cameras - all of a

In general, everything was almost ready. The cameras individually were able to catch a single finger, and with good accuracy, all the mathematical functions of the recounts were calculated, a feature was even written that made it possible to have not any black, but almost any motionless background. But something was not right - a point in space made out bizarre somersaults with an amplitude of about a centimeter when moving with a hand. Trouble! And then I realized that the waitress three hours ago just hit the camera a little.

#### Staging

I sighed and realized that I would have to write a function that determines the position and orientation of the camera itself. The camera's field of view, ideally, is an infinite quadrangular pyramid with a rectangle at the base. It is completely set by eight values: 3 coordinates of the vertex, 2 coordinates of the direction vector (“bisector” of the pyramid, axis), 1 - rotation around the axis, and 2 more - angular viewing width.

The last two coordinates are initially known - google the viewing angle of the camera diagonally and solve the simplest geometric problem. Turning around the axis - it’s clear that, but it can vary depending on the position of the camera. There are two coordinates of the direction vector, because it can be specified as the end of a vector of a certain length (3 coordinates) and one unknown will be removed from the equation x ^ 2 + y ^ 2 + z ^ 2 = l ^ 2. Well, the three coordinates of the vertex are understandable. Total, we need to calculate 6 quantities.

"ABOUT! I need a triangle! ” I exclaimed. 3 points, and with each image we get 2 numbers. Total plan - we put in space some isosceles right triangle and say that its coordinates are (100, 0, 0), (0, 0, 0) and (0, 100, 0). Next, we mark the vertices of this triangle on the camera image and all that remains to be done is to substitute the values in a simple formula. Well, I thought so, anyway.

But it was not there. I killed 4 hours to find this formula using the usual methods of exact mathematics, connected two of my best school mathematics friends to the search for solutions, began to type Wolframalfa's address faster than the password, but all I got was that the exact solution exists, only, but after finding it I know Zen.

And then the fool made a mistake. There was a system of 6 trigonometric equations closely tied to the equation of a circle. And just in the next semester, we passed Calculations, in which, as you know, a method for solving nonlinear systems is described. And it would be right to read the theory and do everything as expected - despite the fact that it will take more time, the result will be better and faster, and also useful for self-development. But no, the tricks of Peter I woke up in me and I decided to chop with an ax.

#### Decision

As is known from the school course of planimetry, the HMT from which this segment (AB) is visible at a given angle (alpha) is an arc of a circle. The figure clarifies everything.

Plus, in space this picture can be rotated around a segment. We get something like a torus, only without a hole. Since there are three segments, we get three tori, or rather, three torus surfaces. One surface is a flat figure, the intersection of two is already a line (in the general case, several closed curves), Three surfaces are already a point. Torahs in the picture below.

So, the clumsy method: we must cross these three tori. And since computer science is discrete, we will have to represent the surface of the torus with nodes of a mesh stretched over it. Here it is:

Further into the blunt, the distances between the points are compared and the closest to each other is found.

As a result, this function ate more memory than the rest of the project with all debugging images and mountains of necessary junk, worked for five minutes on one camera (long live real-time!), And sometimes it was wrong.

And after half a year, out of boredom, as a couple, I wrote the function of intersecting these tori. Everything is as it should be, with subtractions, matrices and other things. She worked instantly, thought for sure, and was generally light and pleasant. But since that project was over, it was written on the fly, with zero design, and, therefore, I can’t understand anything in the text (then I was still afraid of the word “class”), I left the source code in my office. And now it's time to finally finish this project, which is what I, in fact, am doing. But this is a completely different story.

Goodbye, dear Khabrovites, good dreams, the city of Belgorod.

PS Soon I plan to describe image processing algorithms - I just remember them myself. So see you soon!