# Perspective matrix in the graphics API or the devil is hiding in the details

- From the sandbox
- Tutorial

At a certain moment, any developer in the field of computer graphics has a question: how do these promising matrices work? Sometimes the answer is very difficult to find, and, as it usually happens, the bulk of developers quit this lesson halfway.

This is not a solution to the problem! Let's figure it out together!

We will be realistic with a practical bias and take OpenGL version 3.3 as a test subject. Starting with this version, each developer is required to independently implement the matrix operations module. Great, this is what we need. We will decompose our difficult task and highlight the main points. Some facts from the OpenGL specification:

There are two ways to store matrices: column-major and row-major. In lectures on linear algebra, the row-major scheme is used. By and large, the representation of matrices in memory does not matter, because a matrix can always be converted into one type of representation in another by simple transposition. And since there is no difference, then for all subsequent calculations we will use the classic row-major matrices. When programming OpenGL, there is a little trick that allows you to refuse from transposing matrices while maintaining the classic row-major calculations. The matrix needs to be transferred to the shader program as it is, and in the shader, multiplying not the vector by the matrix, but the matrix by the vector.

Homogeneous coordinates are not a very tricky system with a number of simple rules for translating the usual Cartesian coordinates into homogeneous coordinates and vice versa. The uniform coordinate is a row matrix of dimension [1x4]. In order to translate the Cartesian coordinate into a homogeneous coordinate,

- Cartesian coordinates

- homogeneous coordinates

A little trick: If

A few words about zero as

Is the point where (

is the vector, where (

The inverse translation of the vertex from homogeneous coordinates to Cartesian coordinates is carried out as follows. All components of the row matrix must be divided into the last component. In other words:

- homogeneous coordinates

- Cartesian coordinates

The main thing that you need to know is that all OpenGL clipping and rasterization algorithms work in Cartesian coordinates, but before that, all transformations are performed in homogeneous coordinates. The transition from homogeneous coordinates to Cartesian coordinates is done in hardware.

Canonical view volume (CVV) is one of the few documented parts of OpenGL. As can be seen from fig. 1 CVV is a cube aligned along the axes with a center at the origin and an edge length equal to two. Everything that falls into the CVV area is subject to rasterization, everything that is outside of CVV is ignored. Anything that partially goes beyond CVV is subject to clipping algorithms. The most important thing to know is that the CVV coordinate system is left-handed!

Fig. 1. Canonical OpenGL clipping volume (CVV)

Left-handed coordinate system? How is it that the specification for OpenGL 1.0 clearly states that the coordinate system used is right-handed? Let's get it right.

Fig. 2. Coordinate systems

As can be seen from Fig. 2 coordinate systems differ only in the direction of the

The main idea of the above is that the developer himself is free to choose the type of user coordinate system and must correctly describe projection matrices. This is where the facts about OpenGL are over and it's time to put everything together.

One of the most widespread and difficult to comprehend matrices is a matrix of perspective transformation. So how does it relate to CVV and the user coordinate system? Why do objects become smaller with increasing distance to the observer? In order to understand why objects decrease with increasing distance, let's look at the matrix transformations of a three-dimensional model step by step. It is no secret that any three-dimensional model consists of a finite list of vertices that undergo matrix transformations completely independently of each other. In order to determine the coordinate of a three-dimensional vertex on a two-dimensional monitor screen, you must:

The translation of the Cartesian coordinate into a homogeneous coordinate was discussed earlier. The geometric meaning of the model matrix is to translate the model from the local coordinate system to the global coordinate system. Or, as they say, to take the peaks out of model space into world space. Let's just say that a three-dimensional object loaded from a file is located in the model space, where the coordinates are measured relative to the object itself. Then, using the model matrix, the model is positioned, scaled and rotated. As a result, all the vertices of the three-dimensional model receive the actual homogeneous coordinates in the three-dimensional scene. The model space relative to world space is local. Coordinates are transferred from the model space to the world space (from local to global).

Now go to step three. Here the species space begins to work. In this space, the coordinates are measured relative to the position and orientation of the observer as if he were the center of the world. The view space is local with respect to the world space, therefore the coordinates must be entered into it (and not made out, as in the previous case). The direct matrix transformation takes the coordinates out of some space. To put them into it, on the contrary, it is necessary to invert the matrix transformation, therefore, the species transformation is described by the inverse matrix. How to get this inverse matrix? To begin with, we obtain the direct matrix of the observer. What characterizes the observer? The observer is described by the coordinate in which he is and the direction vectors of the view. The observer always looks in the direction of his local axis.

Step four is the most interesting step. The previous steps were considered in such detail intentionally, so that the reader had a complete picture of all the operands of the fourth step. In the fourth step, the homogeneous coordinates are moved out of the viewport into the CVV space. Once again, the fact that all potentially visible vertices will have a positive value of the homogeneous coordinate

Consider a matrix of the form:

And a point in the homogeneous space of the observer: We

multiply the homogeneous coordinate by the matrix under consideration:

We translate the resulting homogeneous coordinates into Cartesian coordinates:

Suppose there are two points in the viewport with the same

The OpenGL specification says that clipping and rasterization operations are performed in Cartesian coordinates, and the process of converting homogeneous coordinates to Cartesian coordinates is automatic.

Matrix (1) is a template for the matrix perspective projection. As mentioned earlier, the projection matrix task consists of two points: setting the user coordinate system (left-side or right-side), transferring the observer’s visibility to CVV. We derive a perspective matrix for a left-handed user coordinate system.

The projection matrix can be described using four parameters (Fig. 3):

Fig. 3. Prospective scope of visibility

Consider the projection of a point in the observer's space onto the front cutoff face of the prospective scope of visibility. For clarity, in fig. 4 is a side view. It should also be noted that the user coordinate system coincides with the CVV coordinate system, that is, the left-side coordinate system is used everywhere.

Fig. 4. Projecting an arbitrary point

Based on the properties of such triangles, the following equalities

hold : Express yꞌ and xꞌ:

In principle, expressions (2) are enough to obtain the coordinates of the projection points. However, for the correct shielding of three-dimensional objects, it is necessary to know the depth of each fragment. In other words, it is necessary to store the value of the component

Pseudo-Depth Properties:

Let's derive the formula by which the pseudo-depth will be calculated. As a basis, we take the following expression:

The coefficients

add both parts of the system and multiply the result by the product

We will open the brackets and rearrange terms so that the left was only part of a

substitute (6) (5). Convert the expression to a simple fraction:

Multiply both sides by

Substitute (7) into (6) and express

Accordingly, the components

Now, we substitute the obtained coefficients into the blank matrix (1) and trace what happens to the

Let the distance to the front clipping plane

Multiply all points by matrix (8), and then translate the resulting homogeneous coordinates into Cartesian coordinates . To do this, we need to calculate the values of the new homogeneous components and .

Point 1:

Point 2:

Point 3:

Point 4:

Point 5:

Note that the homogeneous coordinate is absolutely correctly positioned in CVV, and most importantly, the OpenGL depth test is now possible because the pseudo-depth fully satisfies the test requirements.

We figured out the

We have the

Fig. 5. Visibility scope

From fig. Figure 5 shows that:

Now you can get the final view of the perspective projection matrix for the user left-side coordinate system working with CVG OpenGL:

This is the conclusion of the matrices.

A few words about DirectX - the main competitor of OpenGL. DirectX differs from OpenGL only in CVV dimensions and its positioning. In DirectX, CVV is a rectangular box with lengths along the

To display perspective matrices for a user right-handed coordinate system, it is necessary to redraw Fig. 2, Fig. 3 and Fig. 4, taking into account the new direction of the

On this topic, promising matrices can be considered closed.

1. Graphics pipeline

2. Homogeneous coordinates

3. Lanterman A. Multicore and GPU programming for videogames

4. Lindeman RW CS 543 - Computer Graphics: Projection

5. Segal M., Akeley K. The OpenGL Graphics System: A Specification (Version 3.3 ( Core Profile) - March 11, 2010)

6. Song HA OpenGL Projection Matrix

7. The OpenGL Shading Language Version 3.30

8. Tutorial 12 - Perspective Projection

9. Ignatenko A. Homogeneous coordinates

10. Perspective transformations

This is not a solution to the problem! Let's figure it out together!

We will be realistic with a practical bias and take OpenGL version 3.3 as a test subject. Starting with this version, each developer is required to independently implement the matrix operations module. Great, this is what we need. We will decompose our difficult task and highlight the main points. Some facts from the OpenGL specification:

- Matrices are stored in columns (column-major);
- Homogeneous coordinates
- Canonical clipping volume (CVV) in a left-handed coordinate system.

There are two ways to store matrices: column-major and row-major. In lectures on linear algebra, the row-major scheme is used. By and large, the representation of matrices in memory does not matter, because a matrix can always be converted into one type of representation in another by simple transposition. And since there is no difference, then for all subsequent calculations we will use the classic row-major matrices. When programming OpenGL, there is a little trick that allows you to refuse from transposing matrices while maintaining the classic row-major calculations. The matrix needs to be transferred to the shader program as it is, and in the shader, multiplying not the vector by the matrix, but the matrix by the vector.

Homogeneous coordinates are not a very tricky system with a number of simple rules for translating the usual Cartesian coordinates into homogeneous coordinates and vice versa. The uniform coordinate is a row matrix of dimension [1x4]. In order to translate the Cartesian coordinate into a homogeneous coordinate,

*x*,*y*and*z*must be multiplied by any real number*w*(except 0). Next, you need to write the result in the first three components, and the last component will be equal to the factor*w*. In other words:- Cartesian coordinates

*w*- real number not equal to 0- homogeneous coordinates

A little trick: If

*w*equal to one, then all that is needed for translation is to transfer the components*x*,*y*and*z*and assign one to the last component. That is, get the matrix row:A few words about zero as

*w*. From the point of view of homogeneous coordinates, this is quite acceptable. Homogeneous coordinates allow you to distinguish between points and vectors. In the Cartesian coordinate system, such a separation is impossible.Is the point where (

*x, y, z*) is the Cartesian coordinatesis the vector, where (

*x, y, z*) is the radius vectorThe inverse translation of the vertex from homogeneous coordinates to Cartesian coordinates is carried out as follows. All components of the row matrix must be divided into the last component. In other words:

- homogeneous coordinates

- Cartesian coordinates

The main thing that you need to know is that all OpenGL clipping and rasterization algorithms work in Cartesian coordinates, but before that, all transformations are performed in homogeneous coordinates. The transition from homogeneous coordinates to Cartesian coordinates is done in hardware.

Canonical view volume (CVV) is one of the few documented parts of OpenGL. As can be seen from fig. 1 CVV is a cube aligned along the axes with a center at the origin and an edge length equal to two. Everything that falls into the CVV area is subject to rasterization, everything that is outside of CVV is ignored. Anything that partially goes beyond CVV is subject to clipping algorithms. The most important thing to know is that the CVV coordinate system is left-handed!

Fig. 1. Canonical OpenGL clipping volume (CVV)

Left-handed coordinate system? How is it that the specification for OpenGL 1.0 clearly states that the coordinate system used is right-handed? Let's get it right.

Fig. 2. Coordinate systems

As can be seen from Fig. 2 coordinate systems differ only in the direction of the

*Z*axis. OpenGL 1.0 really uses a right-handed user coordinate system. But the CVV coordinate system and the user coordinate system are two completely different things. Moreover, starting with version 3.3, there is no longer such a thing as a standard OpenGL coordinate system. As mentioned earlier, the programmer himself implements the matrix operations module. The formation of rotation matrices, the formation of projection matrices, the search for the inverse matrix, matrix multiplication are the minimum set of operations included in the matrix operations module. There are two logical questions. If the scope of visibility is a cube with an edge length equal to two, then why is a scene the size of several thousand arbitrary units visible on the screen? At what point does the user coordinate system translate into the CVV coordinate system. Projection matrices are just that entityThe main idea of the above is that the developer himself is free to choose the type of user coordinate system and must correctly describe projection matrices. This is where the facts about OpenGL are over and it's time to put everything together.

One of the most widespread and difficult to comprehend matrices is a matrix of perspective transformation. So how does it relate to CVV and the user coordinate system? Why do objects become smaller with increasing distance to the observer? In order to understand why objects decrease with increasing distance, let's look at the matrix transformations of a three-dimensional model step by step. It is no secret that any three-dimensional model consists of a finite list of vertices that undergo matrix transformations completely independently of each other. In order to determine the coordinate of a three-dimensional vertex on a two-dimensional monitor screen, you must:

- Translate the Cartesian coordinate into a homogeneous coordinate;
- Multiply the homogeneous coordinate by the model matrix;
- Multiply the result by the species matrix;
- Multiply the result by the projection matrix;
- Transfer the result from homogeneous coordinates to Cartesian coordinates.

The translation of the Cartesian coordinate into a homogeneous coordinate was discussed earlier. The geometric meaning of the model matrix is to translate the model from the local coordinate system to the global coordinate system. Or, as they say, to take the peaks out of model space into world space. Let's just say that a three-dimensional object loaded from a file is located in the model space, where the coordinates are measured relative to the object itself. Then, using the model matrix, the model is positioned, scaled and rotated. As a result, all the vertices of the three-dimensional model receive the actual homogeneous coordinates in the three-dimensional scene. The model space relative to world space is local. Coordinates are transferred from the model space to the world space (from local to global).

Now go to step three. Here the species space begins to work. In this space, the coordinates are measured relative to the position and orientation of the observer as if he were the center of the world. The view space is local with respect to the world space, therefore the coordinates must be entered into it (and not made out, as in the previous case). The direct matrix transformation takes the coordinates out of some space. To put them into it, on the contrary, it is necessary to invert the matrix transformation, therefore, the species transformation is described by the inverse matrix. How to get this inverse matrix? To begin with, we obtain the direct matrix of the observer. What characterizes the observer? The observer is described by the coordinate in which he is and the direction vectors of the view. The observer always looks in the direction of his local axis.

*The Z*. The observer can move around the stage and make turns. In many ways, it resembles the meaning of a model matrix. By and large, the way it is. However, for the observer, the scaling operation is pointless; therefore, it is impossible to put an equal sign between the observer’s model matrix and the model matrix of a three-dimensional object. The model matrix of the observer is the desired direct matrix. Inverting this matrix, we get a species matrix. In practice, this means that all vertices in global homogeneous coordinates will receive new homogeneous coordinates relative to the observer. Accordingly, if the observer saw a certain vertex, then the value of the homogeneous coordinate*z*this vertex in the viewport will definitely be a positive number. If the vertex was behind the observer, then the value of its uniform coordinate*z*in the species space will definitely be a negative number.Step four is the most interesting step. The previous steps were considered in such detail intentionally, so that the reader had a complete picture of all the operands of the fourth step. In the fourth step, the homogeneous coordinates are moved out of the viewport into the CVV space. Once again, the fact that all potentially visible vertices will have a positive value of the homogeneous coordinate

*z is*emphasized .Consider a matrix of the form:

And a point in the homogeneous space of the observer: We

multiply the homogeneous coordinate by the matrix under consideration:

We translate the resulting homogeneous coordinates into Cartesian coordinates:

Suppose there are two points in the viewport with the same

*x*and*y*coordinates, but different*z*coordinates . In other words, one of the points is located after the other. Due to the perspective distortion, the observer must see both points. Indeed, it can be seen from the formula that due to division by the*z*coordinate , compression to the point of origin occurs. The larger the*z*value (the farther the point is from the observer), the stronger the compression. Here is the explanation for the effect of perspective.The OpenGL specification says that clipping and rasterization operations are performed in Cartesian coordinates, and the process of converting homogeneous coordinates to Cartesian coordinates is automatic.

Matrix (1) is a template for the matrix perspective projection. As mentioned earlier, the projection matrix task consists of two points: setting the user coordinate system (left-side or right-side), transferring the observer’s visibility to CVV. We derive a perspective matrix for a left-handed user coordinate system.

The projection matrix can be described using four parameters (Fig. 3):

- Viewing angle in radians (
*fovy*); - The aspect ratio (
*aspect*); - Distance to the near clipping plane (
*n*); - Distance to the far clipping plane (
*f*).

Fig. 3. Prospective scope of visibility

Consider the projection of a point in the observer's space onto the front cutoff face of the prospective scope of visibility. For clarity, in fig. 4 is a side view. It should also be noted that the user coordinate system coincides with the CVV coordinate system, that is, the left-side coordinate system is used everywhere.

Fig. 4. Projecting an arbitrary point

Based on the properties of such triangles, the following equalities

hold : Express yꞌ and xꞌ:

In principle, expressions (2) are enough to obtain the coordinates of the projection points. However, for the correct shielding of three-dimensional objects, it is necessary to know the depth of each fragment. In other words, it is necessary to store the value of the component

*z*. This value is used in OpenGL depth tests. In fig.*Figure*3 shows that the value of*zꞌ is*not suitable as the depth of the fragment, because all projections of points can have the same value of*zꞌ*. The way out of this situation is the use of the so-called pseudo-depth.Pseudo-Depth Properties:

- The pseudo-depth is calculated based on the value of
*z*; - The closer the point is to the observer, the less important the pseudo-depth is;
- For all points lying on the front plane of the visibility volume, the value of the pseudo-depth is -1;
- For all points lying on the far plane of the cutoff of the visibility volume, the value of the pseudo-depth is 1;
- All fragments lying inside the visibility volume have a pseudo-depth value in the range [-1 1].

Let's derive the formula by which the pseudo-depth will be calculated. As a basis, we take the following expression:

The coefficients

*a*and*b*must be calculated. In order to do this, we use the properties of pseudo-depths 3 and 4. We obtain a system of two equations with two unknowns: Weadd both parts of the system and multiply the result by the product

*fn*, while*f*and*n*cannot be zero. We get:We will open the brackets and rearrange terms so that the left was only part of a

*well*, and on the right with only*b*:substitute (6) (5). Convert the expression to a simple fraction:

Multiply both sides by

*-2fn*, while*f*and*n*cannot be zero. We present similar ones, rearrange the terms and express*b*:Substitute (7) into (6) and express

*a*:Accordingly, the components

*a*and*b*are equal:Now, we substitute the obtained coefficients into the blank matrix (1) and trace what happens to the

*z*coordinate for an arbitrary point in the homogeneous space of the observer. The substitution is performed as follows:Let the distance to the front clipping plane

*n*be 2 and the distance to the far clipping plane*f*is 10. Consider five points in a homogeneous observer space:Point | Value | Description |
---|---|---|

1 | 1 | The point is in front of the front clipping plane of the visibility volume. Does not pass rasterization. |

2 | 2 | The point is on the front edge of the clipping visibility volume. It is being rasterized. |

3 | 5 | The point is between the front cutoff face and the far cutoff edge of the visibility volume. It is being rasterized. |

4 | 10 | The point is on the far side of the clipping volume of visibility. It is being rasterized. |

5 | 20 | The point is located beyond the far edge of the clipping volume of visibility. Does not pass rasterization. |

Multiply all points by matrix (8), and then translate the resulting homogeneous coordinates into Cartesian coordinates . To do this, we need to calculate the values of the new homogeneous components and .

Point 1:

Point 2:

Point 3:

Point 4:

Point 5:

Note that the homogeneous coordinate is absolutely correctly positioned in CVV, and most importantly, the OpenGL depth test is now possible because the pseudo-depth fully satisfies the test requirements.

We figured out the

*z*coordinate , go to the*x*and*y*coordinates. As mentioned earlier, the entire prospective scope of visibility should fit in CVV. The length of the CVV rib is two. Accordingly, the height and width of the prospective volume of visibility must be compressed to two arbitrary units.We have the

*fovy*angle and*aspect*value at our disposal . Let's express the height and width using these values.Fig. 5. Visibility scope

From fig. Figure 5 shows that:

Now you can get the final view of the perspective projection matrix for the user left-side coordinate system working with CVG OpenGL:

This is the conclusion of the matrices.

A few words about DirectX - the main competitor of OpenGL. DirectX differs from OpenGL only in CVV dimensions and its positioning. In DirectX, CVV is a rectangular box with lengths along the

*x*and*y*axes equal to two, and along the*z*axis the length is one. The range of*x*and*y*is [-1 1], and the range of*z*is [0 1]. As for the CVV coordinate system, DirectX, like OpenGL, uses a left-handed coordinate system.To display perspective matrices for a user right-handed coordinate system, it is necessary to redraw Fig. 2, Fig. 3 and Fig. 4, taking into account the new direction of the

*Z*axis. Further calculations are completely similar, accurate to the sign. For DirectX matrices, the properties of pseudo-depths 3 and 4 are modified to the range [0 1].On this topic, promising matrices can be considered closed.

#### Useful literature

1. Graphics pipeline

2. Homogeneous coordinates

3. Lanterman A. Multicore and GPU programming for videogames

4. Lindeman RW CS 543 - Computer Graphics: Projection

5. Segal M., Akeley K. The OpenGL Graphics System: A Specification (Version 3.3 ( Core Profile) - March 11, 2010)

6. Song HA OpenGL Projection Matrix

7. The OpenGL Shading Language Version 3.30

8. Tutorial 12 - Perspective Projection

9. Ignatenko A. Homogeneous coordinates

10. Perspective transformations