Hell render v.2.0. Book one. Overview

    Gag


    I came across a year ago a series of very interesting articles by Mr. Simon . Simon is very fond of disassembling how the games are created, namely the graphic solutions of one or another element in the game. Starting from chips on the edges of the plates , ending with how the cutting of pieces from objects is implemented . But especially interesting is his series of articles under the general title Render Hell , in which he details how the 3D objects are rendered at the iron level (and programmatically too).

    Free translation. I did it for myself, so that at some point I could go back and read something that I could not catch the first time or just forget.

    Let `s start?

    Book one. Overview

    (the original book is here )

    Guys, hold on: from a PC point of view, your work in 3D is nothing more than a list of vertices and textures. All this data is converted to a Next-gene picture, and this is mainly done using the system processor ( CPU ) and the graphics processor ( GPU ).

    First, the data is downloaded from your hard disk ( HDD ) into random access memory ( RAM ) for quick access to them. After that, the Objects ( Meshes ) and textures necessary for displaying (rendering) are loaded into the RAM of the video card ( VRAM ). This is due to the fact that access to VRAM on a video card is much faster.



    If the texture is no longer needed (after unloading in VRAM), it can be removed from the RAM (But you must be sure that you will no longer need it in the near future, because unloading from the HDD takes a very long time).
    Meshes should remain in RAM, because most likely the processor will want to have access to them, for example, to determine the collision.



    Supplemented by the second edition
    Now all the information on the video card (in the RAM of the video card - VRAM). But the transfer speed from VRAM to GPU is still low. The GPU can process much more information than it receives.

    Consequently, the engineers put a small amount of memory directly into the video processor (GPU) itself and named this memory cache (Cache). This is a small amount of memory because it is incredibly expensive to put a large amount of memory directly into the processor. The GPU only copies to the cache what it now needs in small portions.



    Our copied information now lies in Level 2 Cache (L2 Cache). Basically, this is a small amount of memory (for example, on the NVDIA GM204 the amount is 2048 KB), which is installed in the GPU and is readable much faster than VRAM.

    But even this is not enough to work efficiently! Therefore, there is still a small level 1 cache (L1 Cache). On NVIDIA GM204 It is 384KB, which is available not only for the GPU, but also for the nearest coprocessors.



    In addition, there is another memory that is designed for input and output data for GPU cores: for file registration and recording. From here, the GPU takes, for example, two types of values, considers them, and records the results in a register:
    After that, these results are put back into L1 / L2 / VRAM to make room for new calculations. You, as a programmer, usually do not have to worry about their calculations.

    Why does it all work without problems? As stated above, it's all about access time. And if we compare the access time, for example, HDD and L1 Cache, then there is a black hole between them - such is the difference. You can also read about the exact delay figures at this link: gist.github.com/hellerbarde/2843375

    Before the render starts to light, the CPU sets some global values ​​that describe how the meshes should be rendered. These values ​​are called Render State.

    Render State

    These are kind of parameters for how the meshes should be rendered. The parameters contain information about what texture should be, which vertex and pixel shaders should be used to draw subsequent meshes, light, transparency, etc.

    AND IMPORTANT TO UNDERSTAND: Each mesh that the CPU sends to the GPU for rendering will be rendered under the parameters (Render State) that were specified before it. That is, you can render a sword, a stone, a chair and a car - all of them will be rendered under the same texture, if you do not specify RenderState rendering options in front of each of these objects.



    When all preparations are complete, the CPU can finally call the GPU and say what needs to be drawn. This team is called Draw Call.

    DrawCall

    This is a CPU command for the GPU to render one mesh. The command indicates a specific mesh for the render and does not contain any information about materials and other things - all this is specified in the Render State.



    Mesh is already loaded into VRAM.

    After the command is sent, the GPU takes the RenderState data (material, textures, shaders), as well as all the information about the vertices of the object, and converts this data into (we want to believe) beautiful pixels on your screen. This conversion process is called Pipeline (Google likes to translate this word as “pipeline”).

    Pipeline

    As mentioned earlier, any objects are nothing more than a set of vertices and texture information. To convert this into a brain-bearing picture, the video card creates triangles from vertices, calculates how they should be lit, draws textures on them, and so on.

    These actions are called Pipeline States. Most often, most of this work is done using the GPU of the video card. But sometimes, for example, the creation of triangles is carried out using other co-processors of the video card.

    Supplemented by the second edition
    This example is extremely simplified and should only be considered as an approximate overview or “logical” pipeline: each triangle / pixel goes through logical steps, but what actually happens is slightly different from the one described.

    Here is an example of the steps that iron takes for one triangle:



    The image is rendered by solving tens, hundreds of thousands of similar tasks, rendering millions of pixels on the screen. And all this should (I hope) fit in at least 30 frames per second.

    Modern processors have 6-8 cores each, when video processors have several thousand cores (albeit not as powerful as CPUs, but powerful enough to process a bunch of vertices and other data).

    Book number 2 is devoted to the details of the organization of high and low level in the graphics processor.

    When information (for example, a bunch of vertices) gets on the pipeline, several cores perform the work of transforming from vertices into a full-fledged image, so a bunch of these elements are formed into an image at the same time (in parallel).



    Now we know that the GPU can process information in parallel. But what about the communication between the CPU and the GPU? Does the CPU wait until the GPU finishes working before sending it new tasks?

    image

    NO!

    Fortunately not! The reason for this is a weak link that forms like a neck in a bottle when the CPU is not able to send the following tasks quickly enough. The solution is a list of instructions in which the CPU adds instructions to the GPU while it processes the previous instruction. This sheet is called Command Buffer.

    Command buffer

    The command buffer makes it possible for the CPU and GPU to be independent of each other. When the CPU wants to render something, it pushes the objects into the command queue, and when the GPU is freed, it takes them from the sheet (buffer) and starts to execute the command. The principle of taking a team is the order of execution. The first command came, the first one it will be executed.



    By the way, there are different teams. For example, one command can be DrawCall, the second - change RenderState to new parameters.

    Well, in general, this is the first book. Now you have an idea of ​​how information is rendered, Draw Calls, Render State are called, and the CPU and GPU interact with each other.

    The end.

    Only registered users can participate in the survey. Please come in.

    Continue translating books?

    • 93.3% Yes 195
    • 6.6% No 14

    Also popular now: