PatientZero December 18, 2017 at 13:05

How to render a frame engine Metal Gear Solid V: Phantom Pain

Transfer

The Metal Gear series of games has gained worldwide recognition after Metal Gear Solid became a bestseller on the first PlayStation almost two decades ago .

The game introduced many players to the genre of “tactical espionage action”, the name of which was invented by the creator of the franchise Hideo Kojima .

But for the first time I personally played for Snake not in this part, but in Ghost Babel , a spin-off for the GBC console , a less well-known, and yet excellent game with impressive depth.

The last part of the franchise, Metal Gear Solid V: The Phantom Pain , was released in 2015. Thanks to the engineFox Engine , created by Kojima Productions , takes the whole series to a new level of visual quality.

The analysis below is based on the PC version of the game with maximum graphics settings. Some of the information presented here has already been made public after the Photorealism Through the Eyes of a FOX report at GDC 2013.

Analyze frame

Here's a shot taken from the very beginning of the game: this is the prologue in which Snake tries to get out of the hospital.

Snake lies on the floor, trying to merge with the corpses surrounding him (he is at the bottom of the screen, with his bare shoulder).

This is not the most beautiful scene, but it illustrates well the various effects that the engine can create.

Two soldiers are standing right in front of Snake . They look at the burning silhouette at the end of the corridor.

I will call this mysterious person "a man on fire" so as not to spoil the plot of the game.

So, let's see how the frame is rendered!

[Note trans.: as usual, in the original Adrian article there are many animations and interactive elements, so I recommend that you familiarize yourself with them for greater clarity.]

Depth passage

This passage renders only the relief geometry under the hospital, as it looks from the player’s point of view, and displays the depth information in the depth buffer.

Below you can see the relief mesh generated from the elevation map : this is a 16-bit floating-point texture containing the elevation values (from the top view). The engine divides the height map into different tiles, and for each tile a draw call is made with a flat grid of 16x16 vertices. The vertex shader reads the height map and on the fly changes the position of the vertices to match the height value. The relief is rasterized in about 150 draw calls.

Elevation

Depth Map: 5%

Depth Map: 10%

Depth Map: 40%

Depth Map: 100%

G-buffer generation

MGS V, like many games of this generation, uses deferred rendering . If you read my analysis of GTA V ( translation on Habré), then you can notice similar elements. So, instead of directly calculating the final illumination value of each pixel in the process of rendering the scene, the engine first saves the properties of each pixel (such as albedo colors, normals, etc.) in several target renders, called the G-buffer , and later combines all this information.

All of the following buffers are generated at the same time:

G-buffer generation : 25%

Albedo

Normal

Mirror (specular)

Depths

G-buffer generation: 50%

Albedo

Normal

Mirroring

Depths

G-buffer generation: 75%

Albedo

Normal

Mirroring

Depths

G-buffer generation: 100%

Albedo

Normal

Mirroring

Depths

Here we have a relatively lightweight G-buffer with three target renders in the B8G8R8A8 format :

Map Albedo : RGB channels contain the diffuse color of the albedo meshes, that is, their own color, to which no lighting is applied. The alpha channel contains the opacity / “light transmission” value of the material (usually 1 for completely opaque objects and 0 for grass or foliage).
Normal Map : The normal vector (x, y, z) of a pixel is stored in RGB channels. The alpha channel contains a roughness coefficient depending on the viewing angle for some materials.
Specular map (specular map) :
- Red: roughness
- Green: specular reflections
- Blue: material identifier
- Alpha: Translucency for subsurface scattering (this seems to only apply to human skin and hair materials)
Depth Map : A 32-bit float value representing the depth of a pixel. Depth is upside down (meshes next to the camera have a value of 1) to maintain high precision floating-point numbers for distant objects and avoid Z-conflicts . This is important for open world games in which the drawing distance can be very large.

The G-buffer is rendered in the following order: first, all the opaque meshes of the main scene (characters, hospital building, etc.), then the entire terrain (again) and, finally, decals.

It is here that the preliminary passage of the depths comes in handy: it makes the second stage (rendering of the relief) very fast. Each pixel of the relief, overlapped by another mesh, will not have the expected depth predicted by the preliminary passage of depths, in which case it is instantly discarded without the need to obtain texture-related reliefs and write this data back to the G-buffer.

Speed map

To apply the motion blur effect at the post-processing stage, you need to know the speed of each pixel on the screen.

If the scene is completely static, then finding the speed of each point is quite simple: it is derived from the depth and difference of the projection matrix between the previous and current frames. But things get more complicated when there are dynamic objects in the frame, such as running characters, because they can move independently of the camera.

Here the speed map comes into play: it stores the motion (velocity) vectors of each pixel of the current frame.

Speed map (dynamic meshes)

First, the engine generates a speed map only for dynamic meshes, as shown above.

You may notice that in our scene only a person on fire is considered a dynamic mesh . Even if Snake and the soldiers are technically not static meshes, the engine processes them in this way, and in our case this is quite acceptable, because they are barely mobile. Thanks to this, the engine can avoid some of the calculations: to calculate the speeds of animated characters, you need to perform vertex skinning twice (for the previous and current poses), which can be quite expensive.

The red channel is used as a mask (it has a value of 1 where the character is drawn), and the velocity vector itself is written into the blue and alpha channels. A man does not move in fire , therefore his dynamic speed is (0, 0).

Then the engine calculates the speed of static geometry from the current depth buffer and the last two projection matrices and combines it on top of the velocity map of the dynamic meshes, using the red channel as the mixing coefficient. Here's what the final speed map (static and dynamic) looks like:

Speed map (static + dynamic)

Do not pay much attention to noise, there is almost no movement in this scene: the camera slowly zooms in on a person in the fire , all the pixels have almost zero speed, and what you see are errors in the rounding accuracy when recording components in 8-bit channels. I enhanced the colors to make the image more distinct. I also swapped green and alpha channels, a real buffer stores speed in blue and alpha channel.

Screen Space Ambient Occlusion

The SSAO effect should add a little dimming in areas where there is less ambient light, such as in narrow holes or in creases. Interestingly, the Fox Engine performs two separate SSAO passes, using different algorithms and combining the results in the last pass.

SSAO based on linear integrals

Line Integral SSAO (SSAO) is an ambient occlusion calculation technique used by Avalanche Software in Disney's Toy Story 3 .

Despite the awesome name, this algorithm is quite clear and well explained in this report at Siggraph 2010 : for each pixel in the scene, we take a sphere centered on that pixel; this spherical volume is then divided into several linear sub-volumes. The occlusion coefficient of each subvolume is calculated by obtaining a single value from the depth map, and the total sphere occlusion coefficient is simply a weighted sum of the coefficients of each of the subvolumes.

Here Fox Engine uses two pairs of symmetric samples, that is, five depth depth pixels per pixel are included in the original sample.

RGB: linear depth

Alpha: LISSAO The

calculation is performed at half resolution, and the data is saved in the RGBA8 texture, where the alpha channel contains the actual result of ambient occlusion, and the linear depth value is stored in RGB (using Float-to-RGB encoding similar to this technique ).

The result in the alpha channel due to the small number of samples is actually noisy; this SSAO map can be smoothed later using a depth filter that takes into account depth: the linear depth is stored in RGB channels, that is, all the necessary data can be read at one time.

Scalable Ambient Obscurance

SAO

In the second pass of the SSAO calculation, a variation of the Scalable Ambient Obscurance technique is used .

It differs from the “official” SAO in that it does not use any mip-levels of depths and does not reconstruct normals; it reads directly the height map itself and works at half resolution, performing 11 readings per pixel (however, using a different strategy for selecting sample locations).

It uses exactly the same mid-range contrast filtering and a two-sided rectangular filter as in the original SAO implementation.

Note that the SAO parameters are changed so that high-frequency variations (for example, on the soldier's legs) stand out strongly compared to the version for LISSAO.

Just like in LISSAO, the SAO map is washed out by two side passes, taking into account the depths.

After that, the compute shader combines the LISSAO and SAO images to produce the final SSAO result:

Ready SSAO

Spherical Illumination Maps

To work with global lighting, Fox Engine uses local spherical lightmaps: different areas are set at the game level, and a spherical map is created for each of these areas, approximating the illumination coming from different directions.

Сферические карты освещённости

На этом этапе одна за другой генерируются все сферические карты, используемые в нашей сцене, после чего каждая из них сохраняется в тайл 16x16 атласа HDR-текстур. Этот атлас текстур показан выше: диск посередине каждого тайла является приблизительным представлением того, что отражала бы металлическая сфера, расположенная посередине соответствующей ей зоны освещённости.

Как же генерируются эти сферические карты? Они вычисляются из сферических гармоник. Лежащая в основе вычислений математика довольно пугающа, но, в сущности, сферические гармоники являются способом кодирования значения 360-градусного сигнала во множество коэффициентов (обычно их девять), создающих достаточно хорошую точность (SH второго порядка).

And out of nine of these numbers, you can approximately recreate the value of the signal in any direction.

If you are familiar with the concept of how the Fourier transform can split a signal into sinusoidal components, then the situation is pretty similar, except that we decompose the signal into functions on the surface of the sphere.

Where do these coefficients come from? They are pre-computed - I assume that the environment of each area, marked out by level designers, is recorded in a cubic map. Then it is converted into a cubic irradiance map and encoded into the coefficients of spherical harmonics , which the engine reads during the game.

One may wonder: why not use the cubic maps themselves to determine the illumination? It may work, it is possible to use cubic lightmaps, but they have their drawbacks. The most important one is an extra waste of memory for storing six faces of a cubic map, while spherical harmonics reduce costs to only nine RGB values per map. This saves a lot of memory space and GPU bandwidth, which is very important when you have to deal with dozens of cards in the scene.

All these spherical maps are generated in each frame from the previously baked coefficients of spherical harmonics and the current position and direction of the player’s camera.

Diffuse Lighting (Global Illumination)

The time has come to apply all these generated lightmaps! Each area affected by the irradiance map is transmitted to the video processor for rasterization. Usually, each draw call (one per light map) sends a parallelepiped mesh, representing the amount of map influence in the world. The idea is that it should touch all the pixels that a particular lightmap can affect.

The diffuse map is calculated in a half resolution HDR texture by reading normals, depths and illumination from the maps.

Normal

Depths

Illumination

Diffuse Lighting (GI): 15%

Diffuse Lighting (GI): 30%

Diffuse Lighting (GI): 80%

Diffuse illumination (GI): 100%

The process is repeated for each irradiance map with the additive mixing of new fragments over old ones.

After all the illumination introduced by the global illumination is accumulated in the diffuse buffer, it is scaled from half to full resolution. It is worth noting that zooming is not a naive bilinear filtering , it is a bidirectional filter that reads the half-resolution buffer, and also, more importantly, the original map of the full resolution depths (to bind the weights to adjacent color pixels), so the final result still contains broken ribs around mesh boundaries. Visually, it looks as if we have been working in full resolution all this time!

2x zoom (no filtering)

2x bilinear zoom

2x bidirectional zoom

Light sources that do not cast a shadow

After calculating all this static illumination from global illumination, it is time to add dynamic illumination brought in by point and directional light sources. We will work in full resolution and render the volume of influence for each light source in the scene one after another. So far, we will only render sources that do not cast shadows:

Diffuse lighting: 5%

Diffuse lighting: 30%

Diffuse lighting: 60%

Diffuse lighting: 100%

In fact, at the same time as updating the diffuse lighting buffer, another target full-resolution HDR render is rendered: the reflected lighting buffer. Each call to render a light source shown above actually simultaneously writes to the buffer diffuse and reflected lighting.

Reflected Lighting: 5%

Reflected Lighting: 30%

Reflected Lighting: 60%

Reflected Lighting: 100%

Shadow cards

One can guess what we will do after sources without shadows: light sources casting shadows!

Such sources are much more computationally expensive, so their number in games is quite limited. The reason for their high cost is that each requires the generation of a shadow map .

In essence, this means that you need to re-render the scene from the point of view of each light source. On the corridor ceiling, we have two directional light sources shining down, and for each a 4k x 4k shadow map is generated.

Two shadow cards

Light sources casting a shadow

After the generation of shadow maps is completed, lighting from two directional sources on the ceiling is calculated. Diffuse and reflected light buffers are updated at the same time. Finally, sunshine is applied (from a previously generated spherical map of spherical harmonics).

Diffuse lighting 0%

Reflected Lighting 0%

Diffuse lighting 30%

Reflected Lighting 30%

Diffuse lighting 70%

Reflected Lighting 70%

Diffuse lighting 100%

Reflected Lighting 100%

Combination of lighting and tone mapping cards

At this stage, all previously generated buffers are combined: the albedo color is multiplied by diffuse lighting, and then the reflected lighting is added to the result. Then the color is multiplied by the SSAO value and the result is interpolated with the fog color (which is extracted from the fog search texture and the current pixel depth). Finally, tonal compression is used to convert from HDR space to LDR . The alpha channel stores additional information: the original HDR brightness of each pixel.

Depth

Albedo

Diffuse lighting

Indirect lighting

SSAO

Lighting Combination

By the way, what is the tone mapping card used in MGS V? In the range from 0 to a certain threshold (0.6), it is completely linear and returns the initial value of the channel, and above the threshold, the values slowly tend to a horizontal asymptote.

Here is the function applied to each RGB channel, where

$inline$ , a

$inline$ :

$$$ display $$ ToneMap (x) = \ begin {cases} x & \ text {if $ x \ le A $} \\ [2ex] min \ left (\ text {1,} A + B - \ large { \ frac {\ text {$ B $ ²}} {x - A + B}} \ right) & \ text {if $ x \ gt A $} \ end {cases} $$ display $$$

So, for the transition from linear space to sRGB space, tonal compression and also gamma correction are used . In other games, this often means that we have reached the last stages of frame rendering.

But is this really so? No way, we are just getting started! Interestingly, the Fox Engine performs tonal compression quite early and continues to work in LDR-space, including passes for transparent objects, reflections, depth of field, etc.

Radiant and transparent objects

In this passage, the engine draws all objects with the property of emissivity, for example, a green Exit sign or hot spots of flame on a person in a fire . The engine also draws transparent objects, such as glass.

Radiant and transparent objects: before the passage

Radiant and transparent objects: after the passage

In the screenshot above, it is not very noticeable, but in the case of glass, reflections from the environment are also used.

All environmental data is obtained from a 256x256 cubic HDR map , which is shown below (also called a reflection probe).

Reflection Probe The

cubic map is not dynamic, it is pre-baked and used as is during the game, so inside we will not see dynamic meshes. Its task is to create “sufficiently good” reflections from the data of a static environment. In different places of the level there are several probes. The total number of cubic cards of the entire game is huge - not only probes for many locations are required, but also different versions of one probe, depending on the time of the day / night. Plus, it is necessary to take into account different weather conditions, so for each time of day and each point the engine generates four cubic maps (for sunny, cloudy, rainy and thundery weather). The game has to deal with an impressive amount of combinations.

A short clip was shown at GDC 2013 about how the engine generates such lighting probes.

Screen Space Reflections

At this stage, the image of reflections in the scene is created solely on the basis of information from pixels rendered in the previous pass. Here, ray tracing is performed in the screen space at half resolution: several rays are “thrown” at each pixel of the screen, the direction of these rays is calculated from the depth buffer (which gives us the position of the pixel) and from the normal. Each ray is checked for collision by sampling the depth buffer at four equidistant points along the ray. If a collision is detected, the pixel color at the point of its occurrence is used as the reflection color, modulated by the roughness of the original pixel.

SSR color

Alpha SSR

Obviously, we lack “global” information: no object off the screen can contribute to reflections. To make artifacts less noticeable, the reflection map uses an alpha mask to smoothly darken opacity as it approaches the edges of the screen.

The value of SSRs is that they can provide real-time dynamic reflections at reasonably low cost.

The noise of the SSR card is later reduced by Gaussian blur and mixed on top of the scene.

Thermal distortion, decals and particles

The temperature of the burning area in which a person stands on fire is so high that it creates distortions of light. This effect is achieved with several draw calls, each of which creates a copy of the entire target render and applies distortion, locally stretching the pixels in some direction.

This is especially noticeable on the first arch connecting the left wall to the ceiling.

After this, decals are applied, for example, liquid on the floor, and, finally, particles are drawn for rendering fire and smoke.

The foundation

Distortion

Decals

Particles 30%

Particles 60%

Particles 100%

Bloom

Passage of Lightness

At this stage, a bloom texture is created from the original scene. It works with a very low resolution: first, the scene scale is reduced by four times, then the lightness pass filter is applied to select only the lightest pixels, as shown in the image above.

How does the filter pass lightness separates the "dark" and "light" pixels? We are no longer in the HDR space, tonal compression transferred us to the LDR space, in which it is more difficult to determine which color was originally light.

Recall that the alpha channel of the scene buffer contains the initial HDR brightness of each pixel before tone compression - the filter uses this information to determine the "lightness" of the pixels.

Lens flare

In Fox Engine, the bloom effect is not only about bright pixels that spread around their color: it also takes into account lens flares and chromatic aberration , which are procedurally generated from the lightness pass buffer. There are no strong light sources in our dark scene, thanks to which lens flares could stand out, they are barely visible, but you can understand how they look from the image above, where I artificially emphasized the colors.

The lens flare is superimposed on top of the lightness pass filter, and then a blurry version of a larger radius buffer is generated. This is accomplished by four consecutive iterations of the Masaki Kawase blur algorithm .

Техника Кавасе позволяет достичь размытия большого радиуса, похожего на размытие по Гауссу, но с более высокой производительностью.

Bloom

Глубина резкости

Игры серии Metal Gear известны своими длинными кинематографическими роликами, поэтому естественно, что движок стремится как можно точнее воссоздать поведение реальных камер с помощью эффекта глубины резкости (Depth of Field, DoF): резкой выглядит только определённая область, а другие области вне фокуса выглядят размытыми.

Масштаб сцены уменьшается до половинного разрешения и преобразуется обратно из пространства sRGB в линейное пространство.

Then the engine generates two images corresponding to the “near field” (the area between the camera and the focal length) and “far field” (the area outside the focal length). Separation is performed only on the basis of depths (distance from the camera) - all pixels closer than soldiers will be copied to the near field buffer, and all the rest to the far field buffer.

Each field is processed separately, blurring is applied to it. The scatter circle of each pixel is calculated only on the basis of the depth and configuration of the camera (aperture, focal length ...). The value of the scatter circle indicates how “out of focus” the pixel is - the larger the scatter circle, the more the pixel spreads around.

After blurring, two fields are created:

DoF - Near Field

DoF - Far Field

I’ll say a few words about this “blur” operation: in fact, this is a relatively expensive operation in which one sprite is created and rendered for each pixel in the scene.

The sprite itself contains the disk shown above, and you can replace it with any shape, for example, a hexagon, if you prefer hexagonal bokeh.

The sprite is centered on the pixel that created it, it has the same color as the pixel, and its size is scaled along with the pixel scatter circle. The idea is for the pixel to "spread its color around" with the help of the disk - the larger the out-of-focus pixel, the more the sprite it creates.

All sprites are drawn on top of each other with additive blending.

This technique is called “sprite scattering”; it is used in many games, for example, in the Lost Planet , The Witcher series and in the post - processing of Bokeh-DoF UE4 .

After generating blurry far and near fields, we simply mix them on top of the original scene:

DoF: before

DoF: after

This technique works and creates beautiful results, however, with high resolution in almost completely defocused scenes, it can become very slow: huge sprites overlapping each other can lead to a wild amount of redrawing.

How does the Fox Engine manage to weaken the effect of this effect?

Well, in fact, I oversimplified my explanations when I wrote that one field is represented by a half-resolution accumulation buffer: there is not only one buffer, there are several more smaller ones: 1/4, 1/8, 1/16 from permissions. Depending on the pixel scatter circle, the sprite that it creates will end up in one of these buffers: usually large sprites are written to low resolution buffers to reduce the total number of pixels affected by them.

To implement this, the buffer processes the level of each buffer one by one, creating 100% sprites and allowing the vertex shader to "kill" sprites that do not belong to the current level. The vertex shader, by the value of the scattering circle of the source pixel, knows what size the sprite will have, and if this size does not fit the current level, then the shader simply throws it outside the pyramid of visibility. assigning it a negative depth value.

Sprite geometry never gets to the rasterization stage performed by the pixel shader.

Then all of these buffers are combined to create a single half-resolution field.

Dirt on the lens and new lens flare

Dirt and glare 30%

Dirt and glare 60%

Dirt and glare 100%

Snake is in a difficult situation and in a hostile environment, explosions are heard around, some projections cross the camera lens and pollute it.

To reflect this effect, a little dirt on the lens is artificially added over the image. Dirt is generated from sprites.

Then we need more lens flare! Yes, we’ve already added them, but there’s not much glare, right? This time we add artifacts of anamorphic lenses : long vertical streaks of light in the middle of the screen, arising from a bright flame. They are also generated only from sprites.

All these steps are performed in a dozen draw calls that render to the half-resolution buffer, which is then combined over the scene with additive alpha blending.

Dirt and glare: up to

Dirt and glare: after

Motion blur

Remember that we generated a velocity buffer at the very beginning of the frame? Finally, it’s time to apply motion blur to the scene. The technique used by the Fox Engine was inspired by an article from MHBO 2012 .

It generates a low-resolution map with square tiles containing the maximum pixel speed. The scene image is locally stretched along the direction of the velocity vectors to give the impression of movement.

In our scene, the motion blur effect is difficult to visualize because there is almost no movement in it.

Color correction

Цветокоррекция выполняется для настройки конечного цвета сцены. Художники должны сбалансировать цвета, применить фильтры и т.д. Всё это выполняется операцией, берущей исходное RGB-значение пикселя и сопоставляющей его с новым RGB-значением; работа происходит в LDR-пространстве.

В некоторых случаях можно прийти к какой-то математической функции, выполняющей такое преобразование (именно это делает оператор тональной компрессии, производящий преобразование из HDR в LDR), но обычно художникам требуется более полный контроль над преобразованием цветов и никакая математическая функция с этим не справится.

В этом случае нам приходится смириться и применить способ грубого перебора: использовать таблицу поиска (look-up table, LUT) сопоставляя каждое возможное RGB-значение с другим RGB-значением.

Звучит безумно? Давайте прикинем: существует 256 x 256 x 256 возможных RGB-значений, то есть нам придётся хранить более 16 миллионов сопоставлений!

Сложно будет эффективно скармливать их пиксельному шейдеру… если мы не прибегнем к какому-нибудь трюку.

И трюк заключается в том, чтобы рассматривать пространство RGB как трёхмерный куб, заданный в трёх осях: красной, зелёной и синей.

Куб RGB

Мы берём этот куб слева, разрезаем его на 16 «срезов» и храним каждый срез в текстуре 16 x 16.

В результате мы получаем 16 срезов, которые показаны справа.

LUT

So, we “discretized” our cube to 16 x 16 x 16 voxels, that is, a total of 4096 comparisons, which makes up only a small fraction of 16 million elements. How do we recreate the intermediate elements? Using linear interpolation : for the desired RGB color, we simply look at its 8 closest neighbors in the cube, the exact comparison of which we know.

In practice, this means finding two layers closest to the value of blue, and then searching in each layer the four nearest pixels by the value of red and green.

Then linear interpolation is simply done: the weighted average of eight colors is calculated taking into account the distances that affect the weights. Such interpolation from a smaller number of values works quite well, because color correction is usually performed with low-frequency variations.

You can save 16 layers to a 3D texture inside the video processor, and the shader code becomes very simple: just request a search for certain 3D coordinates, the equipment performs trilinear filtering of the eight nearest points and returns the correct value. Quick and easy.

So, we have a way to encode color matching using this lookup table based on 16 slices, but how does an artist actually create such a lookup table?

Everything is quite simple: just spread out all the slices next to each other to get a similar image:

LUT texture 256x16

Then we take a screenshot of the game in the scene that needs color correction. We insert the LUT image into some corner of the screenshot, give the image to the artists and let them create all the magic. They use a graphical editor, make the necessary changes, and then send us the corrected image. The LUT built into the corner of the picture will reflect the new RGB color matching.

Now you can simply extract the modified LUT 256x16 and transfer it directly to the game engine.

Color correction + Bloom: up (LUT: )

Color correction + Bloom: after (LUT: )

In this step, before the application of color correction buffer is added over bloom stage.

Antialiasing

The edges of the meshes repeat too much the pixel grid of the frame buffer, so we see sharply broken borders that look unnatural.

This is a limitation of deferred rendering: each pixel stores only one information; with direct rendering, this problem is less pronounced because you can use MSAA , which creates a lot of color samples per pixel, which provides smoother transitions on the edges.

Fox Engine corrects aliasing on edges by performing the FXAA post-processing step : the pixel shader seeks to recognize and correct distorted edges based on the color values of neighboring pixels.

Notice how the "ladder", clearly visible at the border of the railing, is smoothed out in the final result.

FXAA: before

FXAA: after

Finishing touches

Are we done with antialiasing? Almost, but not quite! At the last stage, artists have the opportunity to apply masks to certain areas of the image to darken or brighten the pixels. This is just a series of sprites drawn on top of a scene. It’s interesting to see how the Fox Engine allows artists to control the image even at the very last stage of rendering.

Final touches: 0%

Final touches: 30%

Final touches: 60%

Final touches: 100%

And we are done! Now the frame can be transferred to the monitor, and the video processor will start the same process from scratch to generate a completely new frame.

Some metrics for this scene: 2331 draw calls, 623 textures and 73 target renderings.

Bonus Notes

Let's look at the buffers in action

Here is a short clip showing the different buffers that I talked about earlier (G-buffer, SSAO, etc.).

If you're curious about how the video was recorded: analyzing this game compared to the previous ones required a lot more effort. None of the graphical debuggers could be used here, because MGS V exits when it detects DLL injectors that modify certain D3D functions. I had to roll up my sleeves and fork the old version of ReShade , which I expanded with my own interceptors. Thanks to them, I could save buffers, textures, binary data of shaders ( DXBC containing all debugging data) ...

Thanks to my interceptors, it became quite simple to create the video shown above: I could just copy any intermediate buffer to the final frame buffer right before it was transferred to the monitor.

Ishmael's true face

Ishmael is a mysterious patient lying in a bed next to Snake , who helps him escape from the hospital. His head is wrapped in bandages hiding his true identity. You were interested to know how it really looks?

Well, let's see! Here's a diffuse albedo buffer immediately before and after rendering the dressing.

There are no spoilers here , but just in case, I hid the second image.

Ishmael without a bandage

This ... not exactly the face that should be

But let's take one more step!

Instead of just saving the albedo map in the middle of the G-buffer generation, as I did before, it would be nice to see Ishmael's face during the game itself, forbidding the engine to render the blindfold.

This is quite easy to do: thanks to my own interceptors, I can turn off the transmission of certain calls to the video processor. Having experimented a bit, I found two draw calls that render the dressing and sent them to the black list.

If you want to repeat the experiment yourself, here are the necessary calls:

Вызовы отрисовки повязки

ID3D11DeviceContext::DrawIndexed( 0x591, 0xF009, 0x0 );

ID3D11DeviceContext::DrawIndexed( 0xB4F, 0xFA59, 0x0 );

Below is a video of the game with the true face of Ishmael. I use a hotkey to switch the rendering of the bandage. The transition is progressive and, until completely dimming, reduces the number of triangles used to draw the dressing.

It is worth considering that this trick with blocking draw calls does not work in all cases - sometimes the original mesh simply does not contain data for hidden surfaces. This is logical, because it allows you to optimize performance: less unnecessary geometry is transmitted to the video processor!

For example, the “third child” model does not have triangles under a gas mask in the lower part of the face, we will never see its nose and mouth, simply because they do not exist.

Sitelinks

This concludes the analysis of MGS V. I hope you better understand how the Fox Engine renders a frame.

If you want to know more, below I provide links to additional materials:

Photorealism Through the Eyes of a FOX: The Core of Metal Gear Solid Ground Zeroes (GDC 2013); report by Kojima Productions
Tech Analysis: Metal Gear Solid 5's FOX Engine ; Digital Foundry Report
The MGS V PBR Texture Analysis documented formats games charts.
MGS V NVIDIA Performance Guide with details on graphics settings.
Special thanks to Patrick Moores for unveiling the ReShade source code this year .

Tags:

How to render a frame engine Metal Gear Solid V: Phantom Pain

Analyze frame

Depth passage

G-buffer generation

Speed ​​map

Screen Space Ambient Occlusion

SSAO based on linear integrals

Scalable Ambient Obscurance

Spherical Illumination Maps

Diffuse Lighting (Global Illumination)

Light sources that do not cast a shadow

Shadow cards

Light sources casting a shadow

Combination of lighting and tone mapping cards

Radiant and transparent objects

Screen Space Reflections

Thermal distortion, decals and particles

Bloom

Глубина резкости

Dirt on the lens and new lens flare

Motion blur

Color correction

Antialiasing

Finishing touches

Bonus Notes

Let's look at the buffers in action

Ishmael's true face

Sitelinks

Also popular now:

Speed map