Reverse engineering rendering "The Witcher 3"

Transfer

Recently, I began to deal with the rendering of The Witcher 3. This game has amazing rendering techniques. In addition, it is great in terms of plot / music / gameplay.

In this article, I’ll talk about the solutions used to render The Witcher 3. It will not be as comprehensive as the analysis of the graphics of Adrian Corregé’s GTA V , at least for now.

We will start with the reverse engineering tonal correction.

Part 1: Tone Correction

In most modern AAA games, one of the stages of rendering is the tonal correction.

Let me remind you that in real life there is a fairly wide range of brightness, while at computer screens it is very limited (8 bits per pixel, which gives us 0-255). It is here that tonemapping comes to the rescue, allowing you to fit a wider one in a limited light interval. Usually there are two sources of data in this process: HDR-image with a floating point, the color values of which exceed 1.0, and the average illumination of the scene (the latter can be calculated in several ways, even taking into account the adaptation of the eye to simulate the behavior of human eyes, but here it does not matter).

The next (and last) stage is to obtain an exposure, calculate the color with an exposure and process it using the tone correction curve. And here everything becomes quite confusing, because new concepts appear, such as “white point” (white point) and “middle gray” (middle gray). There are at least a few popular curves, and some of them are discussed in Matt Pettinéo ’s article “A Closer Look at Tone Mapping” .

To be honest, I always had problems with the correct implementation of tone mapping in my own code. There are at least a few different examples on the web.which turned out to be useful to me ... to some extent. Some of them take into account the HDR-brightness / point of white / medium gray color, others do not - so they do not really help. I wanted to find a “proven in battles” implementation.

We will work in RenderDoc with the capture of this frame of one of the main quests of Novigrad. All settings are set to maximum:

With a bit of searching, I found the draw challenge for tone correction! As I mentioned above, there is a buffer of HDR colors (texture number 0, full resolution) and average scene brightness (texture number 1, 1x1, floating point, computed earlier by the compute shader).

Let's take a look at the pixel shader assembler code:

ps_5_0dcl_globalFlagsrefactoringAlloweddcl_constantbuffercb3[17], immediateIndexeddcl_resource_texture2d (float,float,float,float) t0dcl_resource_texture2d (float,float,float,float) t1dcl_input_ps_sivv0.xy, positiondcl_outputo0.xyzwdcl_temps 4  
   0: ld_indexable(texture2d)(float,float,float,float) r0.x, l(0, 0, 0, 0), t1.xyzw  
   1: maxr0.x, r0.x, cb3[4].y  
   2: minr0.x, r0.x, cb3[4].z  
   3: maxr0.x, r0.x, l(0.000100)  
   4: mulr0.y, cb3[16].x, l(11.200000)  
   5: divr0.x, r0.x, r0.y  
   6: logr0.x, r0.x  
   7: mulr0.x, r0.x, cb3[16].z  
   8: expr0.x, r0.x  
   9: mulr0.x, r0.y, r0.x  
  10: divr0.x, cb3[16].x, r0.x  
  11: ftour1.xy, v0.xyxx  
  12: movr1.zw, l(0, 0, 0, 0)  
  13: ld_indexable(texture2d)(float,float,float,float) r0.yzw, r1.xyzw, t0.wxyz  
  14: mulr0.xyz, r0.yzwy, r0.xxxx  
  15: madr1.xyz, cb3[7].xxxx, r0.xyzx, cb3[7].yyyy  
  16: mulr2.xy, cb3[8].yzyy, cb3[8].xxxx  
  17: madr1.xyz, r0.xyzx, r1.xyzx, r2.yyyy  
  18: mulr0.w, cb3[7].y, cb3[7].z  
  19: madr3.xyz, cb3[7].xxxx, r0.xyzx, r0.wwww  
  20: madr0.xyz, r0.xyzx, r3.xyzx, r2.xxxx  
  21: divr0.xyz, r0.xyzx, r1.xyzx  
  22: madr0.w, cb3[7].x, l(11.200000), r0.w  
  23: madr0.w, r0.w, l(11.200000), r2.x  
  24: divr1.x, cb3[8].y, cb3[8].z  
  25: addr0.xyz, r0.xyzx, -r1.xxxx  
  26: maxr0.xyz, r0.xyzx, l(0, 0, 0, 0)  
  27: mulr0.xyz, r0.xyzx, cb3[16].yyyy  
  28: madr1.y, cb3[7].x, l(11.200000), cb3[7].y  
  29: madr1.y, r1.y, l(11.200000), r2.y  
  30: divr0.w, r0.w, r1.y  
  31: addr0.w, -r1.x, r0.w  
  32: maxr0.w, r0.w, l(0)  
  33: divo0.xyz, r0.xyzx, r0.wwww  
  34: movo0.w, l(1.000000)  
  35: ret

Here it is worth noting a few points. First, the loaded brightness does not necessarily have to be used, because it is limited (max / min calls) within the values chosen by the artists (from the constant buffer). This is convenient because it allows you to avoid too slow or slow shutter speeds. This move seems rather trivial, but I have never done this before. Secondly, the one who is familiar with the tone correction curves will instantly recognize this value “11.2”, because in fact this is the value of the white point from the Uncharted2 John Heyble tone correction curve.

AF parameters are loaded from cbuffer.

So, we have three more parameters: cb3_v16.x, cb3_v16.y, cb3_v16.z. We can explore their meanings:

My guesses:

I believe that “x” is a kind of “white scale” or medium gray because it is multiplied by 11.2 (line 4), and after that it is used as a numerator in calculating the shutter speed setting (line 10).

“Y” - I called it “the multiplier of the numerator u2”, and soon you will see why.

“Z” is “exponentiation parameter” because it is used in the log / mul / exp triple (in fact, in exponentiation).

But treat these variable names with a bit of skepticism!

Also:

cb3_v4.yz - min / max values of permissible brightness,
cb3_v7.xyz - AC parameters of the Uncharted2 curve,
cb3_v8.xyz - DF parameters of the Uncharted2 curve.

Now let's get down to the complex - we will write the HLSL shader, which will give us exactly the same assembly code.

This can be very difficult, and the longer the shader, the more difficult the task. Fortunately, some time ago I wrote a tool that allows you to quickly view hlsl-> asm.

Ladies and gentlemen ... welcome D3DShaderDisassembler!

Having experimented with the code, I received a ready-made HLSL tone correction The Witcher 3 :

 cbuffer cBuffer : register (b3)  
 {  
   float4 cb3_v0;  
   float4 cb3_v1;  
   float4 cb3_v2;  
   float4 cb3_v3;  
   float4 cb3_v4;  
   float4 cb3_v5;  
   float4 cb3_v6;  
   float4 cb3_v7;  
   float4 cb3_v8;  
   float4 cb3_v9;  
   float4 cb3_v10;  
   float4 cb3_v11;  
   float4 cb3_v12;  
   float4 cb3_v13;  
   float4 cb3_v14;  
   float4 cb3_v15;  
   float4 cb3_v16, cb3_v17;  
 }  
 Texture2D     TexHDRColor          : register (t0);  
 Texture2D     TexAvgLuminance     : register (t1);  
 struct VS_OUTPUT_POSTFX  
 {  
   float4 Position : SV_Position;  
 };  
 float3 U2Func( float A, float B, float C, float D, float E, float F, float3 x )  
 {  
      return ((x*(A*x+C*B)+D*E)/(x*(A*x+B)+D*F)) - E/F;  
 }  
 float3 ToneMapU2Func( float A, float B, float C, float D, float E, float F, float3 color, float numMultiplier )  
 {  
      float3 numerator =  U2Func( A, B, C, D, E, F, color );  
      numerator = max( numerator, 0 );  
      numerator.rgb *= numMultiplier;  
      float3 denominator = U2Func( A, B, C, D, E, F, 11.2 );  
      denominator = max( denominator, 0 );  
      return numerator / denominator;  
 }  
 float4 ToneMappingPS( VS_OUTPUT_POSTFX Input) : SV_Target0  
 {  
      float avgLuminance = TexAvgLuminance.Load( int3(0, 0, 0) );  
      avgLuminance = clamp( avgLuminance, cb3_v4.y, cb3_v4.z );  
      avgLuminance = max( avgLuminance, 1e-4 );  
      float scaledWhitePoint = cb3_v16.x * 11.2;  
      float luma = avgLuminance / scaledWhitePoint;  
      luma = pow( luma, cb3_v16.z );  
      luma = luma * scaledWhitePoint;  
      luma = cb3_v16.x / luma;  
      float3 HDRColor = TexHDRColor.Load( uint3(Input.Position.xy, 0) ).rgb;  
      float3 color = ToneMapU2Func( cb3_v7.x, cb3_v7.y, cb3_v7.z, cb3_v8.x, cb3_v8.y,   
         cb3_v8.z, luma*HDRColor, cb3_v16.y);  
      returnfloat4(color, 1);  
 }

A screenshot from my utility to confirm this:

Voila!

I think this is a fairly accurate implementation of the TW3 tone correction, at least in terms of assembly code. I have already applied it in my framework and it works great!

I said “enough” because I have no idea why the denominator in ToneMapU2Func becomes maximum at zero. When dividing by 0, it should be undefined?

This could be finished, but almost by chance I found another version of the TW3 tone correction shader in this frame, used for a beautiful sunset (interestingly, it is used with minimal graphics settings!)

Let's check it out. First, the assembler shader code:

ps_5_0dcl_globalFlagsrefactoringAlloweddcl_constantbuffercb3[18], immediateIndexeddcl_resource_texture2d (float,float,float,float) t0dcl_resource_texture2d (float,float,float,float) t1dcl_input_ps_sivv0.xy, positiondcl_outputo0.xyzwdcl_temps 5  
   0: ld_indexable(texture2d)(float,float,float,float) r0.x, l(0, 0, 0, 0), t1.xyzw  
   1: maxr0.y, r0.x, cb3[9].y  
   2: maxr0.x, r0.x, cb3[4].y  
   3: minr0.x, r0.x, cb3[4].z  
   4: minr0.y, r0.y, cb3[9].z  
   5: maxr0.xy, r0.xyxx, l(0.000100, 0.000100, 0.000000, 0.000000)  
   6: mulr0.z, cb3[17].x, l(11.200000)  
   7: divr0.y, r0.y, r0.z  
   8: logr0.y, r0.y  
   9: mulr0.y, r0.y, cb3[17].z  
  10: expr0.y, r0.y  
  11: mulr0.y, r0.z, r0.y  
  12: divr0.y, cb3[17].x, r0.y  
  13: ftour1.xy, v0.xyxx  
  14: movr1.zw, l(0, 0, 0, 0)  
  15: ld_indexable(texture2d)(float,float,float,float) r1.xyz, r1.xyzw, t0.xyzw  
  16: mulr0.yzw, r0.yyyy, r1.xxyz  
  17: madr2.xyz, cb3[11].xxxx, r0.yzwy, cb3[11].yyyy  
  18: mulr3.xy, cb3[12].yzyy, cb3[12].xxxx  
  19: madr2.xyz, r0.yzwy, r2.xyzx, r3.yyyy  
  20: mulr1.w, cb3[11].y, cb3[11].z  
  21: madr4.xyz, cb3[11].xxxx, r0.yzwy, r1.wwww  
  22: madr0.yzw, r0.yyzw, r4.xxyz, r3.xxxx  
  23: divr0.yzw, r0.yyzw, r2.xxyz  
  24: madr1.w, cb3[11].x, l(11.200000), r1.w  
  25: madr1.w, r1.w, l(11.200000), r3.x  
  26: divr2.x, cb3[12].y, cb3[12].z  
  27: addr0.yzw, r0.yyzw, -r2.xxxx  
  28: maxr0.yzw, r0.yyzw, l(0, 0, 0, 0)  
  29: mulr0.yzw, r0.yyzw, cb3[17].yyyy  
  30: madr2.y, cb3[11].x, l(11.200000), cb3[11].y  
  31: madr2.y, r2.y, l(11.200000), r3.y  
  32: divr1.w, r1.w, r2.y  
  33: addr1.w, -r2.x, r1.w  
  34: maxr1.w, r1.w, l(0)  
  35: divr0.yzw, r0.yyzw, r1.wwww  
  36: mulr1.w, cb3[16].x, l(11.200000)  
  37: divr0.x, r0.x, r1.w  
  38: logr0.x, r0.x  
  39: mulr0.x, r0.x, cb3[16].z  
  40: expr0.x, r0.x  
  41: mulr0.x, r1.w, r0.x  
  42: divr0.x, cb3[16].x, r0.x  
  43: mulr1.xyz, r1.xyzx, r0.xxxx  
  44: madr2.xyz, cb3[7].xxxx, r1.xyzx, cb3[7].yyyy  
  45: mulr3.xy, cb3[8].yzyy, cb3[8].xxxx  
  46: madr2.xyz, r1.xyzx, r2.xyzx, r3.yyyy  
  47: mulr0.x, cb3[7].y, cb3[7].z  
  48: madr4.xyz, cb3[7].xxxx, r1.xyzx, r0.xxxx  
  49: madr1.xyz, r1.xyzx, r4.xyzx, r3.xxxx  
  50: divr1.xyz, r1.xyzx, r2.xyzx  
  51: madr0.x, cb3[7].x, l(11.200000), r0.x  
  52: madr0.x, r0.x, l(11.200000), r3.x  
  53: divr1.w, cb3[8].y, cb3[8].z  
  54: addr1.xyz, -r1.wwww, r1.xyzx  
  55: maxr1.xyz, r1.xyzx, l(0, 0, 0, 0)  
  56: mulr1.xyz, r1.xyzx, cb3[16].yyyy  
  57: madr2.x, cb3[7].x, l(11.200000), cb3[7].y  
  58: madr2.x, r2.x, l(11.200000), r3.y  
  59: divr0.x, r0.x, r2.x  
  60: addr0.x, -r1.w, r0.x  
  61: maxr0.x, r0.x, l(0)  
  62: divr1.xyz, r1.xyzx, r0.xxxx  
  63: addr0.xyz, r0.yzwy, -r1.xyzx  
  64: mado0.xyz, cb3[13].xxxx, r0.xyzx, r1.xyzx  
  65: movo0.w, l(1.000000)  
  66: ret

At first, the code may look frightening, but in fact not everything is so bad. After a brief analysis, you can see that there are two calls to the Uncharted2 function with different sets of input data (AF, min / max brightness ...). I have never met such a decision before.

And HLSL:

 cbuffer cBuffer : register (b3)  
 {  
   float4 cb3_v0;  
   float4 cb3_v1;  
   float4 cb3_v2;  
   float4 cb3_v3;  
   float4 cb3_v4;  
   float4 cb3_v5;  
   float4 cb3_v6;  
   float4 cb3_v7;  
   float4 cb3_v8;  
   float4 cb3_v9;  
   float4 cb3_v10;  
   float4 cb3_v11;  
   float4 cb3_v12;  
   float4 cb3_v13;  
   float4 cb3_v14;  
   float4 cb3_v15;  
   float4 cb3_v16, cb3_v17;  
 }  
 Texture2D     TexHDRColor     : register (t0);  
 Texture2D     TexAvgLuminance     : register (t1);  
 float3 U2Func( float A, float B, float C, float D, float E, float F, float3 x )  
 {  
      return ((x*(A*x+C*B)+D*E)/(x*(A*x+B)+D*F)) - E/F;  
 }  
 float3 ToneMapU2Func( float A, float B, float C, float D, float E, float F, float3 color, float numMultiplier )  
 {  
      float3 numerator =  U2Func( A, B, C, D, E, F, color );  
      numerator = max( numerator, 0 );  
      numerator.rgb *= numMultiplier;  
      float3 denominator = U2Func( A, B, C, D, E, F, 11.2 );  
      denominator = max( denominator, 0 );  
      return numerator / denominator;  
 }  
 struct VS_OUTPUT_POSTFX  
 {  
   float4 Position : SV_Position;  
 };  
 float getExposure(float avgLuminance, float minLuminance, float maxLuminance, float middleGray, float powParam)  
 {  
      avgLuminance = clamp( avgLuminance, minLuminance, maxLuminance );  
      avgLuminance = max( avgLuminance, 1e-4 );  
      float scaledWhitePoint = middleGray * 11.2;  
      float luma = avgLuminance / scaledWhitePoint;  
      luma = pow( luma, powParam);  
      luma = luma * scaledWhitePoint;  
      float exposure = middleGray / luma;  
      return exposure;  
 }  
 float4 ToneMappingPS( VS_OUTPUT_POSTFX Input) : SV_Target0  
 {  
      float avgLuminance = TexAvgLuminance.Load( int3(0, 0, 0) );  
      float exposure1 = getExposure( avgLuminance, cb3_v9.y, cb3_v9.z, cb3_v17.x, cb3_v17.z);  
      float exposure2 = getExposure( avgLuminance, cb3_v4.y, cb3_v4.z, cb3_v16.x, cb3_v16.z);  
      float3 HDRColor = TexHDRColor.Load( uint3(Input.Position.xy, 0) ).rgb;  
      float3 color1 = ToneMapU2Func( cb3_v11.x, cb3_v11.y, cb3_v11.z, cb3_v12.x, cb3_v12.y,   
         cb3_v12.z, exposure1*HDRColor, cb3_v17.y);  
      float3 color2 = ToneMapU2Func( cb3_v7.x, cb3_v7.y, cb3_v7.z, cb3_v8.x, cb3_v8.y,   
         cb3_v8.z, exposure2*HDRColor, cb3_v16.y);  
      float3 finalColor = lerp( color2, color1, cb3_v13.x ); 
      returnfloat4(finalColor, 1);  
 }

That is, in fact, we have two sets of control parameters, we calculate two colors with tone correction, and at the end we interpolate them. Smart decision!

Part 2: eye adaptation

The second part will be much easier.

In the first part, I showed how tone correction is performed in TW3. Explaining the theoretical foundations, I briefly mentioned the adaptation of the eye. And you know what? In this part I will talk about how this adaptation of the eye is realized.

But wait, what is eye adaptation and why do we need it? Wikipedia knows everything about it, but I will explain: imagine that you are in a dark room (remember Life is Strange) or in a cave, and go outside, where it is light. For example, the main source of illumination may be the sun.

In the dark, our pupils are dilated, so that more light will fall through them to the retina. When it becomes light, our pupils shrink and sometimes we close our eyes because it is “painful.”

This change does not happen instantly. The eye must adapt to changes in brightness. That is why we adapt the eye when rendering in real time.

A good example of when the lack of eye adaptation is noticeable is the HDRToneMappingCS11 from the DirectX SDK. Abrupt changes in average brightness are rather unpleasant and unnatural.

Let's get started! For the sake of consistency, we will analyze the same frame from Novigrad.

Now we delve into the frame capture program RenderDoc. Eye adaptation is usually performed right before the tone correction, and The Witcher 3 is no exception.

Let's look at the state of the pixel shader:

We have two sources of input data - 2 textures, R32_FLOAT, 1x1 (one pixel). texture0 contains the average brightness of the scene from the previous frame. texture1 contains the average brightness of the scene from the current frame (calculated immediately before this compute shader - I marked it in blue).

It is expected that there is one output - R32_FLOAT, 1x1. Let's look at the pixel shader.

ps_5_0dcl_globalFlagsrefactoringAlloweddcl_constantbuffercb3[1], immediateIndexeddcl_samplers0, mode_defaultdcl_samplers1, mode_defaultdcl_resource_texture2d (float,float,float,float) t0dcl_resource_texture2d (float,float,float,float) t1dcl_outputo0.xyzwdcl_temps 1  
   0: sample_l(texture2d)(float,float,float,float) r0.x, l(0, 0, 0, 0), t1.xyzw, s1, l(0)  
   1: sample_l(texture2d)(float,float,float,float) r0.y, l(0, 0, 0, 0), t0.yxzw, s0, l(0)  
   2: ger0.z, r0.y, r0.x  
   3: addr0.x, -r0.y, r0.x  
   4: movcr0.z, r0.z, cb3[0].x, cb3[0].y  
   5: mado0.xyzw, r0.zzzz, r0.xxxx, r0.yyyy  
   6: ret

Wow, how simple! Only 7 lines of assembly code. What's going on here? I will explain each line:

0) Get the average brightness of the current frame.
1) Get the average brightness of the previous frame.
2) Perform a check: is the current brightness less than or equal to the brightness of the previous frame?
If yes, then the brightness decreases, if not, the brightness increases.
3) Calculate the difference: difference = currentLum - previousLum.
4) This conditional transfer (movc) assigns a rate factor from the constant buffer. Depending on the result of the test from line 2, two different values can be assigned. This is a smart move, because so you can get different speeds of adaptation and to reduce and increase the brightness. But in the frame under study, both values are the same and vary from 0.11 to 0.3.
5) The final calculation of the adapted brightness: adaptedLuminance = speedFactor * difference + previousLuminance.
6) Shader End

This is implemented in HLSL quite simple:

 // The Witcher 3 eye adaptation shader  
 cbuffer cBuffer : register (b3)  
 {  
   float4 cb3_v0;  
 }
 struct VS_OUTPUT_POSTFX  
 {  
   float4 Position                                             : SV_Position;  
 };  
 SamplerState samplerPointClamp : register (s0);  
 SamplerState samplerPointClamp2 : register (s1);  
 Texture2D TexPreviousAvgLuminance  : register (t0);  
 Texture2D TexCurrentAvgLuminance  : register (t1);  
 float4 TW3_EyeAdaptationPS(VS_OUTPUT_POSTFX Input) : SV_TARGET  
 {  
   // Getcurrentand previous luminance.  
   float currentAvgLuminance = TexCurrentAvgLuminance.SampleLevel( samplerPointClamp2, float2(0.0, 0.0), 0 );  
   float previousAvgLuminance = TexPreviousAvgLuminance.SampleLevel( samplerPointClamp, float2(0.0, 0.0), 0 );  
   // Difference betweencurrentand previous luminance.  
   float difference = currentAvgLuminance - previousAvgLuminance;  
   // Scale factor. Can be different forboth falling down and rising up of luminance.  
   // It affects speed of adaptation.  
   // Small conditional test is performed here, so different speed can be set differently forboth these cases.  
   float adaptationSpeedFactor = (currentAvgLuminance <= previousAvgLuminance) ? cb3_v0.x : cb3_v0.y;  
   // Calculate adapted luminance.  
   float adaptedLuminance = adaptationSpeedFactor * difference + previousAvgLuminance;  
   return adaptedLuminance;  
 }

These lines give us the same assembly code. I would just suggest replacing the type of output data from float4 to float . No need for wasteful bandwidth. This is how the adaptation of the eye is implemented in Witcher 3. Pretty simple, right?

Ps. Many thanks to Baldur Karlsson (Twitter: @baldurk ) for RenderDoc. The program is just great.

Part 3: chromatic aberration

Chromatic aberration is an effect mainly found in cheap lenses. It occurs because the lenses have different refractive index for different lengths of visible light. As a result, it appears a visible distortion. However, not everyone likes it. Fortunately, in Witcher 3, this effect is very subtle, and therefore not annoying during the game process (me, at least). But if you want, you can turn it off.

Let's take a closer look at an example of a scene with and without chromatic aberration:

Chromatic aberration enabled

Chromatic aberration is disabled.

Do you notice any differences near the edges? Me neither. Let's try another scene:

Chromatic aberration is enabled. Notice a slight “red” distortion in the indicated area.

Yeah, much better! Here the contrast between the dark and light areas is stronger, and in the corner we see a slight distortion. As you can see, this effect is very weak. However, I was wondering how it is implemented. Let's move on to the most curious part: the code!

Implementation

The first thing to do is to find the desired draw call with a pixel shader. In fact, chromatic aberration is part of the “final post-processing” large pixel shader, which consists of chromatic aberration, vignetting and gamma correction. All this is inside a single pixel shader. Let's take a closer look at the assembler code of the pixel shader:

ps_5_0dcl_globalFlagsrefactoringAlloweddcl_constantbuffercb3[18], immediateIndexeddcl_samplers1, mode_defaultdcl_resource_texture2d (float,float,float,float) t0dcl_input_ps_sivv0.xy, positiondcl_input_pslinearv1.zwdcl_outputo0.xyzwdcl_temps 4  
   0: mulr0.xy, v0.xyxx, cb3[17].zwzz  
   1: madr0.zw, v0.xxxy, cb3[17].zzzw, -cb3[17].xxxy  
   2: divr0.zw, r0.zzzw, cb3[17].xxxy  
   3: dp2r1.x, r0.zwzz, r0.zwzz  
   4: sqrtr1.x, r1.x  
   5: addr1.y, r1.x, -cb3[16].y  
   6: mul_satr1.y, r1.y, cb3[16].z  
   7: sample_l(texture2d)(float,float,float,float) r2.xyz, r0.xyxx, t0.xyzw, s1, l(0)  
   8: ltr1.z, l(0), r1.y  
   9: if_nzr1.z  
  10:  mulr1.y, r1.y, r1.y  
  11:  mulr1.y, r1.y, cb3[16].x  
  12:  maxr1.x, r1.x, l(0.000100)  
  13:  divr1.x, r1.y, r1.x  
  14:  mulr0.zw, r0.zzzw, r1.xxxx  
  15:  mulr0.zw, r0.zzzw, cb3[17].zzzw  
  16:  madr0.xy, -r0.zwzz, l(2.000000, 2.000000, 0.000000, 0.000000), r0.xyxx  
  17:  sample_l(texture2d)(float,float,float,float) r2.x, r0.xyxx, t0.xyzw, s1, l(0)  
  18:  madr0.xy, v0.xyxx, cb3[17].zwzz, -r0.zwzz  
  19:  sample_l(texture2d)(float,float,float,float) r2.y, r0.xyxx, t0.xyzw, s1, l(0)  
  20: endif  
 ...

And to the cbuffer values:

So, let's try to understand what is happening here. Essentially, cb3_v17.xy is the center of chromatic aberration, so the first lines calculate the 2d vector from the coordinates of texels (cb3_v17.zw = the reciprocal of the viewport size) to the “center of chromatic aberration” and its length, then performs other calculations, testing and branching . When applying chromatic aberration, we calculate the displacements using some values from the buffer of the constants and distort the R and G channels. In general, the closer to the edges of the screen, the stronger the effect. Line 10 is quite interesting because it causes the pixels to “move closer”, especially when we exaggerate the aberration. I am pleased to share with you my realization of the effect. As usual, take the names of variables with (solid) skepticism. And note that the effect is applied before gamma correction.

void ChromaticAberration( float2 uv, inout float3 color )  
 {  
   // User-defined params  
   float2 chromaticAberrationCenter = float2(0.5, 0.5);  
   float chromaticAberrationCenterAvoidanceDistance = 0.2;  
   float fA = 1.25;  
   float fChromaticAbberationIntensity = 30;  
   float fChromaticAberrationDistortionSize = 0.75;  
   // Calculate vector  
   float2 chromaticAberrationOffset = uv - chromaticAberrationCenter;  
   chromaticAberrationOffset = chromaticAberrationOffset / chromaticAberrationCenter;  
   float chromaticAberrationOffsetLength = length(chromaticAberrationOffset);  
   // To avoid applying chromatic aberration in center, subtract small valuefrom  
   // just calculated length.  
   float chromaticAberrationOffsetLengthFixed = chromaticAberrationOffsetLength - chromaticAberrationCenterAvoidanceDistance;  
   float chromaticAberrationTexel = saturate(chromaticAberrationOffsetLengthFixed * fA);  
   float fApplyChromaticAberration = (0.0 < chromaticAberrationTexel);  
   if (fApplyChromaticAberration)  
   {  
     chromaticAberrationTexel *= chromaticAberrationTexel;  
     chromaticAberrationTexel *= fChromaticAberrationDistortionSize;  
     chromaticAberrationOffsetLength = max(chromaticAberrationOffsetLength, 1e-4);  
     float fMultiplier = chromaticAberrationTexel / chromaticAberrationOffsetLength;  
     chromaticAberrationOffset *= fMultiplier;  
     chromaticAberrationOffset *= g_Viewport.zw;  
     chromaticAberrationOffset *= fChromaticAbberationIntensity;  
     float2 offsetUV = -chromaticAberrationOffset * 2 + uv;  
     color.r = TexColorBuffer.SampleLevel(samplerLinearClamp, offsetUV, 0).r;  
     offsetUV = uv - chromaticAberrationOffset;  
     color.g = TexColorBuffer.SampleLevel(samplerLinearClamp, offsetUV, 0).g;  
   }  
 }

I added “fChromaticAberrationIntensity” to increase the size of the offset, and hence the effect strength, as the name implies (TW3 = 1.0). Intensity = 40:

That's all! Hope you enjoyed this part.

Part 4: Vignetting

Vignetting is one of the most common post-processing effects used in games. He is popular in photography. Slightly shaded corners can create a beautiful effect. There are several types of vignetting. For example, the Unreal Engine 4 uses natural. But back to The Witcher 3. Click here to see an interactive comparison of frames with and without vignetting. The comparison is taken from The Witcher 3's NVIDIA performance manual .

Screenshot from The Witcher 3 with vignetting turned on.

Note that the upper left corner (the sky) is not as obscured as other parts of the image. We'll come back to this later.

Implementation Details

Firstly, there is a slight difference between the vignetting used in the original version of The Witcher 3 (which was released on May 19, 2015) and in The Witcher 3: Blood and Wine. In the first, the “reverse gradient” is calculated inside the pixel shader, and in the last, it is calculated in advance in a 256x256 2D texture:

The texture is 256x256, used as a “reverse gradient” in the “Blood and Wine” supplement.

I will use a shader from Blood and Wine (great game, by the way). As in most other games, the Witcher 3 vignetting is computed in the final post-processing pixel shader. Take a look at the assembler code:

 ...  
  44: logr0.xyz, r0.xyzx  
  45: mulr0.xyz, r0.xyzx, l(0.454545, 0.454545, 0.454545, 0.000000)  
  46: expr0.xyz, r0.xyzx  
  47: mulr1.xyz, r0.xyzx, cb3[9].xyzx  
  48: sample_indexable(texture2d)(float,float,float,float) r0.w, v1.zwzz, t2.yzwx, s2  
  49: logr2.xyz, r1.xyzx  
  50: mulr2.xyz, r2.xyzx, l(2.200000, 2.200000, 2.200000, 0.000000)  
  51: expr2.xyz, r2.xyzx  
  52: dp3r1.w, r2.xyzx, cb3[6].xyzx  
  53: add_satr1.w, -r1.w, l(1.000000)  
  54: mulr1.w, r1.w, cb3[6].w  
  55: mul_satr0.w, r0.w, r1.w  
  56: madr0.xyz, -r0.xyzx, cb3[9].xyzx, cb3[7].xyzx  
  57: madr0.xyz, r0.wwww, r0.xyzx, r1.xyzx  
 ...

Interesting! It seems that both gamma (line 46) and linear spaces (line 51) are used to calculate vignetting. In line 48, we sample the “reverse gradient” texture. cb3 [9] .xyz is not associated with vignetting. In each scanned frame, it is assigned the value float3 (1.0, 1.0, 1.0), that is, it is probably the final filter used in the effects of fade-in / fade-out gradually dimming / lightening the screen. TW3 has three main parameters for vignetting:

Opacity (cb3 [6] .w) - affects the power of vignetting. 0 - no vignetting, 1 - maximum vignetting. According to my observations, in the base The Witcher 3 it is approximately equal to 1.0, and in “Blood and Wine” it fluctuates around 0.15.
Color (cb3 [7] .xyz) - an excellent feature of TW3 vignetting is the ability to change its color. It does not have to be black, but in practice ... It usually has the values float3 (3.0 / 255.0, 4.0 / 255.0, 5.0 / 255.0) and so on - in general, these are multiples of 0.00392156 = 1.0 / 255.0
Weights (cb3 [6] .xyz) is a very interesting parameter. I have always seen “flat” vignettes, such as:

A typical mask of vignetting

But with the help of weights (line 52) you can get very interesting results:

TW3 vignetting mask calculated using Weights

Weights are close to 1.0. Look at the data buffer constants of one frame from “Blood and Wine” (of the magical world with a rainbow): that is why vignetting did not affect the bright pixels of the sky mentioned above.

Code

Here is my implementation of TW3 vignetting on HLSL.

GammaToLinear = pow (color, 2.2)

/*  
 // The Witcher 3 vignette.  
 //  
 // Input color is in gamma space  
 // Output color is in gamma space as well.  
 */float3 Vignette_TW3(in float3 gammaColor, in float3 vignetteColor, in float3 vignetteWeights,  
                      infloat vignetteOpacity, in Texture2D texVignette, in float2 texUV )  
 {  
      // For coloring vignette  
      float3 vignetteColorGammaSpace = -gammaColor + vignetteColor;  
      // Calculate vignette amount based on color in *LINEAR* color space and vignette weights.  float vignetteWeight = dot( GammaToLinear( gammaColor ), vignetteWeights );  
      // We need to keep vignette weight in [0-1] range  
      vignetteWeight = saturate( 1.0 - vignetteWeight );  
      // Multiply by opacity  
      vignetteWeight *= vignetteOpacity;  
      // Obtain vignette mask (here is texture; you can also calculate your custom mask here)  float sampledVignetteMask = texVignette.Sample( samplerLinearClamp, texUV ).x;  
      // Final (inversed) vignette mask  float finalInvVignetteMask = saturate( vignetteWeight * sampledVignetteMask );  
      // final composite in gamma space  
      float3 Color = vignetteColorGammaSpace * finalInvVignetteMask + gammaColor.rgb;  
      // * uncomment to debug vignette mask:  // return 1.0 - finalInvVignetteMask;  // Return final color  return Color;  
 }

Hope you enjoyed it. You can also try my HLSLexplorer , which greatly helped me in understanding the HLSL assembly code.

As before, take the names of the variables with a bit of skepticism - TW3 shaders are processed by D3DStripShader, so I don’t really know anything about them, I just have to guess. In addition, I do not bear any responsibility for the damage inflicted on your equipment by this shader;)

Bonus: calculating the gradient

In Witcher 3, released in 2015, the inverse gradient was calculated in a pixel shader, and no sampling of a pre-calculated texture was used. Take a look at the assembler code:

  35: addr2.xy, v1.zwzz, l(-0.500000, -0.500000, 0.000000, 0.000000)  
  36: dp2r1.w, r2.xyxx, r2.xyxx  
  37: sqrtr1.w, r1.w  
  38: madr1.w, r1.w, l(2.000000), l(-0.550000)  
  39: mul_satr2.w, r1.w, l(1.219512)  
  40: mulr2.z, r2.w, r2.w  
  41: mulr2.xy, r2.zwzz, r2.zzzz  
  42: dp4r1.w, l(-0.100000, -0.105000, 1.120000, 0.090000), r2.xyzw  
  43: minr1.w, r1.w, l(0.940000)

Fortunately for us, it is quite simple. On HLSL, it will look something like this:

float TheWitcher3_2015_Mask( in float2 uv )  
 {  
      float distanceFromCenter = length( uv - float2(0.5, 0.5) );  
      float x = distanceFromCenter * 2.0 - 0.55;  
      x = saturate( x * 1.219512 );          // 1.219512 = 100/82float x2 = x * x;  
      float x3 = x2 * x;  
      float x4 = x2 * x2;  
      float outX = dot( float4(x4, x3, x2, x), float4(-0.10, -0.105, 1.12, 0.09) );  
      outX = min( outX, 0.94 );  
      return outX;  
 }

That is, we simply calculate the distance from the center to the textel, create some magic with it (multiplication, saturate ...), and then ... we calculate the polynomial! Awesome

Part 5: the effect of intoxication

Let's see how the game "The Witcher 3: Wild Hunt" has the effect of intoxication. If you have not played it yet, ~~then drop everything, buy and play,~~ watch the video:

Evening:

Night:

First we see a double and swirling image, often appearing when you drink in real life. The farther a pixel from the center of the image, the stronger the effect of rotation. I deliberately laid out the second video with the night, because you can clearly see this rotation on the stars (see 8 separate points?)

The second part of the effect of intoxication, perhaps not immediately noticeable, is a slight change in zoom. It is noticeable near the center.

It is probably obvious that this effect is a typical post-processing (pixel shader). However, its location in the rendering pipeline may not be so obvious. It turns out that the intoxication effect is applied immediately after the tonal correction and right before the motion blur (the “drunk” image is the input data for the motion blur).

Let's start the games with assembly code:

ps_5_0dcl_globalFlagsrefactoringAlloweddcl_constantbuffercb0[2], immediateIndexeddcl_constantbuffercb3[3], immediateIndexeddcl_samplers0, mode_defaultdcl_resource_texture2d (float,float,float,float) t0dcl_input_ps_sivv1.xy, positiondcl_outputo0.xyzwdcl_temps 8  
   0: madr0.x, cb3[0].y, l(-0.100000), l(1.000000)  
   1: mulr0.yz, cb3[1].xxyx, l(0.000000, 0.050000, 0.050000, 0.000000)  
   2: madr1.xy, v1.xyxx, cb0[1].zwzz, -cb3[2].xyxx  
   3: dp2r0.w, r1.xyxx, r1.xyxx  
   4: sqrtr1.z, r0.w  
   5: mulr0.w, r0.w, l(10.000000)  
   6: minr0.w, r0.w, l(1.000000)  
   7: mulr0.w, r0.w, cb3[0].y  
   8: mulr2.xyzw, r0.yzyz, r1.zzzz  
   9: madr2.xyzw, r1.xyxy, r0.xxxx, -r2.xyzw  
  10: mulr3.xy, r0.xxxx, r1.xyxx  
  11: madr3.xyzw, r0.yzyz, r1.zzzz, r3.xyxy  
  12: addr3.xyzw, r3.xyzw, cb3[2].xyxy  
  13: addr2.xyzw, r2.xyzw, cb3[2].xyxy  
  14: mulr0.x, r0.w, cb3[0].x  
  15: mulr0.x, r0.x, l(5.000000)  
  16: mulr4.xyzw, r0.xxxx, cb3[0].zwzw  
  17: madr5.xyzw, r4.zwzw, l(1.000000, 0.000000, -1.000000, -0.000000), r2.xyzw  
  18: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r5.xyxx, t0.xyzw, s0  
  19: sample_indexable(texture2d)(float,float,float,float) r5.xyzw, r5.zwzz, t0.xyzw, s0  
  20: addr5.xyzw, r5.xyzw, r6.xyzw  
  21: madr6.xyzw, r4.zwzw, l(0.707000, 0.707000, -0.707000, -0.707000), r2.xyzw  
  22: sample_indexable(texture2d)(float,float,float,float) r7.xyzw, r6.xyxx, t0.xyzw, s0  
  23: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0  
  24: addr5.xyzw, r5.xyzw, r7.xyzw  
  25: addr5.xyzw, r6.xyzw, r5.xyzw  
  26: madr6.xyzw, r4.zwzw, l(0.000000, 1.000000, -0.000000, -1.000000), r2.xyzw  
  27: madr2.xyzw, r4.xyzw, l(-0.707000, 0.707000, 0.707000, -0.707000), r2.xyzw  
  28: sample_indexable(texture2d)(float,float,float,float) r7.xyzw, r6.xyxx, t0.xyzw, s0  
  29: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0  
  30: addr5.xyzw, r5.xyzw, r7.xyzw  
  31: addr5.xyzw, r6.xyzw, r5.xyzw  
  32: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r2.xyxx, t0.xyzw, s0  
  33: sample_indexable(texture2d)(float,float,float,float) r2.xyzw, r2.zwzz, t0.xyzw, s0  
  34: addr5.xyzw, r5.xyzw, r6.xyzw  
  35: addr2.xyzw, r2.xyzw, r5.xyzw  
  36: mulr2.xyzw, r2.xyzw, l(0.062500, 0.062500, 0.062500, 0.062500)  
  37: madr5.xyzw, r4.zwzw, l(1.000000, 0.000000, -1.000000, -0.000000), r3.zwzw  
  38: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r5.xyxx, t0.xyzw, s0  
  39: sample_indexable(texture2d)(float,float,float,float) r5.xyzw, r5.zwzz, t0.xyzw, s0  
  40: addr5.xyzw, r5.xyzw, r6.xyzw  
  41: madr6.xyzw, r4.zwzw, l(0.707000, 0.707000, -0.707000, -0.707000), r3.zwzw  
  42: sample_indexable(texture2d)(float,float,float,float) r7.xyzw, r6.xyxx, t0.xyzw, s0  
  43: sample_indexable(texture2d)(float,float,float,float) r6.xyzw, r6.zwzz, t0.xyzw, s0  
  44: addr5.xyzw, r5.xyzw, r7.xyzw  
  45: addr5.xyzw, r6.xyzw, r5.xyzw  
  46: madr6.xyzw, r4.zwzw, 

    
    Also popular now: 
    
        
            IPhone Navigation
        
        
            Invisible Ratings
        
        
            GitHub Reflog v1.4.12
        
        
            LiveJournal undergoes a DDoS attack
        
        
            PHP and API on uCoz
        
        
            Neat keyboard setup
        
        
            Webcams and Orientation Events
        
        
            Swedish Masters in IT
        
        
            xinetd + netcat → pitfalls
        
        
            Thumbnail Cartography / Opera Blog