Native implementation of OmniDirectional shadows in DirectX11

  • Tutorial
image

Hey. Continuing to talk about various technologies from the graphical game dev - I would like to talk about how it is convenient to work with shadows in DirectX 11. I’ll tell you about creating a Point light source with full use of GAPI DirectX11 tools, I will touch upon such concepts as: Hardware Depth Bias , GS Cubemap Render , Native Shadow Map Depth , Hardware PCF .
Based on light surfing on the Internet, I came to the conclusion that most of the articles about shadows in DX11 are incorrect, implemented not quite nicely, or using outdated approaches. In this article I will try to compare the implementation of shadows in DirectX 9 and DirectX 11. Everything described below is also true for OpenGL.


Introduction


And to start this expedition, I’ll nevertheless bring into the business those people who don’t quite understand what shadows are in games and how they work.

Back in 1978, Lance Williams introduced the concept of a shadow creation method called Projective shadowing (Shadow mapping) and, as of 2015, there are no better shadow technologies available in production. Yes, there are many modifications of Projective shadowing , but they are all, one way or another, based on the latter. So what is the essence of this method and how can one get shadows from geometry of any complexity, and even in real time? The bottom line is that the whole scene without textures is rendered from the position of the light source and only the distance from the light source to the fragment of the scene is remembered, this thing is called simply a Shadow Map(shadow map). This is done all the way to the main render of the scene. And then, in the simplest case, the main scene is drawn and for each fragment (pixel) of this scene it is determined whether the source sees this fragment (the light source can be called a camera) or not. If it sees, the light falls on the fragment, if it does not see, it does not fall, everything is simple. Well, I’ll also say about the mathematics of this process, almost all real-time rendering technologies are based on matrices, I wrote about them a little bit here. The definition of the “visibility” of a scene fragment by a light source is as follows: the position of the fragment is converted from the space of the main camera (the matrix of the view and projection of the main camera) to the space of the light source: as a result, two comparable values ​​are obtained. The first is the distance from the center of the light source to the scene fragment, and the second is the same depth (distance) from the Shadow map that was created earlier. Comparing them, you can determine whether a fragment is in the shadow or not.
This is what the simplest shadow implementation looks like. I also want to note that the light sources are different, therefore, the shadows will also be different. There are four main types of light source:

  • Spot Light (aka Projection Light) - it is best to imagine it as a regular projector, which has a cone angle of light and is directed to a specific place. The matrix of such a light source is Perspective Martix .
  • Point Light (aka Omnidirectional Light) is an omnidirectional light source, it is easier to imagine it as a point (hence Point), which emits light in all directions. The matrix of such a source is a Perspective Matrix with a field of view (Field of View, FOV) of 90 degrees, while it has six view matrices, each directed in its own direction (cube sides).
  • Directional Light (aka Sun Light) is an infinitely distant light source, the sun on the earth for example. The matrix of such a light source is the Orthographic Matrix .
  • Ambient Light is a special light source that does not have a position. It carries information about the uniform light obtained as a result of reflections of light from other sources. Recently, it has been disappearing and is being replaced by advanced Global Illumination algorithms .

In theory, everything sounds good and consistent, but in practice there are some problems.
The first and most important problem is the discreteness of the shadow map. They are stored in the GPU memory as a special format texture with a finite size (in games, the “shadow quality” parameter is often related to the size of the shadow map). You can imagine this problem if you place a small fragment of geometry directly in front of the light source and cast a shadow on the large canvas. Because of this, the distances between the ends of the ray and its beginning become incomparable (after all, shadow maps of finite size). The same pixel on the shadow map will correspond to many different positions on the canvas, this leads to an effect such as Aliasing (the shadow looks stepwise):
image

There are many ways to deal with this problem, from non-standard projection matrices (e.g. Trapezoidal Shadow Map , Geometry Pitch Shadow Map ) to creating a large number of shadow maps (4e: Cascaded Shadow Map ). But all these algorithms are rather narrow for application and are not universal. The most common way to get rid of strong aliasing is Percentage Closer Filtering , which is to make several samples with a certain constant offset and interpolate the data, but more on that later.

In this article I will consider one of the most complex light sources - Omnidirectional Light. Because it radiates light in all directions and will have to make an unusual shadow map - the Cube Shadow Map .
Let's get started. What needs to be implemented for a point light source?
  • Render a scene to a special shadow map - Cube Shadow Map
  • A shader for calculating lighting for some model, for example Lambert
  • Shadow filtering using DirectX11

Shadow Map Render


In the era of DirectX9, few people did honest point-light with shadows, but if they did, it was a waste of resources. In order to calculate the shadow for an omnidirectional light source - you need to rend the world around the light source more than once, there are two options - either Dual Paraboloid Shadow Mapping (two times) or Cube Shadow Mapping(six times). Let us dwell on the second one, because it is most traditional for point-light. So, in the DirectX9 era, the world was rendered into a special cubic texture, which contained six two-dimensional textures, each time the active side (texture) was switched and the world was drawn again. Unfortunately, many now continue to do the same, even on DirectX11. The second problem was that in DirectX9 it was impossible to work with the hardware depth from the shader and had to write the depth for future use manually (often in a linear form).
In DirectX 9, the Cube Shadow Map worked as follows:
  • The world was rendered six times - for each side separately
  • The world was rendered in Render Target format R32_Float and was a regular texture
  • The vast majority of fragments were recorded in linear format
  • Shadows manually filtered when applied

Separately, I want to say why in a linear format? The fact is that in checking whether the shadow belonged to the current fragment, it was necessary to compare two quantities: the first is the depth recorded in the shadow map, and the second is the current depth of the fragment in the space of the light source. If everything was simple in the case of Spot / Directional, we took a point and made a projection of this point into the space of the light source (by multiplying by the view / projection matrix of the light source) and compared these two depths. In the case of Point-light, things get more complicated, we have six different species matrices, which means that we must first determine which face is in the fragment and reproject using the specific face matrix. This meant that we had to use Dynamic Flow Controlin the shader, which is pretty hard for the GPU. Therefore, they did it easier: they stored the depth in a linear format (the distance from the light source to the fragment was stored in the shadow map) and compared with the linear depth when superimposed. With this rendering, the hardware depth buffer and Render Target were used, where linear depth was recorded.

In DirectX11, much has changed, at least with the advent of DirectX10, it became possible to use geometric shaders and read native depth in the shader.

How do the same shadows work in DirectX11?
  • The world is rendered once - the geometric shader automatically selects the right side for recording
  • The approach no longer uses Render Target and renders only native depth to the hardware buffer
  • The depth of the fragments has a standard form: pz / pw
  • Ability to use Hardware PCF

Practice



Now, let's see how it all looks in the implementation. The very first thing you need to implement Cube Shadows Mapping is view and projection matrices:
_projection = Matrix.PerspectiveFovRH(
                    MathUtil.DegreesToRadians(90.0f),
                    1.0f,
                    0.01f,
                    this.Transform.Scale.X);

The projection matrix always has a viewing angle of 90 degrees, aspect ratio of one (the cube is still a cube), the far plane is equal to the radius of the light source.
There are six matrices of the form for this light source:
            _view[0] = Matrix.LookAtRH(position, position + Vector3.Right, Vector3.Up);
            _view[1] = Matrix.LookAtRH(position, position + Vector3.Left, Vector3.Up);
            _view[2] = Matrix.LookAtRH(position, position + Vector3.Up, Vector3.BackwardRH);
            _view[3] = Matrix.LookAtRH(position, position + Vector3.Down, Vector3.ForwardRH);
            _view[4] = Matrix.LookAtRH(position, position + Vector3.BackwardLH, Vector3.Up);
            _view[5] = Matrix.LookAtRH(position, position + Vector3.ForwardLH, Vector3.Up);

Each view matrix describes its face, in DirectX11 the order of CubeTexture is as follows: Right, Left, Up, Down, Front, Back.
The following is a special description of Hardware Depth Buffer :
TextureDescription cubeDepthDescription = new TextureDescription()
				{
					ArraySize = 6,
					BindFlags = BindFlags.ShaderResource | BindFlags.DepthStencil,
					CpuAccessFlags = CpuAccessFlags.None,
					Depth = 1,
					Dimension = TextureDimension.TextureCube,
					Format = SharpDX.DXGI.Format.R32_Typeless,
					Height = CommonLight.SHADOW_CUBE_MAP_SIZE,
					MipLevels = 1,
					OptionFlags = ResourceOptionFlags.TextureCube,
SampleDescription = new SharpDX.DXGI.SampleDescription(1, 0),
					Usage = ResourceUsage.Default,
					Width = CommonLight.SHADOW_CUBE_MAP_SIZE
				};

Bind flags are a shader resource and our texture is a depth buffer.
It is also important to override that the format is set to R32_Typeless , this is a mandatory requirement when reading hardware depth.
Due to the fact that we do not use rendering in the texture, it is enough for us to fill the depth buffer of the hardware with data:
_graphics.SetViewport(0f, 0f, (float)CommonLight.SHADOW_CUBE_MAP_SIZE, (float)CommonLight.SHADOW_CUBE_MAP_SIZE);
_graphics.SetRenderTargets((DepthStencilBuffer)light.ShadowMap);
_graphics.Clear((DepthStencilBuffer)light.ShadowMap, SharpDX.Direct3D11.DepthStencilClearFlags.Depth, 1f, 0);
_cubemapDepthResolver.Parameters["View"].SetValue(((OmnidirectionalLight)light).GetCubemapView());
_cubemapDepthResolver.Parameters["Projection"].SetValue(((OmnidirectionalLight)light).GetCubemapProjection());
scene.RenderScene(gameTime, _cubemapDepthResolver, false, 0);

Set the viewport size, set the depth buffer, effect and render our scene.
There is only one standard shader needed - the vertex, pixel one is missing because, again, we do not use Render To Texture :
VertexOutput DefaultVS(VertexInput input)
{
	VertexOutput output = (VertexOutput)0;
	float4 worldPosition = mul(input.Position, World);
	output.Position = worldPosition;
	return output;
}

Just then another shader appears - a geometric one, it will select the desired face to record the depth:
[maxvertexcount(18)]
void DefaultGS( triangle VertexOutput input[3], inout TriangleStream CubeMapStream )
{
	[unroll]
    for( int f = 0; f < 6; ++f )
    {
		{
	        GeometryOutput output = (GeometryOutput)0;
			output.RTIndex = f;
			[unroll]
			for( int v = 0; v < 3; ++v )
			{
				float4 worldPosition = input[v].Position;
				float4 viewPosition = mul(worldPosition, View[f]);
				output.Position = mul(viewPosition, Projection);
				CubeMapStream.Append( output );
			}
			CubeMapStream.RestartStrip();
        }
    }
}

Its task is to receive a triangle at the input, and in response to issue six, but each will be in its face (RTIndex parameter). This is what the structures look like:
cbuffer Params : register(b0)
{
        float4x4 World;
	float4x4 View[6];
	float4x4 Projection;
};
struct VertexInput
{
	float4 Position : SV_POSITION;
	//uint InstanceID : SV_InstanceID;
};
struct VertexOutput
{
	float4 Position : SV_POSITION;   
	//uint InstanceID : SV_InstanceID;
};
struct GeometryOutput
{
	float4 Position : SV_POSITION;   
	uint RTIndex : SV_RenderTargetArrayIndex;
};


A person who has worked with multiple renderings of the same model may notice that instead of an emit of a new geometry - you can use instancing by choosing the desired RTIndex based on InstanceID. Yes, you can, but I got a noticeable loss in performance. I didn’t go into details about why it happened. It turned out to be much easier to mimic the new triangles than to use the ones obtained from instancing.
After this process, we can get the hard cubic depth. At the same time, the rendering of this geometry was done in one pass.
The next step is shadowing, in my example Deferred Shading is used , but everything is true for Forwardrendering. Now about the problems again: we must translate the distance from the light source to the fragment into the space of the light source (cubic depth buffer), but you just can’t do this because you need to know which of the six view matrices to use. I don’t want to use Dynamic Flow Control , so you can come to an interesting hack, which is based on the fact that all view matrices are the same and have a FOV of 90 degrees:
float _vectorToDepth(float3 vec, float n, float f)
{
    float3 AbsVec = abs(vec);
    float LocalZcomp = max(AbsVec.x, max(AbsVec.y, AbsVec.z));
    float NormZComp = (f+n) / (f-n) - (2*f*n)/(f-n)/LocalZcomp;
    return (NormZComp + 1.0) * 0.5;
}

In this way, we can determine the depth in space of a light source of a certain vector.
Now, we can read the depth from the shadow map using the three-dimensional vector [FragmentPosition-LightPosition] and use the same vector to get the depth in the space of the light source, compare them and determine whether the fragment is in the shadow or not.
After passing the shaders to get the shadow and rendering the Light-map, we get a shadow with strong aliasing. To do this, it would be nice to filter the shadow and the DirectX11 - Hardware PCF feature comes to the rescue, this feature is implemented using special comprasion samples:
SamplerComparisonState LightCubeShadowComparsionSampler : register(s0);

It is described as follows:
var dms4 = SharpDX.Direct3D11.SamplerStateDescription.Default();
dms4.AddressU = SharpDX.Direct3D11.TextureAddressMode.Clamp;
dms4.AddressV = SharpDX.Direct3D11.TextureAddressMode.Clamp;
dms4.Filter = SharpDX.Direct3D11.Filter.ComparisonMinMagMipLinear;
dms4.ComparisonFunction = SharpDX.Direct3D11.Comparison.Less;

And the selection is done like this:
LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, lightVector, obtainedDepth).r

Where obtainedDepth is the depth that is obtained from the _vectorToDepth function .
The result is a smoothed result of comparing depths (if the sampler had Linear), which is equivalent to 2x2 Bilinear PCF:
image

You can also make an additional 3x3 HPCF and get this result: I
image

completely forgot to mention one more problem: as mentioned earlier, the depth buffer is discrete, which means that any surface reflected in this buffer looks intermittent (due to limited accuracy), like this:
image

The surface starts casting a shadow on itself, creating an incorrect shadow:
image

This problem can be solved if, when comparing, one of the depths is shifted to some small value (bias) in order to somehow level this problem. Usually they do something like: cD + 0.0001 <sD when checking. This method is harmful, because shifting in this way - we very easily get the effect of Peter Pan:
Peter Pan
image

DirectX11 has standard tools to effectively solve the problem; these bias values ​​are set in the Rasterizer State , the DepthBias and SlopeScaledDepthBias parameters .
In such a simple way, smoothed point-light shadows are implemented using the capabilities of DirectX11.

I will not upload the full code, because It is very much connected with the engine, but I will definitely share the shaders:
DeferredShading.fx
#include "..//pp_GBuffer.fxh"
#include "Lights.fxh"
float4 PointLightPS(float2 UV : TEXCOORD) : SV_TARGET
{
	SurfaceData surfaceData = GetSurfaceData(UV);
	float3 texelPosition = GetPosition(UV);
	float3 texelNormal = surfaceData.Normal;
	float3 vL = texelPosition - LightPosition;
	float3 L = normalize(vL);
	float3 lightColor = _calculationLight(texelNormal, L);
	float3 lightCookie = float3(1, 1, 1);
	if(IsLightCookie)
	{
		float3 rL = mul(float4(L, 1), LightRotation).xyz;
		lightCookie = LightCubeCookie.Sample(LightCubeCookieSampler, float3(rL.xy, -rL.z) ).rgb;
	}
	float shadowed = 1;
	if(IsLightShadow)
		shadowed = _sampleCubeShadowHPCF(L, vL);
	//if(IsLightShadow)
	//	shadowed = _sampleCubeShadowPCFSwizzle3x3(L, vL);
	float atten = _calcAtten(vL);
	return float4(lightColor * lightCookie * shadowed * atten, 1);
}
technique PointLightTechnique
{
	pass 
	{
		Profile = 10.0;
		PixelShader = PointLightPS;
	}
}


Lights.fxh
cbuffer LightSource : register(b1)
{
	float3 LightPosition;
	float LightRadius;
	float4 LightColor;
	float4x4 LightRotation;
	float2 LightNearFar;
	const bool IsLightCookie;
	const bool IsLightShadow;
};
TextureCube LightCubeCookie : register(t3);
SamplerState LightCubeCookieSampler : register(s1);
TextureCube LightCubeShadowMap : register(t4);
SamplerComparisonState LightCubeShadowComparsionSampler : register(s2);
SamplerState LightCubeShadowPointSampler : register(s3);
float _calcAtten(float3 vL)
{
	float3 lVec = vL / LightRadius;
	return max(0.0, 1.0 - dot(lVec,lVec));
}
float3 _calculationLight(float3 N, float3 L)
{
	return LightColor.xyz * saturate(dot(N, -L)) * LightColor.w;
}
float _vectorToDepth(float3 vec, float n, float f)
{
    float3 AbsVec = abs(vec);
    float LocalZcomp = max(AbsVec.x, max(AbsVec.y, AbsVec.z));
    float NormZComp = (f+n) / (f-n) - (2*f*n)/(f-n)/LocalZcomp;
    return (NormZComp + 1.0) * 0.5;
}
float _sampleCubeShadowHPCF(float3 L, float3 vL)
{
	float sD = _vectorToDepth(vL, LightNearFar.x, LightNearFar.y);
	return LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, float3(L.xy, -L.z), sD).r;
}
float _sampleCubeShadowPCFSwizzle3x3(float3 L, float3 vL)
{
	float sD = _vectorToDepth(vL, LightNearFar.x, LightNearFar.y);
	float3 forward = float3(L.xy, -L.z);
	float3 right = float3( forward.z, -forward.x, forward.y );
	right -= forward * dot( right, forward );
	right = normalize(right);
	float3 up = cross(right, forward );
	float tapoffset = (1.0f / 512.0f);
	right *= tapoffset;
	up *= tapoffset;
	float3 v0;
	v0.x = LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, forward - right - up, sD).r;
	v0.y = LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, forward - up, sD).r;
	v0.z = LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, forward + right - up, sD).r;
	float3 v1;
	v1.x = LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, forward - right, sD).r;
	v1.y = LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, forward, sD).r;
	v1.z = LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, forward + right, sD).r;
	float3 v2;
	v2.x = LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, forward - right + up, sD).r;
	v2.y = LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, forward + up, sD).r;
	v2.z = LightCubeShadowMap.SampleCmpLevelZero(LightCubeShadowComparsionSampler, forward + right + up, sD).r;
	return dot(v0 + v1 + v2, .1111111f);
}
// UE4: https://github.com/EpicGames/UnrealEngine/blob/release/Engine/Shaders/ShadowProjectionCommon.usf
static const float2 DiscSamples5[]=
{ // 5 random points in disc with radius 2.500000
	float2(0.000000, 2.500000),
	float2(2.377641, 0.772542),
	float2(1.469463, -2.022543),
	float2(-1.469463, -2.022542),
	float2(-2.377641, 0.772543),
};
float _sampleCubeShadowPCFDisc5(float3 L, float3 vL)
{
	float3 SideVector = normalize(cross(L, float3(0, 0, 1)));
	float3 UpVector = cross(SideVector, L);
	SideVector *= 1.0 / 512.0;
	UpVector *= 1.0 / 512.0;
	float sD = _vectorToDepth(vL, LightNearFar.x, LightNearFar.y);
	float3 nlV = float3(L.xy, -L.z);
	float totalShadow = 0;
	[UNROLL] for(int i = 0; i < 5; ++i)
	{
			float3 SamplePos = nlV + SideVector * DiscSamples5[i].x + UpVector * DiscSamples5[i].y;
			totalShadow += LightCubeShadowMap.SampleCmpLevelZero(
				LightCubeShadowComparsionSampler, 
				SamplePos, 
				sD);
	}
	totalShadow /= 5;
	return totalShadow;
}


CubeDepthReslover.fxh
cbuffer Params : register(b0)
{
    float4x4 World;
	float4x4 View[6];
	float4x4 Projection;
};
struct VertexInput
{
	float4 Position : SV_POSITION;
	//uint InstanceID : SV_InstanceID;
};
struct VertexOutput
{
	float4 Position : SV_POSITION;   
	//uint InstanceID : SV_InstanceID;
};
struct GeometryOutput
{
	float4 Position : SV_POSITION;   
	uint RTIndex : SV_RenderTargetArrayIndex;
};
VertexOutput DefaultVS(VertexInput input)
{
	VertexOutput output = (VertexOutput)0;
	float4 worldPosition = mul(input.Position, World);
	output.Position = worldPosition;
	//output.InstanceID = input.InstanceID;
	return output;
}
[maxvertexcount(18)]
void DefaultGS( triangle VertexOutput input[3], inout TriangleStream CubeMapStream )
{
	[unroll]
    for( int f = 0; f < 6; ++f )
    {
		{
	        GeometryOutput output = (GeometryOutput)0;
			output.RTIndex = f;
			[unroll]
			for( int v = 0; v < 3; ++v )
			{
				float4 worldPosition = input[v].Position;
				float4 viewPosition = mul(worldPosition, View[f]);
				output.Position = mul(viewPosition, Projection);
				CubeMapStream.Append( output );
			}
			CubeMapStream.RestartStrip();
        }
    }
}
technique CubeDepthResolver
{
	pass DefaultPass
	{
		Profile = 10.0;
		VertexShader = DefaultVS;
		GeometryShader = DefaultGS;
		PixelShader = null;
	}
}



If you have questions or need help, I’ll be happy to help, you can see the contacts in my profile.

Nearest upcoming articles:
  • Implementation of Deferred Rendered Water
  • Physically-Based Rendering Without IBL

PS
Dear reader, if you like to read articles carefully and find a piece of paper or inaccuracy, do not rush to write a comment, but rather write me a personal message, I will definitely say thank you!

Also popular now: