
What's New in Direct3D 11.2

Briefly about the news.
The bulk of the work that has been done with DirectX 11.2 is primarily related to performance and efficiency, and will not directly affect programmers. Your applications will run faster and require less resources. However, a number of new APIs are included in the Direct3D 11.2 API:
- Support for hardware overlays : a dynamic scaling tool with interesting scenarios.
- Compiling and linking HLSL shaders in runtime : a feature that allows you to build shaders at runtime, including for Windows Store applications.
- Buffers mapped to memory : A feature that eliminates the need for additional data copy operations when exchanging data with the GPU
- API for reducing input delays : A mechanism that can significantly reduce the delay time between user input and output to the screen.
- Tile resources : Improving the quality of rendering using texture maps.
Support for hardware overlays.
One of the features of almost any modern graphics accelerator is that the procedure for scaling graphics is a very cheap operation. In this regard, there are a number of scenarios that would be interesting to use if there is a lack of resources or the rendering speed is reduced.

As it is already clear from the picture, the hardware overlay allows rendering to the buffer with a low resolution, and then enlarging this image to the required size and mixing it with additional buffers via an alpha mask. The game can display a 3D scene in the first overlay with reduced quality, but the HUD or other graphic elements of the application can be displayed in high quality.
At the same time, two main scenarios for using hardware overlays are supported - static and dynamic.
Static overlay.
This type of overlay simply accepts the zoom level when initializing the buffer and does not change its values in the future. To initialize, just specify the DXGI_SCALING_STRETCH flag:
DXGI_SWAP_CHAIN_DESC1 swapChainDesc = {0};
swapChainDesc.Width = screenWidth / 1.5f;
swapChainDesc.Height = screenHeight / 1.5f;
swapChainDesc.Scaling = DXGI_SCALING_STRETCH;
...
dxgiFactory->CreateSwapChainForCoreWindow(
m_d3dDevice.Get(),
reinterpret_cast(m_window.Get()),
&swapChainDesc,
nullptr,
&swapChain
);
The applicability of this method is limited to cases in which you already know the zoom level in advance.
Dynamic overlay.
A more interesting option, in which the level of scaling can change on the fly, without reinitializing the buffers (Swapchain). You just need to call the SetSourceSize function before each render:
DXGI_SWAP_CHAIN_DESC1 swapChainDesc = {0};
swapChainDesc.Width = screenWidth;
swapChainDesc.Height = screenHeight;
swapChainDesc.Scaling = DXGI_SCALING_STRETCH;
dxgiFactory->CreateSwapChainForCoreWindow( ... );
...
if (fps_low == true) {
swapChain->SetSourceSize(screenWidth * 0.8f, screenHeight * 0.8f);
}
// рендер.
...
swapChain->Present(1, 0);
Dynamic overlay allows, depending on the current load on hardware resources, to instantly change the quality of the picture without sacrificing FPS. Sometimes even a 10% decrease in the resolution of the final image can speed up the rendering procedures by several times, which will positively affect dynamic loaded scenes. Players will lose the feeling of “brakes” in cases where too many objects are displayed on the screen.
Compilation and linking of shaders.
Dynamic compilation of shaders is a very convenient optimization tool while the application is running. Unfortunately, in Windows 8.0, this feature was not available for Windows Store applications, and developers had to create binary shader blobs in advance. With the release of Windows 8.1, this feature is back for Windows Store apps.
In addition to this, the option to compile shaders 'lib_5_0' appeared, which allows compiling shader computing units and then, during program execution, not compiling shaders, but only compiling them from ready-made libraries. This feature can significantly increase the shader connection time and eliminate the expensive compilation operation during application execution.
Mapped buffers.
In Windows 8.0, communicating with the GPU for compute shaders requires auxiliary buffers. This imposes some costs, and just the same for computational shaders can be expensive.

If you use Windows 8.1 and DirectX 11.2 you have the opportunity to remove two auxiliary operations using the flag CPU_ACCESS. Then the picture will look as follows:

Thus, it is possible to achieve an increase in performance for computing shaders. It should be noted that so far this feature only works for data buffers, but not for textures (Texture1D / 2D / 3D). In any case, the developer has a simple way of checking and working directly or using an auxiliary buffer:
D3D11_FEATURE_DATA_D3D11_OPTIONS1 featureOptions;
m_deviceResources->GetD3DDevice()->CheckFeatureSupport(
D3D11_FEATURE_D3D11_OPTIONS1,
&featureOptions,
sizeof(featureOptions)
);
...
If (featureOptions.MapDefaultBuffers) {
deviceContext->Map(defaultBuffer, ...);
} else {
deviceContext->CopyResource(stagingBuffer, defaultBuffer);
deviceContext->Map(stagingBuffer, ...);
}
API to reduce input delays
The time between the reaction to input and the actual display of the results on the screen is critical for many applications, especially games. If this time is too long, then the player has a feeling of “brakes” and discomfort. Optimization of this time is a painstaking process, but along with the release of DirectX 11.2, programmers have an additional mechanism that greatly facilitates this task. There is a new API IDXGISwapChain2 :: GetFrameLatencyWaitableObject which allows you to get WAIT HANDLE and then use WaitForMultipleObjectEx to wait for the most successful rendering moment:
DXGI_SWAP_CHAIN_DESC1 swapChainDesc = {0};
...
swapChainDesc.Flags = DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT;
dxgiFactory->CreateSwapChainForCoreWindow( ... );
HANDLE frameLatencyWaitableObject = swapChain->GetFrameLatencyWaitableObject();
while (m_windowVisible)
{
WaitForSingleObjectEx(
frameLatencyWaitableObject,
INFINITE,
true
);
Render();
swapChain->Present(1, 0);
}
For example, using this API can result in a reduction in latency by more than half on devices such as Surface, from 46 milliseconds to 20 milliseconds.
Tile resources

Modern games require more and more video memory, including for textures. The quality of the final image directly depends on the quality of the textures and their resolution. One of the methods for optimizing the used video memory is the Direct X 11.2 (Tiled resources) mechanism of tile resources. To understand what it is about, it’s better to watch a three-minute video from the Build plenary report .