Intel Gen11 GPU architecture and discrete graphics card from Intel
The entry-level discrete graphics card Intel Graphics Xe, the official announcement took place on March 20 at the GDC 2019 game conference
. Intel published documentation on Gen11 GPUs with a description of how they will differ from the previous generation. It is expected that Intel Gen11 architecture will become the basis for the future architecture of the discrete Xe video card, so the technologies described here can be considered as a preview of at least some of the functions that are implemented in these video cards. So far, Intel has not said anything about future video cards, it only showed a few photos (or renderings). Architecture of Intel Core processor, system on a chip (SoC) and ring system of internal connections (Ring Interconnect)
Historically, Intel mid-range graphics processors GT2 for desktop computers and some mobile chips were inferior in performance to AMD chips. In such comparisons, Intel has historically gained an advantage with a more powerful processor compared to APUs derived from the AMD Bulldozer microarchitecture. Now the situation has changed. Ryzen has a much more efficient processor core, and AMD Ryzen mobile processors are much more competitive with Intel. Therefore, the latter needs to do something, including solve the problem with GPU performance.
Detailed Gen11 Flowchart
Based on the technical documentation, it’s difficult to judge the performance of Gen11. But some experts considerthat Intel will be able to compete with AMD much more effectively. At least more effective than ever before.
The new Intel GT2 architecture provides 64 EU execution units, compared to 24 units in Skylake class processors. This significant expansion of resources on the chip should improve overall performance compared to the previous generation. The table below shows the comparative characteristics of the graphics subsystems Gen9 and Gen11.
Key metrics Gen9 and Gen11
Based on the technical characteristics, the computational performance in Gen11 will grow by about 2.67 times, as well as the throughput for textures (texture sampling). The bandwidth of raster operations units (ROPs) has doubled, as has the number of high-Z tests per clock.
The L3 cache has quadrupled, and the GPU's write throughput has doubled to 64 bytes per clock. Memory bandwidth while using DDR4 should remain the same, but LPDDR4 support theoretically allows higher clock speeds.
The last level cache is shared between the GPU and the CPU to reduce data traffic. Video decoder blocks are improved to reduce bitrate. They allow the simultaneous decoding of multiple streams of 4K and 8K. Added support for adaptive synchronization and improved decoding of HD-video.
The GPU now has shared local memory, which does not block access to the L3 cache when reading. Intel claims that this reduces latency and improves the efficiency of atomic operations.
The memory hierarchy at the SoC chip level and its maximum theoretical bandwidth
Intel claims to have significantly improved the overall memory bandwidth in Gen11.
The documentation describes two new technologies that Intel implemented in the graphics accelerator:
- coarse pixel shading (Coarse Pixel Shading, CPS);
- shading by position (Position Only SHading, POSH).
Coarse pixel shading reduces the load on the GPU, reducing the number of color samples that are used to render the image. The screenshot illustrates that CPS has almost no effect on rendering quality.
Shot from the game Citadel 1 in the resolution of 2560 × 1440 (pixel rate 1 × 1 on the left and 2 × 2 on the right). Although coarse pixel shading reduces the number of shader calls, there is virtually no noticeable difference on a high pixel density display. For comparison, a scaled image without anti-aliasing is also shown, at a resolution of 1280 × 720.
Reducing the number of calls of the pixel shader saves energy and improves performance, that is, the frame rate, by 20-40%.
In this image, objects in red frames are identified as being quite remote from the camera and of little importance for the overall image quality, so the detail can be reduced without significantly affecting image quality with a subsequent increase in the frame rate.
The POSH conveyor runs the position shader in parallel with the main application, which usually allows generating the result is much faster, the documentation says. This is part of the Position Only Tile-Based Rendering (PTBR) rendering system.
In general, Gen11 will be a significant update for Intel processors. The first two generations of AMD Ryzen Mobile competed with Skylake's weak graphics. The third generation Ryzen Mobile APU, whenever it comes out, will have to compete with a much more powerful Intel chip, writesExtremeTech Edition .