Modern I / O devices are faster than processors. Article Overview
I want to talk about the article "I / O Is Faster Than the CPU - Let's Partition Resources
and Eliminate (Most) OS Abstractions" , published on the personal page of one of the developers of ScyllaDB, Pekka Enberg. I learned about it from the video .
The authors of this article were supposed to make a presentation at the HOTOS17 (Hot Topics in Operating Systems) workshop on May 12-15, 2019. As far as I understand, they are discussing developments there in the early stages of their lives.
My article is news in order to arouse inquisitive minds to think about this topic and think in the comments.
I / O on servers with fast programmable network cards and non-volatile memory is approaching the speed of volatile RAM, and the speed of one processor core remains in place. Applications cannot take advantage of modern hardware because forced to use interfaces built on abstractions involving slow I / O systems.
The authors propose their own OS structure called parakernel , it eliminates most abstractions of the OS and provides an interface for applications so that they can use the full potential of the equipment. Parakernel facilitates application-level concurrency by securely sharing resources and multiplexing shared resources.
The architecture of modern operating systems was invented when the I / O speed was much lower, and applications were waiting for I / O. Currently, I / O devices can easily saturate the processor.
According to the authors, modern network stacks do too much work per package. In addition, operating systems typically implement the POSIX socket API, which has the high cost of context switching and CPU cache pollution.
A modern 40Gbps network card can receive a packet commensurate with the cache line every 5 ns, and the delay in accessing the LLC (last level cache) processors is about 15 ns.
For example, Linux developed the POSIX AIO interface, which should provide a simple and efficient asynchronous I / O interface. The implementation, support and application of such an interface with the preservation of POSIX semantics turned out to be very difficult and it was abandoned in favor of the new io_uring .
What is the proposed solution
The new OS structure, which the authors call parakernel , is designed to simplify task parallelization. Resources are allocated to applications and they have full control over them, resources that cannot be shared are multiplexed by the kernel.
Resource sharing in multi-core systems requires synchronization between processor cores, which prevents concurrency at the application level. This obstacle can be reduced by dividing resources between processor cores.
Some operating system abstractions limit I / O performance. The authors present an OS structure that shares shared resources and multiplexes shared resources. Parakernel simplifies application-level concurrency, and complements the thread-to-core design .
The prototype parakernel is written in Rust and is currently under development. In the article I did not see the name of the operating system, but I found other material from one of the authors of the Manticore Operating System and I conclude that here is the repository of this development.
What's in the rest of the world
As it turns out, the manufacturers of processors are not sleeping and are also trying to solve the problem of the slow layer between their products and consumers. So many people do not like the bottleneck of performance in the form of the kernel of the operating system.
Interesting innovations from Intel, more about which can be found in this article . Here is an excerpt from it:
- Intel Volume Management Device (Intel VMD) - allows you to work with NVM Express drives directly, "giving" the device directly to the storage system. As a result, a full-fledged hot-swappable SSD, status indication and the use of Intel VROC technology became possible.
- Intel Virtual RAID on CPU (Intel VROC). Allows you to create RAID from NVMe drives using the processor, with it you can refuse software solutions or additional adapters to create arrays of high-speed PCIe SSD.
- Internet Wide-Area RDMA Protocol (iWARP). The RDMA extension is now supported by Intel X722 integrated network adapters, because the processor supports four 10-gigabit (or gigabit) Ethernet ports. Let me remind you that RDMA gains access to data over the network directly from memory, bypassing the kernel and the operating system.
It is always very interesting to learn about new concepts in already rooted systems.
Please write about bugs and necessary additions.
UPD : This article is being amended by the community.
Thanks for the help:
A line of advertising for the Zinc Prod podcast in which we will discuss this article on a topic-by-topic basis.