Features of the cache in relation to realtime on x86

    imageIn continuation of posts on the use of iron with x86 architecture in real-time systems. There I briefly described how x86 meets realtime requirements, and what prevents it.

    A slight lyrical digression. Real-time systems are one of the least known engines of computer progress. For example, the first laptop computer was created thanks to them. Now for some reason, it is believed that the first serial laptop was Osborn. In fact, the device in the picture above was created in Siemens as a means of controlling and programming industrial automation two years before Osborn. Portable computers of this family (Siemens Simatic) are produced now, although, of course, iron has changed many times.

    But let's get down to business. In this topic, I will dwell on one of the factors that interferes with the predictability of the execution time of realtime code. Under the cut will be not long, but boring text.

    Efficient use of cacheuseful for most workloads, not just for realtime. A very real life example - on one core code runs that wakes up once every millisecond, polls sensors, executes PLC code, controls some piece of hardware. At the same time, the GUI code runs on the other core, which displays all this on the monitor and allows the operator to intervene sometimes. Modern GUI is quite “thick”, and with pleasure uses all available cache, which, by the way, is common with the first core. So when the realtime code wakes up, it will not find any of its data in the general cache - you will have to drag it from memory again, spending tens or hundreds of microseconds.

    In general, the x86 architecture does not provide many features for programmatically managing the cache. I will list all these methods, just enough fingers on one hand:

    1.PREFETCHx - pull a line from memory to cache earlier than
    2. CFLUSH, WBINV (Very “evil” command, by the way) - “reset” the line or the entire cache
    3. non temporal COVNTDQ / MOVNTDQA / MOVNTPS, ordering control (L / S / MFENCE) - cache management of some data operations
    4. Any indirect methods. Here I mean clever ways of storing and accessing data, for example, more friendly to prefetcherami
    5. Direct write to the cache through DMA. This not very popular feature is relevant for peripheral manufacturers.
    You can disable the cache altogether, but it is somehow too extreme.

    As you can see, among these methods there is nothing similar, for example, to cache lockdown - a feature that is in ARM, or similar MIPS features. To reserve a cache slice can be useful for realtime code developers, for example, in the case described above. It is possible that someday something similar will appear in x86, although this contradicts the ideology of transparent memory operation. But for now, you can use the palliative.

    image

    The picture shows that the physical address and the address in the cache have 5 bits in common. Well, it happened - just lucky. This allows you to "color" the physical memory page by page in 32 colors. What for? The kernel of the OS, when creating the address space for the application, can then give it only one color of virtual memory. If the main consumers of the cache are given memory of different colors, then their data will not be able to squeeze each other out of the cache.

    In the example above, it is obvious that if you allocate different colors to both tasks, the problem will be solved. GUI will get less cache, but it is very likely that the operator will not notice any brakes. There is, of course, one big drawback - we can only talk about virtual memory, and if something in the kernel also wants a lot of cache, then nothing can stop it.

    The same method is used in the Windows and FreeBSD kernels to more evenly distribute memory across networks in the associative cache. Given the low associativity of the cache, this is important enough so that no piece of it is lost in vain. To use this approach, nothing is required from the programmer - everything is done by the OS. But no production OS currently uses cache coloring to separate process data, there are only unofficial patches.

    Well, by the way, I wish all realtime x86 developers to remember to disable C-states and Speedstep.

    By the way, if anyone knows the Russian replacement for any Englishisms that I used in the topic - please let me know in the comments, and I will correct it in the text.

    Also popular now: