Windows real-time memory management

    Recently, Raymond Chen completed a series of posts, begun a year and a half ago, devoted to managing virtual memory without any support from the processor: Windows up to version 3.0 inclusive supported real mode 8086. In this mode, the address is translated from the "virtual" (visible program) to the physical (issued on the system bus) is carried out by the ingenious addition of the segment and the offset - no “access check”, no “invalid addresses”. All addresses are available to everyone. At the same time, several programs could work simultaneously in Windows and not interfere with each other; Windows could move their segments in memory, unload unused ones, and load them back as necessary, possibly to other addresses.

    (Interestingly, the ever-present holivorashchiki "it was a graphical shell, not an operating system" in the know about these extraordinary abilities?)

    And how did she manage to?

    Data management


    There was no swap in real-time Windows. Immutable data (for example, resources) were simply deleted from memory and, if necessary, loaded again from the executable file. Variable data could not be unloaded, but could (like any other data) be moved: the application for working with memory blocks does not use addresses, but handles; and for the time of accessing the data, it “fixes” the block, getting its address, and then “releases” it so that Windows can move it if necessary. Something similar appeared a dozen years later in .NET, already called pinning.

    The functions GlobalLock/ GlobalUnlockand LockResource/ FreeResourcewere preserved in Win32API for compatibility with those ancient times, although in Win32 memory blocks (including resources) never moved.

    Functions LockSegmentandUnlockSegment(fix / free memory by address, not by handle) there was some time in the documentation marked “obsolete, do not use”, but now they don’t even have any memory left.

    For those who need to fix the memory for a long period of time, there was another function GlobalWire- “so that the block does not stick out in the middle of the address space, move it to the lower edge of the memory and fix it there”; it corresponded GlobalUnwire, completely equivalent GlobalUnlock. This pair of functions is, surprisingly, still alive in kernel32.dll, although they have already been removed from the documentation. Now they just perevyzyvayut GlobalLock/ GlobalUnlock.

    In protected mode Windows functionGlobalLockreplaced with a "stub": now Windows can shuffle memory blocks without changing their "virtual address" visible to the application (selector: offset) - which means that the application no longer needs to fix non-uploadable objects. In other words, pinning now prevents the block from unloading, but does not prevent it (invisible to the application) from moving. Therefore, to fix the data “for real” in the physical memory, for those who need just that (for example, to work with external devices), a couple of GlobalFix/ was added GlobalUnfix. Just like GlobalWire/ GlobalUnwire, in Win32 these functions have become useless; and they are likewise removed from the documentation, although they remain in kernel32.dll, and re-call GlobalLock/ GlobalUnlock.

    Code management


    The trickiest starts here. Blocks of code - as well as immutable data - were deleted from memory, and then loaded from an executable file. But how did Windows ensure that programs did not try to call functions in unloaded blocks? One could access functions through handles, and call a hypothetical one before each function call LockFunction; but remember that many functions twist the “message loop”, for example, show a window or execute DDE commands, and you could unload them too, because in fact, their code is not needed at this time. However, when using the “function handles”, the function segment will not be freed until it returns control to the calling function.

    Instead, Windows begins by assuming that you can unload anya function that is not running right now; and since the Windows memory manager code is being executed right now, you can unload any function at all . Links to it can remain either in the program code or on the stack if this function did not manage to return before the time of unloading.

    So Windows goes through the stacks of all running tasks (the so-called execution contexts in Windows, until the processes and threads were separated), finds the return addresses leading inside the unloaded segments, and replaces them with the reload thunks addresses - “stubs” that load the desired segment from the executable file, and transfer control inside it, as if nothing had happened.

    So that Windows can walk on the stack, programs must support it inthe correct format : no FPO, the stack frame must begin with BP- a pointer to the frame of the calling function. (Since the stack consists of 16-bit words, the value is BPalways even.) In addition, Windows must distinguish between intra-segment (“close”) and inter-segment (“far”) calls in the stack, and it can ignore close calls - they are for sure Do not lead to the unloaded segment. Therefore, they decided that an odd value BPon the stack means a distant call, i.e. each distant function should begin with a prologue INC BP; PUSH BP; MOV BP,SPand end with an epilogue POP BP; DEC BP; RETF(Actually, the prologue and epilogue were more complicated , but this isn’t about that.) We

    figured out the links from the stack, but what about the links from other code segments? Of course, Windows cannot go through the entire memory, find all calls to the unloaded functions, and replace them all with reload thunks. Instead, intersegment calls are compiled taking into account the fact that the called function may not be in memory, and in fact they call the “stub” in the module input table . This stub consists of an instruction int 3fh, and three more service bytes indicating where to look for the function. The handler int 3fhfinds these service bytes at its return address; defines the desired segment; loads it into memory if it is not already loaded; and finally overwrites the stub in the input table with an absolute transition jmp xxxx:yyyyto the function body, so that subsequent calls to the same function are slowed down by only one inter-segment transition, without interruption.

    Now, when Windows unloads the function, it is enough for it to replace the inserted transition back to the stub in the module input table int 3fh. The system does not need to search for all calls to the unloaded function - they were all found even at compilation! The module’s “entry table” contains all the distant functions that the compiler knows about the existence of intersegment calls (this includes, in particular, exported functions and WinMain), as well as all distant functions that were passed somewhere by pointer, which means they could be called from anywhere, even outside the program code (this includes WndProc, EnumFontFamProcand other callback-function).

    Instead of pointers to distant functions, a pointer to a stub is passed everywhere; which means the addresses obtained fromGetWindowLong(GWL_WNDPROC)and similar calls also indicate a stub, not a function body. It’s even GetProcAddresstricky, and instead of the function address it returns the address of its stub in the DLL entry table. (In Win32, only the DLL retained the analogue of the “input table” under the name of “export table”.) Static intermodule calls (calls of functions imported from DLLs) resolve using the same one GetProcAddress, and therefore, they end up stubbing in exactly the same way. In any case, it turns out that when unloading the function, it is enough to fix the stub, and you do not need to touch the calling code itself.

    All this wisdom with relocatable code segments came to Windows “by inheritance” from an overlay linker for DOS. Like, at first the whole scheme is exactly in this form- appeared in the Zortech C compiler, and then in Microsoft C. When the executable file format for Windows was created, the existing overlay format for DOS was taken as the basis.

    But how does Windows choose which segment to unload? Choosing at random would be risky - we can get into the code that has just been executed, and which will have to be downloaded right back. Therefore, Windows uses something like an “accessed bit” for code segments: knowing that all intersegment calls to a function pass through its stub, they decided to insert an instruction (before int 3fhor replacing it jmp) theresar byte ptr cs:[xxx], 1, which resets the byte counter from 1 to 0 each time the function is called. This instruction just takes five bytes: you can save the existing executable file format, and load the stubs int 3fhthrough one, interspersed with a counter instruction.

    Counter values ​​for all code segments are initialized to 1, and every 250ms, Windows bypasses all modules, collects updated values, and reorders code segments in its LRU list. Calls to data segments can be tracked without any tricks: all such calls are already marked by a clear callGlobalLockor similar features. So when it comes time to unload a segment to free up memory - Windows will try to unload the segment that has not been accessed the longest: either a code segment whose counter has not been reset to 0 for the longest time, or a data segment that has not been lasted the longest was fixed.

    Windows advertisements 1.0-2.1 taken on GUIdebook

    Also popular now: