ABBYYTeam August 29, 2011 at 15:07

Overwriting memory - why?

In the bowels of the Win32 API there is a SecureZeroMemory function with a very concise description, from which it follows that this function overwrites the memory area with zeros and is designed in such a way that the compiler never removes the call to this function when optimizing the code. It also says that you should overwrite the memory previously used to store passwords and cryptographic keys with this function.

One question remains - why is this? You can find lengthy arguments about the risk of writing program memory to the swap file, hibernate file or crash dump, where an attacker could find it. This is similar to paranoia - not every attacker has the opportunity to lay a hand on these files.

In fact, there are much more opportunities to access data that the program forgot to overwrite - sometimes you do not even need to have access to the machine. Further we will consider an example, and everyone will decide how justified paranoia is.

All examples will be in pseudo-code suspiciously similar to C ++. There will be many letters and not very clean code, then it will become clear that in a cleaner code the situation is not much better.

So. In the far-distant function, we get the encryption key, password or credit card number (hereinafter - just a secret), use it and do not overwrite it:

{
    const int secretLength = 1024;
    WCHAR secret[secretLength] = {};
    obtainSecret( secret, secretLength );
    processWithSecret( what, secret, secretLength );
}

In another function that is completely unrelated to the previous one, our program instance asks for a file with some name from another instance. For this, RPC is used - an ancient technology like dinosaurs, present on many platforms and widely used by Windows to implement interprocess and machine interaction.

Typically, to use RPC, you need to write an IDL interface description. It will describe a method of something like this:

//MAX_FILE_PATH == 1024
error_status_t rpcRetrieveFile( [in] const WCHAR fileName[MAX_FILE_PATH], [out] BYTE_PIPE filePipe );

here the second parameter is of a special type, which makes it possible to transmit data streams of arbitrary length. The first parameter is an array of characters under the file name.

This description is compiled by the MIDL compiler, resulting in a header file (.h) with the function

error_status_t rpcRetrieveFile ( handle_t IDL_handle, const WCHAR fileName[1024], BYTE_PIPE filePipe);

here the MIDL added a utility parameter, and the second and third parameters are the same as in the previous description.

We call this function:

void retrieveFile( handle_t binding )
{
      WCHAR remoteFileName[MAX_FILE_PATH];
      retrieveFileName( remoteFileName, MAX_FILE_PATH );
      CBytePipeImplementation pipe;
      rpcRetrieveFile( binding, remoteFileName, pipe );           
}

Everything is fine - retrieveFileName () receives a string with a length of not more than MAX_FILE_PATH − 1, terminated by a null character (you did not forget the null character), the called party receives the string and works with it - receives the full path to the file, opens it and transfers data from it.

Everyone is full of optimism, several releases of the product are done with this code, but no one has noticed the elephant yet. Elephant here. From a C ++ perspective, a function parameter

const WCHAR fileName[1024]

this is not an array, but a pointer to the first element of the array. The rpcRetrieveFile () function is just a layer that is generated by the same MIDL. It packs all its parameters and always calls the same WinAPI NdrClientCall2 () function, the meaning of which is “Windows, please make an RPC call with these parameters,” and passes the parameters to the list of NdrClientCall2 () functions. One of the first parameters is the format string generated by the MIDL as described in the IDL. Very similar to the good old printf ().

NdrClientCall2 () carefully looks at the received format string and packs the parameters for transmission to the other side (this is called marshalling). Next to each parameter is its type - each parameter is packaged depending on the type. In our case, for the fileName parameter, the address of the first element of the array is specified and the type is “an array of 1024 elements of type WCHAR”.

Now in the code we meet two calls in a row:

processWithSecret( whatever );
retrieveFile( binding );

The processWithSecret () function eats up 2 kilobytes for storing a secret on the stack, and forgets about them at the end. Next, the retrieveFile () function is called, it retrieves the file name with a length of 18 characters (18 characters + the final zero - only 19, i.e. 38 bytes). The file name is again stored on the stack and most likely it will be exactly the same memory area that was used secretly in the first function.

Then a remote call takes place and the packaging function faithfully packs the entire array (not 38 bytes, but 2048) into a packet and this packet is then transmitted over the network.

EXTREMELY UNEXPECTED The

secret is transmitted over the network. The program did not even plan to ever transmit the secret over the network, but it is being transmitted. Such a defect can be much more convenient to “use” than even viewing the page file. Who is paranoid now?

The example above looks rather complicated. Here is a similar code that can be tried on codepad.org

const int bufferSize = 32;
void first()
{
    char buffer[bufferSize];
    memset( buffer, 'A', sizeof( buffer ) );
}
void second()
{
    char buffer[bufferSize];
    memset( buffer, 'B', bufferSize / 2 );
    printf( "%s", buffer );
}
int main()
{
   first();
   second();
}

It has vague behavior. At the time of writing the post, the result of the work is a string of 16 characters 'B' and 16 characters 'A'.

Now is the time to wave pitchforks and torches and angry exclamations that no one uses ordinary arrays in their minds, that you need to use std :: vector, std :: string and the Universal Universal class, which work correctly with memory, and holy wars on no less than 9 thousand comments.

In fact, this would not help here - the packaging function in the RPC bowels would still read more data than the calling code wrote there. As a result, data at the nearest addresses would be read, or (in some cases) a malfunction would occur if the memory were accessed incorrectly. These nearest addresses could again contain data that could not be transmitted over the network.

Who is to blame here? As usual, the developer is to blame - he misunderstood how the rpcRetrieveFile () function works with the received parameters. The result is undefined behavior, which in this case leads to uncontrolled data transmission over the network. This can be fixed either by changing the RPC interface and editing the code on both sides, or by using an array of a sufficiently large size and completely rewriting it before copying the parameter to it.

SecureZeroMemory () would help in this situation - if the first function rewritten the secret before completion, the error in the second one would at least lead to the transfer of the rewritten array. It's harder to get a Darwin Award.

Dmitry Meshcheryakov,
product department for developers

Tags:

Overwriting memory - why?

Also popular now: