ABBYYTeam April 11, 2011 at 11:03

delete, new [] in C ++ and urban legends about their combination

If an array of objects was created using the “new []” code in C ++ code, you need to delete this array with the help of “delete []” and in no case with the help of “delete” (without brackets). A reasonable question: not that?

To this question you can get a wide range of unreasonable answers. For example, “only the first object will be deleted, the rest will leak” or “only the first object will be called.” The following “explanations” usually do not stand up to any serious criticism.

According to the C ++ Standard, the behavior is not defined in this situation. All assumptions are nothing more than popular urban legends. We will analyze in detail why.

We will need a cunning plan with an example that would perplex supporters of urban legends. Here is such a harmless ok:

class Class {
public:
        ~Class()
        {
           printf( "Class::~Class()" );
        }
};
int main()
{
         delete new Class[1];
         return 0;
}

There is only one object in the array. If you believe either of the two legends above, "everything will be fine" - there is nothing to leak and nowhere, destructors will be called exactly as needed.

We go to codepad.org, insert the code into the form, we get the output:

memory clobbered before allocated block
Exited: ExitFailure 127
42 75 67 20 61 73 73 61 73 73 69 6E 20 77
61 6E 74 65 64 20 2D 20 77 77 77 2E 61 62
62 79 79 2E 72 75 2F 76 61 63 61 6E 63 79

MEMORY WHAT ??? What was it?

Second example:

int main()
{
         delete new char[1];
         return 0;
}

Issue:

No errors or program output.

Here, at least in appearance, everything is fine. What's happening? Why it happens? Why does the behavior seem different?

The reason is what happens inside.

When “new Type [count]” is encountered in the code, the program must allocate enough memory to store the specified number of objects. To do this, she uses the function "operator new [] ()". This function allocates memory - usually inside just calling malloc () and checking the return value (if necessary, calling new_handler () and throwing an exception). Then objects are constructed in the allocated memory - the required number of constructors is called. The result of “new Type [count]” is the address of the first element of the array.

When the “delete [] pointer” is found in the code, the program must destroy all the objects in the array by calling destructors for them. For this (and only for this) she needs to know the number of elements.

An important point: in the “new Type [count]” construct, the number of elements was specified explicitly, and “delete []” receives only the address of the first element.

How does the program know the number of elements? Since she only has the address of the first element, she must calculate the length of the array based on this address alone. How this is done depends on the implementation; the following method is usually used.

When executing “new Type [count]”, the program allocates so much memory that it fits not only objects, but also an unsigned integer (usually of type size_t), indicating the number of objects. This number is written to the beginning of the selected area, then objects are placed. The compiler, when compiling “new Type [count]”, inserts code into the program that implements these whistles.

So, when executing “new Type [count]”, the program allocates a bit more memory, writes the number of elements to the beginning of the allocated memory block, calls the constructors, and returns the address of the first element to the calling code. The address of the first element will differ from the address returned by the memory allocation function “operator new [] ()”.

When executing “delete []”, the program takes the address of the first element passed to “delete []”, determines the address of the beginning of the block (subtracting exactly the same amount as was added when executing “new []”), reads the number of elements from the beginning of the block, calls the required number of destructors, then - calls the function "operator delete [] ()", passing it the address of the start of the block.

In both cases, the calling code does not work with the address that was returned by the memory allocation function and later passed to the memory deallocation function.

Now back to the first example. When “delete” is executed (without parentheses), the calling code has no idea that you need to play a sequence with an address offset. Most likely, it calls the destructor of a single object, then passes to the function “operator delete ()” an address that differs from the previously returned function “operator new [] ()”.

What is going to happen? In this implementation, the program crashes. Since the Standard says that the behavior is undefined, this is acceptable.

For comparison, the program in Visual C ++ 9 by default emits error messages in the debug version, but it seems to work fine (at least the _heapchk () function returns the code _HEAP_OK, _CrtDumpMemoryLeaks () does not produce any messages). This is also acceptable.

Why is the behavior different in the second example? Most likely, the compiler took into account that the char type has a trivial destructor, i.e. you don’t need to do anything to destroy objects, but simply free up memory, so you don’t need to store the number of elements, which means that you can immediately return the same address to the calling code that the “operator new [] ()” function returned. No address offsets — exactly the same as when calling “new” (without parentheses). This compiler behavior is fully consistent with the Standard.

Something is missing…

Have you already noticed that higher up in the text there are functions for allocating and freeing memory, either with square brackets or without? These are not typos - they are two different pairs of functions, they can be implemented in completely different ways. Even when the compiler tries to save money, it always calls the operator new [] () function when it sees the new type [count] in the code, and it always calls the operator new () function when it sees the new Type in the code .

Usually, the implementations of the functions “operator new ()” and “operator new [] ()” are the same (both call malloc ()), but they can be replaced — you can define your own, and you can replace either one pair or both, you can also replace these functions separately for any selected class. The standard allows you to do this as much as you like (of course, you need to adequately replace the pair function to free memory).

This provides rich opportunities for indefinite behavior. If your code causes the memory to be freed up by the “wrong” function, this can lead to any consequences, in particular, damage to the heap, memory corruption, or immediate program crash. In the first example, the implementation of the “operator delete ()” function could not dispose of the address passed to it and the program crashed.

The most enjoyable part of this story is that you can never argue that using “delete” instead of “delete []” (and vice versa, too) leads to some specific result. The standard says that the behavior is undefined. Even a compiler that is fully compliant with the Standard is not required to give you a program with any adequate behavior. The behavior of the program to which you will refer in the comments and disputes is only observable - anything can happen inside. You only state the behavior you observe.

In the second example, everything looks good ... on this implementation. In another implementation, the functions “operator new ()” and “operator new [] ()” can, for example, be implemented on different heaps (Windows allows you to create more than one heap per process). What happens when you try to return a block to the “wrong” heap?

By the way, counting on some specific behavior in this situation, you automatically get an intolerable code. Even if “everything works” on the current implementation, when switching to another compiler, when changing the compiler version, or even when updating C ++ runtime, you may be extremely unpleasantly surprised.

How to be Resign yourself, do not confuse “delete” and “delete []”, and most importantly, do not waste your time in “plausible” explanations of what will happen if you mix them up. While you argue, other developers will do something useful, and for you will increase the likelihood of earning a Darwin Award.

Dmitry Meshcheryakov
Department of Products for Developers

Tags:

delete, new [] in C ++ and urban legends about their combination

Also popular now: