# MIT course "Computer Systems Security". Lecture 3: "Buffer overflow: exploits and protection", part 1

Original author: Nikolai Zeldovich, James Mykens
• Transfer
• Tutorial

### Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems." Nikolai Zeldovich, James Mykens. year 2014

Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.

Lecture 1: “Introduction: threat models” Part 1 / Part 2 / Part 3
Lecture 2: “Control of hacker attacks” Part 1 / Part 2 / Part 3
Lecture 3: “Buffer overflow: exploits and protection” Part 1 /Part 2 / Part 3

Welcome to the lecture on exploits for buffer overflow. Today we end the discussion on Baggy bounds and then move on to other methods of buffer overflow protection.

Next we will talk about the print materials of today's lecture, which are devoted to Blind Return Oriented Programming (BROP) - blind back-oriented programming. This is a technique of exploits that can be carried out even if the attacker does not possess the target binary code. These exploits are aimed at destroying canaries in 64-bit stacks. So, if you were like me when I first read these materials, you should have felt like watching a movie by Christopher Nolan. It was just a brain explosion!

We are going to consider how these gadgets work properly. Therefore, I hope that by the end of the lecture you will be able to understand all these high technologies described in the lecture materials. But first, as I said, we’ll end the discussion with Baggy bounds . Consider a very simple example.

Suppose we are going to assign a pointer p and allocate for it a size of 44 bytes. Suppose also that the slot size is 16 bytes.

What happens when we assign the malloc function ? You already know that in this case the Baggy bounds system will try to supplement this distribution by logarithm 2n. So for our pointer 44 bytes in size 64 bytes of memory will be allocated. But the slot size is 16 bytes, so we will have 64/16 = 4 tables of 16-byte boundaries. Each of these entries will be placed in the size distribution log.

Next, we assign another pointer char * q = p + 60 . We see that this value is out of bounds because the size of p is 44 bytes, and here it is 60 bytes. But Baggy bounds works in such a way that in this case nothing terrible will happen, although the programmer should not have done so.

Now let's assume that the next thing we do is assign another pointer that will be equal to char * r = q + 16. Now it will actually cause an error, because the offset size will be 60 + 16 = 76, which is 12 bytes more than the size of 4 slots (4x16 = 64 bytes), which the Baggy bounds system has allocated . And this excess is indeed more than half the slot.

If you remember, in this case, the Baggy bounds system will immediately respond to a critical synchronization error, which will cause the program to crash and actually stop it.

So let's imagine that we have only two lines:

char * p = malloc (44)
char * q = p + 60

And there is no third line with a code. Instead, we do this:

char * s = q + 8

In this case, the pointer will have a value of 60 + 8 = 68 bits, which is 4 bytes more than the boundaries assigned by Baggy bounds . In fact, this will not cause a critical error, although the value goes beyond the boundaries. What we did here is set for a high order bit pointer. So if someone subsequently tries to dereference it, it will lead to a critical error at this point.

And the last thing we do is assign another pointer:

char * t = s - 32

In fact, we did this - we returned the pointer to the border. So, if s was originally out of bounds, now we have returned it to the initially allocated volume, which was created for the pointer. Therefore, now twill not have in its composition bits of high order, and it can be safely dereferences.

Audience: How does a program know that r has an excess greater than half the stack?

Professor Mickens: notice that when we created r , we received an instrumental code that will work in all these pointer operations. So we can tell where q will be located , and we know that it is within the framework of Baggy bounds . Therefore, when we do this q + 16 operation , the Baggy bounds tools know where the initial value came from. And then, if an offset of this source size occurs , Baggy boundsit will easily determine that the magnitude of this offset is greater than ½ the size of the slot.

In principle, when you perform operations with pointers, you should see whether they have exceeded the selected size or not. At some point, you have a pointer located within the boundaries of Baggy bounds , and then something happens that causes it to go beyond the boundaries. So, just when this happens, we find out that some chicheness has come out of our code.

Hope this is understandable. It was a very brief overview of the homework, but I hope you can easily understand it.

So, we have a pointer that looks like this:

char * p = malloc (256) , then we will add a pointer char * q = p + 256, after which we will try to dereference this pointer.

So what happens when this happens? Well, note that 256 is a 2n sequence , so it will be within Baggy bounds . Therefore, when we add another 256 bits, this means that we make one more pass to the end of the Baggy bounds . As in the previous example, this line is good enough, but it causes the high-order bit to be set for q . So when we try to dereference it, everything will explode and have to call our insurance agent. It's clear?

From these 2 examples you can understand how the Baggy bounds system works. As I mentioned in the last lecture, you really do not need to instrument every pointer operation if you can use static code analysis to find out that a certain set of pointer operations is safe. I will postpone further discussion of some static analyzes, but suffice it to say that you do not always need to perform these mathematical operations, we have already tested this before.

Another question that is mentioned in Piazza is how to ensure the compatibility of the Baggy bounds system with previous, non-tool libraries. The idea is that when baggy boundsinitialize the boundary tables, they establish that all records must be within 31 bits. Therefore, when we read the table of boundaries, each entry in it represents a value of the form 2n + 31 . Thus, by initializing initial bounds of 31 bits, we assume that each pointer will have the maximum possible size of 2n + 31 . Let me give you a very simple example that will clarify this matter.

Suppose we have the memory space that we use for the heap. This memory space consists of two components. Above we have a heap that was allocated by means of a non-tool code, and below we have a heap that was allocated with an instrumental code. So what will Baggy bounds do ?? As you remember, this system has the concept of a slot, whose size is 16 bits. Therefore, the table of boundaries will consist of 2 sections, initiated from 31 bits.

However, when running the tool code, it will actually use the Baggy bounds algorithm to set appropriate values ​​for this row of the table.

When the pointer comes from the top of the memory space, it is always set to the maximum possible limits 2n + 31 . This means that Baggy bounds never thinks that a pointer operation that “came” from a non-instrumental library can go beyond the boundaries.

The idea is that, in the instrumental code, we will always perform these comparisons for pointers, but if we set the boundaries of the pointer record for a non-tool code like 2n + 31 , then we will never have a dereference error. That is, we have a good interaction between the Baggy bounds code entries and the non-instrumental entries of previous libraries.

This means that we have this system, which is good, because it does not cause the program to crash when using non-tool libraries, but it also has a problem. The problem is that we can never define the boundaries of pointers that are generated by non-tool code. Because we will never set a high order bit, for example, when this pointer gets too much or too little space. Thus, we actually cannot ensure the safety of memory for operations that take place when using non-tool code. You also cannot determine when we pass a pointer, which has gone beyond the limits of dimensions, from the tool code to the non-tool code. In this case, something unimaginable can happen. If you have such a pointer pulled from the tool code,

We know that if we simply put this code in the instrumental code, we can clear this flag at some points when it returns to its bounds. But if we just give this huge address to the non-tool code, then it can do something unimaginable. It can even return this pointer back to the border, but we will never have the opportunity to clear this bit of high order. So we still have problems even using the scheme shown here.

Audience: If we have an instrumental code for allocating memory, does it use the same malloc function that the attribute code uses?

Professor:This is a slightly delicate question. If we consider the case present here, then this is strictly observed, since we have two memory areas, each of which obeys the rules established for it. But in principle, this will depend on the code that the selected programming language uses. Imagine that in C ++, for example, you can assign your own determinant. So it depends on certain details of the code.

Audience: how can the determinant check if the 31-bit limit is set or not?

Professor:at the lower levels, the distribution algorithms work in such a way that when an unknown system is called, the pointer moves upwards. So if you have several allocators, then they all try to allocate memory, each of them has its own piece of memory, which they reserve for themselves, basically, correctly. So in real life it may be more fragmented than at a high level.

So, everything that we considered above related to the work of Baggy bounds in 32-bit systems. Consider what happens when using 64-bit systems. In such systems, you can actually get rid of the table of boundaries, because we can store some information about the boundaries in the pointer itself.

Consider what a regular pointer looks like in the Baggy bounds system. It consists of 3 parts. For the first, zero part, 21 bits are allocated, 5 bits are allocated for the size, this is the main log size, and 38 more are the normal address bits.

The reason that this does not limit the size of the address of the program you are using is that the majority of high-order bits of the operating system and / or equipment located in the first 2 parts of the pointer do not allow the application for various reasons. So, as it turned out, we do not greatly reduce the number of applications used in the system. This is what a regular pointer looks like.

What happens when we have only one of these pointers? Well, in a 32-bit system, all we can do is simply set a high order bit and hope that this thing will never get more than half the size of the slot. But now that we have all this additional address space, you can put the offset beyond the boundaries of OOB (out of bound) directly into this pointer. So we can do something like what is shown in the figure, dividing the pointer into 4 parts and redistributing its size.

Thus, we can get 13 bits for the offset boundaries, that is, write how far this OOB pointer will be from where it should be. Then again you can set the actual size of the object specified here, equal to 5, and the remainder of the zero part, which will now be 21-13 = 8 bits. And then our address part of 38 bits follows. In this example, you see the benefits of using 64-bit systems.

Note that here we have the usual size for a regular pointer, in both cases this size is 64 bits, and its description is of an elementary nature. And this is good, because using “thick” pointers we would need a lot of words to describe them.
Also note that it is easy to apply non-tool code here, because it works and uses the same size as regular pointers. We can put these things in a struct , for example, and the size of this struct will remain the same. So it is very good when we have the opportunity to work in the 64-bit world.

Audience: why in the second case is the offset located in front of the size, and not as in the previous case, and what happens if the size of the offset is large?

Professor:I think that in some cases we have certain limiting problems that we have to work on. For example, a problem will arise if there are more bits. But in principle, I don’t think there is a reason why you couldn’t read some of these things. Unless certain strict conditions, which I don’t think about right now, should have stipulated the size of the zero part, otherwise there could be problems with the hardware.

So you can still trigger a buffer overflow in the Baggy bounds system ., since the use of the above approaches does not solve all the problems, right? Another problem you may encounter if you have a non-tool code, because we will not be able to detect any problems in the non-tool code. You may also encounter memory vulnerabilities that arise from dynamic memory allocation systems. If you remember, in the previous lecture we looked at this strange pointer free malloc , and Baggy bounds could not prevent the occurrence of such things.

We also discussed at the last lecture the fact that code pointers do not have boundaries that would be associated with them. Suppose we have a structure in which the memory buffer is located at the bottom, and a pointer at the top, and a buffer overflow occurs. We assume that the buffer overflow is still within Baggy bounds . Then you must override this function pointer. Otherwise, if we try to use it, it can send a malicious code to attack a controlled part of the memory. And in this case, the boundaries will not help us, because we have no boundaries that are associated, associated with these function pointers.

So, what is the price of using the Baggy bounds system ? In fact, we have only 4 components of this price.

The first is space. Because if you use a thick pointer, then obviously you have to make the pointers bigger. If you use the Baggy bounds system that we just talked about, you should keep a table of boundaries. And this table has such a slot size that allows you to control how big this table will be until you have exhausted the possibilities of the memory allocated for it.

In addition, you also increased the load on the CPU, which is forced to do all these instrumental operations with pointers. Since for each or almost every pointer, it is necessary to perform boundary checks using the same modes of operation, which will slow down the execution of your program.

There is also a problem with false alarms. We have already discussed that it can happen that the program generates pointers that go beyond the boundaries, but never tries to dereference them. Strictly speaking, this is not a problem. Baggy bounds flags these "outrageous" pointers if they go beyond 1/2 of the slot size, at least in a 32-bit solution.

What you will see in most security tools is that false alarms reduce the likelihood that people will use these tools. Because in practice, we all hope that we care about security, but what really excites people? They want to be able to upload their stupid photos on Facebook, they want to speed up the upload process and so on. Thus, if you really want your security tools to be in demand, they must have zero false alarms. Attempting to catch all vulnerabilities usually results in false alarms that will annoy either the developers or the users.

Unproductive costs are also caused by the fact that you need compiler support. Because you have to add all the tools to the system, a pointer check bypass, and so on and so forth.

So, for using the Baggy bounds system, we have to pay the price, which consists of space overspending, increasing CPU load, false alarms and the need to use a compiler.

This completes the Baggy bounds discussion .

Now we can think of two other buffer overflow mitigation strategies. In fact, they are much easier to explain and understand.

One of these approaches is called non-executable memory , non-executable memory.. Its basic idea is that the paging hardware will indicate 3 bits R , W, and X — read, write, and execute — for each page that you have in memory. That is, the program can read this memory, write to it, execute it. The first 2 bits are old, they existed before that, but the last, third bit is a new design.

The next idea is that you can make the stack non-executable. If you make the stack non-executable, this means that the adversary will not be able to run the code by simply creating a shell code in order to later jump to some place in the buffer. Many systems define policies like this. Exceptional priority “ W or X"Means that if you have a specific page, you can either write to it or use it as executable code, but you cannot do both at the same time. This is good for confronting an attacker who simply pushes executable code onto the stack and then proceeds to execute it. In this case, a similar trick will not work. Thus, at the hardware level, we eliminated the attacker's vector of the attacker, which puts the executable code on the stack. What are the benefits of this approach?

Potentially, it works without any changes in the application, everything happens at the hardware level and at the OS level. The operating system simply checks that the pages are protected by these bits. So it is very convenient, because you do not need to use a compiler, as in the previous case.

Another useful thing that I mentioned in the last lecture is that the hardware always keeps track of you, even if the OS does not. These bits here indicate that the hardware has scanned them and verified the correctness of each memory reference you made with code. This is also a very good aspect.

The disadvantage of this approach is that it makes it harder for an application to dynamically generate code. And the best example of this is the just-in-time compilers that we discussed in the last lecture.

That is, you can go to the web page, and your JavaScript code will be executed fairly quickly. Downloading this source of JavaScript, at first it just starts to interpret it, but then at some point finds some kind of “hot” path, some kind of “hot” loop, where it is taken to dynamically generate x86 machine code and execute it directly. But for this to work, you must provide the ability to dynamically write code to the page.
There are several ways to get around this inconvenience. For example, you can imagine that a just-in-time compiler initially sets the W bit , then deletes it and sets the X bit . The remaining methods may look complicated, but at a higher level it is quite easy to understand how non-executable memory works.

Another way to protect against buffer overflow is to use random addresses or address spaces. We see that the attacks that we discussed so far use hard-coded addresses.

Recall how the attacks that you developed in your lab work are created. You open a program in GDB, find out the location of some things, and you can create a shell code that actually has hard-coded addresses. Thus, the idea of ​​using a random address database is quite simple. This is mainly done to make it difficult for an attacker to guess the addresses. There are several different ways to implement this approach.

So, one idea is to randomize the stack. Imagine that all this space from top to bottom is the virtual memory space of the program, with the stack always placed on top, always going down, then the program codes are placed, and the heap rises from bottom to top.

And all these segments of the program: the stack, the heap, the program code - start in a well-known place. Instead, imagine that a stack that always starts here, in this famous place, can start here, lower, higher, somewhere else. Similarly, the program code can change its position in this structure, being located above or below the usual, fixed place.

So the idea is that if you are attacking and controlling one of these binaries, you can look in GDB and find out where all these offsets are, but in fact they will not help you figure out where these offsets are in a particular code. that works on this server. This is the basic idea of ​​random address spaces.
Thus, it uses the fact that most of the code you generate is not loaded to a specific place in memory. If you write something like a device driver that interacts with some hardware, then this hardware may require a hard-coded address. This address must be kept in a certain place in the buffer in order to be able to copy information from there. But if you do not write anything like this, you can use relocatable code, since this approach will work very well with such things.

The question is, can you use this approach? Obviously, yes. There are several different ways to do this, which we will discuss today, but an attacker can still extract an element of chance and defeat all these random approaches. To prevent this from happening, you make them nonrandom, or you find a random intervention by the attacker, or somehow use the fact that the attacker took advantage of the leakage of information about the random locations of these things.

27:10 min.

Continuation:

MIT course "Computer systems security". Lecture 3: "Buffer overflow: exploits and protection", part 2

Full version of the course is available here .

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr's users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from \$ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

Dell R730xd 2 times cheaper? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from \$ 249 in the Netherlands and the USA! Read aboutHow to build the infrastructure of the building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?