MIT course "Computer Systems Security". Lecture 1: Introduction: Threat Models, Part 3

Original author: Nikolai Zeldovich, James Mickens
  • Transfer
  • Tutorial

Massachusetts Institute of Technology. Lecture Course # 6.858. "Security of computer systems." Nikolai Zeldovich, James Mickens. year 2014


Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security techniques based on recent scientific work. Topics include operating system (OS) security, features, information flow management, language security, network protocols, hardware security, and web application security.

Lecture 1: “Introduction: threat models” Part 1 / Part 2 / Part 3

Run this program using the debugger. You will get acquainted with this in detail in the first laboratory work. And now we will try to set a breakpoint in this redirection function, run the program and see what happened.



So, I started the program, it began to execute the main function, and the redirection is pretty fast. The debugger is now stopped at the beginning of the redirect. We can see what happens here, for example, we can ask to show us the current CPU registers. Here we will look at the lower level, not the level of the C source code. We are going to look at the real instructions executed by my machine to see what is really going on. C can really hide something from us, so we’ll ask you to show us all the registers.

In 32-bit systems (x86), as you recall, there is a pointer to the stack frame - the EBP register (stack-frame Base Pointer, pointer to the stack frame). And my program, which is not surprising, also has a stack.



On x86, the stack grows down, it is such a stack, as shown on the slide, and we can continue to “push” our data into it. Currently, the stack pointer points to a specific memory location ffffd010 (ESP register, address of the top of the stack). There is some meaning here. How did it get there? One way to figure this out is to parse the redirect function code.



The variable Convenience must have an integer value. So, we can parse the function by name. Here you can see what this function does. First of all, she starts to perform some actions with the EBP register, this is not very interesting. But then it subtracts a certain value from the stack pointer. This essentially creates a space for all variable parameters, such as a buffer and an integer, we saw this in the source code C.
Now we want to understand how this function works. The value of the stack pointer, which we saw earlier, is now already in the middle of the stack, and above it is placed information about what is done in the buffer, what is the integer value and the return address to the main function that is implemented on the stack is also located. So somewhere here we should have a return address. Now we are just trying to figure out where different things are on the stack.

We can give the command to print the address of this buffer variable.



Her address is ffffd02c. Now we display the address of the integer value i - it looks like this: ffffd0ac. Thus, the integer is located above the stack, and the buffer is lower.

That is, we see that our buffer is located on the stack at this place, on top is an integer, and possibly some other things, and at the very end is the return address to the main function, which is called "redirection".



We see that the stack grows down, because above it there are things with higher "higher" addresses. Inside our buffer, the elements will be located as follows: [0] below, and then upwards increasing to the element [128], as I painted on the board.

Let's see what happens if we enter the same data that led to the system crash. But before that, we must determine exactly where our return address is, how it relates to the ebp pointer.

In x86, there is a convenient thing called Convention, which makes the EBP pointer, or register, pointing to something happening on the working stack, marked as “saved EBP register” (saved EBP). This is a separate register, located after all the variables, but before the return address, as shown in this figure.



It is saved according to several instructions placed on top. Let's learn what constitutes saved EBP.

In the GDB debugger (GNU Debugger), you can examine some variable X, such as an EBP pointer variable.



Here is his stack position - ffffd0b8. Indeed, it is located higher than our variable i (register edi). It is perfectly.

And it has some other value that EBP takes before the function is called, and above it there is another memory location, which will be the return address. If we print ebp + 4, we will be shown the contents of the stack 0x08048E5F. Let's see what this indicates.

This is what you have to do in the laboratory. So you can take this address and try to parse it. What is he like and where does it end? So GDB really helps to figure out which function contains this address.



What is 5f? This is what the return address points to. As you can see, this instruction follows immediately after the redirect is called.. Therefore, when we return from the redirect, this is the place where we get and where we continue to execute the function from.

So where are we now? To get the bottom line, we can try to disassemble our instruction pointer. Enter "disass $ eip".



We are now at the very beginning of the redirect. Let's try to run the get () function and enter the “next” command. And then we print our unimaginable value, which caused the program to stop - AAA ... A, to see what happens.



So we did get (), but the program still works. Now we will find out what is happening in memory at the moment and why then everything will become bad.

What do you guys think is happening now? I printed the sequence of characters A. What did the get () command do with the memory? She pushed this sequence onto the memory stack, which, if you recall, contains elements from [0] to [128] inside itself. And this sequence A began to fill it from the bottom up, that's how I drew, in the direction of the arrow.



But we had only one pointer - the beginning of the address, that is, we indicated where in the buffer we should start positioning A. But get () does not know the length of the stack, so it just continues to fill the memory with our data, redistributing them up the stack, perhaps bypassing the return address and everything above our stack. So I type the command to count the repetitions of A and get a value of "180", which exceeds our value of "128".



This is not so good. We can check again what happens with our EBP pointer, for this I type $ ebp. We get the address 41414141.



Well, then I type "show the location of the return address $ ebp + 4" and get the same address 41414141.



This is not at all good. This shows what happens if the program returns here after the redirect, that is, it jumps to the register with the address 41414141. And there is nothing there! And she will stop. That is, we got a segmentation error.

So let's just go over here and see what happens. Type “next” and run the program further.



Now we are nearing the end of the function and can step over 2 more instructions. Again we type “nexti”.



You see that at the end of the function there is a “leave” statement that restores the stack to where it was. It kind of “pushes” the stack pointer all the time back to the return address using the same EBP, which is why it is mainly needed. And now the stack points to the return address we are going to use. In fact, these are all our characters A. And if we run another instruction, the processor will go to this specific address 41414141, start executing the code there and “crash”, because this is an invalid address in the page table.



Let's check what happens there. Once again, print the contents of our buffer and make sure that it is completely filled with the characters “A” in the amount of 128 pieces.



If you remember, in total we entered 180 “A” elements into the buffer. So, something else happens after the buffer overflow has occurred. If you remember, we performed the conversion of A to the integer i in the integer register. And if we have only alphabetic characters A, without any numbers, then 0 is written to the memory location, since the letter cannot be represented as an integer. And 0, as you know, in C means the end of the line. So GDB thinks we have a beautiful, complete string of 128 characters A.



But that doesn't really matter, because we still have all these A at the top that already damaged the stack.

Well, that was a really important lesson. You need to consider that there is also other code that will be executed after you manage to overflow the buffer and cause memory corruption. You must make sure that this code does not do anything stupid, for example, it does not try to convert the letter characters A into integer values ​​i. So, it should provide that if a non-numeric value is detected, in our case it is A, we will not be able to jump to the address 41414141. Thus, in some cases you must limit the input data. Perhaps this is not very important in this case, but in other situations you need to be careful about the type of input data, that is, indicate what kind of data - numeric or alphabetic - should be processed by the program.

Now we will see what happens next, and jump again. Let's look at our register. Right now, the EIP, a view of the instruction pointer, points to the last redirect address. If we take another step, we will finally move on to our unfortunate 41414141.



Indeed, the program follows our instructions, and if we ask GDB to print the current set of registers, then the current position pointer will be a strange value. We will try to execute one more instruction and finally, we get a program crash.



This happened because the program tried to follow the instruction pointer, which does not correspond to a valid page for this process in the page table of the operating system. This is clear?

Well, I have a question for you. So what is our problem all the same?

Audience: with this program you can do whatever you want!

Absolutely right! Although, in fact, it was rather silly to introduce such a huge number of these A. But if you knew well where to place these quantities, you could put other values ​​there and go to some other address. Let's see if we can do this.

Stop our program, restart it, and again enter many A characters to overflow the buffer. But I'm not going to find out which And where is located on the stack. But suppose I overflow the stack at this point and then try to manually change things on the stack so that the function jumps to the place I need. So I introduce NEXTI again.



Where are we? We are at the very end of the redirect again. Let's look at our stack.



If we examine ESP, then we will see our damaged pointer. Good. Where could we jump from here? What interesting could we do? Unfortunately, this program is very limited. There is nothing in her code to help us jump and do something interesting, but we will try anyway. Perhaps we can find the PRINTF function, jump over there and force it to print some value, or the value X equivalent to something. We can disassemble the main function - disass main.



And the main function does a whole bunch of things - initiating, forwarding calls, much more, and then calls PRINTF. So what about jumping to this point - <+26>, which sets the argument for PRINTF to% eax in register <+22>? Thus, we can take the value in the register <+26> and "stick" it to this stack. This should be fairly easy to do using the debugger, you can make this set {int} esp equal to this value.

You can check the ESP again, and indeed, it has that meaning.



We continue with the “C” command, and we will see that the function printed X equal to some nonsense, and I think this happened because of the contents of this stack, which we tried to print. We incorrectly configured all the arguments because we jumped into the middle of this calling sequence (the sequence of commands and data needed to call this procedure).



Yes, we printed this value, and after that the system crashed. Why did this happen? We jumped to the PRINTF function, and then something went wrong. We changed the return address, so when we returned from the redirect, we go to this new address, at the same point immediately after PRINTF. So where did this failure come from?

Audience: due to the return of the main function!

Absolutely right! That's what happens - here is the point where we jumped in the register <+26>. It sets some parameters and calls PRINTF. PRINTF is operational and ready to return. So far, everything is fine, because this call instruction "pushes" the return address on the stack so that the PRINTF function used this address.



The main function continues to work, it is ready to run the LEAVE instruction, which is nothing interesting, and then make another “return” in the register <+39>. But the fact is that there is no correct return address in this stack. Therefore, presumably, we return to someone else who knows the location of the memory above the stack, and jump somewhere else. So, unfortunately, our pseudo-attacks do not work here. This is where some other code runs. But then it crashes. This is probably not what we wanted to do.

So if you really want to be careful, you must not only carefully place the return address on the stack, but also find out from whom the second RET will receive its return address. Then you need to try to carefully push something else on the stack to be sure that your program continues to run “cleanly” after it has been hacked, and so that no one will notice this intervention.

You will try to do all this in laboratory work No. 1, only in more detail.

There is one more thing that we should think about now - about the architecture of the stack during buffer overflows. In this case, our problem is that the return address is located there at the top, right? The buffer continues to grow and, ultimately, overlaps the return address. But what if we flip the stack “down ready”? You know, some cars have stacks that grow up. So we could imagine an alternative design where the stack starts from the bottom and continues to grow up, not down. So if you overflow such a buffer, you just keep going up the stack, in which case nothing bad will happen.

Now I will draw you to explain how it looks. Let the return address be located here, at the bottom of the stack. Above are our variables, or saved EBP, then integer integers, and at the very top a buffer from [0] to [128]. If we do overflow, then it goes up this arrow.



Thus, buffer overflows will not affect the return address. What do we need to do in our program to implement this option? Right, do a redirect! We place on the left a stack frame that will perform such a redirect, and forward the function call up. As a result, our scheme will look like this: at the top of the stack is the return address, then saved EBP, and all other variables will be located on top of it to us. And then we start overflowing the buffer with the get (S) command.



So, the function is still problematic. Basically, because the buffer is surrounded by return functions from all sides, and in any case, you can overflow something. Suppose our machine has a stack growing up. Then at what point can you take control of the program?

In fact, in some cases it is even easier. You do not need to wait for the redirect to return. Perhaps there were even things like turning A into i. This is actually simpler because the get (S) command overflows the buffer. This will change the return address, and then immediately go back and jump to where you tried to create some kind of construct.

What happens if we have such a rather boring program for all kinds of experiments? It seems to contain no interesting jump code. All you can do is print here, in PRINTF, another X value.



Let's try it!

Audience: if you have an additional stack, can you put arbitrary code that, for example, executes the shell of the program?

Yes, yes, yes, that’s really reasonable, because then you can support other “input” values. But there is some protection against this, you will learn about it in the following lectures. But basically, you could have a return address here that overlaps on both types of machines — with stacks up and with stacks down. And instead of specifying it in the existing code, for example PRINTF in the main function, we could have a return address in the buffer, since this is just some kind of location in the buffer. But you can “jump” there and consider it an executable parameter.



As part of your request, you send a few bytes of data to the server, and then get the return address or thing that you located in this place in the buffer, and you continue to run the program from this point.

This way you can provide the code you want to run, jump there and use the server to run it. Indeed, on Unix systems, attackers often do this - they ask the operating system to simply execute the BIN SH command, allowing you to choose the type of arbitrary shell commands that are then executed. As a result, this thing, this piece of code that you inserted into the buffer, is called the “shell code” for a number of historical reasons. And in your laboratory work you will try to construct something similar.



Audience: Is there a separation between code and data?

Historically, many machines did not provide any separation of code and data, but simply had a flat memory address space: the stack pointer points there, the code pointer points here, and you simply do what it points to. Modern machines try to provide some protection against such attacks, so they often create permissions associated with different areas of memory, and one of the permissions is executable. Thus, the part of your 32-bit or 64-bit address space that contains the code has permission to perform operations. And if your instruction pointer points there, the processor will actually manage these things. But the stack and other data areas of your address space usually do not have execute permission.

So if you happen to somehow set the instruction pointer to a certain position corresponding to a memory area where there is no code, the processor will refuse to execute this instruction. So this is a pretty good way to defend against some types of attacks, but it does not prevent their possibility at all.

So how would you get around this obstacle if you had an unexecutable stack? In fact, you saw this example earlier, when we just “jumped” into the middle of the main function. Thus, it was a way to use buffer overflows without having to enter new native code. Therefore, even if this stack was not executable, I would still be able to get into the middle of the main function. In this particular case, this is rather boring, because it is enough to enter PRINT X to bring down the system.

But in other situations, you may have other pieces of code in your program that allow you to do interesting things that you really want to accomplish. This is called a “return to lib c” attack - a return to library attack involving buffer overflows. In this case, the return address of the function in the stack is replaced by the address of another function, and the parameters for the called function are written to the next part of the stack. This is a way around security measures. Thus, in the context of buffer overflows, there is no really clear solution that provides perfect protection against these errors, because, in the end, the programmer made a mistake in writing this source code. And the best way to fix this is probably to just change the source code and make sure that you did not enter too many getS (), which the compiler just warned you about.

But there are more subtle things that the compiler does not warn you about, but you should still consider them. Since in practice it is difficult to change all the software, many people are trying to develop methods to prevent such errors. For example, they make the stack unenforceable so that you cannot put shell code on it and must do something more complex to achieve your goal. In the next 2 lectures we will consider these methods of protection. They are not ideal, but in practice they make life difficult for a hacker.
Audience: will there be a test on the topic of today's lecture and when?

Yes, if you look at the schedule, you will see 2 tests there.

So to summarize. What should we do with the problems of the buffer overflow mechanism? The general answer should sound like this - you need to have the least number of mechanisms. As we have seen, if you intend to apply security policies in every piece of software, you will inevitably make mistakes. And they will allow the enemy to circumvent your mechanism in order to exploit some shortcomings in the web server.

In the second lab, you will try to design a more advanced system, the security of which will not depend on software, and which will ensure compliance with security policies. The security policy itself will be implemented by a small number of components.

And the rest of the system, whether it is correct or not, does not matter for security if it does not violate the security policy itself. So a kind of minimization of a reliable computing base is a rather powerful technology that allows you to bypass the errors of the mechanism and the problems that we examined today in more or less detail.

That's all for today, come to the lecture on Monday and do not forget to post your questions on the site.

To be continued…


The full version of the course is available here .

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending it to your friends, a 30% discount for Habr users on a unique analogue of entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to divide the server? (options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

Dell R730xd 2 times cheaper? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read aboutHow to build the infrastructure of the building. class using Dell R730xd E5-2650 v4 servers costing 9,000 euros for a penny?

Also popular now: