MIT course "Computer Systems Security". Lecture 9: "Web Application Security", part 3
Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems." Nikolai Zeldovich, James Mykens. year 2014
Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.
Lecture 1: “Introduction: threat models” Part 1 / Part 2 / Part 3
Lecture 2: “Control of hacker attacks” Part 1 / Part 2 / Part 3
Lecture 3: “Buffer overflow: exploits and protection” Part 1 /Part 2 / Part 3
Lecture 4: “Privilege Separation” Part 1 / Part 2 / Part 3
Lecture 5: “Where Security System Errors Come From” Part 1 / Part 2
Lecture 6: “Capabilities” Part 1 / Part 2 / Part 3
Lecture 7: “Native Client Sandbox” Part 1 / Part 2 / Part 3
Lecture 8: “Network Security Model” Part 1 / Part 2 / Part 3
Lecture 9: “Web Application Security” Part 1 / Part 2/ Part 3
Audience: so what prevents an attacker from finding the key? Where is this secret key?
Audience: Is it possible to use MAC for normal clients?
Professor:for normal - do you mean browsers?
Audience: for ordinary users.
Professor: the fact is that the question of where the key really lives is extremely important. Because if a key can be stolen as easily as a cookie, then we don’t win anything. Therefore, in many cases, all these things are stored somewhere in the cloud and are used to exchange data between virtual machines and are transferred from server to server in the cloud too. In this way, the application developer starts a VM that uses the outsourcing of a heap of things stored in AWS.
Audience: Is there a network delay problem here, so an attacker can send the same request right after the user and also get access?
Professor:Yes, suffice it to say that several people defended their dissertations on the topic of timestamp security. But you are absolutely right, because we have considered a rather rough example.
Imagine that here, in this example, String To Sign in the DATE line we will have the value "Monday, June 4th." Then, if in some way the attacker can get access to all this, then he will be able to repeat the user's request. The fact is that AWS allows you to use the expiration date of these things. So one thing you can do is add an Expires field here, and assume that an expiration date has been set.
Then I can give this link to a bunch of different people, and the server will check if their requests have expired.
Lecture hall:even if the expiration date is only 200 milliseconds, but the attacker is watching the network, he can send just in case several copies of the request instead of one.
Professor: quite right, if the attacker attacks the network and sees how these things are transmitted by wire, and there is enough room for maneuver on the expiration date, then he can definitely make this attack.
So this was an overview of how stateless cookies work. One interesting question arises here: what does it mean to log out of a system having cookies of this type? The answer is that you are not really leaving. I mean that you have this key and whenever you want to send a request, you just send it. But the server could revoke your key.
Suppose the server has revoked the key. But you can create one of these GET things, and when you send a message to the server, it will say: "Yeah, I already know your User ID, User ID, the key has been revoked, so I will not fulfill your request." However, there are nuances, and if we talk about such things as SSL, then it is much easier to empower a person than to recall them.
Thus, there are several other things that you can use if you want to avoid traditional cookies to implement authentication. One of them is the use of the DOM repository, which contains information about the authentication of the client side. You could use the DOM repository to store some session states that are usually placed inside cookies.
If you remember from the last lecture, the DOM repository is a key interface of the values that the browser provides to each source, that is, the browser takes them from there and inserts them into a string.
The good thing is, the DOM does not have such stupid rules regarding the same policy of the same origin. So, if these were ordinary cookies, you could do all these tricks with subdomains and the like. The DOM repository is actually strictly tied to one source, so you cannot extend any subdomain. Therefore, frameworks such as Meteor use this repository.
One of the problems I mentioned briefly and which we will look at in detail in subsequent lectures is the revocation of certificates. If a user leaves your organization, how can you take a certificate from him? It is quite difficult.
In addition, these things are not very convenient to use, because no one wants to install a bunch of certificates for each site that you visit. Therefore, authentication certificates are not very popular, with the exception of companies or organizations that relate to security with great responsibility. This concludes the discussion of cookies.
Let's talk now about protocol vulnerabilities in the web stack. One of the interesting types of attacks is to use errors in browser components, for example, when parsing URLs. So how can URL parsing bring trouble to us?
Suppose we have a URL of this kind, where for some reason strange characters are embedded at the end:
example.com : 80 @ foo.com.
The question is, what is the origin of this particular URL? Flash would have thought that the hostname is example.com. But when the address analyzes the browser, it will think that the origin of the host in this case is foo.com.
This is very bad, because when we have two different entities that are confused in the origin of the origin of the same resource, this is fraught with unpleasant problems.
For example, a flash code can be malicious and download some material from example.com. If the exploit was built into the page from foo.com, he could also do some evil things there. And then it takes some code from example.com and runs it with the authority of foo.com. Many complicated parsing rules like these make life very difficult. It happens all the time.
We have just considered content disinfection, the basic idea of which is often much better when there are simpler parsing rules for this kind of thing. However, in retrospect it is difficult to do, because HTML is already there.
And now let's talk about my most favorite security vulnerability - about files with the .jar extension, which are a ZIP archive with a part of a Java program. The object of the attack are browser JAR files, mostly Java applets. Around 2007, one great site called lifehacker.com explained how to embed zip files into images. It’s not quite clear who you are trying to hide from by doing this, but lifehacker.com is convinced that this can be done.
They mainly use the fact that if you look at image formats, such as GIF, then, as a rule, the parser runs from top to bottom. He first finds the information in the header, and then considers the remaining bits located at the bottom.
As it turned out, programs that usually manipulate ZIP files work from the bottom up, that is, opposite to the direction of parsing images. First, they find information in the file footer and unpack what is contained inside the archive. Thus, if you place an image file containing a ZIP archive, it will go through all the checks, even Flickr, like any other image, and it will even appear as an image in your browser.
But only you will know the hidden truth. Only you will be aware that if you take this file, you can unpack it and use the information enclosed there. It seems that it seems like a cheap trick, but hackers never sleep, they constantly want to ruin our lives. So how do they implement this idea?
Later, people called this attack method GIFAR, half GIF, half JAR, and both half evil. It was awesome. When people first discovered this opportunity, they found it amazing, but they didn’t understand at all how to use it. But as it turned out, the following things can be done on its basis.
So how can you do this? You just use CAD. Take .gif, take .jar, use self-extracting archive - boom, and GIFAR attacked you!
So as soon as you got this, what can you do? There are some sensitive sites that allow users to provide data, but not just arbitrary data types. So Flickr or something like this may not allow you to send a custom ActiveX or any other custom HTML. But you will be allowed to send images. So you could build one of these things and present it on one of these confidential sites, which allows you to send images. What do you need to do to make a successful attack in this case?
This code uses a cross-site scripting vulnerability, so it will be launched in the site content. GIFAR will pass origin checks because it comes from a site with a common source of origin, despite the fact that this code was inserted by an attacker.
So, now the attacker gets the opportunity to run this Java applet in the context of the victim's site with all the authority of origin. And one of these things will actually be correctly identified as a GIF image. But there is a hidden code. Let me remind you that first the browser unpacks the archived files, so first of all it will launch the JAR part, ignore the top part of the GIF. So it is actually quite amazing.
There are some pretty simple ways to fix this. For example, you can use the applet loader, which understands that there should not be random garbage. In many cases, metadata information is used that indicates the length of this resource. In this case, the loader will start, as expected, from the top, analyze its length, see that the applet ends at the top, and stop. It doesn’t care about the lower part, it is possible that it is even 0. In our case, such a loader will not help, as it will start processing the request from the lower, archived part, and stop before the upper one, ignoring it.
What I like about this is that it really shows how wide the Internet software stack is. Taking only two of these formats, GIF and JAR, you can create a really unpleasant attack.
You can do this with PDF files. You can put PDF instead of GIF and call this attack something harm PDFAR. But in the end, people have dealt with this problem, and vulnerabilities of this kind have now been eliminated.
Audience: What can you do with an attack of this kind that you cannot do with a regular XSS cross-site scripting attack?
So, as I said, this is my favorite attack of all time, mostly only because it made respectable people in the field of computer security come up with a word such as GIFAR.
Another interesting thing is the use of attacks that are based on time. Usually, people do not think of time as a resource that can be a vector for attacks. But as I noted a few minutes ago, time may in fact be a means of allowing an exploit to be introduced into the system.
The specific attack I’m about to talk to you about is a covert channel attack attack. The idea of this attack is that the attacker finds a way to exchange information between the two applications, and this exchange operation is not authorized. The attacker somehow uses some part of the system to transfer bits of information between two different resources.
A good example of this is the sniff-based CSS attack. What is such an attack?
Suppose an attacker has a website that a user can visit. Getting a user to visit a website is, in fact, quite simple. You create an advertisement or send a phishing email.
Thus, the attacker has a website that the user visits. And the attacker's goal is to find out what other sites the user has visited. An attacker may want to find out for several reasons. Perhaps he’s interested in the user's search queries, or he is trying to figure out where this person works, or maybe he wants to know if this person doesn’t go to any “shameful” sites, and so on.
How is the attacker going to do this if the only thing he controls is the website he wants to convince the user to log in to? A possible way is to use link colors. As you know, if you clicked on a link once, the next time it will appear in your browser of a different color, indicating that you have previously clicked on this link. This is actually a security vulnerability.
Interestingly, in many cases you don’t even need to display URLs. You can arrange links in the form of headings on the screen like a domino, and these headings will change color if the user used this link. Perhaps you will think it is not too tedious to scan all these URLs of sites visited by the user? But this process can be optimized by using several filtered passages on the list of addresses. For example, you can first see if the user has visited the top-level URLs — cnn.com, Facebook.com, and so on and so forth. If the answer is yes, you can make a selection of the most visited top-level pages. Thus, you can really limit the amount of search.
So the harmless function that browsers support to help the user, saying: “hey, buddy, that's where you visited!” Can be used by the attacker as compromising evidence on you.
Thus, the next attack that an attacker can organize is a cache-based attack. Its purpose is the same - the hacker wants to know which sites you visit.
As an exploit, it uses cached information that serves to provide the user with quick access to the previously visited site. In fact, the ability to quickly access a page is the reason why you first cache its address.
Thus, the attacker can again generate a list of candidate objects that he believes you could visit, and then simply check how quickly these objects return to him. If objects return quickly, you can guess that you have been there before, because when you visit them, the information is not downloaded again, but the data from the cache is used. It's clear?
Again, the browser is just trying to help you. But you can use his good intentions for evil. It is also important that this attack can use some very interesting information about geographic location.
Imagine we are attacking Google Map Tiles. If I find that you really have access to the Google Map “tiles” series, it means that you are either in this place or you are interested in other people who may be in this place. So this is a pretty powerful attack.
So how can you fix this? Well, this is not entirely clear. You may have a site that does not cache anything at all, but then your site will be slow, and this is no good. So it's not entirely clear how to get around this vulnerability.
Here's the idea: even if you don't cache anything, when you go to a resource for the first time, you must create a DNS query for the hosting associated with that resource. Thus, the attacker, again, can detect the time and see how long it takes to gain access to these candidate sites. And if the request comes back quickly, then this is a good hint that you have previously created a DNS name for this host. And it works even if you do not cache anything, because the DNS cache is associated with the operating system, not with the browser.
Professor: yes it is!
Lecture hall:that is, you can simply make a link that looks like a pixel, and then take a screenshot of where it will go, and thereby make the transition to the link?
Professor: yes, such a possibility exists. Rendering is always difficult, because you need to play this game - if you want to show something to the user, it must flash very quickly, otherwise you can see that someone is on this huge list of URLs. But you are right, if you have access to the screen sharing API, a lot becomes much easier.
Audience: what if you just have some kind of animated image that looks random and you just pay attention to one pixel?
Professor:you are absolutely right. I think the screen sharing API is a bad idea. But I am not the president of the world, what can I do? One way or another, DNS-based attacks work, even if caching is not performed.
The basic idea here is to more quickly visualize the URL you visited before.
При этом злоумышленник может создать плавающий фрейм <iframe>, поместить туда некоторое содержимое, которое, по его мнению, вы, возможно, посетили, а затем постоянно следить, не потеряет ли он доступ к этому <iframe>. Потому что как только <iframe> загружается, браузер обычно считает, что этот <iframe> принадлежит странице злоумышленника и блокирует его. Поэтому как только в браузер поступает контент, исходящий из другого origin, вы начнете получать эти сообщения об ошибке доступа.
Thus, the attacker can no longer touch anything and concludes that he was able to identify the site that you previously visited.
See you at the next lecture!
Full version of the course is available here .
Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr's users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).
VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps until December for free if you pay for a period of six months, you can order here .
Dell R730xd 2 times cheaper? Only here2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read about How to build an infrastructure building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?