MIT course "Computer Systems Security". Lecture 8: "Model of network security", part 2

Original author: Nikolai Zeldovich, James Mykens

Transfer
Tutorial

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.

Lecture 1: “Introduction: threat models” Part 1 / Part 2 / Part 3
Lecture 2: “Controlling hacker attacks” Part 1 / Part 2 / Part 3
Lecture 3: “Buffer overflow: exploits and protection” Part 1 /Part 2 / Part 3
Lecture 4: “Privilege Separation” Part 1 / Part 2 / Part 3
Lecture 5: “Where Security System Errors Come From” Part 1 / Part 2
Lecture 6: “Capabilities” Part 1 / Part 2 / Part 3
Lecture 7: “Native Client Sandbox” Part 1 / Part 2 / Part 3
Lecture 8: “Network Security Model” Part 1 / Part 2 / Part 3

So what happens if the browser incorrectly processes the object and cannot identify its type? In this case, you may have security problems. One of them is called the MIME attack.

You are probably familiar with MIME - it's a type of unencrypted headers like text / html, image.jpeg, and so on. So, the old versions of the IE browser used this because they believed that such a view would help the user. But sometimes it happens that web servers assign an incorrect extension to an object file.

A misconfigured web server can attach the .html suffix to what actually has the .jpeg extension, or vice versa, for example, create foo.jpg instead of foo.html.

So in the old days, IE was trying to help you. That is, he went to take a resource, while thinking: “OK, this resource claims that it is of this type, in accordance with its file name extension”. But then it will only look at the first 256 bytes available in this object. And if he had found there certain magical values that indicated that there was another type of extension for this object, he would simply say: “hey, I found something cool here! I think the web server mistakenly identified this object, so let me just treat this object as the type I found in these first 256 bytes. ”
And then everyone becomes a winner, because a browser of a type has helped the web server developer, since now their website will be displayed properly. And the user will like it, because they will be able to unlock this content, which previously would have been just rubbish.

But this is a clear vulnerability! Suppose that a page contains some passive content, for example, an image from a domain controlled by an attacker. However, the victim's page thinks: “even if it is the content of a malicious hacker site, it’s just passive content. He will not be able to do anything to me! ”As a last resort, a bad image will be displayed, but he will not be able to open any code, because the passive content has zero authority.

But the fact is that IE can first “sniff” this image, its first 256 bytes. And an attacker may intentionally place HTML and JavaScript code there. Then it turns out that the victim's site will bring what it considers an image, and IE will execute malicious code in the context of the embedded page.

This is a kind of example of how complex browsers are and how adding even a very good intent can cause very subtle security errors. So let's take a close look at how the browser protects various resources.

Let's look at frames and window objects (objects representing a window containing a DOM document). Frames represent these independent universes of JavaScript that we talked about here. I mean that JavaScript is an instance of a DOM node, as shown in the picture of the DOM tree.

Thus, a frame will exist as an object of a DOM node somewhere in this hierarchy that is visible to javascript.

In JavaScript, the window object is actually an alias for the global namespace. That sounds like a stupid idea. That is, if you were to find the name of the global variable x, you would also be able to access it through the name window.x.

Thus, frames and window objects are very powerful links to enable you to access. And they contain pointers to each other. The frame may contain a pointer to the associated window object and vice versa. In essence, these two things are equivalent.
Both frames and window objects get the origin of origin origin from a URL frame, or, because they are always in the secure part of the network, they can get the suffix of the original domain name, that is, its original origin.

For example, the frame may begin like this: .xyzcom, here for a second you can ignore the scheme and protocol.

In this case, we can assume that the source of origin for (document.domain) is the yzcom suffix. Similarly, the source of origin for this document is z.com. This is possible because z.com is a yzcom suffix.

The only thing that cannot serve as such a source is the .ayzcom expression, because it is the wrong suffix of the origin source origin. Also, the correct source of origin suffix cannot be considered simply .com, because in this case, the site could somehow affect cookies or something like that on any site, such as .com, which can have rather devastating consequences.

The motivation for why these types of things are permissible lies in the fact that this is related to some type of existing trust relationship. So, it seems, with the three top options, everything is in order, and the disorder is only in .com.

Audience: is it possible to make similar splits at any point or at the end point? For example, can your xyzcom be changed to your z.com?

Professor:No, this is valid only for each point.

Audience: Is there any reason why it was not done so that you could specify a super or a subdomain, but at the same time, they somehow had to agree on where the information would come from? Let's say that I want to accept only that which has the same origin as mine, so that any of these resources could attack me. Moreover, we would make this interaction symmetrical, so that I could also influence it. After all, the suffix of the source of origin .com means that everything that has the same suffix .com can affect me.

Professor:Yes, it is difficult, so there are several answers to this question. First, people were very worried about the possibility of an attack using .com. Therefore, they wanted to make the domain manipulation language easy to understand. Thus, they did not allow spoiling the settings.

In a second I will tell you about one thing that allows you to do what you are talking about, but only with respect to domain suffixes. For now, I want to note that the Post Message interface allows domains to communicate with each other, if they agree to this. So in practice, people use Post Message to perform cross-domain communication if they cannot establish the same source of origin using the tricks described above. Thus, browsers can limit domains according to these source domain suffixes. And here there is also a small interesting nuance - browsers understand when (document.domain) can be written, and when it cannot.

There is a reason for this, which we will look at in a second. Thus, two frames can access each other if at least one of the two provisions is true:

both sets of frames set (document.domain) the same value;
None of these frames can change (document.domain), despite the fact that the value of this document is the same in both frames.

The main idea of these rules is that they protect the domain from an attack caused by their own mistakes or the harmfulness of one of the subdomains.

Imagine that you have a xyzcom domain that is trying to attack a yzcom domain, because the first domain contains an error or is malicious. He will try to shorten the yzcom domain to the .com type, in order to then start “cheating” with the state of JavaScript, or cookies, or other things.

So, these two rules mean that if yzcom doesn't want to let anyone interact with it, then it will never change the value (document.domain). So when the xyzcom frame wants to cut it, the browser will say: “Yeah, you want to cut it, but you don't have the right to do it!”. There is a coincidence of values, but the domain yzcom did not indicate that he wants to participate in it. It's clear? In this case, you can see that most frames work with the same origin policy.

And now we can see how our DOM node is processed. It is quite simple. Normally, DOM nodes receive origin from the frame around them. In the case of cookies, this is somewhat more complicated. Cookies have a domain and they have their own path. For example, you can imagine that a cookie may be associated with the following information, for example * .mit.edu / 6.858. In this case, the cookie has a domain * .mit.edu / and the path is 6.858.

Please note that this domain may possibly be a full suffix of the pages of the current domain. So you can do the same tricks with him as those we talked about earlier. Note that this 6.858 path can also be represented as a slash, followed by nothing, indicating that all the paths of the domain must have access to the cookie located here.

But in this case we have a specific path address. So, whoever sets these cookies, he gets a chance to see what the domain and the path look like. And these values can be set both on the server side and on the client side. On the client side, you will have a JavaScript object called document.cookie. This is the format used to indicate all paths to similar objects.

There is the Secure flag security flag, which you can set on a cookie to indicate that it is an HTTPS file. In this case, HTTP content should not have access to this cookie. This is the main idea of cookies.

Please note that whenever the browser generates a request to a specific web server, it is going to include all relevant cookies in this request. There is a kind of correspondence lines and algorithms that allow you to find out that these are exactly the right cookies that should be sent in response to a request, because there may be strange things with domain suffixes, and so on.

Audience: can frames access each other's cookies if they comply with these rules?

Professor:Yes, frames can do this. But it depends on how the document.domain is set, the domain of the cookies and the path. So, frames can access each other's cookies, hence the question: can there be a problem if you allow an arbitrary frame to record people with arbitrary cookies? Suffice to say it will be bad. The reason why this is bad is that these cookies allow the client side of the application to store user data.

So you can imagine - if an attacker can control or override user cookies, he can, for example, change the cookie for Gmail so that the user logs in via the Gmail account belonging to this attacker. In this case, any letter of the user could be read by the attacker. You can imagine that someone will be able to take possession of cookies from Amazon.com, in order to put all sorts of ridiculous purchases and the like into your shopping cart.

Thus, cookies are a very important resource for protection. And many network security attacks are designed to steal them and use for harm.
There is another interesting question related to cookies. Suppose you have a website foo.co.uk. What if the site with this hostname will be able to set cookies for the co.uk site?

Here there is a subtlety related to the rules that we discussed earlier, because the first site should be able to shorten its domain and set cookies for the second, here everything seems legal. But from a human point of view, we look at it with suspicion, because we understand that co.uk is a single atomic domain. However, this is equivalent to .com. We can say that the British screwed up, that they should have a point here. But it is not their fault. From a moral point of view, this is a single single domain that can not be broken. Thus, we need to have some special infrastructure in order to configure the cookie to work properly.

Mozilla has a website called publicsuffix.org, which contains lists of rules for how cookies, origin and domains should be reduced, taking into account that there may be periods in some things, despite the fact that they should be considered as a single atomic whole.

So when your browser finds out how it should be manipulated with different cookies, it should check with this side of the question. Or somehow make sure that foo.co.uk can not shorten the domain to co.uk. So there is a very sensitive issue.

There are still many interesting web security issues that we discover in the process, because a lot of the original infrastructure was designed specifically for the English language. For example, ASCII text or something like that. It was not originally designed for use by the international community.

But as the Internet became more popular, people began to say: “Hey, at the beginning we created a rather large design of solutions and now we must make it suitable for people who are forced to use our narrow understanding of what language means.” Therefore, we now face all these insane problems.

Consider how XML HTTP responses are handled by the same origin policy. By default, JavaScript can only generate one of them if it is built on its origin server. However, there is a new interface called the Cross Origin Request, or CORS.

So, this is the same origin, unless the server included this CORS gizmo. Basically, a new HTTP response header called Access-Control-Allow-Origin is added.

Let's say javascript from foo.com wants to make an XML HTTP request to bar.com. As described in the rules, there is a cross-origin. And if the bar.com server wants to allow it, it will return its HTTP response with the header: “yes, I allow foo.com to send me these cross-origin XML HTTP requests”.

In general, the bar.com server can answer “no”, that is, it can refuse the request. In this case, the browser will not be able to execute the XML HTTP request. So this is a kind of new thing, which appeared mostly because of mixed applications. It is needed for applications from different developers and different domains, so that they have the opportunity to exchange data with each other.

So instead of foo.com there may be asterisks here, if someone wants to get cross-origin data of a cross-origin, and so on. I think it's pretty simple. There are a lot of other resources that we could see, for example, images. Thanks to Access-Control-Allow-Origin, the frame can download images from any source from which it wants. But he cannot check the bits of this image, because it is considered that with different policies of sources of origin it is not good to check the contents of each other's files.

Although the frame cannot verify the bits, it can still infer the size of the images, because it sees how they are placed on the page. So this is another one of these strange cases where the same source of origin policy is supposedly trying to prevent all information leaks. But in fact, it is not able to prevent all this, because embedding inheritance, in fact, shows some types of information.

CSS is similar to images, so a frame can embed CSS from any source. However, it cannot directly verify the text inside the CSS file if it is from another source. But he can recognize what this CSS does, because it simply creates a bunch of nodes and then looks at how their style changes. And it looks a little weird.

JavaScript is my favorite example of how the same origin policy tries to support any type of intellectual sequence. The idea here is that if you cross-select JavaScript, this is allowed. You can allow external JavaScript to run in the context of your own page, but you cannot look at the source code.

Therefore, if you have a script tag source equal to something outside your domain, then when that source is executed, you can initiate functions in it. But at the same time you can not view the source code of JavaScript.

It all looks very good, but it has a bunch of “holes”. For example, JavaScript is a dynamic scripting language. And functions are first-class objects. Thus, for any f function, you can simply use the f.tostring () function, and this will give you the source code for the f function. And people do this all the time, they do dynamic rewriting and the like.

Thus, although origin policies do not allow you to directly view the contents of a script tag, you can simply perform the specified operation and get the source code.

Similarly, you can get your home server from your domain just to get the source code on it and then send it back to you. That is, in essence, you simply asked your home server to start the Wget program in order to get the source code in this way. So this also looks a bit silly, that is, the origin policy is a bit strange here.

Audience: Suppose that the reason they do this is to prevent the user from getting the javascript, because then the cookie can also be sent. That is, the user will be able to adapt the resulting JavaScript to their needs.

Professor: yes it is.

Lecture hall:so if you get your server to do this, it will not be able to provide you with custom cookies.

Professor: this is true, although in practice the “raw” source code is not intended to be modified by the user. But you are right that this will prevent some attacks.

So, due to the fact that a user or an application can easily get the source code of JavaScript, during deployment this code is masked and minimized. Therefore, if you ever tried to see how a web page works, that is, tried to look at its mechanism, you could see just a solid white field. Sometimes people also change all variable names to very short, which will look like symbols from exclamation marks, similar to how animated characters swear in cartoons. So this is a kind of fraudulent form of digital rights management.

But all this is ultimately not serious, because you can execute this code in your browser. If people use some sort of web page obfuscation or something like that, they often try to hide some secrets in their HTML or JavaScript. Maybe they want to hide the protocol if the client uses it to communicate with the server. Often, minimizing variable names is used simply to reduce page load times.

So, we looked at the issue with JavaScript, and now let's look at what plugins are. They are similar to Java, so the frame can easily run the plugin from any source.

Nowadays, plug-ins are similar to dinosaurs, because HTML5 has many built-in new features, such as video tags and the like, that can do the same thing as individual Java plug-ins. So it is not clear how long the plugins can still exist.

Remember that when the browser generates an HTTP request, it automatically includes the appropriate cookies. So what happens if a malicious site generates a URL that looks like this?

For example, it creates a new child frame with the URL bank.com. Then he tries to imitate what the browser will do to transfer the user's money to someone else. Thus, in this URL, in this frame that the hacker is trying to create, he wants to initiate the command to transfer the $ 500 user to his bank account. Now the user is forced to visit the attacker's page, because the hacker redirected him there.

But what is interesting is that even if the attacker’s page cannot see the contents of this child frame, because it will have a different origin, the bank.com page will still do what the hacker wants, because the browser sends all user cookies along with this request. . He will look at this command and say: “aha, the user probably asks me to transfer $ 500 to this mysterious user named attacker! Okay, I'll do it".
So there is a problem here. This attack is triggered because, in essence, an attacker can determine empirically how this command should look. There is no chance in this command. Essentially, an attacker can try this on his own bank account, figure out what the protocol looks like, and then somehow force the user's browser to perform a transfer action in the name of the attacker. This is what is called a cross-site request forgery, abbreviated CSRF.

Eliminate the possibility of attacks of this kind can, if you include some randomness in the generated URL. Accidents of this type, which the attacker can not guess statically.

Imagine that there is some form inside the bank web page. A form is what actually generates the same query as in our case:

<formaction = “/transfer.cdi”…>

And inside this form, we will have input input data, which is usually used for text input, keystrokes, mouse clicks, and the like. In fact, we can make this input hidden, so that it does not appear on the user’s page: input type = "hidden", give it the name = "csrf" attribute and random value value = "a72f ...". This form will be generated on the server.

Thus, when the user goes to this page, on the server side, the server generates this randomness value = "a72f ..." and embeds it in the HTML that the user receives. Therefore, when the user fills out this form, this URL of the form:

bank.com/xfer?amount=500&to=attacker

Complemented by a random token:

http://bank.com/xfer?amount=500&to=attacker/&csrf=a72f...

This means that the attacker must now be able to guess the specific token that the server generates for the user each time he goes to the bank page. So if we have enough random variable, then the hacker will not be able to fake anything, because if he specifies the wrong token, the server rules will reject his request.

58:00 min.

Continued:

Course MIT "Security of computer systems." Lecture 8: "Model of network security", part 3

Full version of the course is available here .

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps until December for free if you pay for a period of six months, you can order here .

Dell R730xd 2 times cheaper? Only here2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read about How to build an infrastructure building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?

Tags:

MIT course "Computer Systems Security". Lecture 8: "Model of network security", part 2

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

Also popular now: