Web Security: an introduction to HTTP
- Transfer
HTTP is a beautiful thing: a protocol that has existed for more than 20 years without much change.
This is the second part of the web security series: the first part was “ How browsers work .”
As we saw in the previous article, browsers interact with web applications via HTTP, and this is the main reason we delve into this topic. If users enter their credit card information on the website, and the attacker can intercept the data before they get to the server, we will certainly have problems.
Understanding how HTTP works, how we can protect the communication between clients and servers, and what security-related functions the protocol offers is the first step towards improving our security.
When discussing HTTP, however, we must always distinguish between semantics and technical implementation, since these are two completely different aspects of HTTP operation.
The key difference between them can be explained by a very simple analogy: 20 years ago, people took care of their relatives as much as they do now, even though the way they interact has changed significantly. Our parents will probably take their car and go to their sister to catch up and spend some time with their family.
Instead, these days, they often send messages to WhatsApp, make phone calls, or use a group on Facebook, which was previously impossible. This does not mean that people communicate or care more or less, but simply that their interaction has changed.
HTTP is no different: the semantics of the protocol has not changed much, while the technical implementation of the interaction between clients and servers has been optimized over the years. If you look at the HTTP request of 1996, it will be very similar to the ones we saw in the previous article, although the way these packets go through the network is very different.
As we have seen, HTTP follows a request / response model when a client connected to a server sends a request and the server responds to it.
An HTTP message (request or response) consists of several parts:
In the request, the first line indicates the method used by the client, the path to the resource he wants, and the version of the protocol he is going to use:
In this case, the client tries to get the resource (
After the first line HTTP allows us to add metadata to the message through the headers, which take the form of key-value, separated by colons: For example, in this query, the client added to the request 3 extra header: , and . Wait ?
Headers should not use specific, reserved names, but it is usually recommended to rely on those that are standardized in the HTTP specification: the more you deviate from the standards, the less you understand the other exchange participant.
However, it sometimes makes sense to include a “custom” header in the message, since you can add metadata that is not actually part of the HTTP specification: the server may decide to include technical information in its response so that the client can simultaneously perform requests and receive important information about state of the server that returns the response: When using custom headers, it is always preferable to prefix them with a key so that they do not conflict with other headers that may become standard in the future: historically it worked well, until everyone started using “non-standard” prefixes, which in turn became the norm. Headers and are examples of custom headers that
If you need to add your own custom header, now it’s usually better to use a company prefix such as
After the headers, the request may contain a body, which is separated from the headers by an empty line: Our request is completed: the first line (location and protocol information), headers and body. Note that the body is completely optional and, in most cases, it is used only when we want to send data to the server, so the method used in the example above . The answer does not make much difference:
The first information that is sent in the response is the version of the protocol that it uses, along with the status of that response. Then follow the headers and, if necessary, the line break, followed by the body.
As already mentioned, the protocol has undergone numerous revisions and over time new functions have been added (new headers, status codes, etc.), But the basic structure has not changed much (first line, headers and body). What has really changed is how clients and servers exchange these messages — let's look at this in more detail.
There are 2 significant semantic changes in HTTP:
“Where is HTTPS and HTTP2 ?”, You ask.
HTTPS and HTTP2 (H2 for short) are more technical changes, as they introduced new ways of delivering messages over the Internet, without significantly affecting the semantics of the protocol.
HTTPS is a “secure” HTTP extension and includes the establishment of a shared secret key between the client and the server, ensuring that we communicate with the right party and encrypt messages that exchange the shared secret key (more on this later). While HTTPS was aimed at improving the security of the HTTP protocol, H2 was aimed at providing high speed.
H2 uses binary, not text messages, supports multiplexing, uses the HPACK algorithm for header compression ....... In short, H2 improves HTTP / 1.1 performance.
Website owners reluctantly switched to HTTPS, as this included additional workarounds between the client and the server (as already mentioned, you need to set a shared secret key between the two parties), thereby slowing down the user experience: there is no longer any excuse for encrypted H2 , since functions such as multiplexing and server push make it better than simple HTTP / 1.1 .
HTTPS (HTTP Secure) allows clients and servers to securely communicate through TLS (Transport Layer Security), the successor to SSL (Secure Socket Layer).
The problem TLS is focused on is quite simple and can be illustrated with one simple metaphor: your other half calls you in the middle of the day when you are in a meeting and asks you to tell them the password of your online banking account, since it must execute banking translation to ensure timely payment for your son's education. It is very important that you report this right now, otherwise you will face the possibility that your child will be dismissed from school the next morning.
Now you have two problems:
What are you going to do? This is exactly the problem HTTPS is trying to solve.
To check who you are talking to, HTTPS uses Public Key Certificates, which are nothing more than certificates that indicate the identity of a particular server: when you connect via HTTPS to an IP address, the server behind that address presents you his certificate is for you to verify your identity. Going back to our analogy, you can simply ask your soul mate to say your social security number. As soon as you make sure that the number is correct, you get an additional level of trust.
This, however, does not prevent “intruders” from finding out the victim’s social security number, stealing your other half’s smartphone and calling you. How do we verify the identity of the caller?
Instead of directly asking your soul mate to write your social security number, instead you call your mom (who lives next door) and ask her to go to your apartment and make sure that it is your other half who says the social security number. This adds an extra level of trust, since you do not consider your mother a threat and rely on it to verify the identity of the caller.
In terms of HTTPS your mom called CA, short for Certificate Authority: the work of CA is to verify the identity of a particular server and issue a certificate with its own digital signature, which means that when you connect to a specific domain, I get no certificate generated by the owner of the domain (the so-called self-signed certificate ), and CA.
The task of CA is that they authenticate the domain and issue a certificate accordingly: when you “order” a certificate (usually called an SSL certificate, although TLS is currently used instead — the names really stick!), The CA can call you or ask to change the DNS setting to make sure you are in control of this domain. After the verification process is complete, it will issue a certificate, which can then be installed on web servers.
Clients, such as browsers, will then connect to your servers and receive this certificate so that they can verify its authenticity: browsers have a kind of “relationship” with CA, in the sense that they track the list of trusted domains in CA to make sure that The certificate is really trustworthy. If the certificate is not signed by a trusted authority, the browser will display a large informational warning for users:
We are halfway through the connection between you and your second half: now that we have authenticated (caller ID), we need to make sure that we can communicate safely, without the intervention of others in the process. As I mentioned, you are right in the middle of the meeting and you need to record your password for online banking. You need to find a way to encrypt your communication so that only you and your soul mate can understand your conversation.
You can do this by setting a shared secret key between you two and encrypt messages using this key: for example, you can use the Caesar cipher option based on your wedding date.
This will work well if both sides have established relationships, like you and your other half, as they can create a key based on shared memory that nobody knows about. Browsers and servers, however, cannot use the same mechanism, since they do not know each other in advance.
Instead, variations of the Diffie-Hellman key exchange protocol are used , which ensure that the parties without prior knowledge establish a shared secret key and no one else can “steal” it. This includes the use of mathematics .
Once the secret key is established, the client and server can communicate without fear that someone can intercept their messages. Even if attackers do this, they will not have the shared secret key needed to decrypt messages.
For more information on HTTPS and Diffie-Hellman, I would recommend reading “ How HTTPS Protects Connections ” by Hartley Brodie and “ How does HTTPS actually work? »Robert Heaton. In addition, the Nine Algorithms That Changed the Future has an amazing chapter that explains public key encryption, and I warmly recommend it to computer science fans interested in original algorithms.
Still deciding whether you should support HTTPS on your site? I have bad news for you: browsers have begun to protect users from websites that do not support HTTPS in order to “force” web developers to provide fully encrypted browsing capabilities.
Behind the motto “ HTTPS everywhere ”, browsers began to speak out against unencrypted connections - Google was the first browser provider to give web developers a deadline, announcing that since Chrome 68 (July 2018) it will mark HTTP websites as “insecure” :
Even more disturbing for non-HTTPS websites is the fact that as soon as a user enters anything on a web page, the “Insecure” label turns red - this step should prompt users to think twice before sharing data with websites that do not support https.
Compare this with the way the HTTPS website looks and has a valid certificate:
Theoretically, a website should not be secure, but in practice it discourages users - and rightly so. In those days, when H2 was not a reality, it would make sense to stick with unencrypted, simple HTTP traffic. Currently, there is little reason for this. Join the “HTTPS everywhere” movement and help make the Internet a safer place to surf .
As we saw earlier, the HTTP request begins with a kind of “first line”:
First of all, the client tells the server which methods it uses to execute the request: the basic HTTP methods include
Theoretically, no method is safer than others; in practice, things are not so simple.
GET requests usually do not contain a body, so parameters are included in URLs (for example,
To illustrate the important difference between these methods, we need to look at the web server logs that you may already be familiar with:
As you can see, web servers register the request path: this means that if you include sensitive data in your URL, they will be skipped by the web server and stored somewhere in your logs - your confidential data will be somewhere in plain text, which we need to completely avoid. Imagine that an attacker could get access to one of your old log files , which may contain credit card information, access tokens for your private services, etc., it will be a complete disaster.
Web servers do not journal HTTP headers and bodies, since the stored data will be too voluminous - that is why sending information through the body of the request, rather than the URL, is usually safer. From here we can infer that
In this article, we looked at HTTP, its development, and how its secure extension combines authentication and encryption to allow clients and servers to exchange data through a secure channel: this is not all that HTTP can offer from a security perspective.
The translation was made with the support of the company EDISON Software , which professionally deals with security , as well as develops electronic medical verification systems .
This is the second part of the web security series: the first part was “ How browsers work .”
As we saw in the previous article, browsers interact with web applications via HTTP, and this is the main reason we delve into this topic. If users enter their credit card information on the website, and the attacker can intercept the data before they get to the server, we will certainly have problems.
Understanding how HTTP works, how we can protect the communication between clients and servers, and what security-related functions the protocol offers is the first step towards improving our security.
When discussing HTTP, however, we must always distinguish between semantics and technical implementation, since these are two completely different aspects of HTTP operation.
The key difference between them can be explained by a very simple analogy: 20 years ago, people took care of their relatives as much as they do now, even though the way they interact has changed significantly. Our parents will probably take their car and go to their sister to catch up and spend some time with their family.
Instead, these days, they often send messages to WhatsApp, make phone calls, or use a group on Facebook, which was previously impossible. This does not mean that people communicate or care more or less, but simply that their interaction has changed.
HTTP is no different: the semantics of the protocol has not changed much, while the technical implementation of the interaction between clients and servers has been optimized over the years. If you look at the HTTP request of 1996, it will be very similar to the ones we saw in the previous article, although the way these packets go through the network is very different.
Overview
As we have seen, HTTP follows a request / response model when a client connected to a server sends a request and the server responds to it.
An HTTP message (request or response) consists of several parts:
- "First line" (first line)
- headers (request headers)
- body
In the request, the first line indicates the method used by the client, the path to the resource he wants, and the version of the protocol he is going to use:
GET /players/lebron-james HTTP/1.1
In this case, the client tries to get the resource (
GET
) at the address /Players/Lebron-James
through the protocol version 1.1
- nothing difficult to understand. After the first line HTTP allows us to add metadata to the message through the headers, which take the form of key-value, separated by colons: For example, in this query, the client added to the request 3 extra header: , and . Wait ?
GET /players/lebron-james HTTP/1.1
Host: nba.com
Accept: */*
Coolness: 9000
Host
Accept
Coolness
Coolness
Headers should not use specific, reserved names, but it is usually recommended to rely on those that are standardized in the HTTP specification: the more you deviate from the standards, the less you understand the other exchange participant.
Cache-Control
- this is, for example, the header used to determine whether (and how) the answer is cachable: most proxies and reverse proxies understand it by following the HTTP specification before the letter. If you had to rename the header Cache-Control
to Awesome-Cache-Control
, the proxy would have no idea how to cache the response, since they were not created to match the specification you just came up with.However, it sometimes makes sense to include a “custom” header in the message, since you can add metadata that is not actually part of the HTTP specification: the server may decide to include technical information in its response so that the client can simultaneously perform requests and receive important information about state of the server that returns the response: When using custom headers, it is always preferable to prefix them with a key so that they do not conflict with other headers that may become standard in the future: historically it worked well, until everyone started using “non-standard” prefixes, which in turn became the norm. Headers and are examples of custom headers that
...
X-Cpu-Usage: 40%
X-Memory-Available: 1%
...
X
X-Forwarded-For
X-Forwarded-Proto
are widely used and understood by load balancers and proxies , even if they are not part of the HTTP standard . If you need to add your own custom header, now it’s usually better to use a company prefix such as
Acme-Custom-Header
or A-Custom-Header
. After the headers, the request may contain a body, which is separated from the headers by an empty line: Our request is completed: the first line (location and protocol information), headers and body. Note that the body is completely optional and, in most cases, it is used only when we want to send data to the server, so the method used in the example above . The answer does not make much difference:
POST /players/lebron-james/comments HTTP/1.1
Host: nba.com
Accept: */*
Coolness: 9000
Best Player Ever
POST
HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: private, max-age=3600
{"name": "Lebron James", "birthplace": "Akron, Ohio", ...}
The first information that is sent in the response is the version of the protocol that it uses, along with the status of that response. Then follow the headers and, if necessary, the line break, followed by the body.
As already mentioned, the protocol has undergone numerous revisions and over time new functions have been added (new headers, status codes, etc.), But the basic structure has not changed much (first line, headers and body). What has really changed is how clients and servers exchange these messages — let's look at this in more detail.
HTTP vs HTTPS vs H2
There are 2 significant semantic changes in HTTP:
HTTP / 1.0
and HTTP / 1.1.
“Where is HTTPS and HTTP2 ?”, You ask.
HTTPS and HTTP2 (H2 for short) are more technical changes, as they introduced new ways of delivering messages over the Internet, without significantly affecting the semantics of the protocol.
HTTPS is a “secure” HTTP extension and includes the establishment of a shared secret key between the client and the server, ensuring that we communicate with the right party and encrypt messages that exchange the shared secret key (more on this later). While HTTPS was aimed at improving the security of the HTTP protocol, H2 was aimed at providing high speed.
H2 uses binary, not text messages, supports multiplexing, uses the HPACK algorithm for header compression ....... In short, H2 improves HTTP / 1.1 performance.
Website owners reluctantly switched to HTTPS, as this included additional workarounds between the client and the server (as already mentioned, you need to set a shared secret key between the two parties), thereby slowing down the user experience: there is no longer any excuse for encrypted H2 , since functions such as multiplexing and server push make it better than simple HTTP / 1.1 .
Https
HTTPS (HTTP Secure) allows clients and servers to securely communicate through TLS (Transport Layer Security), the successor to SSL (Secure Socket Layer).
The problem TLS is focused on is quite simple and can be illustrated with one simple metaphor: your other half calls you in the middle of the day when you are in a meeting and asks you to tell them the password of your online banking account, since it must execute banking translation to ensure timely payment for your son's education. It is very important that you report this right now, otherwise you will face the possibility that your child will be dismissed from school the next morning.
Now you have two problems:
- authentication of what you are really talking to your soulmate, because it could be someone pretending to be
- encryption : transfer a password so that your colleagues cannot understand and write it
What are you going to do? This is exactly the problem HTTPS is trying to solve.
To check who you are talking to, HTTPS uses Public Key Certificates, which are nothing more than certificates that indicate the identity of a particular server: when you connect via HTTPS to an IP address, the server behind that address presents you his certificate is for you to verify your identity. Going back to our analogy, you can simply ask your soul mate to say your social security number. As soon as you make sure that the number is correct, you get an additional level of trust.
This, however, does not prevent “intruders” from finding out the victim’s social security number, stealing your other half’s smartphone and calling you. How do we verify the identity of the caller?
Instead of directly asking your soul mate to write your social security number, instead you call your mom (who lives next door) and ask her to go to your apartment and make sure that it is your other half who says the social security number. This adds an extra level of trust, since you do not consider your mother a threat and rely on it to verify the identity of the caller.
In terms of HTTPS your mom called CA, short for Certificate Authority: the work of CA is to verify the identity of a particular server and issue a certificate with its own digital signature, which means that when you connect to a specific domain, I get no certificate generated by the owner of the domain (the so-called self-signed certificate ), and CA.
The task of CA is that they authenticate the domain and issue a certificate accordingly: when you “order” a certificate (usually called an SSL certificate, although TLS is currently used instead — the names really stick!), The CA can call you or ask to change the DNS setting to make sure you are in control of this domain. After the verification process is complete, it will issue a certificate, which can then be installed on web servers.
Clients, such as browsers, will then connect to your servers and receive this certificate so that they can verify its authenticity: browsers have a kind of “relationship” with CA, in the sense that they track the list of trusted domains in CA to make sure that The certificate is really trustworthy. If the certificate is not signed by a trusted authority, the browser will display a large informational warning for users:
We are halfway through the connection between you and your second half: now that we have authenticated (caller ID), we need to make sure that we can communicate safely, without the intervention of others in the process. As I mentioned, you are right in the middle of the meeting and you need to record your password for online banking. You need to find a way to encrypt your communication so that only you and your soul mate can understand your conversation.
You can do this by setting a shared secret key between you two and encrypt messages using this key: for example, you can use the Caesar cipher option based on your wedding date.
This will work well if both sides have established relationships, like you and your other half, as they can create a key based on shared memory that nobody knows about. Browsers and servers, however, cannot use the same mechanism, since they do not know each other in advance.
Instead, variations of the Diffie-Hellman key exchange protocol are used , which ensure that the parties without prior knowledge establish a shared secret key and no one else can “steal” it. This includes the use of mathematics .
Once the secret key is established, the client and server can communicate without fear that someone can intercept their messages. Even if attackers do this, they will not have the shared secret key needed to decrypt messages.
For more information on HTTPS and Diffie-Hellman, I would recommend reading “ How HTTPS Protects Connections ” by Hartley Brodie and “ How does HTTPS actually work? »Robert Heaton. In addition, the Nine Algorithms That Changed the Future has an amazing chapter that explains public key encryption, and I warmly recommend it to computer science fans interested in original algorithms.
Https everywhere
Still deciding whether you should support HTTPS on your site? I have bad news for you: browsers have begun to protect users from websites that do not support HTTPS in order to “force” web developers to provide fully encrypted browsing capabilities.
Behind the motto “ HTTPS everywhere ”, browsers began to speak out against unencrypted connections - Google was the first browser provider to give web developers a deadline, announcing that since Chrome 68 (July 2018) it will mark HTTP websites as “insecure” :
Even more disturbing for non-HTTPS websites is the fact that as soon as a user enters anything on a web page, the “Insecure” label turns red - this step should prompt users to think twice before sharing data with websites that do not support https.
Compare this with the way the HTTPS website looks and has a valid certificate:
Theoretically, a website should not be secure, but in practice it discourages users - and rightly so. In those days, when H2 was not a reality, it would make sense to stick with unencrypted, simple HTTP traffic. Currently, there is little reason for this. Join the “HTTPS everywhere” movement and help make the Internet a safer place to surf .
GET vs POST
As we saw earlier, the HTTP request begins with a kind of “first line”:
First of all, the client tells the server which methods it uses to execute the request: the basic HTTP methods include
GET, POST, PUT и DELETE,
but the list can be continued with less common (but still standard) methods such as TRACE, OPTIONS
, or HEAD
. Theoretically, no method is safer than others; in practice, things are not so simple.
GET requests usually do not contain a body, so parameters are included in URLs (for example,
www.example.com/articles?article_id=1
), while POST requests are usually used to send (“publish”) data that is included in the body. Another difference is the side effects that these methods carry:GET
- idempotent method, meaning that no matter how many requests you send, you will not change the status of the web server. Instead, POST
it is not idempotent: for each request you send you can change the state of the server (think, for example, about placing a new payment - now you probably understand why websites ask you not to refresh the page when executing a transaction). To illustrate the important difference between these methods, we need to look at the web server logs that you may already be familiar with:
192.168.99.1 - [192.168.99.1] - - [29/Jul/2018:00:39:47 +0000] "GET /?token=1234 HTTP/1.1" 200 525 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36" 404 0.002 [example-local] 172.17.0.8:9090 525 0.002 200
192.168.99.1 - [192.168.99.1] - - [29/Jul/2018:00:40:47 +0000] "GET / HTTP/1.1" 200 525 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36" 393 0.004 [example-local] 172.17.0.8:9090 525 0.004 200
192.168.99.1 - [192.168.99.1] - - [29/Jul/2018:00:41:34 +0000] "PUT /users HTTP/1.1" 201 23 "http://example.local/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36" 4878 0.016 [example-local] 172.17.0.8:9090 23 0.016 201
As you can see, web servers register the request path: this means that if you include sensitive data in your URL, they will be skipped by the web server and stored somewhere in your logs - your confidential data will be somewhere in plain text, which we need to completely avoid. Imagine that an attacker could get access to one of your old log files , which may contain credit card information, access tokens for your private services, etc., it will be a complete disaster.
Web servers do not journal HTTP headers and bodies, since the stored data will be too voluminous - that is why sending information through the body of the request, rather than the URL, is usually safer. From here we can infer that
POST
(and similar non-idempotent methods) is safer than GET
, even if it depends more on how the data is sent using a particular method, and not on the fact that a particular method is essentially safer than others: if you included confidential information in the body request GET
, you would have no more problems than using POST
, even if such an approach would be considered unusual.We believe in HTTP headers
In this article, we looked at HTTP, its development, and how its secure extension combines authentication and encryption to allow clients and servers to exchange data through a secure channel: this is not all that HTTP can offer from a security perspective.
The translation was made with the support of the company EDISON Software , which professionally deals with security , as well as develops electronic medical verification systems .