
Investigation of the mechanism for blocking sites by Rostelecom and ways to bypass it
In this post I will give a small study of the mechanism for blocking sites by Rostelecom, and also show how to bypass it without using various tunnels to third-party hosts (proxies, vpn, etc.). This is likely to apply to some other providers.
RT has been blocking HTTP sites for some time by URL, not IP.
When blocked, a redirect of the form 95.167.13.50/?st=0&dt=& rs = where - IP to which the browser was connected, - The URL that he requested. If you look at the transmitted traffic, it becomes clear that only the beginning of the server response is overwritten, the rest remains as it is.
It is obvious that RT site blocking is manually controlled.
Not all sites from the registry are blocked. At a minimum, there are several HTTPS sites that are not blocked at all.
Typically, HTTPS sites are blocked by IP, sometimes the provider crawls into HTTPS, substituting their certificate, in this case, blocking by URL.
Sometimes an HTTPS site from the registry is blocked only by HTTP (respectively, by URL, and not by IP) and is easily accessible via HTTPS.
In the course of a series of experiments, the following locking principles were identified:
Thus, we come to the following workarounds:
I chose an implementation based on 3proxy. It includes a plugin that allows you to modify all transmitted data based on regular expressions. At the same time, the proxy is quite lightweight and undemanding, it can be installed on a regular router.
In accordance with the foregoing, the most convenient options in practice are adding an extra header in front of the Host and modifying the Host header. Obviously, Host modification is preferable, because does not increase the size of the request. I regularly use this method to decide for myself what information I can consume.
But in general, both options are easily customizable:
UPD:
@ValdikSS made a very interesting comment:
From me:
So a +1 workaround.
This method can also be used on the gateway / router. The rule, of course, must be added to the FORWARD chain.
Lock Result
RT has been blocking HTTP sites for some time by URL, not IP.
When blocked, a redirect of the form 95.167.13.50/?st=0&dt=
It looks something like this
HTTP / 1.1 302 Found Connection: close Location: http://95.167.13.50/?st=0&dt=192.237.142.117&rs=grani.ru/ f-8 Transfer-Encoding: chunked Connection: keep-alive 6d7Грани.Ру: Главное ...
The real answer of the site
Those. If you find a way to restore the headers, you can bypass the lock. Obviously, this is not the most affordable way.HTTP / 1.1 200 OK Server: nginx / 1.2.1 Date: Sun, 01 Feb 2015 17:34:03 GMT Content-Type: text / html; charset = utf-8 Transfer-Encoding: chunked Connection: keep-alive 6d7Грани.Ру: Главное ...
What and how is blocked
It is obvious that RT site blocking is manually controlled.
Not all sites from the registry are blocked. At a minimum, there are several HTTPS sites that are not blocked at all.
Typically, HTTPS sites are blocked by IP, sometimes the provider crawls into HTTPS, substituting their certificate, in this case, blocking by URL.
Sometimes an HTTPS site from the registry is blocked only by HTTP (respectively, by URL, and not by IP) and is easily accessible via HTTPS.
Explore deeper
In the course of a series of experiments, the following locking principles were identified:
- The first line of the query looks for the name of the HTTP method, space, URL, space or? or /.
Reacts to methods GET, POST, HEAD, DELETE, OPTIONS, TRACE. The PUT method has apparently been forgotten, it skips it. Other method names are also skipped. Names of methods with a changed case are also missing.
The check occurs only in the first line, if you insert an empty line at the beginning of the request, the request passes.
If the URL is "/", then only the name of the method is searched.
When adding an extra space after the method name, the request also passes without problems if the URL is not equal to "/".
Apparently, the URL is considered to have ended when there are spaces, "?" or "/". If you add some other character to the URL, the request passes. Including, if you add the line feed character, i.e. remove "HTTP / 1.1" from the request. - URL encoding (urlencode) does not help overcome censorship, including in different registers. Even if you encode the initial slash (% 2F), the request is blocked, although the web server does not understand this anymore.
- Next, the Host header is searched.
And it is searched in the same package.
And it is searched with the obligatory match to the form “Host:". Any extra character or a change in the case of the header name (host, HOST) allows the request to pass.
Changing the case of the characters of the domain itself, however, does not help, the lock is triggered.
Workarounds
Thus, we come to the following workarounds:
- Add a blank line to the top of the query. Not all web servers understand, in particular, nginx does not understand.
- Add a space before the URL. Popular web servers understand this. However, there may be problems in rare cases (such as here )
- Adding a character after the URL. Obviously, it must be some kind of character that the web server will ignore, but the censorship unit will decide that this is part of the URL. I could not find such a symbol.
- Remove protocol name and version ("HTTP / 1.1"). In this case, the request is perceived by the web server as HTTP / 1.0, and in this version of the protocol there was no Host header, so this will not work with many sites.
- Sending URL and Host in different packages.
You can simply call send on the first line of the request (HTTP method and URL), and then send the rest of the request in the usual way.
You can add some big enough header (about 1530 bytes to fill the entire package for sure) between these lines.
There were no problems with web servers in such cases. - Modification of the Host header.
You can change the case, add spaces before and after the domain.
There were no problems with web servers in such cases.
Practical implementation
I chose an implementation based on 3proxy. It includes a plugin that allows you to modify all transmitted data based on regular expressions. At the same time, the proxy is quite lightweight and undemanding, it can be installed on a regular router.
In accordance with the foregoing, the most convenient options in practice are adding an extra header in front of the Host and modifying the Host header. Obviously, Host modification is preferable, because does not increase the size of the request. I regularly use this method to decide for myself what information I can consume.
But in general, both options are easily customizable:
Adding an Extra Header
pcre_rewrite cliheader dunno "Host:" "X-Something:00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\ r \ nHost: "
Header modification
pcre_rewrite cliheader dunno "Host:" "HOST:"
Base config
# dns server nserver 77.88.8.8 nserver 8.8.8.8 # cache dns nscache 65536 #work in the background daemon # plugin connection, it is worth specifying the full path plugin PCREPlugin.ld.so pcre_plugin # one of the rules described above pcre_rewrite ... # launch proxy, option -a allows to get rid of Forwarded-For and Via headers proxy -a -p8080
UPD:
@ValdikSS made a very interesting comment:
As soon as you look at the traffic that comes to the interface from Rostelecom. The DPI is probably connected in parallel, not in series, and only client traffic comes in there. Because DPI is clearly closer than a website, a package with a Location from DPI arrives faster than the actual first package from a website, and a package from a website is already discarded by the OS kernel as a retransmission, so if you use Linux, just one line in iptables is enough to bypass the lock :iptables -A INPUT -p tcp --sport 80 -m string --algo bm --string "http://95.167.13.50/?st" -j DROP
From me:
Indeed, there is a retransmission. I watched the traffic, but obviously not carefully enough.
First comes a packet that contains only HTTP 302 and Location, then comes a packet with a normal site response.
However, the system does not discard the second packet, but uniquely combines with the first.
Those. packages come1HTTP / 1.1 302 Found Connection: close Location: http://95.167.13.50/?st=0&dt=192.237.142.117&rs=grani.ru/2And the application sees thisHTTP / 1.1 200 OK Server: nginx / 1.2.1 Date: Sun, 01 Feb 2015 17:34:03 GMT Content-Type: text / html; charset = utf-8 Transfer-Encoding: chunked Connection: keep-alive 6d7 ...SoHTTP / 1.1 302 Found Connection: close Location: http://95.167.13.50/?st=0&dt=192.237.142.117&rs=grani.ru/ f-8 Transfer-Encoding: chunked Connection: keep-alive 6d7 ...
This is observed on both Windows and Linux.
But the above iptables rule really solves the issue.
So a +1 workaround.
This method can also be used on the gateway / router. The rule, of course, must be added to the FORWARD chain.