peter23 February 2, 2015 at 12:47

Investigation of the mechanism for blocking sites by Rostelecom and ways to bypass it

In this post I will give a small study of the mechanism for blocking sites by Rostelecom, and also show how to bypass it without using various tunnels to third-party hosts (proxies, vpn, etc.). This is likely to apply to some other providers.

Lock Result

RT has been blocking HTTP sites for some time by URL, not IP.
When blocked, a redirect of the form 95.167.13.50/?st=0&dt=& rs =where - IP to which the browser was connected, - The URL that he requested. If you look at the transmitted traffic, it becomes clear that only the beginning of the server response is overwritten, the rest remains as it is.

It looks something like this

HTTP / 1.1 302 Found
Connection: close
Location: http://95.167.13.50/?st=0&dt=192.237.142.117&rs=grani.ru/
f-8
Transfer-Encoding: chunked
Connection: keep-alive
6d7


  
    
      Грани.Ру:
      Главное
    
...

The real answer of the site

HTTP / 1.1 200 OK
Server: nginx / 1.2.1
Date: Sun, 01 Feb 2015 17:34:03 GMT
Content-Type: text / html; charset = utf-8
Transfer-Encoding: chunked
Connection: keep-alive
6d7


  
    
      Грани.Ру:
      Главное
    
...

Those. If you find a way to restore the headers, you can bypass the lock. Obviously, this is not the most affordable way.

What and how is blocked

It is obvious that RT site blocking is manually controlled.

Not all sites from the registry are blocked. At a minimum, there are several HTTPS sites that are not blocked at all.
Typically, HTTPS sites are blocked by IP, sometimes the provider crawls into HTTPS, substituting their certificate, in this case, blocking by URL.

Sometimes an HTTPS site from the registry is blocked only by HTTP (respectively, by URL, and not by IP) and is easily accessible via HTTPS.

Explore deeper

In the course of a series of experiments, the following locking principles were identified:

The first line of the query looks for the name of the HTTP method, space, URL, space or? or /.
Reacts to methods GET, POST, HEAD, DELETE, OPTIONS, TRACE. The PUT method has apparently been forgotten, it skips it. Other method names are also skipped. Names of methods with a changed case are also missing.

The check occurs only in the first line, if you insert an empty line at the beginning of the request, the request passes.
If the URL is "/", then only the name of the method is searched.

When adding an extra space after the method name, the request also passes without problems if the URL is not equal to "/".
Apparently, the URL is considered to have ended when there are spaces, "?" or "/". If you add some other character to the URL, the request passes. Including, if you add the line feed character, i.e. remove "HTTP / 1.1" from the request.
URL encoding (urlencode) does not help overcome censorship, including in different registers. Even if you encode the initial slash (% 2F), the request is blocked, although the web server does not understand this anymore.
Next, the Host header is searched.
And it is searched in the same package.
And it is searched with the obligatory match to the form “Host:". Any extra character or a change in the case of the header name (host, HOST) allows the request to pass.
Changing the case of the characters of the domain itself, however, does not help, the lock is triggered.

Workarounds

Thus, we come to the following workarounds:

Add a blank line to the top of the query. Not all web servers understand, in particular, nginx does not understand.
Add a space before the URL. Popular web servers understand this. However, there may be problems in rare cases (such as here )
Adding a character after the URL. Obviously, it must be some kind of character that the web server will ignore, but the censorship unit will decide that this is part of the URL. I could not find such a symbol.
Remove protocol name and version ("HTTP / 1.1"). In this case, the request is perceived by the web server as HTTP / 1.0, and in this version of the protocol there was no Host header, so this will not work with many sites.
Sending URL and Host in different packages.
You can simply call send on the first line of the request (HTTP method and URL), and then send the rest of the request in the usual way.
You can add some big enough header (about 1530 bytes to fill the entire package for sure) between these lines.
There were no problems with web servers in such cases.
Modification of the Host header.
You can change the case, add spaces before and after the domain.
There were no problems with web servers in such cases.

Practical implementation

I chose an implementation based on 3proxy. It includes a plugin that allows you to modify all transmitted data based on regular expressions. At the same time, the proxy is quite lightweight and undemanding, it can be installed on a regular router.

In accordance with the foregoing, the most convenient options in practice are adding an extra header in front of the Host and modifying the Host header. Obviously, Host modification is preferable, because does not increase the size of the request. I regularly use this method to decide for myself what information I can consume.

But in general, both options are easily customizable:

Adding an Extra Header

pcre_rewrite cliheader dunno "Host:" "X-Something:00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\ r \ nHost: "

Header modification

pcre_rewrite cliheader dunno "Host:" "HOST:"

Base config

# dns server
nserver 77.88.8.8
nserver 8.8.8.8
# cache dns
nscache 65536
#work in the background
daemon
# plugin connection, it is worth specifying the full path
plugin PCREPlugin.ld.so pcre_plugin
# one of the rules described above
pcre_rewrite ...
# launch proxy, option -a allows to get rid of Forwarded-For and Via headers
proxy -a -p8080

UPD:
@ValdikSS made a very interesting comment:

As soon as you look at the traffic that comes to the interface from Rostelecom. The DPI is probably connected in parallel, not in series, and only client traffic comes in there. Because DPI is clearly closer than a website, a package with a Location from DPI arrives faster than the actual first package from a website, and a package from a website is already discarded by the OS kernel as a retransmission, so if you use Linux, just one line in iptables is enough to bypass the lock :
iptables -A INPUT -p tcp --sport 80 -m string --algo bm --string "http://95.167.13.50/?st" -j DROP

From me:

Indeed, there is a retransmission. I watched the traffic, but obviously not carefully enough.
First comes a packet that contains only HTTP 302 and Location, then comes a packet with a normal site response.
However, the system does not discard the second packet, but uniquely combines with the first.
Those. packages come
1
HTTP / 1.1 302 Found
Connection: close
Location: http://95.167.13.50/?st=0&dt=192.237.142.117&rs=grani.ru/
2
HTTP / 1.1 200 OK
Server: nginx / 1.2.1
Date: Sun, 01 Feb 2015 17:34:03 GMT
Content-Type: text / html; charset = utf-8
Transfer-Encoding: chunked
Connection: keep-alive
6d7

...
And the application sees this
So
HTTP / 1.1 302 Found
Connection: close
Location: http://95.167.13.50/?st=0&dt=192.237.142.117&rs=grani.ru/
f-8
Transfer-Encoding: chunked
Connection: keep-alive
6d7

...
This is observed on both Windows and Linux.

But the above iptables rule really solves the issue.

So a +1 workaround.

This method can also be used on the gateway / router. The rule, of course, must be added to the FORWARD chain.

Tags: