Release of unofficial MTProto proxy on Python, protocol features

Recently, Telegram developers have posted the source texts of a proxy server using the MTProto protocol. On Habré published articles about the features of its assembly and repacking of the docker-container with it . The official proxy server written in C surprises with the amount of code - about 23 thousand lines. Simultaneously with this, and sometimes a little earlier, several alternative implementations were released, but none of them supported the possibility of advertising their channel.

In this article, I would like, first, to tell about the little-known features of the proxy server communication protocol with external servers and, secondly, to talk about its own development - the implementation of a proxy server in Python, which has just reached release and is available to everyone under free MIT license.

Features of the interaction of the proxy server with external servers

The official proxy server does not interact with the telegram servers directly, but uses at least one layer of proxy servers for this. We will call them middle-proxy , their list is available at the links core.telegram.org/getProxyConfig and core.telegram.org/getProxyConfigV6 . IPv6 connection is not yet supported by the official proxy server.
To encrypt data between the proxy server and the middle-proxy, the key obtained from the ip addresses of both nodes is used. Therefore, a proxy server to connect to the middle-proxy must know its external ip-address, otherwise the encryption keys on one and the other side will be different. In addition, port numbers of both nodes and a shared secret available at core.telegram.org/getProxySecret are involved in key generation . Telegram developers recommend updating this secret once a day.
When the proxy server is connected to the middle-proxy, the first one passes its time. If the time differs by more than a few minutes, the second side closes the connection.
When sending a message from the client to the middle proxy, the message is wrapped in an RPC call to the MTProto protocol. The proxy adds several arguments to each such RPC call: the ip and port of both nodes, the random connection identifier, and the proxy server tag used to display the ad channel in the application. These additional arguments take about 96 bytes. Because of this feature, it will not be possible to show advertising channels when working directly, not through the middle proxy.
Telegram servers "believe" the client's ip information received from the proxy server. These addresses can be seen in the session information (the rectangle is drawn):
One TCP connection between the proxy server and the middle-proxy sends messages from different users. In requests and responses there is an argument “random connection identifier”, which is necessary in order for the data to go to the right client.
A proxy server cannot decrypt client data, but can distinguish regular messages from transmitted files. Also, he knows the size of each message.

Fuf, I hope not tired with technical details. Now it should be clear why in many alternative proxies there is no advertising support - they transmit messages directly to the telegraph servers, bypassing the middle-proxy. It turns out much easier. The second part of the article describes the first unofficial implementation of a proxy server that works through the middle-proxy. At the moment, in the free access you can find three such implementations: the official one, on Erlang and this one.

Implementing a proxy server in Python

Initially, a proxy server was written in order to understand the features of the protocol and was a development of another project - an asynchronous sox proxy, written in turn, to “touch” async / await in Python.

Gradually, the project had users who were inundated with questions, bug reports and feature requests. After improvements, the project entered the beta testing and stabilization stage, which lasted about a week and involved five servers of different configurations.

Before talking about features that the official proxy server doesn’t yet have, but the alternative has one (and keep silent about the functions that the official has and the alternative does not have), I’ll tell you about the thing that many people think of when they mention Python .

Performance

To test the performance, a virtual machine was used in a cloud of minimal configuration: 1 CPU, 1024MB RAM.

On synthetic tests, the proxy server was able to transfer about 240 megabytes / sec or 3000 messages / sec. When using an alternative event-loop implementation in C, which is called uvloop, and also using the PyPy interpreter, the performance data is different (all measurements per second):

When testing on real users, it turned out that such a server is enough to comfortably serve 4,000 users or 8,000 using PyPy. A big surprise was that the test server, as it were, was not promoted in Russian-language channels, 89% of users were still from Iran (it is possible that for other countries the number of simultaneously served users will differ). It looks like this:

I asked around for several administrators of other servers - their situation is the same. Perhaps this is due to the fact that telegrams in Russia work well without proxy servers. In Iran, test servers were blocked for the public several hours after creation.

Server load with 2,000 users. The moment of blocking the server for Iranian citizens is clearly visible.

Thus, the CPU performance is not a bottleneck on the tested node. With 10,000 clients, the memory is likely to run out.

Simultaneous use of multiple CPU cores is not implemented (hello, GIL).

Features that the official proxy does not yet have

Work on IPv6.
Proxy server without additional configuration can use IPv6 for outgoing connections. IPv6 connections are not blocked in Russia (for now).

Mode of operation without middle-proxy
If a channel advertisement is not needed, the proxy automatically connects directly to the telegraph servers, bypassing the middle-proxy. It is faster and more reliable.

Also, the optional " quick mode " is implemented , when messages from the Telegram server to the proxy and from the proxy to the client are encrypted with the same key. Thus, the proxy does not need to re-encrypt messages - it sends them as is. This should not affect safety. In any case, the proxy server administrator does not have access to user messages.

Auto-update middle-proxy list and secret once a day.
The official proxy server to update the middle-proxy list recommends restarting the docker container once a day, which resets all connections. New connections may not be established if, for example, in the country blocked the server. Python version periodically goes to the site and updates the list.

Multiplatform
Any platforms that run Python are supported. It turned out to run it even on the iPad, however, external incoming connections were blocked by the device. Windows is separately supported, it was a surprise to me how many people launch a proxy under this OS. Although under Windows you can run the official client, if you use virtualization technology or docker.

Easy start without docker.
If (suddenly) there are those who do not like docker, the proxy can be started without it. You must specify at least two parameters in the configuration file: port and secret, you can also set an optional advertising tag, then execute the command: python3 mtprotoproxy.py. However, in this case you will have to think about autostart in the OS, for example, write unit-file for systemd. You will also need to install pycrypto or pycryptodome, without it it will work, but very slowly.

In the case of the docker, the container can be rebuilt with the docker-compose up --build command.

Features scheduled for the next release

Limit the speed of downloading large files.
When downloading large files, at the TCP level, you can “ask” the middle-proxy or the Telegram server to send data more slowly. Now this is done by setting the receive buffer to a small value, which additionally saves server memory.

Streaming messages.
Now, all known proxy servers working with the middle-proxy, first read the message from the client and only then transmit it. The size of a single message can reach 1MB. Memory is required for its storage and transmission delay is slightly increased. You can stream data. This will complicate the code, but will reduce memory consumption at worst.

Changing the length of packets to bypass the filter along the length of the packet .
Do not have time to get into the release.

Install and Run

git clone -b stable github.com/alexbers/mtprotoproxy.git; cd mtprotoproxy
(optional, recommended) specify PORT , USERS and AD_TAG in config.py
docker-compose up --build -d (or python3 mtprotoproxy.py, so that without a docker)
(optional, displays a link like tg: //) docker-compose logs

Other implementations of MTProto-proxy with advertising channel support:

Thanks
seriyps - for helping us with testing real users
shifttstas - for advice on docker
forst (github) - for the idea and implementation of work on IPv6
p1ratrulezzz (github) - for advice and for an article about the
freekzy project (github) - for a bug patch with a leak

UPD descriptors : a repository that contains various implementations of the MTProto-proxy: github.com/mtProtoProxy

Tags: