
Let's make bittorrent better
Recently, among users of file-sharing networks, calls have increasingly been made for moving to a subspace: anonymous networks like i2p tor, etc.
Undoubtedly, this idea has a lot of positive aspects, however, in essence, this means breaking down the established traffic exchange systems, some of which took more than 10 years and starting to build anew, albeit taking into account old mistakes, but also trying not to make new ones. At the moment, the bittorrent network is dozens, and probably even hundreds of millions of established links, “breaking” which it will be extremely difficult to restore everything in full.
Let’s take a look at our old man, and consider whether everything is so bad and whether it is possible to correct the shortcomings that he has.
The first reason that bittorrent is usually criticized for is its centralization in the form of a tracker, which, as a rule, is a vulnerable spot for attacks by copywriters, DDoSerov and authorities of all stripes (sorry for the pun).
Here you can add the presence of the torrent file itself, which is the initiator of the distribution, and which responsible persons often confuse with the very contents of the distribution, with all the ensuing consequences.
However, bittorrent already has solutions to these problems that work effectively. These are DHT and peer-exchange as substitutes for trackers, and operations with hash sums as substitutes for torrent and magnet links.
About DHT and PE, I will dwell a little more below, but for now let's talk about hash amounts.
In fact, this is the main and necessary component of the magnet link and it is necessary and almost always sufficient to obtain the distribution described to it.
(As yet) legislation does not prohibit reading hash amounts and publishing them on the Internet. Just like it does not prohibit counting the guardsmen in the crowd or the number of frames in the main erotic scene of the film, which, by the way, are always unique and can serve as its unique identifier.
From the point of view of your client - it makes no difference to him, you slipped the torrent to him, or the magnet link or hash - the amount. “Almost” - in the event that he has an initial cloud of peers, which will allow him to interview them for the availability of the corresponding hash sum of content.
From this, a simple conclusion follows - the more you download / distribute - the larger your cloud of peers, and the faster your client will be able to find the peers that will give you its contents by hash sum. So a lot of distributions in the client is not only pleasant, but also useful.
And most importantly:
Well, as a modification of technology - an algorithm that allows you to make an arbitrary number of digital "aliases" of the hash. This will noticeably entertain copywriters and censors, expanding and complicating the scope of searching and controlling the distribution of hashes many times in the event that the hunt for them develops on a full scale.
In addition, hash amounts are easily converted to QR codes (barcodes as an option), which opens up new possibilities for outdoor and banner advertising, as well as for equipping every police officer with a QR code reader connected to the Internet.
The last in this paragraph - all of the above is fully compatible with the technology that we have now, and is its extension, or rather, a logical continuation.
The second problem of file-sharing systems, and indeed the entire Internet - is the problem of data security involved in file sharing. The file hosting service can close the FBI or Roskomnadzor, the data will disappear irrevocably, the same thing can happen to the tracker, but the data will remain, although the distribution clouds will collapse (disperse, the clouds!), And if they recover, it will be with great losses .
There is another way to deal with file sharing: how to scare users. Then they, like zombies, erase everything from their disks and make the TV louder.
But nevertheless, the main reason for the loss of information in peer-to-peer networks is the users themselves ... They simply leave distributions, and distributions thus "die" (or are deleted from file hosting services as unnecessary). If you downloaded a torrent, and there are no siders on the distribution, the maximum how long you will wait is a month, sometimes two. Then you stop waiting, it means the distribution has died, and only the human factor is to blame: one of the last siders spilled coffee on the laptop, the other drowned the portable disk in the toilet. And the third got married, and the films of Lars Von Trier are no longer interested in him.
The number of content killed for this reason is very large, and significantly exceeds all the efforts of copywriters in this field. Of course, one can argue that the missing information is of much less value than that supported by users, and even more so than that which is closed by copyright holders, but this does not change the essence of the matter. Firstly, because the concept of “value” of information is relative, and secondly, once we have solved the problem of “non-removability”, we will solve others at the same time.
(Here I will consciously omit the discussion about the other consequences of the “undeletion” of information, since it is obvious that there is content that should not be distributed on the network - even from the point of view of The PirateBay).
Oddly enough, implementing this mechanism is not so difficult.
It is only necessary to make an add-on over the protocol discussed here, which allows any client to give part of their traffic and disk space for caching, storage and distribution of random blocks of random distributions.
Without being a single whole, such data blocks will have no applied value for the computer on which they are cached.
The cached distribution will never be downloaded entirely on this computer and the user will never know that his computer has cached.
With the current speed of Internet access and the cost of disk space, there is no inconvenience to sacrifice a few gigabytes on the screw and a few percent of the width of the channel on the Internet,especially since windows update does this without demand .
As a result of this, any distribution except for its direct sider-leechers, initiated by users, will be block-by-block downloaded and distributed by random nodes, the probability of its "death" will decrease. An extremely difficult task will be to find and clean its parts from the places where it is cached, if such a need arises.
If there is an algorithm for assessing the prevalence of a particular content on the network, the client can cache only rare distributions, because artificial support for popular distributions is usually not required. As soon as the distribution prevalence index falls, the self-containment mechanism is activated, and customers cache parts of it. By the same principle, blocks with growing popularity will be replaced in the cache, distribution blocks of which will fall in popularity.
A side effect of this innovation will also be an additional protection for the release - having done the distribution, you can wait until it disperses through the cloud of peers before publishing the hash sum in a crowded place. To establish the source of the release after this is almost impossible (the followers of Assange and Snowden will certainly appreciate).
This add-in is also easy to make backward compatible with existing technology, as an additional function in any bittorrent client (preferably not disconnected)
If we want to decentralize and, thus, protect the network, we need to move from trackers to DHT and Peer-Exchange networks as the main way to exchange peers among bittorrent clients, using trackers only to initialize distributions, and possibly just to issue an initial list of peers DHT network, whatever the specific distribution.
This would seem an obvious conclusion, however, many trackers represented by their administrations act exactly the opposite - artificially narrowing and limiting the clouds of their peers!
I mean closed trackers, access to which is possible only through invitations, and which pursue their own internal policy in the form of private distributions, keys, etc., making it difficult, or even completely restricting the exchange of data via DHT.
No, I have nothing against closed communities with my own rules and internal rules of communication, but what's the point of creating such artificial reservations for robots, i.e. bittorrent clients?
This is not only about private keys in torrents, but also the rules for “repackaging” releases for a specific tracker (with the vanity celebration in the form of your * .nfo file in the distribution), due to which the distribution hash changes, and as a result, a clone appears on content, but with an excellent hash, which is "not compatible" by peers with the source.
Your bittorrent client is no more pleasant to download from the IP addresses registered on your tracker than from any other received via DHT. He absolutely does not care! Give freedom to the Internet at least robots!
To summarize a bit, then probably we should talk about introducing a kind ofpirate code of honor for trackers, a set of standard rules, among which there will be a ban on private torrents and keys, a ban on “ban DHT” in torrent files, a ban on “repackaging” releases under the needs or rules of a particular tracker or release group.
In conclusion, I want to note that, despite the high-profile statements, there are no technologies for filtering bittorrent traffic yet. Some providers, using DPI, can only limit the speed or prohibit it entirely. Unlike the system for blocking sites by URL, where filtering is already in full swing.
Undoubtedly, this idea has a lot of positive aspects, however, in essence, this means breaking down the established traffic exchange systems, some of which took more than 10 years and starting to build anew, albeit taking into account old mistakes, but also trying not to make new ones. At the moment, the bittorrent network is dozens, and probably even hundreds of millions of established links, “breaking” which it will be extremely difficult to restore everything in full.
Let’s take a look at our old man, and consider whether everything is so bad and whether it is possible to correct the shortcomings that he has.
1.
The first reason that bittorrent is usually criticized for is its centralization in the form of a tracker, which, as a rule, is a vulnerable spot for attacks by copywriters, DDoSerov and authorities of all stripes (sorry for the pun).
Here you can add the presence of the torrent file itself, which is the initiator of the distribution, and which responsible persons often confuse with the very contents of the distribution, with all the ensuing consequences.
However, bittorrent already has solutions to these problems that work effectively. These are DHT and peer-exchange as substitutes for trackers, and operations with hash sums as substitutes for torrent and magnet links.
About DHT and PE, I will dwell a little more below, but for now let's talk about hash amounts.
In fact, this is the main and necessary component of the magnet link and it is necessary and almost always sufficient to obtain the distribution described to it.
- A hash is not a torrent and does not magnet a file, it does not need to be downloaded and saved to disk, it is much easier to publish in open form, it is not a link, it is very difficult to fit it into the definition of “technical tool that facilitates the dissemination of data”, as it is written in the latest law of the Russian Federation “on locks”.
(As yet) legislation does not prohibit reading hash amounts and publishing them on the Internet. Just like it does not prohibit counting the guardsmen in the crowd or the number of frames in the main erotic scene of the film, which, by the way, are always unique and can serve as its unique identifier.
From the point of view of your client - it makes no difference to him, you slipped the torrent to him, or the magnet link or hash - the amount. “Almost” - in the event that he has an initial cloud of peers, which will allow him to interview them for the availability of the corresponding hash sum of content.
From this, a simple conclusion follows - the more you download / distribute - the larger your cloud of peers, and the faster your client will be able to find the peers that will give you its contents by hash sum. So a lot of distributions in the client is not only pleasant, but also useful.
And most importantly:
- Search and initiation of distributions by hash sums has long been time to build in ALL bittorrent clients. In addition to working with magnet links, of course, which for some reason, too, is not everywhere. It is clear that to make a magnet from a hash is a single movement in a text editor, but still for an average user this can be difficult.
Browsers can (in fact - must!) Learn to recognize hash sums, highlight them like links, opening them by clicking in the bittorrent client.
Well, as a modification of technology - an algorithm that allows you to make an arbitrary number of digital "aliases" of the hash. This will noticeably entertain copywriters and censors, expanding and complicating the scope of searching and controlling the distribution of hashes many times in the event that the hunt for them develops on a full scale.
In addition, hash amounts are easily converted to QR codes (barcodes as an option), which opens up new possibilities for outdoor and banner advertising, as well as for equipping every police officer with a QR code reader connected to the Internet.
The last in this paragraph - all of the above is fully compatible with the technology that we have now, and is its extension, or rather, a logical continuation.
2.
The second problem of file-sharing systems, and indeed the entire Internet - is the problem of data security involved in file sharing. The file hosting service can close the FBI or Roskomnadzor, the data will disappear irrevocably, the same thing can happen to the tracker, but the data will remain, although the distribution clouds will collapse (disperse, the clouds!), And if they recover, it will be with great losses .
There is another way to deal with file sharing: how to scare users. Then they, like zombies, erase everything from their disks and make the TV louder.
But nevertheless, the main reason for the loss of information in peer-to-peer networks is the users themselves ... They simply leave distributions, and distributions thus "die" (or are deleted from file hosting services as unnecessary). If you downloaded a torrent, and there are no siders on the distribution, the maximum how long you will wait is a month, sometimes two. Then you stop waiting, it means the distribution has died, and only the human factor is to blame: one of the last siders spilled coffee on the laptop, the other drowned the portable disk in the toilet. And the third got married, and the films of Lars Von Trier are no longer interested in him.
The number of content killed for this reason is very large, and significantly exceeds all the efforts of copywriters in this field. Of course, one can argue that the missing information is of much less value than that supported by users, and even more so than that which is closed by copyright holders, but this does not change the essence of the matter. Firstly, because the concept of “value” of information is relative, and secondly, once we have solved the problem of “non-removability”, we will solve others at the same time.
- We can come to a situation where any information laid out in a p2p network will remain in it forever. Perhaps it will have limited availability in the form of a small number of sources, due to the low demand, however, it will be available and there will be no way to remove it from the network ...
(Here I will consciously omit the discussion about the other consequences of the “undeletion” of information, since it is obvious that there is content that should not be distributed on the network - even from the point of view of The PirateBay).
Oddly enough, implementing this mechanism is not so difficult.
It is only necessary to make an add-on over the protocol discussed here, which allows any client to give part of their traffic and disk space for caching, storage and distribution of random blocks of random distributions.
Without being a single whole, such data blocks will have no applied value for the computer on which they are cached.
The cached distribution will never be downloaded entirely on this computer and the user will never know that his computer has cached.
With the current speed of Internet access and the cost of disk space, there is no inconvenience to sacrifice a few gigabytes on the screw and a few percent of the width of the channel on the Internet,
As a result of this, any distribution except for its direct sider-leechers, initiated by users, will be block-by-block downloaded and distributed by random nodes, the probability of its "death" will decrease. An extremely difficult task will be to find and clean its parts from the places where it is cached, if such a need arises.
If there is an algorithm for assessing the prevalence of a particular content on the network, the client can cache only rare distributions, because artificial support for popular distributions is usually not required. As soon as the distribution prevalence index falls, the self-containment mechanism is activated, and customers cache parts of it. By the same principle, blocks with growing popularity will be replaced in the cache, distribution blocks of which will fall in popularity.
A side effect of this innovation will also be an additional protection for the release - having done the distribution, you can wait until it disperses through the cloud of peers before publishing the hash sum in a crowded place. To establish the source of the release after this is almost impossible (the followers of Assange and Snowden will certainly appreciate).
This add-in is also easy to make backward compatible with existing technology, as an additional function in any bittorrent client (preferably not disconnected)
3.
If we want to decentralize and, thus, protect the network, we need to move from trackers to DHT and Peer-Exchange networks as the main way to exchange peers among bittorrent clients, using trackers only to initialize distributions, and possibly just to issue an initial list of peers DHT network, whatever the specific distribution.
This would seem an obvious conclusion, however, many trackers represented by their administrations act exactly the opposite - artificially narrowing and limiting the clouds of their peers!
I mean closed trackers, access to which is possible only through invitations, and which pursue their own internal policy in the form of private distributions, keys, etc., making it difficult, or even completely restricting the exchange of data via DHT.
No, I have nothing against closed communities with my own rules and internal rules of communication, but what's the point of creating such artificial reservations for robots, i.e. bittorrent clients?
This is not only about private keys in torrents, but also the rules for “repackaging” releases for a specific tracker (with the vanity celebration in the form of your * .nfo file in the distribution), due to which the distribution hash changes, and as a result, a clone appears on content, but with an excellent hash, which is "not compatible" by peers with the source.
Your bittorrent client is no more pleasant to download from the IP addresses registered on your tracker than from any other received via DHT. He absolutely does not care! Give freedom to the Internet at least robots!
To summarize a bit, then probably we should talk about introducing a kind of
- We (tracker operators in this case) should strive to ensure that the distribution of the same content on different trackers has the same hash, and therefore a common peer cloud. Then the “survivability” of such distribution will grow in proportion to the number of trackers where it is published, and the trackers themselves will only benefit from this.
In conclusion, I want to note that, despite the high-profile statements, there are no technologies for filtering bittorrent traffic yet. Some providers, using DPI, can only limit the speed or prohibit it entirely. Unlike the system for blocking sites by URL, where filtering is already in full swing.