Analysis of the blockchain, or why the mixer broke?

According to the materials of my report at the Digital Transformation conference in Moscow on April 16, 2018

I wonder how the blockchain works. Not only are there any algorithms, cryptography, platforms and cryptocurrencies. For me, blockchain is not only technology, but also a new kind of life, a new universe. If in doubt, take a look at this Aragon token sales chart:

All these addresses, smart contracts, tokens constantly interact with each other, and behind them are the actions of people, organizations and robots. Without this interaction, the blockchain and cryptocurrency would have no meaning and value.

How businesses work in the blockchain, what people and robots do there - these questions forced me to study the blockchain.

Problem and solutions

The blockchain network, and we are talking specifically about public blockchain networks, is actually completely open. You can read absolutely any information about blocks, addresses and transactions. For programmers, there are APIs for this (for example, Web3 [1]), and for mere mortals - blockchain researchers, for example Etherscan [2]. In addition, any complete blockchain node downloads all blocks to the local disk from the beginning of time with complete information inside, as this is required to verify the correctness of transactions, and God forbid, mining. That is, any blockchain node is its complete copy, and even with access interfaces and detailed documentation.

It seems to have everything to analyze, but not here it was. Blockchain against Recall what this blockchain word means in Russian: a chain of blocks. The blocks store transaction records and meta information to ensure integrity and coherence. To find something in the blockchain you need to know the block number or transaction hash, or at least the address. There are no indexes other than those listed on the node.

Etherscan is also a little better. It shows the same that is available through the API, only in the form of web pages. And also, to find something, you must know in advance either the address, or the hash of the transaction, or the block number. You see the blockchain through a narrow window bounded by these entities. It’s like exploring the universe with a microscope; existing tools are completely inappropriate for analyzing “in big”.

To philosophize, I even drew such a diagram, where the essence of the problem is visible:

With crypto-currencies, everything is more or less clear; for their analysts, the well-known methods and tools of exchange trading are applied. Obtain reliable and objective information about all parameters of cryptocurrency can be on many sites on the Internet.

About the blockchain this is impossible to say. The information is basically or purely technical for those who understand (the “Etherscan” type), or fiction about ICO projects [3], DAO [4], which has a clear subjective bias and is not checked by mathematical methods.

Blockchain is not transparent as a whole, although all information is widely available, so we will work on it!

Technological tools for blockchain analytics

First, understand the magnitude of the problem. Blockchain has a lot of networks, and there are also many different platforms on which they are built. We need to start somewhere and I chose the Ethereum Foundation network for several reasons:

  1. Many participants
  2. The capitalization of all currencies of the network, including tokens, is perhaps the greatest of all
  3. Smart contracts [5] and DAO [4], expanding the possible analysis and making it much more informative and useful

Even choosing one network, we get quite a lot of data (as of June 15, 2018):

Number of cryptocurrency transfers, total267 million
Cryptocurrency transfers per day, on average 750 thousand
Number of valid addresses44 million
Number of smart contracts 6.8 million
Number of tokens issued48 thousand
Call smart contracts daily, on average690 thousand
Estimated amount of compressed data for full node117 GBytes

Initially, there was a desire for the analysis to be as close as possible to the real state of the network, that is, in real time. This has two technical aspects:

  1. Information from the blockchain should get into the database as soon as possible, as soon as a new block is created. We want to see current information, not an archive;
  2. We want to receive reports quickly, within a second or faster, so as not to lose interest in asking many questions.

The choice fell on the Clickhouse database [6], an open source project from Yandex. Before that, I did not use this system, and the guys from Altinity [7] helped me figure it out, for which a special thank you to them.

The overall structure of the system is as follows:

The source data is read from the full Ethereum network node by the ETL process (Extract - Transform - Load), which parses the data inside the block and writes it into several tables in the Clickhouse database. The process starts as soon as a new unit hits the network node and runs continuously.

On the right side of the diagram, data users at the moment:

  1. SuperSet analytics tool [8]. Using it, you can make cool diagrams and quickly combine queries to get answers to analysis questions;
  2. Python Jupiter [9] for a more complete analysis of machine learning tools and the use of statistical algorithms;
  3. Bloxy website and API [10] for public use of information.

The indexing of the Ethereum database took some time, after all, there are already almost 6 million blocks, and each must be counted from the node and processed, but this work is over, and we can finally enjoy all the power of the analytical database, especially since the data is just Mmm what a delicious!


Let's start with tokens, since this is the most popular application of smart contracts on the Ethereum network, you can say, the purpose and meaning of its creation. Tokens are cryptocurrencies that anyone can issue using a smart contract of a certain type. The basic standard of the token is ERC20 [11], but as we shall see, everything is not limited to them.

Now, having a base for analytics and SuperSet, we can see which tokens are issued, how they are used, and what is popular now:

The data are given for the entire lifetime of Ethereum. The pie chart shows that ERC20 tokens are overwhelming in comparison with other types. The graph of the number of actively used tokens for transfers has a stable growth so far, and this means that the ICO enthusiasm does not subside, but vice versa. In fact, several hundred new tokens are sometimes created per day, read crypto currencies, but not all fall on this chart, but only those that are actively used.

The chart below shows a not so rapid growth over time, this is the number of transactions tokens transfer per day. Somewhere in the spring of 2018, he stopped at about 400 thousand transactions per day and is not growing. In essence, this means that new tokens already have significantly fewer transfers than before.

There are two anomalies on this graph: the peak of the transfers of ERC20 tokens in November 2017 and the less pronounced “hump” growth in the translation of tokens of the ERC721 standard in December.

The November peak is associated with the InsPromo token, which was distributed to almost a million addresses “just like that,” it was an advertising campaign like “airdrop” [12]. This method of attracting ICO clients has been used both before and after many times, but the scale of scatter for 1 day of free "coins" is record!

The December interest in the ERC721 token is fully connected with the CryptoKitties game, people were very enthusiastic in buying and raising digital cats. The graph shows a rapid increase in the turnover of Crypto Kitties and a decrease in transactions of the remaining tokens, apparently people have forgotten that there are some other tokens.

Crypto Beasts and not only

ERC721 tokens [14] appeared, in fact, with the advent of crypto cats [13], although their potential use is much wider. If the ERC20 standard made it possible for everyone to release the crypto-currency measured in a certain value, then ERC721 gave everyone the opportunity to designate the property rights of any object of the virtual, real or even the intellectual world.

Technically, each ERC721 token stores an identifier that is unique within a smart contract. This identifier may refer to a crit cat, a golden sword, a piece of land, or a patent for an invention. Ownership of the identifier is fixed in the blockchain. Since there is a standard for the exchange of tokens ERC721, you can see them in your wallet, trade on the stock exchange and do other general operations.

The upper graph shows the number of transactions for all ERC721 tokens. In December 2017, a large increase is seen, 100% of the game Crypto Kitties. Interest in the game lasted throughout December, then gradually slept.

The lower graph shows the number of different tokens of the ERC721 standard in turnover, read projects, on this technology. For December there was only one Crypto Kitties, and in February there were already a few dozen. The names of tokens are shown on the left, the greater the number of transactions, the larger the font, so far the kittens with the CK symbol are in the first place.

Why do we need a mixer?

The analysis of the blockchain as a whole makes it possible to find patterns and anomalies that are not visible on the micro level of transactions, addresses and blocks. One of the most striking examples is the “mixer” of a thousand bots that works on the Ethereum network.

Let's start with the search for anomalies in the distribution of addresses by the number of recipients and senders of the cryptocurrency:

On the horizontal axis - the number of addresses from which money was transferred to the address, vertically - the number of addresses that were transferred from the address. The size of the circle - the number of such addresses.

The left diagram is taken in December 2016, and the maximum circle is for addresses that have one recipient and one sender, slightly less addresses without recipients and also one sender. This is quite understandable, most of the addresses receive currency from one source and spend it in one place or do not spend it at all, but store it.

But in December 2017, the circle with two senders and three recipients grew abnormally in size. And there are several million such addresses! To sort out the situation, select one of these addresses from the circle and build a translation graph:

It can be seen that all these addresses are connected in a giant mixer, which sends money inside. Since, on average, each address has more recipients than senders, a huge number of recipients are obtained from the original sender in a few steps. Of course, it is not people who do this, but robots, since there are more than 4 million of such addresses and they work smoothly and very quickly, sending money further within minutes.

We estimated the volume of work of this huge robot by separating the transactions of these addresses from the rest of the transactions in the network:

In terms of transfers, the mixer (orange columns) in individual months exceeds all other transfers in the network (green columns) several times. Of course, it is necessary to take into account that it translates the currency within itself and its external turnover is not so significant, not more than 17 million ether (today $ 10 billion).

Mixer transactions have occupied a significant part of the bandwidth of the Etehereum network for many months. The maximum of its activity falls on the beginning of 2018, when every fourth transfer transaction for Ethereum was initiated by this robot, as can be seen from the blue graph of the total number of mixer transactions:

But suddenly, at the end of February 2018, he stopped working. Since we do not know the reasons for its use, we can only guess about the causes of his life and sudden death. Or maybe he did not die, but changed the algorithm and just left our radar?

I believe in the blockchain

I believe in the blockchain. Businesses, people, and communities benefit from its use. To use it, you need to understand how it works as a phenomenon. By what laws it develops, what are the internal anomalies, trends, recessions and rises.

A more transparent blockchain will allow businesses to operate efficiently with their eyes open. Regular users will understand better what exactly they are doing, what they are participating in, to be more protected and happy.

In the end, the blockchain is not so much networks, platforms, blocks and transactions, but people and communities. The success of the development of this technology depends entirely on its perception by society, and transparency is important in this process.


[1] Web 3: A platform for decentralized apps

[2] Etherscan

[3] ICO

[4] DAO

[5] Ethereum smart contracts

[6] Yandex Clickhouse

[7] Altinity

[8] SuperSet

[9] Python Jupyter

[10] Bloxy

[11] ERC20

[12] WTF is an Airdrop? A Detailed Guide to Free Cryptocurrency

[13] CryptoKitties

[14] ERC721 standard

Also popular now: