Part 1. Where to store data for decentralized applications on the blockchain?

    Now there is a boom in blockchain projects. Some blockchains are so powerful that they are a platform for writing applications. Applications are automatically decentralized, resistant to censorship and blocking. But is everything really so good and simple? In this article, we will try to look at the blockchain as a platform for applications by removing pink glasses.

    And what is blockchain?

    Blockchain(blockchain) is an immutable data structure consisting of a list of blocks, where each subsequent block contains a hash of the previous block. As a result of such hashing, the chain of blocks becomes unchanged: you cannot change or remove a block from the middle of the chain without rebuilding all the blocks above, because the slightest change will require rebuilding (recalculating the hashes) of all blocks above the change.

    If we even make the calculation of the hash of each block computationally or economically difficult, then changing the data in the middle of the chain becomes generally almost impossible. The combination of the complexity of calculating the hash of the new block, as well as the ease of checking the correctness of the hash, just provides the blockchain with serious resistance to unlawful changes. This is where the security of Bitcoin and other blockchains rests.

    Due to this property of blockchain, projects can be publicly decentralized. That is, anyone can put a working blockchain node and generate new blocks. In most blockchain implementations, block generation is rewarded - this process is called mining. And since mining is difficult, and your results can easily be verified, it is beneficial to act only honestly. Otherwise, you will spend resources on mining, and other miners will not accept your block - all the work is wasted. Thus, with complete decentralization and independence of individual nodes, the blockchain network works as a whole.

    But okay, for example, one dishonest miner is easy to calculate and ignore. But what if there are a lot of them, and they conspired? Imagine that all the people around you think red light is green. :) And they look at you as abnormal, if you think otherwise. Social experiments show that most people in such a situation begin to doubt and join the majority opinion. But in the blockchain, the rule of the majority just works!

    A similar problem of clarifying the truth in conditions when your interlocutors can unscrupulously lie was called by Leslie Lamport “The Problem of the Byzantine Generals”, and was solved two years earlier in 1980 by him together with other authors. It was shown that for nspies who can lie and distort information, consensus between participants can be achieved with a total number of participants 3 n +1. And if you guarantee that spies cannot distort the message transmitted through them, then 2 n +1 is enough . In the blockchain, due to the electronic signature, malicious nodes cannot distort information, therefore, if the blockchain has less than half of the malicious nodes, then the network is stable.

    Network Resilience to Malicious Nodes is called Byzantine Fault Tolerance (BFT). BFT is very important for public network systems into which arbitrary nodes can freely be added. These systems are the majority of projects on the blockchain.

    The use of blockchain is not limited to the creation of cryptocurrencies. You can record anything inside the block. In Bitcoin , a list of new transactions is written there, and this is used to exchange cryptocurrency between its owners. In NameCoin , arbitrary key-value pairs are stored in blocks, which can be used to create decentralized DNS. In other implementations of the blockchain, some more chips are used. But Ethereum went much further. It allows you to store not only transactions on the blockchain, but also full-fledged Turing-complete programs called smart contracts, which allow you to fine-tune the blockchain to an application. For example, NameCoin is implemented on Ethereum with 5 lines of code .

    Ethereum was conceived as a universal platform for creating decentralized projects based on the blockchain. Why re-implement the entire blockchain, deploy your own infrastructure, if you can implement what you need on Ethereum with a couple of smart contracts, such as the analogue of NameCoin? Therefore, recently, Ethereum is experiencing rapid growth. Since March 2017, ETH (Ethereum cryptocurrency) in just two months has grown in price by 5 times , and growth continues. There are already hundreds of applications running on Ethereum , for example, the AKASHA social network , Ethlance freelancers exchange , a word game , and a lot of them!

    The smart contract blockchain provides applications with the entire infrastructure. Applications have blockchain code in smart contracts. Applications can store any information on the blockchain, transferring it to their smart contracts as data. Applications can read this information from the blockchain, because the state of the Ethereum blockchain is, in fact, a key-value database.

    It would seem that what else is needed? Applications are truly decentralized, uncensored and prohibited. In general, blockchain is a solid advantage! But if everything was so good ... When you create really powerful applications, shortcomings are immediately discovered.

    Immutability. Immutability is, of course, good. It is immutability that gives the blockchain publicity and BFT. However, there is a flip side to the coin. All the data that applications write to the blockchain remains there forever. Played the words - the blockchain remembered this. We placed information on a social network - it is permanently stored on the blockchain, even if you later deleted your profile. The explosive growth in the number of applications on the blockchain leads to a strong inflation of the block chain in size. Already, the size of the full Ethereum blockchain has exceeded 130GB, although it has been operating for less than 2 years. Bitcoin has less with its solid age of more than 7 years.

    Of course, some implementations of Ethereum include State Tree Pruning technology., which allows you to store only the last state of the blockchain, with a limited history for about a day, which currently allows you to reduce the stored information by 20 times. For example, go-ethereum full node requires 130 GB for blockchain storage, while Parity with support for this technology requires only 6 GB. However, given that the growth in the number of applications is just beginning, and each Ethereum node has to store all the data of all applications, this looks, although necessary, but only a delay in the problem. As the blockchain grows in size, it will no longer fit on mass-produced hard drives; only large organizations can afford it, which leads to dangerous centralization - concentration of control over more than 50% of the network in one organization. This may disrupt the BFT.

    Slow transaction . Blockchains pay for their flexibility with the speed of transactions. Bitcoin has 7 transactions per second, Ethereum has 15. And this is for the entire network, because each node completely replicates the other nodes. Adding a new node increases the stability of the system, but in no way increases the speed of its operation or the maximum amount of data storage. That is, a data change (and every data change in the blockchain is a transaction) is a bottleneck. Popular applications will immediately come across this limitation.

    Primitive data storage . Despite the fact that the state of the blockchain is already a key-value database, it is quite primitive. Search is possible only by primary key, the amount of stored data is very limited. For serious applications, this is clearly not enough.

    Thus, when developing applications on blockchains, for example, for Ethereum, the problem of data storage is very acute. Now there are no satisfactory ways to solve it.

    But existing applications, for example, AKASHA, are somehow twisted out ... In the next part, we will consider existing approaches to solving this problem.

    The second part of the article
    The third part of the article

    Also popular now: