Understanding the Stellar Consensus Protocol

Original author: Bob Glickstein
  • Transfer


The Stellar consensus protocol was first described in a 2015 scientific article by David Mazier. This is a “federal system of the Byzantine agreement”, which allows decentralized computing networks without leaders to effectively reach consensus on any decision. The Stellar billing network uses the Stellar Consensus Protocol (SCP) to maintain a consistent transaction history that all members see.

Consensus protocols are considered difficult to understand. SCP is simpler than most of them, but still shares this reputation - in part because of the erroneous idea that the “federal vote,” which the first half of the scientific article is devoted to, is SCP. But this is not so! This is only an important building block, which in the second half of the article is used to create the actualStellar Consensus Protocol.

In this article, we briefly describe what a “system of agreements” is, what can make it “Byzantine,” and why make the Byzantine system “federal.” Then we explain the federated voting procedure described in the SCP article, and finally explain the SCP protocol itself.

Agreement systems


The system of agreements allows a group of participants to come to a consensus on a topic, for example, what to order for lunch.

We at Interstellar have implemented our own dining arrangement system: we order what our operations manager John says. This is a simple and effective agreement system. We all trust John and believe that every day he will find something interesting and nutritious.

But what if John abuses our trust? He can single-handedly decide that we should all become vegans. In a week or two, we will probably overthrow him and hand over the authority to Elizabeth. But suddenly she loves avocados with anchovies and thinks that everyone should become like that. Power corrupts. Therefore, it is better to find some more democratic method: some way to make sure that different preferences are taken into account, while ensuring a timely and unambiguous result so that no one orders lunch or five people place different orders, or the discussion drags on until the evening.

It would seem that the solution is simple: to vote! But this is a misleading impression. Who will collect the ballots and report the results? And why should the rest believe what he says? Maybe we can firstvote for the leader we trust to lead the vote - but who will lead this first vote? What if we can’t agree on a leader? Or if we agree, and this leader gets stuck in a meeting or leaves for sick leave?

Similar problems are found in distributed computer networks. All participants or nodes must agree on some solution, for example, whose turn it is to update the shared file or take the task from the processing queue. In a cryptocurrency network, nodes repeatedly have to choose how the full story looks from among several possible versions that sometimes conflict. This network agreement gives the recipient a guarantee that the coin is (a) valid (not counterfeit) and (b) not yet spent elsewhere. It also ensures that he can spend coins in the future because the new recipient will have the same guarantees for the same reasons.

Any matching system in a distributed computer network must be fault tolerant: it must give consistent results, despite errors, such as slow communication lines, non-responsive nodes, and incorrect message order. The Byzantine system of agreements is additionally resistant to “Byzantine” errors: nodes that provide false information, whether due to an error or in a deliberate attempt to undermine the system or gain some advantage. The “Byzantine” resiliency — the ability to trust a group decision, even when some members of the group may lie or otherwise not follow the rules of decision making — is called the parable of the Byzantine Empire generals who tried to coordinate the attack. Good descriptionAnthony Stevens.

Consider the owner of the crypto coin Alice, who must choose between buying a delicious ice cream from Bob and paying Carol's debt. Perhaps Alice wants to pay both at once, fraudulently spending the same coin. To do this, she must convince Bob's computer that the coin was never paid to Carol, and convince Carol's computer that the coin was never paid to Bob. The Byzantine system of conventions makes this virtually impossible using a majority rule form called a quorum. A node in such a network refuses to switch to a certain version of history until it sees that a sufficient number of peer nodes - a quorum - agree to such a transition. As soon as this happens, they will form a sufficiently large electoral block to force the remaining network nodes to agree with their decision. Alice may make some nodes lie on her behalf, but if the network is large enough, then her attempt will be suppressed by the voices of honest nodes.

How many nodes are required for a quorum? At a minimum, a majority, or rather, a qualified majority, to combat errors and fraud. But in order to calculate the majority, you need to know the total number of participants. At Interstellar's office or at district elections, these numbers are easy to recognize. But if your group is a poorly defined network into which nodes can enter and exit as desired without coordination with the center, then you need a federal Byzantine agreement system that can determine quorums not from a predetermined list of nodes, but dynamically, from a constantly changing and inevitably incomplete snapshot of nodes at a certain point in time.

It may not seem possible to create a quorum in terms of a single node in an extensive network, but it is possible. Such a quorum may even guarantee the results of a decentralized vote. The SCP white paper shows how to do this using a procedure called federated voting .

For the impatient


The rest of the article details the federal voting and the Stellar consensus protocol. If you are not interested in the details, here is a general overview of the process.

  1. The nodes conduct rounds of federal voting on the “nominees”. A round of federal voting means:
    • The node votes for any statement, for example, "I propose the value of V";
    • The node listens to the voices of the feasts until it finds one that can “receive”;
    • The node is looking for a “quorum” for this statement. The quorum “confirms” the nominee.
  2. As soon as a node can confirm one or more nominees, it tries to “prepare” a “ballot” through several rounds of federal voting.
  3. As soon as the node is able to verify the readiness of the ballot, he tries to feed it with even more rounds of federated voting.
  4. Once a site can confirm a newsletter commit, it can “externalize” the value of this newsletter, using it as a result of consensus.

These steps include several rounds of federated voting, which together form one round of SCP. We will examine in more detail what happens at every step.

Federated vote


Federated voting is the process of determining whether a network can agree on a proposal. In a round of voting, each node must select one of potentially many possible values. He cannot do this until he is sure that other nodes on the network will not choose a different result. To be sure of this, the nodes exchange a flurry of messages back and forth so that everyone confirms that the quorum of nodes makes the same decision . The rest of this section explains the terms in this sentence and how the whole procedure goes.

Quorums and Quorum Slices


Let's start by defining a quorum. As we discussed above, in a decentralized network with dynamic membership it is impossible to know in advance the number of nodes and, therefore, how much is needed for most. The federal voting solves this problem by introducing a new idea slice quorum (quorum slice): a small set of peers that unit trusts transmission of information on the status of voting in the rest of the network. Each node defines its own quorum slice (of which it becomes a member in fact).

Quorum formation begins with a quorum cut. For each node, nodes of its slice are added. Then, slicer members of these nodes are added.etc. As you continue, more and more nodes come across that you cannot add, because they are already included in the slice. When there are no more new nodes to add, the process stops: we formed a quorum by “transitive closure” by cutting off the quorum of the initial node.


To find the quorum from this node ...


... add the members of its slice ...


... then add the members of the slices of these nodes.


Continue until there are no nodes to add.




No nodes to add left. This is a quorum.

In fact, each node can go into more than one slice. To form a quorum, select only one of the slices and add members; then select any slice for each member and add members to thisslice and so on. This means that each node is a member of many possible quorums.


Select only one quorum slice at each step.






One possible quorum. Or an alternative ...


... select other slices ...




... (when possible) ...


... creates another quorum.

How does a node know what slices the other nodes are in? In the same way as other information about other nodes: from the transmissions that each node broadcasts to the network when its voting status changes. Each broadcast includes information about slices of the sending node. SCP technical document does not specify a communication mechanism. Implementations typically use the gossip protocol to guarantee broadcast messages across the network.

Recall that in a non-federal Byzantine system of agreements, a quorum is defined as the majority of all nodes. The Byzantine system of agreements was developed from the point of view of the question: how many dishonest nodes can the system endure? In a system of N nodes designed to survive with f failures (tricks), the node should be able to make progress by receiving a response from N − f peers, since f of them may not function. But having received a response from N − f peers, we can assume that all f peers (from which the node did not receive a response) are actually honest. Thus, f from N − f peers (from which a response is received) are malicious. For the nodes to come to the same consensus, the majority of the remaining nodes must be honest, that is, we need N − f to be greater than 2f or N> 3f. So usually a system designed to survive f failures, will have a total of N = 3f + 1 nodes and a quorum size of 2f + 1. As soon as the proposal crosses the quorum threshold, the rest of the network is convinced that any competing proposals will fail. So the network converges to the result.

But in a federal Byzantine system of agreements, not only can there not be a majority (because no one knows the overall size of the network), but the concept of a majority is generally useless! If the membership in the system is open, then someone can get the majority by simply conducting the so-called Sibyl attack: by repeatedly joining the network through several nodes. So why can the transitive closure of a slice be called a quorum , and how is it able to suppress competing offers?

Technically, no way! Imagine a network of six nodes, where two triples are isolated in slices of each other's quorum. The first subgroup can make a decision about which the second will never hear, and vice versa. There is no way for this network to reach consensus (except by chance).

Therefore, SCP requires that the network must have a property called quorum crossing for federated voting (and for applying important article theorems) . On a network with this property, any two quorums that can be built always overlap in at least one node. To determine the prevailing mood of the network is as good as having the majority. Intuitively, this means that if any quorum agrees with statement X, no other quorum can ever agree with something else, because it will necessarily include some node from the first quorum that has already voted for X.


If there is any network intersection of quorums ...


... then any two quorums that you can build ...


... will always intersect.





(Of course, overlapping nodes may turn out to be Byzantine-lying or bad in other respects. In this case, intersecting quorums does not help the network agree at all. For this reason, many of the results in the SCP technical paper are based on explicit assumptions, such as what remains on the network intersection of quorums even after removal of bad nodes . For simplicity, we leave these assumptions implicit in the rest of the article).

It may seem unreasonable to expect that in a network of independent nodes a reliable intersection of quorums is possible. But there are two reasons why this is so.

The first reason is the existence of the Internet itself. The Internet is an ideal example of a network of independent nodes with intersection of quorums. Most sites on the Internet connect to only a few other local sites, but these small sets overlap enough to make each site accessible from every other site on a particular route.

The second reason is specific to the Stellar payment network (the most common use of SCP). Each asset in the Stellar network has an issuer, and Stellar recommendations require that each issuer designate one or more nodes in the network to process repayment requests. It is in your interest to directly or indirectly include these nodes in quorum slices for each asset you are interested in. Then the quorums for all nodes interested in this asset will overlap at least in these repayment nodes. The nodes interested in several assets will include in their quorum slices all the repayment nodes of the respective issuers, and they will strive to combine all the assets together. In addition, any assets that are not related in this way to others on the network and should not be related- it is planned that there would be no quorum crossing for this network (for example, banks from the dollar zone sometimes want to trade with banks from the euro zone and banks from the peso zone, so they are on the same network, but none of them care about a separate network of children trading baseball cards).

Of course, pending quorum crossing is not a guarantee. Other Byzantine agreement systems owe much of their complexity to guaranteeing quorums. An important innovation of SCP is that it removes the responsibility for creating quorums from the consensus algorithm itself and brings it to the application level. Thus, although a federated vote is sufficiently general to vote on any issues, in fact its reliability critically depends on the broader meaning of these values. Some hypothetical uses may not be as convenient for building well-connected networks as others.

Voting, acceptance and confirmation


In the round of federated voting, the node optionally starts voting for some value of V. This means the message is sent to the network: “I am node N, my quorum slices are Q, and I vote for V”. When a node votes this way, he promises that he never voted against V and never will.

In broadcasts from peer-to-peer nodes, each node sees how others vote. As soon as the node collects a sufficient number of such messages, it can track quorum slices and try to find quorums. If he sees a quorum of peers who also vote for V, then he can proceed to acceptV and broadcast this new message to the network: "I am node N, my slices are quorum Q, and I accept V." Acceptance provides a stronger guarantee than a simple vote. When a node votes for V, it can never vote for other options. But if a node accepts V, no node on the Web will ever accept another option (Theorem 8 in the technical white paper SCP proves this).

Of course, it is highly likely that there will not immediately be a quorum of nodes that agree with V. Other nodes may vote for other values. But for the site there is another way to go from simple voting to acceptance. N can take a different value of W, even if it did not vote for it, and even if it does not see a quorum for it. To decide to change your voice, just see the blocking setnodes that accept W. A blocking set is one node for each of the slices of the quorums of N. As the name implies, it can block any other value. If all nodes in such a set accept W, then (by Theorem 8) it will never be possible to form a quorum assuming a different value, and therefore it is also safe for N to accept W.


A node N with three slices of quorums.


BDF is a blocking set for N: it includes one node of each of the slices N.


BE is also a blocking set for N, because E appears in two slices of N.

But the blocking set is not a quorum. It would be too easy to trick the node N into accepting the desired value if it is enough to crack just one node in each of the N slices. Therefore, the adoption of the value is not the end of the vote. Instead, N must confirm the value, that is, see the quorum of the nodes receiving it. If he goes this far, then, as the SCP white paper proves (in Theorem 11), the rest of the network will also end up confirming the same value, so N will end the federated vote with a specific value as the result.


Federated vote.

The process of voting, acceptance and confirmation is one full round of federated voting. The Stellar Consensus Protocol brings together many of these rounds to create a complete consensus system.

Stellar Consensus Protocol


The two most important features of a consensus system are safety and survivability . The consensus algorithm is “safe” if it can never give different results to different participants (a copy of Bob’s story will never contradict Carol). “Vitality” means that the algorithm will always produce a result, that is, it won’t get stuck.

The described federated voting procedure is safe in the sense that if a node confirms the value of V, no other node confirms the other value. But “does not confirm another meaning” - this does not mean that he will necessarily confirm something. Participants can vote for so many different values ​​that nothing reaches the threshold of acceptance. This means that there is no survivability in federal voting.

The Stellar Consensus Protocol uses federated voting in a way that guarantees both security and survivability. (The guarantees of safety and survivability of SCPs have a theoretical limit. The design chooses a very strong guarantee of safety, sacrificing a slight weakening of survivability, but given sufficient time, consensus is likely to be reached.) In a nutshell, the idea is to conduct several federated votes on multiple values ​​until one of them passes completely through all phases of SCP voting described below.

The values ​​by which SCP strives for consensus may be a transaction history or lunch order, or something else, but it is important to note that these are not the values ​​that are accepted or confirmed. Instead, federated voting takes place on claims of these values .

The first rounds of federated voting take place at the nomination phase, on a set of “I nominate V” applications, possibly for many different values ​​of V. The purpose of the nomination is to find one or more applications that go through acceptance and confirmation.

After finding confirmed candidates, SCP proceeds to the voting phase, where the goal is to find a ballot(i.e. a container for the proposed value) and a quorum that can declare a commit for it (commit). If a quorum commits a ballot, its value is taken as consensus. But before the node can vote for the ballot commit, it must first confirm the cancellation of all ballots with a lower counter value. These steps - canceling the ballots to find one for which you can confirm the commit - include several rounds of federated voting on several ballot statements.

The following sections describe in more detail the nomination and voting.

Nomination


At the beginning of the nomination stage, each node can spontaneously select the value V and vote for the statement “I am nominating V”. The goal at this stage is to confirm the nomination of a certain value through a federal vote.

Perhaps a sufficient number of nodes vote for rather different statements, and no nomination can reach the threshold of adoption. Therefore, in addition to broadcasting their own nominee votes, the nodes “reflect” the nominations of their peers. Reflection (echo) means that if the node votes for the nomination of V, but sees a message from the neighbor voting for the nomination of W, now it will vote for the nomination of both V and W. (Not all peer votes are reflected during the nomination because this can lead to an explosion of different nominees. SCP includes a mechanism for regulating these voices. In short, there is a formula for determining the "priority" of the feast from the point of view of the node, and only the votes of high-priority nodes are reflected. The longer the extension takes place, the lower the threshold, therefore the node expands a set of feasts whose voices he will reflect.

Conceptually, the nomination in parallel of both V and W are separate federal votes, each separately able to achieve acceptance or confirmation. In practice, SCP protocol messages pack these individual voices together.

Although voting for nomination V is a promise to never vote against nomination V, at the application level - in this case, SCP - it is defined what means “against”. SCP does not see a statement that contradicts the “I am nominating X” vote, that is, there is no “I am against nominating X” message, so the node can vote for nominating any values. Many of these nominations will lead to nothing, but in the end, the node will be able to accept or confirm one or more values. Once the nominee is confirmed, he becomes a candidate .


Nominating SCP using federated voting. There can be many “B” values ​​pushed by peers and reflected by the peer.

Nomination of candidates may result in several confirmed candidates. Therefore, SCP requires the application layer to provide some method of combining candidates into one composite.(composite). The union method can be anything. The main thing is that if this method is determined, then each node will unite the same candidates. In the lunch voting system, “association” can simply mean a rejection of one of the two candidates. (But in a deterministic way: each node must choose the same value to reset. For example, an earlier selection in alphabetical order). In the Stellar payment network, where the transaction history is voted on, combining the two proposed nominees involves combining the transactions that they contain and the last of their two timestamps.

The technical description of SCP proves (Theorem 12) that by the end of the extension phase, the network eventually converges to a single composite. But there is a problem: federated voting is an asynchronous protocol (like SCP). In other words, the nodes are not coordinated in time, but only according to the messages that they send. From the point of view of the node, it is unclear when the extension phase ended . And although all nodes will eventually come to the same composite, they can choose different routes along this path, creating different composite candidates along the way, and they can never tell which one is final.

But it normal. Nomination is just preparation. The main thing is to limit the number of candidates to achieve a consensus that occurs during the balloting process .

Ballot


The newsletter is a couple , where counter is an integer that starts with 1, and value is a candidate from the nomination stage. This can be a node’s own candidate or a neighbor candidate adopted by that node. Roughly speaking, during the run, repeated attempts are made to force the network to reach consensus on some candidate in a ballot by holding potentially many federated votes on ballot applications. The counters in the ballots keep track of the attempts made, and the ballots with higher counters take precedence over the ballots with lower counters. If the newsletter stuck, a new vote begins, now on the ballot .

It is important to distinguish between values (for example, what should be the order for lunch: pizza or salads), ballots (a pair of counter-value) and statements about the ballots. The SCP round includes several rounds of federated voting, in particular on such statements:

  • “I am ready to commit bulletin B” and
  • “I declare a commit to Bulletin B”

From the point of view of this node, consensus is reached when it finds Bulletin B for which it can confirm (that is, find a quorum that accepts) the statement “I declare a commit to Bulletin B”. From now on, you can safely act on the value specified in B - for example, place this order for lunch. This is called externalizing the value. Once the acceptance of the ballot has been confirmed, the node can be sure that any other node has externalized the same value or will definitely commit it in the future.

Although conceptually many federated ballots are held on applications for many different ballots, they do not exchange as many messages because each message encapsulates a number of ballots. One message, thus, promotes the state of many federated votes at once, for example: “I accept ballot commissions ranging from before ".

What do the terms “prepared” and “commit” mean?

A node votes to commit a ballot when it is convinced that other nodes will not commit ballots with different values. Convincing this is the purpose of preparing the statement. A vote that says “I'm ready to commit bulletin B” is a promise to never commit a bulletin less than B, that is, with a lower counter (SCP requires that the values ​​in the bulletins have a certain order. Thus, newsletter smaller if N1
Why does “I am ready to commit bulletin B” mean “I promise never to commit ballots less than B”? Because SCP defines abort as the opposite of commit. Voting for the preparation of a ballot also implies voting to cancel some other ballots, and, as we discussed earlier, voting for one thing is a promise to never vote against it.

Before broadcasting the commit, the site must first find the bulletin, which it can confirm as prepared. In other words, he holds a federal vote on “I'm ready to commit to Bulletin B,” perhaps for many different ballots, until he finds one that accepts the quorum.

Where do ballot papers for voting come from? First, the node broadcasts the preparations for voting for <1, ​​C>, where C is the composite candidate produced at the nomination stage. However, even after the start of preparations for voting, the nomination may lead to the appearance of additional candidates who will become new ballots. Meanwhile, peers can have different candidates, and they can form a blocking set that accepts “I am ready to commit B2 bulletin”, which will convince the node to accept it too. Finally, there is a timeout mechanism that generates new rounds of federated voting on new ballots with higher counters if current ballots are stuck.

As soon as the node finds Bulletin B, which it can confirm as prepared, it broadcasts a new message, “Bulletin B Commit”. This vote tells the peers that the node will never give up B. In fact, if B is a newsletter, then "Bulletin Commit "Means unconditional consent to vote for the readiness of each ballot from to <∞, s>. This additional value helps other nodes to catch up with a commit, if they are still in the earlier stages of the protocol.

At this stage, it is worth emphasizing once again that these are asynchronous protocols. Just because one node sends votes for a commit does not mean that its peers do it too. Some of them can still vote on applications in preparation for voting, others may have already externalized the meaning. SCP explains how a node should process each type of peer message regardless of its phase.

If the message "I declare a commit»Cannot be accepted or confirmed, that is, the likelihood of receiving or confirming a message or - or, in any case, any newsletter with a value of C, and not any other, since the node has already promised never to cancel . By the time the node broadcasts the votes for the commit, it will be C or nothing, depending on how far the consensus goes. However, this is not enough for the node to externalize C. Some Byzantine feasts (less than a quorum, based on our assumptions about security) may lie to the node. The adoption and then confirmation of a particular ballot (or range of ballots) is what gives the node the confidence to finally externalize C.


Balloting SCP through federated voting. Not shown: at any time a timer may work, increasing the counter in the ballot (and, possibly, producing a new composite from additional nominated candidates).

And it's all! Once the network has reached consensus, it is ready to do it again and again. On the Stellar billing network, this happens approximately every 5 seconds: a feat that requires both security and survivability, guaranteed by SCP.

SCP can achieve this by relying on several rounds of federated voting. Federated voting was made possible thanks to the concept of quorum slices: sets of peer nodes that each node decided to trust as part of its (subjective) quorum. This configuration means that you can reach consensus even on a network with open membership and Byzantine deceit.

Further reading


  • The original SCP technical document can be found here , and here is the draft specifications for its implementation.
  • The original author of the SCP protocol, David Mazier, simplifies (but still technically) explains it here .
  • You may have been surprised not to find the terms “mining” or “proof of work” in this article. SCP does not use these methods, but some other consensus algorithms do. Zane Witherspoon wrote an accessible review of consensus algorithms .
  • A step-by-step description of a simple network that achieves consensus in one full round of SCP.
  • For readers interested in SCP implementations: see the C ++ code used by the Stellar payment network, or the Go code I wrote for a better understanding of SCP.

Also popular now: