NeoResearch has begun the initial implementation of its dBFT 3.0 consensus mechanism design. The proposed upgrade would introduce a redundant primary node in each round of consensus in order to mitigate liveness defects caused by view change events.
The team had completed the original specification for dBFT 3.0 back in 2020. Development on consensus had been deprioritized in the core due to the need to focus on N3 changes, and the dBFT 2.0 version had already proven itself stable.
At the time Neo launched, its choice of consensus mechanism made it quite unique. Most blockchains of the era opted for Nakamoto consensus or a variant of it, relying on Proof of Work to incentivize block production and cause convergence on a canonical state. The benefit of having a competitive mining market is that it facilitates the primary benefit of this style of consensus; a guarantee of liveness. There is money to be made in mining new blocks, so someone should always be mining, and you should always be able to get a transaction processed.
The penalty of liveness is a sacrifice to safety. Multiple copies of the same blockchain can exist, called forks, and there is no guaranteed finality to recent blocks (and the included transactions) until the blocks are cemented deep enough in the chain to be considered canonical. In the world of blockchain security, that’s the point at which it is infeasible to perform a cost-effective attack through chain reorganization. In reality, it’s just the number of confirmations you have to wait until you and the recipient are both satisfied that the transaction has really been completed.
Neo approached the problem from the opposite side, deriving its solution from the classic PBFT. Rather than the stable leader of PBFT, Erik Zhang’s dBFT would introduce a rotating leader amongst token-elected consensus nodes. Each leader of a 3-phase consensus round, the primary, proposes a new block containing new transactions. The other nodes receive the proposal, verify the block, and sign a commitment to it. After receiving enough commitments, the block is added to the network.
By requiring ⅔ of the validator set to commit up-front to a new block, there is never any inconsistency in what the true state of the blockchain is. As a user, if your transaction is included in a block (even the most recent one), it is 100% final and can not be undone.
The tradeoff to this guarantee of safety is that ⅔ of the validators won’t always agree on a new block in a timely manner all the time. This could occur because of hardware errors, DoS attacks, or malicious replica or primary nodes that refuse to give signatures or propose valid blocks.
If for some reason not enough signatures are gathered in a given round of dBFT consensus, a view change event occurs. Inherited from PBFT, the view change is a fundamental part of many (not all) BFT-based mechanisms.
Consider for example that only the primary, the leader/block proposer, is a faulty node, and the other consensus nodes (replicas) are honest. If the primary submits an invalid block, the other replicas will reject it. ⅔ signatures are not gathered in the consensus round, and a block is not produced in this window of time (15 second period).
Now we need to change the leader, otherwise it will submit another faulty block and waste time again. The view change causes the next replica in line to be selected as the new primary for this round of consensus, in other words, to propose the new block for this block height.
When a set of Neo nodes fail to reach consensus on a block in a given round, the 15 second window passes without any new transactions on the network. In other words, the network loses liveness. Since Neo’s dBFT uses an exponential backoff as a way to guarantee consensus is eventually restored, this window of time for a new block doubles for every consecutive view change.
In other words, dBFT favors safety over liveness. The dBFT 2.0 iteration included significant upgrades to the original form, including a new recovery system to quickly bring faulted nodes back online and up to speed with consensus. These changes assisted with the Neo Legacy chain’s liveness significantly. But if a node that is currently the primary has a fault, there is no easy fix for the resulting liveness failure, even if the liveness outage only lasts for one block length worth of time.
The dBFT 3.0 specification introduces a solution for this problem. The proposal would change dBFT from having one to two speakers in each consensus round—a priority primary and a fallback primary. In each consensus round, the priority primary sends the first “prepare-request”, looking for approval on its block from other replicas. After a delay, the fallback node will also send its own prepare-request to other nodes.
An additional consensus stage is introduced with 3.0, the “pre-commit” phase, where honest nodes will either commit to the priority or the fallback block. In many cases, “pre-commit” agreement can be skipped, as any node can initiate the “commit” step for the priority node if it has received proof of enough other commitments to its block. In other cases, nodes will have the chance to commit to the priority or fallback blocks, but not both.
The payoff here is that there is essentially always an alternate block available. If the priority primary is quick to get enough approvals on its block, the mechanism disregards the fallback primary and plays out the same way as dBFT 2.0, resulting in quick and final consensus on the next block. If there is something wrong with the priority primary, the fallback ensures that the other nodes can still achieve consensus and deliver a block without a break in liveness.
The spec from NeoResearch also came with several other suggestions, detailed more comprehensively in its paper “Challenges of PBFT-Inspired Consensus for Blockchain and Enhancements over Neo dBFT”. Examples included two alternate upgrades to dBFT 2.0, not involving double speakers.
One enhancement, dBFT 2.0+, was designed to improve the synchrony condition of dBFT 2.0 consensus by having nodes note which transactions in a proposed block were verified successfully, even if the block itself did not pass consensus. With enough of these proofs, the next node should always include those transactions. This would potentially speed up consensus by speeding up block creation, making use of already-approved transactions by creating a similar block without any problematic parts.
Another proposed extension took the aforementioned dBFT 2.0+ one step further. NeoResearch highlighted two main attack paths for dBFT consensus: Type-1, referring to when a node dies for some reason, and Type-2, where a node commits to both a priority and fallback block proposal. Unnoticed, this could potentially lead to “sporks”, a type of dead-on-arrival fork event that caused issues on the Neo Legacy chain.
The double speaker upgrade protects against Type-1, since a faulty primary will not necessarily prevent a block being produced in the allotted time. However, this particular proposal targeted Type-2 attacks. By recording commitment hashes as part of the “pre-commit” phase, it becomes essentially impossible for a malicious node to commit to both a priority and fallback block and pass unnoticed by other nodes. This would cause a blacklist response from the other nodes, which would prevent any potential block conflicts that could otherwise occur.
NeoResearch also explored the throughput implications of the double-speaker design, and provided results from a Mixed-Integer Linear Programming model used to verify consensus behavior in a range of scenarios. It’s experimental results can be found in the specification at the link below: