In the previous Road to Neo3 article, we considered the potential benefits to be gained from decoupled state distribution through the P2P protocol, along with the possible ways to implement it. Today’s article aims to provide more information on Merkle roots and Patricia tries, fundamental components required for state root derivation, and how they can be used for simplified payment verification by light clients.

Following on from discussions noted in the previous article, Neo software engineer Zhang Tao shared an amended proposal which summarized the current solution being developed. Based on Solution 1 from the initial round of proposals, state will be designated in the PrepareRequest consensus messages, and the signatures will be broadcast as part of the Commit message. After each consensus round, the block and state roots can be relayed for interpretation by nodes on the network.

Tao also outlined the required steps needed to build out the feature, such as the required modifications to P2P messages and the addition of functionality needed for RPC nodes to provide Merkle proofs. These proofs are required to support simplified payment verification (SPV), a technique initially described by Satoshi Nakamoto in the original Bitcoin whitepaper, which is used to allow light clients to validate transactions without maintaining a copy of the whole blockchain.

The most important component required for state root implementation is the Merkle Patricia trie (MPT). In the first episode of the RTN3 series on state root, we touched briefly on MPTs, which are a combination between a Patricia trie and a Merkle root.

Patricia trie

A trie is a data structure consisting of multiple vertices connected by branches (edges), each labelled with a single character. Depending on the path followed down the tree, different values can be found. A radix tree is a compressed version of a trie; instead of a single character, each edge is labelled with a string that acts as a common prefix for subsequent vertices.

Patricia tries are a special type of radix tree, also known as binary radix trees. As the name implies, in a binary radix tree each vertex is followed by two edges at most. This means that at any given vertex, a maximum of two paths are available. Stored values with characters in common are filed under a shared prefix.

Left: A trie that stores 5 animal types across 13 nodes
Right: A Patricia trie storing 10 words across 11 nodes (net, network, networked, etc)

The example above demonstrates the efficiency of radix trees. Rather than storing a single letter on each edge, frequently leading to vertices with only a single child node, the edges are labelled with partial strings.

By organizing the stored words by shared prefixes, the data structure is optimized for storage space and the average lookup speed is also improved. This is very important for a blockchain node, which is expected to store and compute over a large amount of data.

Merkle tree

A Merkle tree is a structure that starts with several data chunks. The values in these chunks are hashed, forming the first row of vertices in the tree. Every subsequent vertex in the Merkle tree is also represented by a hash, which is determined by adding together the hashes of the parent vertices and hashing the result.

This trail of hashes will eventually lead up to a single common vertex, known as the Merkle root or root hash. The Merkle root can be thought of as a unique, cryptographic fingerprint of the data in the tree; any changes in the stored values would change all connected hashes, resulting in a different fingerprint.

All nodes on the Neo network will calculate a local state and determine a state root after each block. By comparing this state root against the state root signed by consensus nodes, nodes can verify consistency of state with the rest of the network.

Merkle proofs are an interesting property of these trees, which provide a mechanism by which it is possible to determine whether a particular key-value can be found within the tree. When a light client requests data from an RPC node, the data can be accompanied by a Merkle proof that a particular value (such as a user’s token balance) exists within the tree.

This means that light clients can verify information from RPC nodes in a trustless manner, requiring only the Merkle proof and the latest state confirmation from consensus nodes received from the P2P network.

Left: An example Merkle tree, the green node holds the tree’s Merkle root (root hash)
Right: An example Merkle proof, the green nodes are used to prove that the value at c is correct

Simplified payment verification

For an example, consider the Merkle proof diagram above. Assume that a light wallet user wants to send a transaction, but before doing so it wants to verify that the RPC node is providing accurate token balance information. This balance data is stored in data chunk c.

Since the light wallet does not store the blockchain, it requests a Merkle proof from the RPC node. The RPC node provides the balance value from c along with a “Proof of Path,” a list of the hashes required to move up the tree from c to the state root. In this case, the RPC node would provide the token balance value for c along with the values determined by hash(d) and h(h(a) + h(b))).

With these values, the light wallet can begin verification. Firstly, the balance value of c is hashed. This hash can then be added to the result of hash(d), which is hashed again in order to determine the result of h(h(c) + h(d))).

This process is repeated once more, adding the newly derived hash h(h(c) + h(d))) to the hash h(h(a) + h(b))) provided by the RPC node, hashed one last time to determine the Merkle root. Equipped with the Merkle root, verification can now be performed.

If the token balance value at c and proof of path provided by the RPC node is correct, this process should determine the same Merkle root that has been signed and broadcast by consensus nodes over the P2P network. If the balance or proof of path is incorrect, a different state root will be calculated, proving that the RPC node is providing faulty information that is not consistent with the consensus nodes.

The light wallet has now verified the balance data without storing a copy of the blockchain, guaranteeing that the RPC server is providing information that is consistent with the consensus nodes.