In the last Road to Neo3 article we covered Patricia tries, Merkle trees, and how Merkle proofs can be used to provide lightweight transaction verification. For the final episode in the state root series, we’ll combine the first two to understand Merkle Patricia tries (MPT).

Node storage

Each node on the Neo blockchain processes the data from blocks, reading and executing each transaction in order. Some transactions are read-only, such as a query for the total supply of a particular asset, but others will require changes to the node’s storage. For example, a transfer of tokens from one address to another would modify the stored balance values for both addresses.

Nodes store data in the form of key-value pairs. A key can be thought of as a name or unique identifier which corresponds to a particular piece of data, such as the name of a smart contract or the token balance information for a particular address.

In order to locate the particular value to be changed, the key is used. Since every node processes every transaction in the exact same order, we expect each node to determine the same keys and values, and therefore share the same state.

To verify that the nodes do indeed share the same state, we can use a Merkle Patricia trie. As we outlined in the previous episode, the Patricia trie component is a data structure that organizes keys by any shared prefixes, and the Merkle tree component provides a cryptographic fingerprint of the whole structure that can be used to authenticate its state.

Merkle Patricia trie

In the image below, we have a simplified Merkle Patricia trie that stores three key-value pairs. You can think of this tree as a snapshot of a node’s storage. In our case, the trie shows how three pieces of balance data are stored.

At the top of the trie, we have the first node which contains the key b3, the shared prefix of our three keys b31d4, b38a2, and b3d6f. If we had more keys that started with b3, they would all be found on a child node that branched out from this same point.

Simple example of a Merkle Patricia trie

To discover the values of each key, we follow the trie down just as we did with the Patricia trie. In the next node, n2, we have 16 branches to choose from, each corresponding to one of the 16 possible hexadecimal digits (0-9, a-f). For example, if we wanted to find the value that corresponds to the key b38a2, we would follow the branch from 8, leading us to node n4. In our simple trie, this node holds the last part of our key, a2, and points us to the stored value, 13.12 GAS.

Inversely, we can also follow the trie upwards to discover the Merkle root, starting with the stored values just as we did with the Merkle tree in the previous episode. To find the hash of n3, n4, and n5, we first need the hashes of each value. We can hash each of those nodes together with their hashed values in order to determine the hash of n2. With the hash of n2, we can determine the hash of n1, which corresponds to our Merkle root.

If all nodes on the network have the exact same information in storage, they will all share the same Merkle root. Conversely, if a node has different data stored, such as an incorrect value for a particular key, the hashing process will result in a completely different Merkle root being generated. This makes it very easy to check if a node is out of alignment with the view held by consensus nodes.

MPT implementation

The implementation of MPT for Neo is ongoing, with development efforts currently being led by Zhang Tao, a software engineer for Neo Global Development. As with our example above, three main node types are used in the structure:

  1. Full node (branch node) – has a branch for each hex digit plus a value (n2)
  2. Short node – has only one key along with a pointer. Two types exist:
    • Extension node – one key and a pointer to the next node (n1)
    • Leaf node – one key that points to a value node (n3, n4, n5)
  3. Value node – stores the final value for a key

Additionally, there are also hash nodes, which corresponds to the hash of a particular node in the trie. These are used for efficiency—since many stored values are not immediately required, there is no need to store the full data for a particular node.

If this information is required later, we can determine the correct value using the state root and Merkle proofs, as Tao explains:

“When we store a node, n1 for example, [we] use a HashNode to represent n2 in [the] next part. So in [the] database, the key is [the] hash of n1 and the value is n1’s key part and hash of n2. So we can use the database and just the state root hash to build the whole tree part by part [that] we need.”

Following the implementation of MPT and other required components, both full nodes and light nodes on the Neo network will be able to provide strong storage consistency guarantees. Additionally, it paves the way for new use cases, such as facilitating local computation on light nodes through checkpointing, and ensures that node storage is auditable and trustworthy, a requirement for both normal end users and enterprise applications alike.