Bacoor Logo.png

Data structure in Ethereum. Episode 3: Patricia trie.

Image source: /r/ethereum by BitcoinIsTehFuture

 

Patricia trie is a main trie used in Ethereum to store data. It is a mixture of Radix trie and Merkle trie.

In order to make it easy to understand, we will give you to an example about an inefficient Patricia trie, then move to the “improved” Patricia trie. This way will help us know the reason why some structures have been generated.

 

https://gist.githubusercontent.com/sontuphan/2a26d8416d0517797fb68e536138f01b/raw/dd6d026cbd92db9500acf04a365fd26f45706dc0/merkledataset.js

 

 

Terminologies

 

 

A node in Ethereum is stored as a key-value with key is hash of node. Value is an array with 17 elements. In 17 elements, first 16 elements are indexed by hex number from 0 to f and the last one is data contained by that node.

Besides, please take notice that key is used for “database lookup” (it means to find a node by database mechanism) and path is used for “trie lookup” (it means to find data by path descending as Radix trie).

 

Inefficient Patricia trie

We build a trie that represents our dataset as below.

To test it again, let’s try to search the value of 395 path step by step.

Notations: “ this is database lookup” — tdl, “this is trie lookup” — ttl;

  1. We descend 395to 3 parts 3, 9, 5 and use them in sequence.

  2. Starting at rootHash, we have to find rootNodecorresponding with rootHash (tdl).

  3. First part of path is 3, so we get element indexed by 3of rootNode and it is hashA (ttl).

  4. Looking for hashA(tdl), getting element indexed by 9 (ttl) and value of element is hashC.

  5. Looking for hashC(tdl), getting element indexed by 5 (ttl)and value of element is hashD.

  6. At this time, we descend entire the path so we will get value contained in data element (the last element) of node corresponding with hashD(ttl), and the result is duck.

You can see that, we use path to find the value (an attribute of Radix trie) and if a value in trie is changed, it will cause the rootHash of trie to change (an attribute of Merkle trie).

Moreover, trie has so many node with null value in data element and we have to improve it for more efficiency.

 

“Improved” Patricia trie

The best answer is already in Ethereum wiki. Please check it right here.

 

 

I have been mentioned about leaf node and extension node in Episode 1+, but we still don’t know what they really are. We met situations that make our trie become degraded and those are the long paths without branch (no divergent path).

For example: 56f0, to get horse value, you need to descend too many empty-value nodes.

But, this leads to 2 sub-problems:

  1. No divergent path that points to a data at the end. For example: 56f0.

  2. No divergent path is branched in the middle. For example: cabof {cabe, cab8}.

To solve the first one leaf node is introduced, and to solve the second one they introduced extension node. They are nodes in form of an array with 2 elements, the first one is partialPath that helps to reduce empty-value node, while the second one contains value that will be data if leaf or merkleHash if extension.

 

 

Using leaf node

 

hashE now becomes a leaf node, to get value of 56f0 path we can do like:

  1. Get element with index 5of rootHash and the value is hashE.

  2. Because hashE may be a leaf or extension node, we have to compare the rest of path to partialPath of hashE. The rest of path is 6f0 and partialPath is 6f0, so node is leaf node, return data field and result is chicken.

 

Using leaf node and extension node

 

hashB now becomes a extension node. For example of getting data of cab8path:

  1. Get element indexed by cof rootHash, value is hashB.

  2. We can see hashBis a node as an array with 2 values, so we keep comparing the rest of path with partialPath to know which kind of node is that. The rest of path is ab8 and partialPath is ab, it means node is extension We remove partialPath from the current rest of path and we get the new rest of path is 8 and the next hash is hashJ.

  3. Finding node corresponding with hashJ, getting element indexed by 8 and the value is hashK. The rest of path is empty now.

  4. Finding hashKand we are received a node with empty partialPath (leaf node because the rest of path is equal to partialPath), return dog.

In addition, we can see hashK and hashL are leaf node as well. And actually, our trie is still not optimized completely.

Optimized trie

 

Patricia trie

Two examples above help us understand why and how Patricia trie was built and improved. Now we will complete it and get final Patricia trie which is used by Ethereum.

Some additional rules:

  1. Every partialPathwill be HP encoded before hand.

  2. Every elements in a node will be RLP encoded.

  3. Value (Node) will be RLP encoded before stored down.

 Patricia trie

 

Conclusion

Everything I say from episode 1 to episode 3 is all about some theoretical stuff, and it seems to be not suitable to explain how Patricia trie works in real life. In the next episode, We will do an example to show how to connect to database and how trie is organized.

Something you need to go first is about geth, web3 and levelDB.

 

Reference

Document from wiki Ethereum

Share on Facebook
Share on Twitter
Please reload

Please reload

Copyright © 2019 Bacoor Inc. All rights reserved.