Bacoor Logo.png

Data structure in Ethereum. Episode 1+: Compact (Hex-prefix) encoding.

 

In the episode 1, we discussed about RLP Encoding/Decoding. However, there is still another encoding of Ethereum, called Compact encoding, or Hex-prefix (HP) encoding. So, what are the differences between RLP and HP encoding? And how does HP encoding work? This article will help you get to this.

For starters, I am going to explain some terminologies. They may seem confused at first, but you’ll find them very helpful to understand this article later, as well as in the next episode about the trie in Ethereum.

 

Some terminologies

Let’s play a game. I will give you a map and a path, you have to find the animal I want.

The path is: 3–2–3

https://gist.githubusercontent.com/sontuphan/a9b2c401dc750292695221aa71a8f0a6/raw/87524eccbc1f5dfbb64b24a4f77fe9c75ba24c9b/trie.js

 

 

In order to solve this game, we do step by step following:

  1. Start at Root and the first element in path is 3, it means we must get the element of Root which is labeled by number 3 and we can see that the value of that element is A.

  2. Find node A, the next element in path is 2 so we get the value in node A is D.

  3. Find node D, the last element in path is 3, so obviously the value we need to search for is ‘Whale’.

Now, try to apply this method with another path: 3–1–3–2. We’ll get the result as ‘Chicken’.

By this game we get familiar with some of the following terminologies:

  • Key: Root, A, B, C, D and E in game is keys

  • Node: the content corresponding with the key in the right part of each row. For example: {1: ‘Dog’, 2: B, 3: A}.

  • Path: path is like 2–2–3 in game.

  • Value: you can see it has some elements in all nodes, every element is a key-value pair. Value is the right part of element, which can be a key or a name of animal like in the examples.

  • Nibble: hex form of 4 bits is a nibble. For example: 0x1, 0x4, 0xf…

Differentiate RLD from HP encoding

It really easy to differentiate RLP from HP encoding. RLP is used for encoding/decoding Value and HP encoding is used for encoding/decoding Path.

 

HP encoding goals

In this part, we are going to introduce to you about 2 more terminologies: leaf and extension. The details of them will be explained more in the episode 2, but firstly, you need to grasp the basic ideas below.

Leaf and extension are 2 kinds of node, however path of leaf has terminator and extension does not. Terminator is the last byte of the path and has value of 16 in dec or 0x10 in hex.

As we can see, there is a chance that the path has odd length. But that is not a thing familiar to machines at all. So we have to convert all odd-length paths to even-length paths.

In summary, HP encoding goals are:

  1. Distinguish leafand extension from each other without terminator.

  2. Convert the path to even length.

HP encoding specification

For now, we call the path inputed in HP encoding as input for convenience.

  • If input has terminator, remove terminatorfrom input.

  • Create the prefix into input which has value as the following table:

https://gist.githubusercontent.com/sontuphan/f77739de1aa99835947b179ddf9eaa35/raw/b8d69d1f2d46ded773ab2a9a895c41287e902d12/HPencoding

 

  • If the prefix is 0x0or 0x2 , add a padding nibble 0 follow the prefix, so the prefix is 0x00 and 0x20 . The main reason to do that is we are trying to maintain the even-length attribute of the path. Convince yourself :).

  • Add prefix to path.

For example:

https://gist.githubusercontent.com/sontuphan/51f1baf260e28d6948171da3d6fe7758/raw/bcceabcaee4e00a93d29d93d6807be56087080e3/ExHPencoding

 

Conclusion

For now, we believe that we got all of weapons before going to the battle with Trie in Ethereum. Please looking forwards to our next episode for more further discussions.

 

**References

Wiki Ethereum

HP encoding on original Golang code

Sontuphan’s Github

 

Share on Facebook
Share on Twitter
Please reload

Please reload

Copyright © 2019 Bacoor Inc. All rights reserved.