Github repository - contributions are welcome
The purpose of this tool is to be an educational resource for people trying to learn about Ethereum's data storage. We assume good understanding of how Ethereum's data looks like - block headers, accounts, storage slots, etc. If you ever wondered how Ethereum stores all of this data, this tool is for you.
The text here is meant to help guide you through some common
use-cases for exploring Ethereum's low-level data structures. It
is recommended to follow along with the examples using the query
tool. The best way to understand these data structures is with
visual, real-world examples.
The sections are ordered in a specific way and may depend on
each other. For example, avoid trying to access storage slots
before you understand how to traverse a patricia trie.
This tool is based on the Geth implementation using LevelDB.
Since the Yellow Paper never provided an actual implementation,
different clients implement the Ethereum database differently,
but because of the use of very specific mathematical concepts,
most clients implement the database very similarly.
This tool works with the Sepolia testnet and is currently updated
until block . To confirm the results you get from this tool, you
can cross-check them on Etherscan.
With this tool you can manually explore the LevelDB data by querying
specific keys. It's as low-level as possible, since its main
purpose is educational.
All the data, both keys and values, is saved in a binary format.
This tool accepts and responds with hex-encoded binary data.
The encoding scheme of the results that come from the LevelDB is
beyond the scope of this text, but this tool provides several
decoders that let you decode the results in different ways.
Different values require different decoders — make sure
you use the appropriate decoder for each piece of data.
Very few resources exist on this topic, which is the main reason
for building this tool. Still, some resources exist and I
encourage you to read them as well. This is not an easy topic,
but you can learn it.
Here are some resources I found useful:
Ethereum stores several top-level keys with mutable values:
4c617374426c6f636b
)4c617374486561646572
)4c61737446617374
)446174616261736556657273696f6e
)68
("h")48
("H")6e
("n")62
("b")74
("t")72
("r")42
("B")Retrieving block data requires using the block number and/or hash in order to construct the keys that hold the data that we want.
To find a block hash by its number, you would concatenate the "h"
prefix (68
) and the "n" suffix (6e
). Once
you find the block hash, you can retrieve its header data.
To find the header data, you would concatenate the "h" prefix
(68
) with the block number (in hex and padded to 16
digits) and the block hash.
Let's take block number 2,505,997 as an example
263d0d
. Padded to 16
digits: 0000000000263d0d
680000000000263d0d6e
9d32afbe77c7d105253b4ed7750caf23063352936ce357b89a9dd54c9fa24ab1
)
and use the "h" prefix along with the block number and
the hash to find the block header:
680000000000263d0d9d32afbe77c7d105253b4ed7750caf23063352936ce357b89a9dd54c9fa24ab1
stateRoot
transactionsRoot
recieptsRoot
Ethereum uses two kinds of tries to save data in a cryptographically secure yet efficient way:
These two tries can be constructed from the key/value pairs in the
DB. The keys in the DB are part of the Merkle trie, while the
values are part of the Patricia trie.
Generally speaking, if you are trying to retrieve raw data, you need
to construct the Patricia trie from the values, but if you are
trying to validate the data, you need to construct the Merkle trie
from the keys.
Let's take block 2,500,039 (2625c7
in hex). We can
get the hash and header data as explained above. From the header
we can extract the stateRoot
, which is:
644ae129f630e6c5c864b2dbd634c50fe479d631ef76ae1e9ceb5220bca949c5
The stateRoot
is the key for our root nodes.
The value stored on this key in the DB is the root Patricia
node, while the key itself is the root Merkle node.
Querying this key will give us an RLP-encoded value that
represents 1 of 4 kinds of Patricia nodes:
2
or 3
)0
or 1
)
Querying for our block's stateRoot
gives us a
17-item long branch node. Each item in the branch node
represents a hex character from 0
to
f
.
In order to find the balance of an account in the State Trie, we
need to traverse the Patricia trie following the keccak256 hash
of the address we want to query. Let's take the following
address as an example:
0x3810d4c7eB88dd66ab9bE39A5F567Cf77fF9f8B7
Its keccak256 value (without the 0x
part) is:
acf0daf35759515a3118de4ab5ff63ec27518b94b03d601ac7a1e53b3d6603f8
We need to traverse the Patricia trie for every character in the
hash. We start from a
which is the first character
in our keccak hash.
We take the item at index 10 (which is a
in hex)
of the root Patricia node, which is:
c4ee4cf0cab88b6932d7380a6e0efdc33c1d4f0ffa05207f7a1450b45a97972a
We then query that key to get the next Patricia node. The next
patricia node is also a branch node, and so we follow it, taking
the key from the c
place in the Patricia branch
node, which is:
51f41878a482a7e1a60e91b8e5c66333d119339dc067363b681ad7f7e6581c39
We keep traversing this way, f
, 0
,
d
.
At this point, we get the next node's key:
8b97f78fa20cfba908a4953654b4fcdc55c94a3df3305548e1e16eb549c19672
When we query this key, we get a Leaf node. We can
identify it because if we decode it with RLP, we get 2 items and
the value of the first item starts with 3
. This
type of node is built of two items: the rest of our "path" and
the final value of our account.
If we take the first item of this node and remove the
3
, we get:
af35759515a3118de4ab5ff63ec27518b94b03d601ac7a1e53b3d6603f8
If this string seems familiar, it's because it's part of the
hash that we were searching for:
acf0daf35759515a3118de4ab5ff63ec27518b94b03d601ac7a1e53b3d6603f8
We traversed through a
, c
,
f
, 0
, d
, to get to this
node, which contains the rest of our hash and our desired value
- 4 items, encoded with RLP, that represent (in order) the
account data:
b1a2bc2ec50000
,
which converted to decimal becomes
50000000000000000
. This balance is in wei, so it's
0.05 eth. We can confirm on Etherscan that at block height
2,500,039, address 0x3810d4c7eB88dd66ab9bE39A5F567Cf77fF9f8B7
had 0.05 eth in its balance.
The same technique of traversing the state trie can be used for traversing the transactions trie or the receipts trie.
In order to understand how contract state is stored in the DB, we
need to look at how the EVM handles contract state. This text
assumes good understanding of how the EVM works and what
opcodes are. Explaining these concepts is out-of-scope.
One more important thing to note: different languages (Solidity,
Vyper, etc) compile into different bytecode. The process
described here is based on how Solidity implements its compiler,
but most other EVM languages mimic the same behavior.
Some information about Solidity's scheme is presented in their
documentation:
https://docs.soliditylang.org/en/v0.8.17/internals/layout_in_storage.html
To store contract state in the EVM, we use an opcode called
SSTORE
. The SSTORE
opcode has two
operands: a slot
and a 256-bit array to be stored.
The slot
is a uint256
. When a contract
is trying to store some data, it invokes the SSTORE
opcode with the slot number it wants to write into.
When you compile a Solidity contract, Solidity transforms it
into bytecode. When you write to variable, Solidity translates
its position in the code into a slot number. So the first
variable goes into slot 0
, the second goes into
slot 1
, etc.
Let's look at a simplified example:
contract Example { uint256 a = 123; }This contract will compile into something roughly equivalent of
sstore(0, 123)
.
Now let's get back to LevelDB. So how can we access this data
from LevelDB?
When the EVM processes SSTORE
, it's actually
writing into LevelDB in the background. For every
SSTORE
operation, the EVM translates the slot into
a LevelDB key where it will store the value.
Solidity supports 5 types of data structures:
To retrieve a storage slot from LevelDB, we need to use our
account / contract address, retrieve the
storageRoot
of the account and then derive the slot
we want to retrieve from its position in the code and the type
of data structure it holds.
Let's look at contract
0x5fb282df60a3264c06b2cb36c74d0fd23d727f82
. It's an
ERC20 contract that follows the OpenZeppelin implementation.
Looking at the code we can see the
name
variable is fourth, meaning it will be stored
in slot 3
(0-based index).
We now retrieve the account details for this contract in the
manner described previously. We will use block 2,505,997 as our
head. Its block header can be found here:
680000000000263d0d9d32afbe77c7d105253b4ed7750caf23063352936ce357b89a9dd54c9fa24ab1
We take the state root and find the account details as we did
previously. The keccak256 of the address is:
c6c986aabcc27ea73df5b336048692ab9cab96645861b869da7b6935a1aa29ab
We traverse the stateRoot
trie same as before,
until we reach the leaf node for this account:
f12c6be1635c47f9a9aaeef51429e19bf43bcde0fe1ee1894b331dd68e7cab74
From that we extract the storageRoot
for the
account. This is a Merkle-Patricia root, and we can traverse it
like any other Merkle-Patricia Trie:
df20e5cf9e6aef54d16c6123d87957fe1c7c591a82cb03073432ec7375c65648
Now we can find our storage slots. To find slots, we take their
index and find that index on the Patricia trie, starting from
the storageRoot
.
We know the "name" variable is stored in slot 3
.
Slot numbers are always padded to 32 bytes and then hased, so
we take the padded number:
0000000000000000000000000000000000000000000000000000000000000003
Hash it with keccak256:
c2575a0e9e593c00f959f8c92f12db2869c3395a3b0502d05e2516446f71f85b
And traverse the storageRoot
trie to find that key.
After traversing c
, 2
, 5
,
we get to the leaf node that containes our value:
0b49a92e9302e8d45d0ce6acd86eee8ea4a83fc447bc7f9e629febb197ece43d
We can now see the value stored in slot 3
, but it's
encoded:
a04255534420546f6b656e00000000000000000000000000000000000000000014
Solidity encodes strings that are 31-bytes or smaller directly in
a single slot. The first byte (in this example, a0
)
we ignore. The last byte (in this example,
14
) encodes the length of string. 14
is hex, converted to decimal it's 20
. So our string
are the 20 digits following the a0
byte, and then
those bits are our ASCII/UTF encoded string:
4255534420546f6b656e
We can use Javascript to decode it (any other language can also
work):
'4255534420546f6b656e'.match(/.{1,2}/g).map(v => String.fromCharCode(parseInt(v, 16))).join('')Or in Node.js:
Buffer.from('4255534420546f6b656e', 'hex').toString('utf8')And we get:
BUSD Token
Mappings and other dynamic types are a bit more complicated to
retrieve from the storage, becuase of how Solidity allocates
slots for dynamic types.
Every slot in the EVM is 256-bits long. This means that if you
want to save more than 256 bits, you need to come up with a
scheme that would let you save a single variable in multiple
slots. For fixed-size large values, a simple scheme / layout
would be to stack the slots. Let's take as an example a
fixed-size array:
contract Example { uint[2] list; }While we are defining a single variable, it will actually take up two slots.
list[0]
would be located in slot
0
while list[1]
will be located in
slot 1
. Simple enough. Solidity uses something
similar (remember: storage schemes / layouts are
compiler-specific).
Let's continue with our BUSD Token contract from above. We know
that it keeps its _balances
variable in slot
0
. But if we search for slot 0
, which has
the keccak hash:
290decd9548b62a8d60345a988386fc84ba6bc95484008f6362f93160ef3e563
We can't find it in the Patricia trie. That's because Solidity
doesn't save anything in that slot. Instead, we need to look for
the slot of a specific key inside the mapping. Solidity
generates a different slot for every key in the mapping. To find
the slot where the balance of an address is kept, combine the
slot of the mapping and the key (i.e. the address) we are
looking for. Let's take this address as an example:
0x8ab7b1954fbe6c39b146bffd2fb1e8c38138fb4d
.
What Solidity does is it constructs a key from the address and
the slot, using the following formula:
storageSlotNumber = keccak256(abi.encode(mappingKey, variableSlotPosition))Where:
mappingKey
is the key we are searching for.
In our example it's the address
8ab7b1954fbe6c39b146bffd2fb1e8c38138fb4d
variableSlotPosition
is the position of the
mapping variable in the code. In our case, 0
abi.encode()
is the standard EVM ABI
encoding function, which mostly means we need to pad
every paramer to 32 bytes.
8ab7b1954fbe6c39b146bffd2fb1e8c38138fb4d
0000000000000000000000008ab7b1954fbe6c39b146bffd2fb1e8c38138fb4d
0
and pad it to 32 bytes:0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000008ab7b1954fbe6c39b146bffd2fb1e8c38138fb4d0000000000000000000000000000000000000000000000000000000000000000
a194304cfaa67b7f4640d773719472e36ea5de258553109420ff3fb659aa3d1c
0x8ab7b1954fbe6c39b146bffd2fb1e8c38138fb4d
.ed98752026e9e727d97d787c433a482f543a72cfc1e944ffc2a72e460ebb2c4a
storageRoot
trie to find the balance
for the address. After traversing e
and
d
, we reach a leaf node that contains our value:891b1ae4d6e2ef500000
1b1ae4d6e2ef500000
, which comes out to
500*10^18
in decimal, or 500 "BUSD
tokens".
Side note: maybe now you understand why it's impossible to iterate over mappings in Solidity — the keys are not saved anywhere in their raw form, only as hashed values which cannot be reverse-engineered into the original keys. To search for a value of a mapping, you must know the original key you are looking for.
Coming soon
Coming soon