State Expiry MVP0.1 Release Blog

Explore the latest state expiry MVP 0.1 release on BNB Smart Chain(BSC)

State Expiry MVP0.1 Release Blog

Introduction

Running a full node can be a daunting task, especially with the exponential growth in storage requirements. In this blog post, we delve into the current issues surrounding state expiry on the BNB Smart Chain (BSC) compared to Ethereum, introduce our innovative solution, and explore how it outshines other chains. Additionally, we'll guide you on using the new state expiry full node and its key advantages.

What’s up with the state?

Referring to the following figure from NodeReal’s BSC Annual Report 2023, it requires at least 1.6TB of disk space to run a BSC full node.

As of writing this blog, it already exceeded 2TB. BSC grows at a much faster pace as compared to other chains like Ethereum due to its high performance and much shorter block time. However, not the entire state is useful. Only a minority of key-value pairs (i.e. accounts and storage slots) have been accessed in the recent period. With this in mind, BSC developers have devised a non-consensus state expiry scheme to minimize the state data that full nodes have to store.

Unveiling the Smallest BSC Full Node

A summary of how the new feature works is that a full node with the state expiry feature enabled gets to prune away a historical state that has not been accessed in a long time. If a new state access operation requires access to the pruned state, then the node will revive the expired state by submitting a request to a third-party regular full node (or what we call RemoteDB) to retrieve a proof and then revive locally. More technical details are in the “How it works” section.

Now, let’s check out the results below:

Note: Chain data here refers to all state data excluding the ancient store (older blocks, header, tx, receipt, etc) and transaction index.

The setups are all based on the path-based-state-scheme (PBSS), which is the most cost-effective and the latest state scheme on BSC. The first setup is a regular full node, used as a benchmark. The second setup is a full node with the state expiry feature enabled, but it’s not pruned yet. Compared to the benchmark, we observed a 0.01% increase in chain data size, which suggests that the new state epoch metadata does not incur a high storage cost. Moving on to the third setup, we have a full node with the state expiry feature enabled and pruned away the contract trie. We can observe a 50.8% chain data size reduction, which is a huge improvement.

But wait, there’s more. For our last setup, we have a full node with the state expiry feature enabled and pruned away both the contract trie and snapshot. Based on the result above, we can observe a 79.4% chain data size reduction, much smaller than the previous setup.

The setups with pruned data have shown a huge reduction in storage cost, but it comes with a tradeoff of reducing the performance of the node. Hence, if you’re a node operator and care little about the performance of the node, this feature is suitable for you. The difference between the fourth setup and the third setup is that the fourth setup has the largest storage reduction but with the lowest performance.

If you run multiple full nodes, now you just need to run one regular full node to act as the RemoteDB and the others can enable this feature to save up some storage space.

What’s so unique about this feature?

Most chains have redundant state data (i.e. data that is not accessed, but persisted forever in the blockchain), but we don’t see any solutions to tackle this problem. Take Ethereum as an example. Ethereum’s goal is to create stateless nodes through Verkle Tree and ePBS, but it may take a long time to fulfill the vision, while the state is still growing.

That’s why we built this feature. By removing useless state data, we can effectively lower the storage requirements for running a full node, so that more people can run one to help decentralize BSC. At the same time, it’s an out-of-protocol solution that allows our developers to continue developing and improving this feature at a much faster pace to suit our users’ needs.

How to use it

Refer to this guide to experience this feature.

How it works

State Epoch Metadata

In this design, we introduced a new state metadata called state epoch. A state epoch is a unit measurement to determine if a state is expired or not. A state epoch period is measured using a fixed number of blocks (e.g. 1 epoch for every 100000 blocks). In our state expiry rule, once a state has been left behind the latest epoch for at least 2 epochs, then it’s considered expired and can be pruned away.

State Expiry

The state epoch metadata is stored in the branch nodes of the contract trie, as shown in the following figure:

The length of the epoch map corresponds to the length of the number of child nodes (i.e. 16), where each epoch points to the direct corresponding child node.

Every state access operation (i.e. SLOAD and SSTORE) requires the traversal from the root of the contract trie to the value in the leaf node. During the traversal process, it is possible to pass by the branch nodes and have their state epoch checked against the state expiry rule. If it’s expired, then an error is returned and the parent process will perform a state revive operation.

During the offline expired pruning process, each contract trie is scanned and the portion of the expired subtries is evaluated. Expired subtries are pruned, deleting the trie nodes from the database and shrinking the trie. The following figure shows an example:

Revive using a RemoteDB

If an expired state has been pruned away and some state operation needs to access the expired state, then state revival needs to be performed. State revive is done by requesting an MPT proof of the expired state from an entity called remoteDB. A remoteDB is simply a regular full node that contains the entire state or an archive node. The following figure illustrates the interaction between the different types of nodes:

Future roadmap

We have the smallest full node, but we still have problems to solve. The next milestone is to build the smallest and most performant full node. Different components can still be optimized which include optimizing RemoteDB query, pruning, and reviving.

Another concern that we know you might have is the RemoteDB. If users only can run a single full node, then it’s not possible to enable this feature. Hence, we are preparing a public RemoteDB endpoint so that anyone can try this feature out.

We built it for you

We understand that it has not been easy to run a BSC full node, and that’s why we built this feature for you. Try it out and let us know your thoughts! If you face any issues, raise a GitHub issue on the BSC repository and the team will be happy to assist you.

Conclusion

State growth is a problem, and we found a way to mitigate the issue. By labeling a state with an additional state epoch metadata, an expired state can be pruned away and recovered later using proof.

About NodeReal

NodeReal is a leading one-stop blockchain infrastructure and service provider that embraces the high-speed blockchain era and empowers developers by “Make your Web3 Real”. We provide scalable, reliable, and efficient blockchain solutions for everyone, aiming to support the adoption, growth, and long-term success of the Web3 ecosystem.

NodeReal’s Semita helps developers build their custom Application Chains or scale their blockchains with layer 2 solutions, like ZK Rollup and Optimistic Rollup.

Join Our Community

Join our community to learn more about NodeReal and stay up to date!

Discord | Twitter| Youtube | LinkedIn