Our best practices in hosting a BSC validator
In this article, we’re going to disclose our best practices in hosting a BSC validator.
Binance Smart Chain (BSC) is a decentralized and censorship-resistant blockchain network, using Proof of Staked Authority (PoSA) consensus that can support shorter block time and lower fees. In the very beginning, a few trusted nodes ran as the initial Validator Set, after the blocking started, anyone could compete to join in as candidates to be elected as a validator.
In this article, we’re going to disclose our best practices in hosting a BSC validator.
Topology
One of the goals of BSC is to shorten the Blocking time. To achieve this goal, NodeReal validator joins the P2P network directly to save another hop. However, we observed that there were some peers doing heavy transactions on validators nodes, delaying the overall blocking time and causing high system load. We improved the architecture by taking these actions:
- We put the validators inside NAT and disabled the Peer Discovery function by setting NoDiscovery = true, to ensure the performance and avoid being impacted by the external peers.
- We set up sentry nodes to enhance the networking.
- The validator only connects with stable peers and sentry nodes, while the sentry nodes are open to the network and help to propagate blocks and transactions.
- We get another service crawling the P2P networking to find stable synced peers, and add them to the validator dynamically. (This service will be launched soon on NodeReal, we will expose most of our sentry nodes within the service to enhance the BSC networking)
Hardware
We chose AWS as the infrastructure provider. The validator is located in Ireland and used the m5zn.3xlarge(12core, 48G) instance to deploy the validator. The validator is mounted with 3T GB solid-state drive(SSD), gp3, 10k IOPS, 250MB/s throughput.
RPC
The validator should run within a NAT environment without exposing their IP address to the public. Since validators will unlock wallets and sign blocks, its RPC port can only listen to localhost or even disable the RPC function, the maintainer can still use `geth attach geth.ipc` to interact with the node.
Storage
We want the validator to always keep a light storage by pruning the storage, in order to improve the storage latency.
The steps we use to perform the pruning:
- Stop the bsc node first.
- Run nohup geth snapshot prune-state — datadir {the data dir of your bsc node} &. It will take 3–5 hours to finish.
- Start the node once it is done.
In order to make our validator always online, we maintain a few backup validator nodes so that we can switch to the backup ones when the online instance needs to stop and prune.
When the node crashes or is force killed, the node will sync from a block that was a few minutes or few hours ago. This is because the state in memory is not persisted into the database, and the node needs to replay blocks from the last checkpoint. For a validator, we hope the replaying time to be as short as possible, it is dependent on the configuration TrieTimeout in the config.tom. We set it to 150000000000, which means the node will write a checkpoint about each 20 minutes.
Backup Nodes
Always running 2–3 backup nodes. For backup nodes, the TrieTimeout setting can be much larger. You don’t have to restart the backup node when switching the validator, try attaching the geth.ipc and enable the miner module manually, it can avoid missing ant blocks.
Monitor
We enable the prometheus metrics on the validator and build a dashboard on grafana. The most important metrics are block height and storage occupation. We also monitor the event that happens on the slash contract, and should stay alert if the slashed validator address happens to be ours.
NodeReal, the top BSC validator
As of November 2021, NodeReal has proven its capabilities and now has more than 4% voting power without any slash and is the top of 21 validators.
Reference: https://www.binance.org/en/staking