Visualising the 7-block reorg on the Ethereum beacon chain
Yes, re-orgs are bad. No, this situation is not what PoS Ethereum is.
On May 25th 2022, at 08:55 UTC, a 7 block re-org occurred on the beacon chain. We aim here to provide a visual guide to understand the Ethereum Proof-of-Stake protocol and why the re-org happened.
In a nutshell, the re-org is not expected behaviour of the beacon chain. It occurred due to an unhappy combination of three distinct causes:
A late block proposal split views of validators due to a recent fork choice update called proposer boost.
The proposer boost update was released as a soft fork, seen as a local change only that could be rolled out over the network at its own pace. This created a situation where some validators were using proposer boost, and some weren’t, splitting the views.
A known, incorrect implementation of when the fork choice is expected to be run prevailed in some clients, leading to the persistence of the fault.
Importantly, the re-org did not result in loss of finality. Finality was not even delayed. So we need to be more precise when we talk about what exactly re-orged here.
In the following, we recap’ the basics of Proof-of-Stake Ethereum and look into how the re-org eventually happened.
Proof-of-Stake in Ethereum 101
Proof-of-Stake Ethereum features two consensus mechanisms running in parallel:
The first, FFG Casper, is responsible for providing economic finality to the chain. Every epoch, “source” and “target” votes from attesters are tallied up. When a checkpoint block (the start of an epoch) accumulates enough target votes, it becomes justified. When a recently justified source is used to justify a target checkpoint, the target checkpoint is justified and the source becomes finalised. The gist is that no conflicting checkpoint block can be also finalised without at least 1/3 of the active stake getting slashed. So there can be no re-org of the finalised chain without a massive loss of ETH by some party. But the present re-org has nothing to do with this part of the consensus.
The second component, LMD GHOST, aims to provide a dynamically available ledger, i.e., grow the chain while finalisation runs its course. Blocks on that chain accumulate weight bestowed upon them by the “head” vote from attesters. This weight is used to figure out, in the event of a fork, which branch should the validator follow. Because weight is expected to accumulate quickly after a block is released, forks are expected to be rare, and a block that is timely shouldn’t expect to be re-orged.
In PoS Ethereum, time is divided in slots, each slot 12 seconds in length. Each slot, a proposer is chosen, who builds their block on top of what they believe to be the head chain according to the LMD-GHOST fork choice rule (hereafter only referenced as the fork choice). Proposers are expected to release their block at the beginning of the slot, while attesters are expected to release their source/target/head vote 4 seconds into the slot or upon receiving the block from their slot, leaving plenty of time for the proposer to make their block seen by attesters of their own slot.
Despite the buffer, it was recently discovered that a late proposer could maliciously “ex ante” re-org the proposer in the next slot. Meanwhile, proposer boost was suggested to resolve the possibility of split views with balancing attacks, and applied to ex ante re-orgs as well. Using proposer boost, attesters give a weight boost to the proposer of the current slot they are attesting in, in essence preferring timely proposals. We’ll see how this works out in practice in the following.
So how did it fork?
Let’s see how the fork happened first, before we break it down step-by-step. We represent blocks as rectangles while the attestations (votes) are represented as circles, the larger the circle, the more attestations voting for a specific block. You can see the voting weight accumulate over time as blocks receive more and more attestations.
Roughly, blocks 74 and 75 appeared at the same time, creating a fork. A string of proposers made blocks building on 75, while the fork of block 74 was accumulating more weight than the competing branch. Eventually, proposer 82 built on 74, ending the fork at the cost of re-orging blocks 75 to 81. Each step is now carefully analysed to see how this happened.
The fork step-by-step
At slot 73, attesters of slot 73 vote for block 73, which arrived on time. So far, so good.
At slot 74, no block showed up, so attesters of slot 74 are voting for the block at slot 73 instead, increasing its weight.
Blocks 74 and 75 both show up around the same time, at the start of slot 75, as block 74 is late. Proposer boost is designed to give more weight to 75, so that attesters at slot 75 prefer block 75 rather than block 74, since block 75 is timely in their view. However, not all attesters were using proposer boost, so the votes are split almost 50-50.
Attesters running a client without proposer boost activated prefer block 74.
Attesters running a client with proposer boost activated prefer block 75.
We can see the split clearly as votes accumulate for both sides of the fork, with non-boosted attesters preferring slot 74 and boosted attesters preferring slot 75. It turns out that slightly more attesters didn’t have proposer boost activated, so the upper fork has slightly more weight than the lower fork.
Now the question is: given that block 74 had more weight than block 75, why did proposer 76 build their block on top of 75?
The reason is subtle and has to do with a client behaviour that is updated at the moment. Proposers are expected to run the fork choice before proposing, set at the time of their slot. However, the proposer at 76 was using the fork choice computation they ran set at the time of the previous slot. Proposer boost applies to the proposer of the current fork choice slot. Hence, proposer 76 was incorrectly boosting proposer 75 in their view of the current head, and selected block 75 as the parent of their block. From the point of view of proposer 76:
Attesters on 75 + Proposer boost on 75 > Attesters on 74
Meanwhile, attesters at slot 76 keep splitting their votes between the heaviest chain without the boost (with head block 74) and the heaviest chain with the boost (with head block 76).
The scenario repeats until slot 81. Notice that all attesters from slots 75 to 81 who are not using the proposer boost keep voting for block 74, increasing its weight.
Finally, a proposer comes in who does not apply the proposer boost on the previous slot to decide where to build their own block. Proposer 82 sees block 74 as heavier than block 81. While block 81 inherits the weight of all blocks behind it, remember that there is always slightly more attesting weight voting for 74 than for any of the blocks 75 to 81…
There is no confusion for attesters at slot 82 here:
Those who do not use proposer boost clearly see 82 as winning, for the same reason why block 82 built on block 74: block 74 is heavier than block 81.
Those who use proposer boost even more so see block 82 as winning. The votes are no longer split.
Finally, the chain resumes its course, with proposer 83 building on block 82. By this point it is clear that blocks 75 to 81 have been effectively re-orged.
Takeaways
The re-org highlights a failure case of the dynamically available chain, one that is in the realm of the theoretically possible but practically unthinkable, much like long re-orgs in Proof-of-Work are possible but rarely seen in practice (barring adversarial behaviour). So it is important to recognise that contributing factors to the present re-org are purely accidental:
A late block can always happen, there is no way around that. The dynamically available chain is in principle designed to deal with this eventuality fairly, so that more timely proposers see their blocks accepted in the canonical chain.
But it is clear that even changes which appear to be “local only” (as the fork choice computation is) need to be considered in the larger picture of the consensus. Ethereum protocol researchers have become familiar with the idea of “split views” between validators, where one set of validators sees something locally and another set sees something else, and how these split views can contribute to delaying liveness. It should have been recognised that an uneven rollout of proposer boost had the potential to create such a split view. It was made even worse by a known implementation fault.
The issue would not have happened were all validators running the same configuration! In particular, it would not have happened post-Merge, as all validators must hard-fork to the Merge updates previous to the Merge, or be entirely kicked out of the consensus.
PoS Ethereum is a hybrid consensus, designed for robustness in adversarial environments. A recent thread by Sreeram Kannan highlights well the challenges that come with it:
Known limits of the consensus, also mentioned in the research Sreeram links in his thread, are driving protocol changes. While it is deeply unsatisfactory when such failures occur, they do not (in my own opinion) invalidate the approach that Ethereum is taking. But they do ask us to tread carefully, and create a protocol that is not only robust in theory but also in practice.
Many thanks to all the researchers and developers who helped make sense of this issue. Specifically, Paul Hauner detected the re-org within minutes, then triaged and diagnosed the cause of the issue over the next hour in a group chat with other consensus client developers. Thanks to Martin Köppelmann for publicly raising the issue, Jacek Sieka for providing me with data supporting the present analysis, Potuz for his own analysis, Terence Tsao for his thread that quickly gave insights to the community, Michael Sproul for additional data, and Caspar and Francesco for their whiteboarding skills on a late Wednesday evening.
Diagram plotting code available here.
Impressive explanation