Why Blockchain Systems Use Checkpoints for Recovery and Finality
Table of Links
Abstract and 1. Introduction
-
Key Concepts
2.1 Append-Only Log and 2.2 Virtual Machine State
2.3 Transactions As Curried Functions
2.4 Natural Names of State
2.5 Ground Truth
2.6 Efficient Representations of State
2.7 Checkpoints
2.8 Execution Parameters: callData
2.9 Execution Ordering
2.10 Deciding on the Correct State
-
Ideal Layer 2 Design
3.1 VM Job Queue and Transaction Order Finality
3.2 Data Availability and Garbage Collection
3.3 State Finality
3.4 Checkpoint Finality
-
Conclusion and References
A. Discrepancy Detection Security Parameters
2.7 Checkpoints
Replaying all transactions from genesis is expensive and as a blockchain system is used it quickly becomes prohibitively so for new participants wishing to join the system. The storage cost of the blockchain blocks grows linearly over time, and for Ethereum maintaining any sort of availability for old trie state can become quite expensive. In the past, Bitcoin performed a sort of uber-commit by releasing new clients with “checkpoints”, where the state at some block height is built into the client. Blocks with lower block numbers can be safely discarded, since everyone has a state—the checkpoint state—from which to replay newer transactions. Bitcoin has since removed checkpoints because it was viewed as creating confusion/misconceptions around the security model.
\
In addition to reducing the entry cost for joining the system, checkpointing reduces the cost of record keeping for existing participants. This is a “meta” level of finality: the effect of transactions that landed prior to the checkpoint cannot be disputed, since the records associated with them are likely to be unavailable.
\
Beyond cost of entry or on-going operational costs, checkpoints are also used in conjunction with governance mechanisms for catastrophic error recovery. Many blockchain systems that have experienced problems, e.g., massive token losses due to bugs in critical blockchain code or smart contracts, have resorted to hard forks to recover from such errors despite transactions reaching state finality; checkpointing too soon— if records older than checkpoints are not kept around— would prevent recovery by reverting to the checkpoint state and (optionally) re-executing (using fixed versions of the code, etc) transactions in their original logged order.
\
2.7.1 Checkpoints vs Long-Range Attacks
\
Note that checkpoints addresses a different problem than long-range attacks. In long-range attacks, we are worried about exposure of old cryptographic signing keys used by past consensus committee members in a PoS design [2]. Such members may have exited the ecosystem and their old keys are no longer handled carefully; worse, such past members may rationally decided to auction their signing keys on the dark web, since they no longer hold any tokens and have nothing to lose. Such keys are useless when used to present bogus information to blockchain participants that know the current committee composition and have been tracking transactions and committee elections. However, consider a threat model where a Rip van Winkle victim wakes up after a period of inactivity and is somehow placed in an Inception style virtual world / faux information bubble. That bubble filters out legitimate information about a PoS blockchain’s current state and instead only makes available information constructed to make a forked chain—made feasible due to a super-threshold number of exfiltrated/compromised keys of members from an old consensus committee [2].
\
The Inception information bubble is an interesting threat model. If applied to PoW blockchains, a victim cannot know if the chain that they see is indeed the longest chain—the “longest” predicate amounts to a universal quantifier, and global information is needed. Estimates for how long the chain should be might be feasible if there is trusted time, but that’s probabilistic in nature: block production rate depends on both protocol parameter changes and the number of active miners, which has more to do with the economic attractiveness for participating (relative to all other investments) than with computation power limitations. Furthermore, such a length estimate only applies to the whole chain segment since the victim fell asleep and does not help much with the “longest” predicate since the fork can be quite recent.
\
Without Inception-like powers to mount eclipse attacks [6], an adversary should be unable to confuse potential victims. A potential victim can verify that they have a current view of the blockchain: they just securely query N sources for the hashes of recent blocks on the chain. If a majority M of these hashes are on the same chain and these sources are honest, not eclipsed, and have continued to be blockchain observers , the potential victim will be able to distinguish the global consensus chain from a forked chain created with long-range leaked keys. Here M ≤ N is a security parameter, which can be chosen so that the Inception-esque adversary will have to additionally compromise prohibitively many more keys (or their holders) than just those of old consensus committee members, since any blockchain observer can witness recent blocks and thus N can be much larger than the size of a consensus committee.
\
2.7.2 Zero-day Attacks / Common-mode Failures
\
Checkpointing is intended for addressing the handling of relatively recent zero-day attacks where a super-threshold number of the current consensus committee members have been compromised, or a newly discovered vulnerability in the rollup software is being exploited. We do not envision it being useful for handling attacks that had not been noticed for a long time, since any actual means of addressing it will be complicated. The cascading causal relationship of newer transactions depending on the output states of older transactions is likely to lead to an explosion of transactions that will be aborted in a new interpretation of their effects when they had successfuly committed before.
\
The design decision is whether to perform checkpointing at all, and if so, how old—in real time, block numbers, etc—must a finalized transaction be before it might be included in a checkpoint. This decision is essentially the blockchain version of a statute of limitations in many legal systems. Unlike normal statutes of limitations that specify a time limit that is specific to the type of crime—and for some crimes there are no limits—here the checkpoint is global in scope: all transactions, regardless of which contracts they might be associated with, have to be treated the same way.
\
\
The checkpoint’s state / value association is a temporal barrier: transactions earlier than the checkpoint have “checkpoint finality”, since records that would enable their replay in the alternate bug-fixed environment are unavailable. Note that unlike transaction order finality and state finality which typically occur at the same time as when their log entries are made, i.e., at log finality, checkpoint finality could occur long after the identification of a state as a checkpoint candidate. For example, a system could log that a state will become the next checkpoint once the blockchain’s block height reaches a certain value (which is expected to occur in about six months, say) or when a quorum of time oracles attest that a certain date has been reached. The log entry makes the decision irrevocable, but checkpoint finality does not necessarily occur until other gating conditions are met.
\
Separating state finality from checkpoint finality to handle the possibility of catastrophic failures introduces risk for those who need to take actions external to the rollup based on state finality. External actions cannot always be rolled back. We believe that this can be handled using an insurance model: for example, an insurance policy based on the type of external action and risk profile could be offered to make participants whole, should a checkpoint/replay invoked due to a catastrophic failure cause the proper external action to change.
\
:::info
Authors:
(1) Bennet Yee, Oasis Labs;
(2) Dawn Song, Oasis Labs;
(3) Patrick McCorry, Infura;
(4) Chris Buckland, Infura.
:::
:::info
This paper is available on arxiv under ATTRIBUTION 4.0 INTERNATIONAL license.
:::
\