How Layer 2 Rollups Handle Data Availability and Garbage Collection Efficiently

cover
16 Apr 2025

Abstract and 1. Introduction

  1. Key Concepts

    2.1 Append-Only Log and 2.2 Virtual Machine State

    2.3 Transactions As Curried Functions

    2.4 Natural Names of State

    2.5 Ground Truth

    2.6 Efficient Representations of State

    2.7 Checkpoints

    2.8 Execution Parameters: callData

    2.9 Execution Ordering

    2.10 Deciding on the Correct State

  2. Ideal Layer 2 Design

    3.1 VM Job Queue and Transaction Order Finality

    3.2 Data Availability and Garbage Collection

    3.3 State Finality

    3.4 Checkpoint Finality

  3. Conclusion and References

A. Discrepancy Detection Security Parameters

3.2 Data Availability and Garbage Collection

The availability of callData and state representation are both important. From the transactions and the ordering, we can reconstruct the state value. From a finalized state, earlier callData is not needed except for catastrophic failures; from a checkpoint finalized state, earlier callData is no longer accepted.

While the callData is typically stored on-chain, the state representation is too large and cryptographic commitments are used instead. That doesn’t mean that transaction callData has to be on-chain.

One scenario is that an external trusted fair scheduling service could be authorized to determine transaction order, with the job schedule stored in an external CAS store with availability guarantees, so that all that the only on-chain data is a cryptographic hash commitment. The external highly available storage would have to have service-level agreements to maintain the data to remain accessible until at least the data ages beyond checkpoint finality.

The scheduler could run as a decentralized application on a separate blockchain. In this case, the scheduling algorithm / code is open for inspection and audit and does not have to be trusted; its execution would derive integrity from the blockchain on which it runs, and data availability derives from the availability of this blockchain as well.

Rather than using a separate blockchain, the scheduler could run on the rollup VM, with the traditional mempool used only for submitting callData to the scheduler via job-submission transactions. In such a design, the callData and transaction order would simply be stored as scheduler state in the existing decentralized data availability layer. The rollup executors would have access to the storage services that holds this and no external highly available data storage service would be needed.

With this kind of integration, the normal state garbage collection mechanism handles reclaiming precheckpoint callData: the scheduler can keep the callData in its persistent storage and only remove these entries after a checkpoint has occurred.

3.3 State Finality

State finality determination is where replicated computation occurs, and where efficiency and computation integrity appear to be diametrically opposing goals.

Using reasonable security parameter estimates (see calculations from Appendix A), we see that if zkSNARK proof generation costs more than about 25× the cost of normal execution, then it will have essentially no replication efficiency advantage, ignoring the costs of the replicated proof verification. Even if it could be faster, it increases the time to state finality as compared to DD/DR, where the rollup committee members execute in parallel. If we are willing to move away from EVM compatibility for new smart contracts, depending on the contract language and runtime environment design, it should be feasible to allow smart contract to execute at (near) native code speeds with comparatively less engineering effort than proof generation/verification, making the comparison even more one sided.

More importantly, the DD/DR approach is straightforward to analyze and its implementation is much simpler. Both are critical traits for practical security: users need to understand why they should trust a system; more importantly, complexity is antithetical to correct, auditable implementations.

Absent a breakthrough in proof generation performance, a DD/DR implemented between commitchain committee and bridge contract provides a good design choice for security, efficiency, and generality.

Note that just as there is the verifier’s dilemma for ZK and optimistic rollups, there is a parallel problem in replicated execution / voting schemes. Execution piggybacking is a potential problem, where an executor doesn’t bother to compute their own resultant state but just re-uses the results from another (hopefully honest) executor. To address this, we could try to do something like commit/reveal for results, or have confidential compute requirement for the rollup. Except for trusted computing style mechanisms, such approaches are insufficient: a group of rational executors can collude to save costs. Fortunately, as long as there is at least one altruistic executor to cross verify, the DD/DR approach works.

3.4 Checkpoint Finality

DD/DR handles independent failures, e.g., through bribery of key personnel, bugs in custom deployment infrastructures, etc. The existence of zero day vulnerabilities / bugs mean that there are common-mode failures. The “traditional” way to address such commonmode failures—especially when their exploit was large scaled—is to hard fork and change a checkpoint’s hard-wired mapping from a natural name to a state value/representation.

We don’t have a better solution, just (hopefully) better terminology / clearer analysis: checkpoints must lag the current head of the blockchain significantly to permit detection and recovery from common-mode failures, but not so far as to make the long-term highavailability data retention too costly. This is a balancing act, and the appropriate value depends on the estimated likelihood of common-mode failures, business requirements of dependent contracts, etc.

When to perform a checkpoint should be formalized and made part of the system design and be subject to change by governance. Storage reclamation / garbage collection and catastrophic fault handling are intimately connected. The checkpoint state representation and all newer transaction callData must be available to allow recomputation of transaction results in case of common-mode catastrophic faults, but storage for older state representations or callData can be safely reclaimed. In order for this to work, the servers in the decentralized data availability layer must be aware of rollup state, e.g., act as a light client and observe checkpoint determination events themselves or rely on witnesses that relay the information.

Authors:

(1) Bennet Yee, Oasis Labs;

(2) Dawn Song, Oasis Labs;

(3) Patrick McCorry, Infura;

(4) Chris Buckland, Infura.


This paper is available on arxiv under ATTRIBUTION 4.0 INTERNATIONAL license.