This Study Uses Markov Models to Break Blockchain Fairness

1 Jul 2025

Authors:

(1) Krishnendu Chatterjee, IST Austria, Austria ([email protected]);

(2) Amirali Ebrahimzadeh, Sharif University of Technology, Iran ([email protected]);

(3) Mehrdad Karrabi, IST Austria, Austria ([email protected]);

(4) Krzysztof Pietrzak, IST Austria, Austria ([email protected]);

(5) Michelle Yeo, National University of Singapore, Singapore ([email protected]);

(6) Ðorđe Žikelić, Singapore Management University, Singapore ([email protected]).

Table of Links

Abstract and 1. Introduction

1.1 Related Work

Preliminaries

2.1 System Model

2.2 Selfish Mining Objective

2.3 Markov Decision Processes
Selfish Mining Attack

3.1 Overview

3.2 Formal Model

3.3 Formal Analysis

3.4 Key Features and Limitations
Experimental Evaluation
Conclusion, Acknowledgments, and References

A. NAS Mining Objectives

B. Efficient Proof Systems

C. Proof of Theorem 3.1

ABSTRACT

We study selfish mining attacks in longest-chain blockchains like Bitcoin, but where the proof of work is replaced with efficient proof systems – like proofs of stake or proofs of space – and consider the problem of computing an optimal selfish mining attack which maximizes expected relative revenue of the adversary, thus minimizing the chain quality. To this end, we propose a novel selfish mining attack that aims to maximize this objective and formally model the attack as a Markov decision process (MDP). We then present a formal analysis procedure which computes an 𝜖-tight lower bound on the optimal expected relative revenue in the MDP and a strategy that achieves this 𝜖-tight lower bound, where 𝜖 > 0 may be any specified precision. Our analysis is fully automated and provides formal guarantees on the correctness. We evaluate our selfish mining attack and observe that it achieves superior expected relative revenue compared to two considered baselines.

In concurrent work [Sarenche FC’24] does an automated analysis on selfish mining in predictable longest-chain blockchains based on efficient proof systems. Predictable means the randomness for the challenges is fixed for many blocks (as used e.g., in Ouroboros), while we consider unpredictable (Bitcoin-like) chains where the challenge is derived from the previous block.

1 INTRODUCTION

Bitcoin. Blockchain protocols were proposed as a solution to achieve consensus over some states (e.g., financial transactions) in a distributed and permissionless (i.e., everyone can participate in securing the chain) setting. Participants in blockchain protocols add data into blocks, which are then appended to the blockchain with some probability that depends on the underlying consensus protocol. The earliest and most commonly adopted consensus protocol is proof of work (PoW), which also forms the basis of the Bitcoin blockchain protocol [21].

Proof of work. The parties that maintain a PoW blockchain like Bitcoin are called miners. The general idea is that in order to add a block to the chain, the miners derive a computationally hard but easily verifiable puzzle from the tip of the chain. To add a block to the chain, this block must contain a solution to this puzzle. This mechanism ensures that attacking the chain, in particular rewriting past blocks in a double spending attack, is computationally very expensive. In Bitcoin, the puzzle is a hashcash style PoW [2], with parameters being a difficulty level 𝐷 and a global hash function (e.g., SHA256). A newly generated block can only be added to the chain if the hash of the new block, the previous block, the miner’s public key and some nonce give a value that is less than the current difficulty 𝐷. To mine a block in Bitcoin, miners will continuously generate different nonces and hash them until they find a nonce that passes the threshold. The lucky miner that first finds a block broadcasts it across the network, where other miners can easily verify the validity of the nonce. The Bitcoin protocol specifies that miners should always work towards extending the longest chain they are aware of, hence such chains are called “longest-chain blockchains”. An alternative approach is chains based on Byzantine Fault Tolerance like Algorand [6] where randomly rotating committees approve blocks to be added.

Efficient proof systems. As an alternative to PoW, several other consensus protocols based on efficient proof systems have been proposed [8, 9, 23]. Generally, we say a “proof of X” is efficient if computing a proof for a random challenge is very cheap assuming one has the resource X. The most popular and best investigated efficient proofs are Proofs of Stake (PoStake) as e.g., used in Ouroboros [9] or post-merge Ethereum. Here the coins recorded in the blockchain are the resource. Another proposal of an actual physical resource are Proofs of Space (PoSpace) [10], where the resource is disk-space. The first proposal for a PoSpace based chain was Spacemint [23]. The first deployed chain was the Chia network [8], it uses PoSpace in combination with verifiable delay functions (VDFs) to address some security challenges, and thus is referred to as a Proofs of Space and Time (PoST) based chain. See Appendix B for more details of blockchains based on efficient proof systems that we consider in our work.

Selfish mining. There are various security properties we want from longest-chain blockchains, the most important ones being persistence and liveness [16]. Informally, persistence means that entries added to the chain will remain there forever, while liveness means that the chain remains available. A less obvious requirement is fairness, in the sense that a party that contributes a 𝑝 fraction of the resource (hashing power in PoW, space in PoST, staked coins in PoStake) should contribute a 𝑝 fraction of the blocks, and thus get a 𝑝 fraction of the rewards.

It was first observed in [11] that Bitcoin is not fair in this sense due to selfish mining attacks. In selfish mining, attackers mine blocks but occasionally selectively withhold these (so the honest miners cannot mine on top of those) to later release them, overtaking the honest chain and thus orphaning honest blocks. In doing so, the attackers can reduce the chain quality of the blockchain [11], this measure quantifies the fraction of blocks contributed by an adversarial miner

Although the analysis of selfish mining and its impact on chain quality is well-studied in Bitcoin and other PoW-based blockchains [11, 19, 27, 31], a thorough analysis of optimal selfish mining attacks in blockchains based on efficient proof systems is difficult due the fact that the computational cost of generating proofs is low in these systems. This gives rise to several issues that in the PoStake setting are generally referred to as the “nothing-at-stake” (NaS) problem. [1]

The security vs predictability dilemma. Assume one would naïvely replace PoW in Bitcoin with an efficient proof system like PoStake. As computing proofs is now cheap, an adversarial miner can try to extend blocks at different depths (not just on the tip of the longest chain), growing private trees at each depth. If they manage to create a tree of depth 𝑑 that starts less than 𝑑 blocks deep in the public chain, releasing the longest path in this tree will force the honest miners to switch to this path. Such “tree growing” attacks can be prevented by diverting from Bitcoin-like protocols, and instead of using the block at depth 𝑖 to derive the challenge for the block at depth 𝑖 + 1, one uses some fixed randomness for a certain number of consecutive blocks. Unfortunately, this creates a new security issue as an adversary can now predict when in the future they will be able to create blocks [3, 5]. We note that this approach is used in the PoStake-based Ouroboros [9] chain, where one only creates a fresh challenge every five days. An extreme in the other direction is the PoSpace-based Spacemint [23] chain or an early proposal of the PoST-based Chia chain [7] which, like Bitcoin, derive the challenge from the previous block (the deployed Chia design [8] uses a fresh challenge every 10 minutes, or 32 blocks).

(1) (Model) Although there have also been some very recent works analysing selfish mining attacks in blockchains based on efficient proof systems [8, 12, 14, 19, 28, 29], these analyses so far have only focused on predictable protocols.

(2) (Methodology) Furthermore, they either consider different adversarial objectives [8, 12, 14, 29] or use deep reinforcement learning to obtain selfish mining strategies [19, 28]. However, deep reinforcement learning only empirically maximizes the objective and does not provide formal guarantees on the quality (lower or upper bound) of learned strategies.

Our approach. In this work, we present the first analysis of selfish mining in unpredictable longest-chain blockchains based on efficient proof systems. Recall that in such chains the challenge for each block is determined by the previous block. At each time step, the adversary has to decide whether to mine new blocks or to publish one of their private forks which is longer than the public chain. Our analysis is concerned with finding the optimal sequence of mining and fork reveal actions that the adversary should follow in order to maximize the expected relative revenue (i.e., the ratio of the expected adversarial blocks in the main chain when following a given strategy compared to the total number of blocks in the chain). Note that this is a challenging problem. Since at each time step the adversary can choose between mining new blocks and publishing any of its private chains, the strategy space of the adversary is exponential in the number of privately mined forks which makes the manual formal analysis intractable.

To overcome this challenge, we model our selfish mining attack as a finite-state Markov Decision Process (MDP) [26]. We then present a formal analysis procedure which, given a precision parameter 𝜖 > 0, computes

• an 𝜖-tight lower bound on the optimal expected relative revenue that a selfish mining strategy in the MDP can achieve, and

• a selfish mining strategy achieving this 𝜖-tight lower bound.

At the core of our formal analysis procedure lies a reduction from the problem of computing an optimal selfish mining strategy under the expected relative revenue objective to computing an optimal strategy in the MDP under a mean-payoff objective for a suitably designed reward function. While we defer the details on MDPs and mean-payoff objectives to Section 2, we note that solving meanpayoff finite-state MDPs is a classic and well-studied problem within the formal methods community for which efficient (polynomialtime) algorithms exist [15, 26] and there are well-developed tools that implement them together with further optimizations [18, 20]. These algorithms are fully automated and provide formal guarantees on the correctness of their outputs, and the formal analysis of our selfish mining attack naturally inherits these desirable features.

Contributions. Our contributions can be summarized as follows:

(1) We study selfish mining in unpredictable efficient proof systems blockchain protocols where the adversary’s goal is to maximize expected relative revenue and thus minimize chain quality.

(2) We propose a novel selfish mining attack that optimizes the expected relative revenue in unpredictable blockchain protocols based on efficient proof systems. We formally model the attack as an MDP.

(3) We present a formal analysis procedure for our selfish mining attack. Given a precision parameter 𝜖 > 0, our formal analysis procedure computes an 𝜖-tight lower bound on the optimal expected relative revenue in the MDP together with a selfish mining strategy that achieves it. The procedure is fully automated and provides formal guarantees on its correctness.

(4) Our formal analysis is flexible to changes in system model parameter values. For instance, it allows us to tweak system model parameters and study their impact on the optimal expected relative revenue that selfish mining can achieve in a fully automated fashion and without the need to develop novel analyses for different parameter values. This is in stark contrast to formal analyses whose correctness is proved manually, see Section 3.4 for a more detailed discussion.

(5) We implement the MDP model and the formal analysis procedure and experimentally evaluate the quality of the expected relative revenue achieved by the computed selfish mining strategy. We compare our selfish mining attack to two baselines: (1) honest mining strategy and (2) a direct extension of the PoW selfish mining strategy of [11] to the setting of blockchains based on efficient proof systems. Our experiments show that our selfish mining strategy achieves higher expected relative revenue compared to the two baselines.

A Remark on the Model. While the selfish mining attacks analyzed in this paper apply to PoStake, PoSpace and PoST based longest-chain blockchains, they capture PoST better than PoStake and PoSpace as we will shortly outline now. Due to the use of VDFs, PoST is “strongly unpredictable” in the sense that a miner cannot predict when they will be able to extend any block, while PoStake and PoSpace are “weakly unpredictable” in the sense that a miner can predict when they will find blocks on top of their own (but not other) blocks. Moreover, in PoST a malicious miner must run a VDF on top of every block they try to extend, while in PoStake or PoSpace this comes basically for free. The class of selfish mining attacks analyzed in this paper does not exploit “weak unpredictability” (a necessary assumption for PoST, but not PoStake or PoSpace) and, to be able to give automated bounds, we also assume a bound on the number of blocks one tries to extend, which is a realistic assumption for PoST (as each block requires a VDF) but less so for PoSpace or PoStake.

Optimal selfish mining strategies. There have been some works that go beyond proposing and analysing specific selfish mining strategies to actually claiming the optimality of these strategies. The work of [27] modelled the Bitcoin protocol as an MDP and proposed a method using binary search to solve it approximately in order to find an optimal selfish mining strategy. [31] improved the search method in [27], resulting in a method that finds an optimal strategy but with an order of magnitude less computation. [19] and [28] use deep reinforcement learning to automate discovery of attack strategies in Bitcoin and Nakamoto-PoStake, respectively, but without guarantees on optimality. Finally, [13] suggested modelling mining strategies in PoStake as an MDP and using an MDP solver to find optimal selfish mining strategies. However, due to the infinite size of their MDP, finding even an approximately optimal solution is undecidable.

This paper is available on arxiv under CC BY 4.0 DEED license.

[1] We will slightly abuse terminology in our work and continue to use the terms “mining” and “miners” from PoW-based chains also when discussing chains based on efficient proof systems even though when using PoStake this is sometimes referred to as “proposing” and “proposers” (but it is not coherent, some works like [13, 14] use mining) while in PoSpace it was suggested to use the term “farming” and “farmers”.

← Previous

Mathematical Proofs for Truthful Rebate Mechanisms (TFRM)

Up Next →

A Model for Blockchain Mining Under Adversarial Conditions