Authors:
(1) Chenhao Shi, Shanghai Jiao Tong University;
(2) Ruibang Liu, Shanghai Jiao Tong University;
(3) Guoqiang Li†, Shanghai Jiao Tong University.
Table of Links
Abstract
The communities of blockchains and distributed ledgers have been stirred up by the introduction of zeroknowledge proofs (ZKPs). Originally designed to solve privacy issues, ZKPs have now evolved into an effective remedy for scalability concerns and are applied in Zcash (internet money like Bitcoin). To enable ZKPs, Rank-1 Constraint Systems (R1CS) offer a verifier for bi-linear equations. To accurately and efficiently represent R1CS, several language tools like Circom, Noir, and Snarky have been proposed to automate the compilation of advanced programs into R1CS. However, due to the flexible nature of R1CS representation, there can be significant differences in the compiled R1CS forms generated from circuit language programs with the same underlying semantics. To address this issue, this paper uses a data-flow-based R1CS paradigm algorithm, which produces a standardized format for different R1CS instances with identical semantics. By using the normalized R1CS format circuits, the complexity of circuits’ verification can be reduced. In addition, this paper presents an R1CS normalization algorithm benchmark, and our experimental evaluation demonstrates the effectiveness and correctness of our methods.
Index Terms—Zero-knowledge proof; Rank-1 constraint systems; Data flow graph; ZKP programming; Normalization
I. INTRODUCTION
Zero-knowledge proofs (ZKPs) are increasingly recognized for their importance in modern cryptography [1], as one and more cryptographic communities seek to address some of the blockchain’s most significant challenges: privacy and scalability. It is also the essential technique in Zcash [2], [3]. From both user’s and developer’s perspectives, the heightened emphasis on information privacy and security has led to a greater appreciation for the privacy advantages offered by zero-knowledge proofs. As decentralized finance (DeFi) usage grows, zero-knowledge applications that provide scalability and privacy advantages will have more opportunities to increase industry-wide adoption. However, not all computational problems can be directly addressed using zero-knowledge proofs. Instead, we must transform the issue into the correct form of computation. The rank-1 constraint system (R1CS) describes the execution of statements written in high-level programming languages and is used by many ZKP applications, but there is no standard way of representing them [4]. Circom is a novel domain-specific language for transforming computational problems into R1CS format circuits.[5] In the specific process of a first-order zero-knowledge proof, we first convert the problem into a computational problem in Circom, then into R1CS format circuits.
Due to the flexible nature of R1CS representation and variation of program organizations and compiler optimization levels, there can be significant differences in the compiled R1CS forms generated from circuit language programs with the same underlying semantics, which leads to difficulties in further ZKP program analysis and verification.
This paper proposes a data-flow-based algorithm for generating normalization of R1CS, enabling the conversion of different R1CS constraints into a normal form, facilitating the determination of equivalence and correctness. To achieve this, the algorithm starts by transforming an R1CS into a data flow graph structure resembling an expression tree. It then segments and abstracts the data flow graph, eliminating differences between equivalent R1CS constraints that may arise from the generation process. Finally, sorting rules are proposed to sort the constraints and variables within R1CS, ultimately resulting in a unique normal form for equivalent R1CS.
Moreover, we classify and summarize the reasons and characteristics of the different equivalent R1CS generated, based on the constraint generation logic of mainstream compilers and the expressiveness of R1CS. In addition, based on the identified reasons for producing equivalent R1CS, we create a relatively complete benchmark. Our proposed algorithm, which can pass all test cases in the benchmark, demonstrates that equivalent R1CS can be converted into a unique and identical canonical form under various circumstances.
This work contributes to R1CS optimization by providing a novel algorithm for generating canonical forms of equivalent R1CS constraints. Our algorithm can eliminate unnecessary redundancy and normalize representation, thus it can improve existing methods and facilitates the analysis of equivalence and correctness. Furthermore, the effectiveness and practicality of the proposed algorithm are demonstrated through our comprehensive benchmark.
Related Work Eli et al. design, implement, and evaluate a zero-knowledge succinct non-interactive argument (SNARG) R1CS [6]. Historically, research investigating the factors associated with R1CS has focused on satisfiability. In paper[7], where the prover incurs finite field operations to prove the satisfiability of an n-sized R1CS instance [7]. Alexander et al. introduce Brakedown, the first built system that provides linear-time SNARKs for NP [8].
Considering Circom and R1CS format circuits as two languages before and after compilation, research on the generation of the R1CS paradigm is more akin to research on semantic consistency in the compilation. Currently, the patent applications and research papers propose ideas and solutions for generating compilation paradigms in other languages, mainly exploring data flow [9], syntax tree [10], or semantic mapping [11] aspects. These studies offer crucial insights into the fundamental information semantically identical programs entail in the compilation process. However, due to the inherent constraints embedded within the R1CS form, this paper ultimately elects to use data flow as a starting point for research. Paper Organization The paper is organized as follows: In the next section, a brief preliminary review of zero-knowledge proof and related tools. Section 3 provides the process of the proposed algorithm in this paper. The technical exposition in Section 4 explains in detail the logic of the critical steps and their formal description. Section 5 presents the specific categories of benchmarks and their corresponding experimental results. Lastly, Section 6 concludes the present study.
This paper is available on arxiv under CC BY 4.0 DEED license.