This paper is available on arxiv under CC BY 4.0 license.

**Authors:**

(1) Zhibo Xing, School of Cyberspace Science and Technology, Beijing Institute of Technology, Beijing, China, and the School of Computer Science, The University of Auckland, Auckland, New Zealand;

(2) Zijian Zhang, School of Cyberspace Science and Technology, Beijing Institute of Technology, Beijing, China, and Southeast Institute of Information Technology, Beijing Institute of Technology, Fujian, China;

(3) Jiamou Liu, School of Computer Science, The University of Auckland, Auckland, New Zealand;

(4) Ziang Zhang, School of Cyberspace Science and Technology, Beijing Institute of Technology, Beijing, China;

(5) Meng Li, Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education; School of Computer Science and Information Engineering, Hefei University of Technology, 230601 Hefei, Anhui, China; Anhui Province Key Laboratory of Industry Safety and Emergency Technology; and Intelligent Interconnected Systems Laboratory of Anhui Province (Hefei University of Technology)

(6) Liehuang Zhu, School of Cyberspace Science and Technology, Beijing Institute of Technology, Beijing, 100081, China;

(7) Giovanni Russello, School of Computer Science, The University of Auckland, Auckland, New Zealand.

**TABLE OF LINKS**

Zero-Knowledge Proof-Based Verifiable Machine Learning

Existing Scheme , Challenges and Future Research Directions

Conclusion, Acknowledgment, and References

## IV. EXISTING SCHEME

We have summarised the existing schemes associated with ZKP-VML in table IV. For the existing schemes, we first analyse them according to their technical route in subsection IV-A, and then analyse their performance and application scenarios in subsection IV-B.

It is worth mentioning that in the existing work we do not consider general zero-knowledge proof schemes such as DIZK[58], zk-Authfeed[59], etc. that can be used for machine learning and many other scenarios. Also not considered are schemes such as Drynx[60], GOPA[61], etc. that include only a small portion of zero-knowledge proofs in their schemes (e.g., using zero-knowledge proofs to prove the correctness of adding noise in differential privacy). Some schemes such as Ju [62] and Ghaffaripour [63] that have only theoretical analysis without experimental validation are also omitted.

Meanwhile, some non-zero-knowledge schemes based on the sumcheck protocol are considered. One aspect is that they are well suited for outsourced learning scenario, which is an integral part of verifiable machine learning. On the other hand, because these schemes can be zero-knowledge through simple modifications, their introduction actually broadens the possible technological routes.

*A. Technical Route Analysis*

To solve the challenges mentioned in figure 5, existing work has primarily concentrated on two main aspects: 1) the arithmetization and quantization of the machine learning computation process, and 2) the reduction of computational costs and enhancement of proving efficiency. According to their different focuses, the existing schemes can be can be classified into two main categories and three subcategories:

• Transforming machine learning into zero-knowledge proof.

• Improving the efficiency of zero-knowledge proof,

– through optimized proof of specific equation.

– through approximate description of specific computation.

– through changes of specific algorithm structure.

We have also further subdivided the existing schemes according to the above classification, as shown in figure 6.

- Transforming into zero-knowledge proof: This type of work aims to bridge the gap between zero-knowledge proof systems and the machine learning processes, allowing for more practical verification of machine learning procedures. With the exception of schemes designed specifically to address generalizability, most ZKP-VML schemes have also proposed their own simple solutions. We will first present these simple solutions as an introduction to the problem.

In the context of converting decimals to integers, the key concern revolves around determining the mapping method that minimizes the impact of precision loss on model performance, given a specific number of variable bits. On one hand, the range of integers that can be represented by a given number of variable bits is fixed. This constrains decimals from exceeding this representation range after mapping. On the other hand, the converted integer computational process should ideally have as many significant bits as possible to reduce the influence of computational precision on the final model accuracy. In SafetyNets, it deals with computation processes in the form of zi = wiyi + bi1 T . Initially, it ensures that all decimal weights and input data fall within the range [− p−1 2 , p−1 2 ].

Subsequently, these data undergo multiplication by factors, such as wi = ⌊βw’i⌉ and xi = ⌊αx’i⌉, while bi is set as bi = ⌊α 2 i−1 β (2i−1+1)b’i⌉. This approach achieves an integer computation process. Similarly, in the case of VeriML, for each input is constrained to have at most l bits of decimal points. To achieve the integer computation, the inputs are multiplied by 2 l , and appropriate scaling factors are applied in the equations to prevent inconsistent amplification. Ruckel achieve the transformation for numbers with up to x decimal places by multiplying the floating-point numbers by 10x . zkCNN employs the approach proposed by Jacob et al. [64], which involves transforming a real number a into an integer q using a real number L and an integer Z such that a = L(q − Z). Additionally, modifications are made to the addition and multiplication operations to guarantee the correctness of the transformed constraints. These methods

share similarities because they address a common challenge. The challenge arises when considering a simple computation process involving only addition, subtraction, multiplication, and division operations. The constraints of this process can be flattened into several simple binary equations, such as a+b = c or a ∗ b = c. Constraints involving subtraction and division can be equivalently expressed through inverse operations. A naive scaling approach is to identify the largest number within all the computation equations. Based on the upper bounds of the expression ranges, a scaling factor can be determined. By appropriately scaling and truncating both sides of the equation, the integer constraints for the original computation process can be obtained.

Next, we will introduce three solutions that aim to efficiently address the issue of generality. Compared to the previous solutions, these approaches not only consider more practical problems in specific zero-knowledge proof system, but also improve performance and efficiency, allowing future ZKPVML schemes to directly or indirectly utilize them as arithmeticization modules.

Weng et al. proposed Mystique [65], an efficient conversion solution between zero-knowledge proofs and machine learning. Mystique is based on the existing sVOLE-based zeroknowledge proof protocol QuickSilver [66] and gives efficient conversion schemes between arithmetic/boolean values, committed/authenticated values and fixed-point/floating-point values. The first conversion aims at switching circuit types to improve efficiency based on specific computations. The second conversion aims at allowing publicly committed data to be simply used in zero-knowledge proof scheme. The third conversion aims at solving the inconsistency between floating point numbers used in machine learning algorithms and fixedpoint number used in cryptographic algorithms. In addition, the scheme is also optimized for matrix multiplication in terms of proof cost. It is remarkable that the above solutions are integrated into Rosetta [67], a privacy-preserving framework based on TensorFlow [68], which means that developers can simply call these interfaces and ignore the cryptographic details involved.

Feng et al. presented ZEN [69], a compiler for zero-knowledge verifiable neural network accuracy and inference. ZEN can compile pyTorch model to R1CS constraints. ZEN has two key components: one is R1CS friendly quantization, and the other is stranded encoding of R1CS constraints. The quantization algorithm converts the floating-point neural network model into an arithmetic circuit (R1CS constraints) over a finite field. In this quantization algorithm, ZEN avoids the costly bit-decompositions due to two complex operations comparison and division by two optimization methods, sign-bit grouping and remainder-based verification.

And by stranded encoding, ZEN encodes several low-precision unsigned integers in quantized neural networks as finite field elements on an elliptic curve, reducing the number of constraints compared to the previous one-to-one encoding approach. Based on these optimization methods, ZEN can reduce the R1CS constraint by 5.43∼22.19× compared with a general integer-arithmeticonly neural network baseline [64] when generating verifiable neural network inferences.

Kang et al. [70] tailored the ImageNet-scale model MobileNet v2[71] with the zero-knowledge proof scheme halo2 [72] to obtain an ImageNet-scale zk-SNARK circuit. Verifying the division operation in the circuit is expensive, to solve this problem, two optimizations are applied to the Plonkish arithmetization [73] used by halo2 [72]. In linear layers (convolutional layers, residual connection layers, fully connected layers), two custom gates are designed to represent division. And as for the non-linear layers (ReLU, softmax), the lookup argument is applied to reduce the representation cost of division. Privacy for the input and weights of the model is achieved through hash commitments. In addition, three protocols based on this scheme are proposed for verifying model accuracy, prediction results, and searching itemspredicate matches, respectively. Compared with other schemes (including ZEN, vCNN, pvCNN, zkCNN), this scheme improves the proving time on MobileNet by at least ten times.

The three aforementioned solutions start from specific zero-knowledge proof systems and delve into the crucial issues of transforming machine learning into these particular zero-knowledge proof systems. This significantly enhances the usability of these zero-knowledge proof systems in the context of machine learning verification.

There is another issue lies in the transformation process of a specific and widely used machine learning model, neural network. In neural networks, whether matrix multiplication or convolution operations, these can be represented by breaking down into simple additions and multiplications. However, activation functions pose a special challenge. Most activation functions are nonlinear, and this nonlinearity cannot be directly represented by arithmetic circuits, making the proof of neural network computations difficult. Handling the activation function problem in neural networks is relatively straightforward, with three main approaches. One approach is to use a simple activation function, such as the quadratic activation function y = x 2 used in SafetyNets. This directly avoids complex computations but may impact the network’s performance to some extent due to poor activation functions. The second approach is to approximate complex functions with polynomials.

For example, VeriML uses the Remez approximation for the sigmoid function and employs quadratic functions to approximate the ReLU function. zkMLaaS utilizes the method proposed by Sav et al. [74] to minimize the least squares error when approximating activation functions. However, this method can also affect model performance due to the precision loss from approximation. The third approach involves directly representing activation functions through more complex operations. For instance, zkCNN employs a bit decomposition method to calculate the ReLU function directly in a piecewise manner. Fan constructs an operational matrix, where the input of the ReLU activation layer and the operation matrix undergo a Hadamard product to obtain the output of the ReLU layer. This method minimally affects model performance but introduces additional computational and design overhead.

To provide a more intuitive illustration of the three methods for handling activation functions, we conducted specific experiments to demonstrate the impact of different processing techniques on the model accuracy and circuit complexity. First, we consider the differences in approximation accuracy among various approximation methods. Next, we examine the impact of different approximation methods on model accuracy. Finally, we assess the additional costs introduced by different approximation methods during the proof process. We selected a neural network (NN) using the sigmoid activation function and a convolutional neural network (CNN) using the ReLU activation function for our study. Table III provides an overview of the performance of these two activation functions under different processing methods.

From the experimental results, it can be observed that while the bit decomposition method can directly represent the ReLU activation function in arithmetic circuits, thereby avoiding precision loss caused by the activation function, it significantly increases the number of constraints associated with this function due to the construction of logical comparison. This increase in constraints leads to a higher circuit complexity. For many other activation functions, such as sigmoid, softmax, tanh, and others, which involve complex exponential and trigonometric operations that cannot be straightforwardly represented through addition and multiplication, bit decomposition is not a widely applicable representation method. In contrast, approximating the target activation function using polynomials demonstrates greater versatility. The results show that higher polynomial orders result in smaller precision losses but also entail more constraints. In this experiment, the chosen task and network architecture are relatively simple, resulting in minor precision loss due to approximation.

However, in larger tasks and models, the precision loss associated with this approximation method can increase significantly, posing the challenge of balancing precision loss with circuit complexity. Furthermore, when approximating activation functions, the specific shape and characteristics of the function should be taken into account. For example, when using a seconddegree polynomial to approximate the S-shaped curve sigmoid function, a first-degree polynomial is obtained, leading to a significant drop in model accuracy because a straight line is used in the activation function. In addition, the experiment compared various approximation methods, which exhibit taskand model-dependent performance without a clear superior approximation method. Finally, we tested the performance of the quadratic activation function. Although it performed well in the experimental task with relatively low circuit complexity, its adaptability as a non-mainstream activation function in different models and tasks needs to be considered.

*2) Improving the efficiency of zero-knowledge proof:* This type of work performs optimizations on specific algorithms or models based on the properties of the equation relationships or algorithmic structures involved. There are three common ideas for optimization under this type of work. One approach is to modify the proving format or to utilize new representation methods within existing zero-knowledge proof schemes, thereby reducing the cost of proving algorithms involving specified computations. Another is to achieve a better efficiency at the cost of a slight equivalence through certain approximate expressions. The other is to change the structures of algorithm or arithmetic circuit, thus reducing the cost by aggregating and disaggregating.

Optimization **through optimized proof of equation** targets to changing the constraints or representations of specific computation in a zero-knowledge proof schemes, thus most of the computational costs in zero-knowledge proofs can be significantly reduced. For specific computations, there are two main strategies to reduce the proof burden and enhance proof efficiency. On one hand, existing optimization techniques can be employed to optimize the computation, reducing its computational complexity. Proof can then be provided for the optimized computation process. On the other hand, the alignment of the computation process with the zero-knowledge proof scheme can be considered. By designing an appropriate constraint and circuit representation, the circuit size corresponding to the computation process can be reduced, subsequently lowering the proof overhead. Additionally, modifications can be made to existing interactive proof schemes by incorporating cryptographic techniques to imbue them with

zero-knowledge properties.

Ghodsi et al. raised SafetyNets [75], an interactive proof protocol for verifiable execution of a class of neural networks. It is worth mentioning that SafetyNets is the first scheme on verifiable machine learning. Although the concept of zero knowledge is not involved in this scheme, this is because in the model assumptions both the inputs and the model are given by the verifier, so there is no privacy issue. In SafetyNets, efficient verification of neural network computation is achieved by designing GKR protocols based on the equation relations of matrix multiplication. But unfortunately, for activation functions and pooling layers, SafetyNets can only support specific quadratic activation functions and sum pooling, making it relatively impractical. However, it still provides a technical route to efficient verification by constructing GKR protocol for specific computations.

Collantes et al. proposed SafeTPU [76], a verifiable and Trojan resilient hardware accelerator for deep neural networks. SafeTPU bases on the protocol of SafetyNets. By enhancing the original protocol and using an efficient, parallelized hardware implementation, SafeTPU triples the efficiency of the original protocol. For the sparse matrices involved in convolution operations, SafeTPU enhances the original protocol by using more efficient sparse matrix-vector multiplication. By reusing the hardware architecture of convolution, SafeTPU replaces the original software implementation with a hardware implementation that can efficiently perform matrix-vector multiplication in deeper parallels while saving resource area. These two make SafeTPU triple faster compared to the original baseline protocol.

Liu et al. proposed zkCNN [77], a zero knowledge proof scheme for convolutional neural networks based on GKR protocol. It is worth mentioning that the building block GKR protocol it uses is not zero-knowledge per se, but can be converted to zero-knowledge by a zero-knowledge polynomial commitment [44]. zkCNN improves the efficiency of zeroknowledge proof schemes by optimizing the proof cost of convolutional computation in convolutional neural network models. Liu proposed a new GKR protocol for checking the computation of the fast fourier transform (FFT). And based on the properties of the FFT, a protocol for two-dimensional convolution is constructed by using the above protocol, whose additional prover time is surprisingly O(n 2 ), even faster than computing the convolution. Lastly, they made the protocol non-interactive by the Fiat-Shamir transformation. In addition, to improve the performance of GKR on CNN, they also proposed several improvements and generalizations. As the performance, compared to vCNN [78] and ZEN [69], zkCNN is 11.2× faster and 213× faster on LeNet, respectively.

Fan et al. [79] also focus on convolutional computation, converting the computations therein into a simple arithmetic expression in matrix form. For the convolution layer, the 3D convolution is represented using a 2D matrix by the im2col method, which in turn transforms the convolution calculation into the equivalent matrix multiplication. The pooling layer also uses the im2col method, which reduces the 3D data to a 2D representation. The activation functions ReLU and Softmax are also expressed in the form of matrix multiplication. Where ReLU is represented as the input matrix multiplied by a matrix with elements 0 or 1, and Softmax is represented as the output matrix multiplied by a vector of summations of exponents. Further, all the matrix computation in the convolutional neural network can be optimized by the Freivalds’s algorithm [80], which greatly increases the efficiency of setup and proof generation.

Lee et al. introduced vCNN [78], a framework for verifiable convolutional neural network based on zero-knowledge succinct non-interactive arguments of knowledge (zk-SNARKs). Lee extends the original zk-SNARKs scheme based on quadratic arithmetic programs (QAPs) to a zk-SNARKs scheme based on quadratic polynomial program (QPP) based on the properties of convolutional computation. The way to express vectors by polynomials makes it more adaptable and efficient for proofs of convolutional computations. For the pooling and activation layers, vCNN still retains the QAP-based zk-SNARK. And the proofs about the continuity between the two proofs are generated by the commit and prove SNARK (CP-SNARK) to connect the adjacent layers. In this way, the QAP-based and QPP-based zk-SNARKs proves the correctness of the intra-layer computation, while the CPSNARK proves the continuity of the computation of each layer, and finally the vCNN proves the correctness of the entire convolutional neural network computation. Theoretically, vCNN has a certain improvement in proving time compared to schemes such as SafetyNets [75], VeriML [81], and Embedded proof [82]. Meanwhile, experimental results show that vCNN is 18000 times more efficient on the VGG16 model compared to the original zero-knowledge proof scheme Groth16 [83], the latter of which takes more than a decade to generate a proof.

Inspired by vCNN, Weng et al. further proposed pvCNN [84], also a framework for verifiable convolutional neural network. The main innovation of this paper is the circuit representation method, which proposes a zk-SNARKs scheme based on quadratic matrix program based on the QPPbased method proposed by vCNN. By further expanding the representation capability of the wire in the circuit from array to matrix, pvCNN reduces the size of the circuit by reducing the number of multiplication gates in convolutional operation, thereby improving efficiency. In addition, since the neural network is layered, multiple proofs for different inputs of the same CNN layer can be aggregated into one proof with SnarkPack [85]. To ensure the confidentiality of the model and data, the authors also propose a fully homomorphic encryption strategy based on splitting the model. By splitting the model into a private part running locally at the model developer and a public part running on the server, the computational overhead of fully homomorphic encryption and privacy requirements are balanced. In terms of performance, the scheme is compared theoretically with SafetyNets, zkCNNs, vCNNs, etc., and significantly outperforms the above schemes in terms of proving time. And the experimental results also show that QMP-based zk-SNARKs has higher efficiency than the QAP-based for convolution operations.

Zhang et al. [86] proposed a verifiable zero-knowledge proof scheme for decision tree prediction and accuracy. For a decision tree model, to verify the output, a prior commitment to the decision tree by the prover is required, and then the prover proves the validity of the prediction path to the verifier. Whereas converting each comparison on the prediction path into an arithmetic circuit is very expensive, to improve efficiency, the authors reduce the generation cost of proofs by inserting designed sibling nodes on the prediction paths. And the proof is generated under Aurora [87], a kind of zero knowledge succinct non-interactive argument (SNARG).

Singh et al. [37] presented a zero-knowledge verifiable scheme for decentralized artificial intelligence (AI) pipelines, containing a privacy-preserving verification scheme for decision tree inference. The distributed AI pipeline assigns the different steps of data collating, model training, and using the model to make predictions to independent actors such as data owner, model owner, and model consumer. Compared with ZKDT, this scheme avoids costly hashing operations by changing the way of representing and committing to the decision tree. And it further reduces the number of multiplication gates in the arithmetic circuit by improving the access method in the arithmetic circuit to reduce the access cost of different operations in the prediction path verification. For the decision tree inference task, the complexity of the circuit generated by this scheme is ten times better than that of zkDT.

Optimization through approximate description of computation targets to replacing the original equivalence relation with an approximate equivalence relation, a higher efficiency gain can be obtained at the cost of some degree of accuracy.

Burkhalter et al. presented RoFL [88], a scheme focusing on verifiable robustness of federated learning. Unlike the previous scheme, the proof part in this scheme does not require strict equivalence relations, but rather constrains the output by using range proofs of the norm. To guarantee the confidentiality of the local model during the aggregation process, RoFL uses a cryptographic algorithm based on the Pedersen promise. For the encrypted local model, RoFL bounds it by L − ∞ norm and L − 2 norm, the boundaries are given by the server in advance. And the proof is generated under Bulletproof [89], a zero-knowledge proof system oriented to range proofs. It is worth mentioning that RoFL is more concerned with the robustness of the system rather than the verifiability of the training process, which is slightly different from the previously described scheme, since RoFL only requires that the training results are reasonable within the norm constraints.

Ruckel et al. [90] presented a scheme for zero-knowledge verification of multiple linear regression. The proving idea is similar to that of RoFL, also by performing range proofs. The difference is that the scheme improves efficiency by using an approximate solution that is computationally less expensive and proves the proximity of that approximate solution to the true solution. For the computation of the inverse matrix in the model update, the authors ensure the correctness of the computation by proving the proximity of the computed inverse matrix to the true inverse matrix through a series of range constraints on the norm. Besides, differential privacy techniques are also applied to protect the updated local weights in order to achieve stronger privacy. Unfortunately, although the scheme is relatively complete in terms of process, the limitations of the optimization method lead to the narrow application scenarios.

Angel et al. introduced Otti [91], a compiler for zkSNARKs that focuses on optimization problems including linear programming (LP), semi-definite programming (SDP), and a broad class of stochastic gradient descent (SGD) instances, which are often used in the training of neural networks. Otti can compile programs written in a subset of C that describe optimization problems into rank-1 constraint satisfiability (R1CS). Otti’s idea is to avoid proving the solving process by proving the optimality of the solution, constructing a non-deterministic checker from the certificate of optimality, and then compiling this checker into R1CS. For the LP and SDP problems, Otti proves optimality by using the properties of the primal and dual solutions to the optimization problem. For the SGD problem, Otti proves optimality by showing that the gradient at the solution has certain properties. With the Spartan proof system, Otti can prove the optimality of the solution with zero knowledge in 100ms, which is four orders of magnitude faster than existing methods.

Optimization through changes of algorithm structure targets to dividing the structure of the algorithm and applying appropriate proof strategies, the proof efficiency can be proportionally improved. However, this approach is primarily based on optimizations related to repetitive structures, such as verifying a subset of rounds from the training process to reduce the number of proof generations. Therefore, it is only applicable to the verification of the training process.

Keuffer et al. [82] proposed a hybrid embedded proof scheme for verifiable computation combining GKR protocol and zk-SNARKs. The goal is to achieve a balance between efficiency and usability by first processing individual functions with efficient verifiable computation schemes (EVCs, such as the GKR protocol, which are more efficient but can only handle relatively simple computations), and then processing sequences of functions with general purpose verifiable computation schemes (GVCs, such as zk-SNARKs, which are less efficient but can theoretically handle most computations).

A neural network can be thought of as consisting of several functions, where the correctness of the computation of each function is guaranteed by the GKR protocol, which means that several proofs of the GKR protocol are generated. And zk-SNARKs not only prove the continuity of inputs and outputs between functions, but also verify the proofs generated by each function. Eventually, it generates a total proof. By verifying the total proof, the verifier can know whether all the specific computations proved by GKR protocol pass or not. Experimentally, it is shown that the embedded proof scheme has twice as good proving time on two-layer neural networks compared to the scheme using only zk-SNARKs. However, although the scheme claims that it protects the privacy of the provers’ inputs, this is irrelevant in the case where the functions and inputs are known to the verifier.

Zhao et al. brought VeriML [81], a framework for integrity and fairness in outsourced machine learning. Since several iterations of the same process are performed during training, VeriML chooses to use several iteration rounds of the training process as a challenge for verification, thus reducing the cost of proving. By storing the input and output of some iterations in advance during the training process and committing to them, the provers can retrieve to the specified iterations and generate proofs of their computational processes as requested by the verifier. VeriML supports a total of six machine learning models, including linear regression, support vector machines, and neural networks. For each kind of model, VeriML also proposes some small optimizations for improving the proving efficiency. In addition, VeriML uses an on-chain protocol to protect the confidentiality of the trained models for fair trading of the models.

Zhang et al. introduced Hydra [92], a verifiable computation system for neural networks based on GKR protocol. It shares the same assumptions as those used by SafetyNets in that zeroknowledge is not considered because the verifier has all the knowledge. Unlike SafetyNets, Hydra does not design GKR protocols for specific computational operations, but rather by using GKR protocols in subcircuits that are split from the original circuit. With the polynomial commitment scheme, Hydra makes the proof process much more concise. Meanwhile, since the subcircuits are independent of each other, generating and verifying them in parallel makes the efficiency multiply. In addition, a bottom-up quantization algorithm is proposed to reduce the impact of integerization on accuracy. Compared to SafetyNets, the Hydra protocol is four times more efficient on neural networks.

Huang et al. proposed zkMLaaS [93], a verifiable scheme for machine learning as a service (MLaaS), which focuses on handling volume issue of input data with the random sampling idea. Proof costs are proportionally reduced by randomly selecting and challenging committed epochs and iterations, which is similar to VeriML. As for the convolution operation in CNNs, the optimization idea is similar to Fan et al. The im2col algorithm [94] is applied to convert the convolution into matrix multiplication and the Freivalds’ algorithm [95] is further utilized to reduce the overhead of matrix multiplication. Compared to simply using zk-SNARKs directly, zkMLaaS saves approximately 273 times the proof overhead.

*B. Functionality and Performance Analysis*

The efficiency and functionality analysis of the existing schemes are listed in Table IV and V.

*1) Functionality Analysis:* Here we focus on analyzing how each property is achieved.

For **Verifiability**, most schemes fulfill this by generating the proof of the computational process. However, certain schemes (RoFL, Ruckel, Otti) that express the optimization efficiency by approximation can only indicate the validity of the result and do not fully prove that the result was obtained by some defined computational process, which somehow compromises the verifiability. In addition, some schemes (VeriML, zkMLaaS) that perform verification by sampling proofs of partial rounds also cannot fully guarantee the correctness of the learning process. If the prover overstates the training workload by 5% (or poisons the training process with 5% ratio, which is the common ratio for poisoning attacks) and forges these proofs, then the verifier only needs to select 60 proofs to have an 95% probability of detecting the mistake. However, if the prover directly edit the weight in one of hundreds of training rounds, it is difficult to detect such malicious behavior with this sampling verification approach. In general, the approach to achieving verifiability involves translating the proof of the training process into a paradigm of an existing proof system or making modifications to an existing proof system to efficiently represent the training process. Subsequently, the corresponding zero-knowledge proof system is used to prove the problem.

For **Zero-Knowledge**, most schemes satisfy this by generating proofs using existing zero-knowledge proof protocol. Zero-knowledge proofs often rely on certain one-way hard problems, such as the discrete logarithm problem, integer factorization problem, and others. These one-way computations ensure that verifiers can easily verify results but find it difficult to compute inputs reversely from the proofs. However, additional zero-knowledge property also adds extra computational overhead to the proof system, some schemes (e.g., SafetyNets, VeriML) directly use proof protocols that are not zero-knowledge such as sumcheck and SNARK. This is due to the fact that the scenarios targeted by this class of schemes do not need to consider zero-knowledge, i.e., the verifier has knowledge of the input dataset and model weights. These schemes have a higher execution efficiency compared to zeroknowledge schemes, but also lead to their limited application scenarios. It is worth mentioning that even for the case where the verifier has the knowledge of the training dataset and the model weights, some schemes (Keuffer, zkMLaaS) could still provide protection for auxiliary inputs provided by the prover, such as hyperparameters, by using a zero-knowledge proof protocol.

For **Auditability**, most schemes achieve this by using some sort of commitment scheme, including those existing (e.g., zero-knowledge polynomial commitment, Pedersen commitment) and self-constructed (e.g., using Posidon hash, pseudorandom function) commitment schemes. By committing to the private input and proving that the data corresponding to the commitment is the same as the one used in the zero-knowledge proof, the validity of the input can be proved, which could

be formalized as ”Commit-and-Prove”[96]. However, some schemes (SafetyNets, SafeTPU, VeriML, Hydra) assume that the verifier has the knowledge of the inputs, so this does not involve the auditability. Besides, those schemes (RoFL, Otti) focus on the validity of the results also ignore the auditability of the inputs.

*2) Effciency Analysis:* We focus here on analyzing those specialized schemes that improve the proof efficiency of a particular computation. We discuss them separately according to the type of computation: matrix multiplication computation, convolution computation, and decision tree computation.

For **matrix multiplication**, we consider C = A · B, where A,B,C are all matrices of size n × n in field Fq. we consider a 2-D multiplication between two matrices A and B both of size n × n as a n × n matrix C = A × B. The complexity of input and output is O(n 2 ) and the computational complexity is O(n 3 ).

SafetyNets designs a dedicated sumcheck protocol for verifying matrix multiplication. By randomly selecting a point Ci,j in the matrix C and representing its computation process in the form of sumcheck protocol (i.e. Ci,j = Σk∈(0,n)A[i, k]B[k, j]), a protocol with computational complexity O(n 2 ) for both the prover and verifier can be obtained, while 4 log(n) elements need to be exchanged in the process.

Several schemes like Mystique, Fan, zkMLaaS reduce the computational complexity of the matrix multiplication check through a reduction strategy following the Freivalds algorithm [80]. Rather than check the equation A×B = C directly, a random challenge vector t ∈ F n q is introduced. The original equation can be checked by multiplying a challenge vector as A × (B · t) = C · t, with an error probability of 1/q and computational complexity of O(n 2 ).

For **matrix convolution**, we consider a 2-D convolution between two matrices X and W of size n × n and w × w (n ≫ w) as a (n − w + 1) × (n − w + 1) matrix U = X ∗ W. The complexity of input and output is O(n 2 + w 2 ) = O(n 2 ) and the computational complexity is O(n 2w 2 ).

zkCNN designs a sumcheck protocol for verifying the Fast Fourier Transform (FFT) (actually a matrix-vector multiplication of size n), reducing the original prove complexity from O(n 2 ) to O(n). Further, by verifying the original convolution in the form of U = X ∗ W = IF F T(F F T(X) ⊙ F F T(W)) with sumcheck protocol for FFT and Inverse FFT (IFFT), the overall prover time can be reduced to O(n 2 ), with O(log2 n) proof size and verifier time.

The vCNN framework introduces a novel Quadratic Polynomial Program (QPP) relation, which has been optimizedfor convolution. This innovative approach involves assigning a polynomial value to each wire, allowing for the expression of 1-D convolution computation using a single multi-gate in the arithmetic circuit. As a result, the number of multi-gates required for proving convolution is reduced from O(n 2w 2 ) to O(n 2 + w 2 ) = O(n 2 ). Building on this, vCNN proposes a QPP-based zk-SNARK that offers a prover and verifier complexity of O(n 2 + w 2 ) = O(n 2 ), along with a constant proof size.

Similar to the idea of vCNN, pvCNN proposes Quadratic Matrix Program (QMP) relation, assigning a matrix value to each wire. For batch convolution operations involving M matrices W and Mn2 matrices X, pvCNN transforms them into a matrix multiplication of size Mn2 . This conversion is achieved through a transformation technique, resulting in a proof complexity of O(M2n 4 ) with a QMP-based zkSNARK. In contrast, under similar conditions, vCNN has a proof complexity of O(M2n 2 (m2 + n 2 )).

For **decision tree,** we consider a binary decision tree T with height h, which has N nodes and an attribute set size of d. To prove the inference of a decision tree, it is necessary to demonstrate the existence of a valid prediction path within this decision tree. For a decision tree with a height of h and nodes containing d attributes, proving a decision path that includes h calculations has a complexity of O(d) per comparison. Therefore, the prover computational complexity is O(hd).

The zkDT achieves a reduction in proof complexity to O(d + h) by constructing an additional input permutation a¯, which is based on the sorting of attributes used in the decision path. This allows the proof of the decision path to be split into two parts: the verification of a¯ along the decision path and the verification of the permutation. Due to the construction of a¯, each node only needs to compare a single attribute. Additionally, validating the permutation relationship between a and a¯ adds only O(d) extra computational complexity. Furthermore, zkDT also provides additional sibling nodes along the prediction path to reduce the complexity of the decision tree verification circuit. Overall, during the commitment phase, the prover needs to perform O(N) hash computations, and the size of the proof circuit is O(d + h). With the Aurora system, the computational complexity of the proof generation is O(|C| log |C|) = O((d + h) log(d + h)) = O(d log d), the verification complexity is O(d + h) = O(d), and the proof size is O(log2 |C|) = O(log2 d).

Singh et al. consider a decision tree as a quintuple composed of the splitting feature V, threshold value T, left and right child node identifiers L, R, and the label C. For a batch of n input samples with d dimensions, this scheme reduces the cost of each sample from O(d log d) to O(d + h) through consistent memory access. Consequently, the size of the entire proof circuit is reduced to O(N + n(d + h + wh)), where w is the bit-width of the feature values. This scheme reduces the cost of commitments by utilizing different representations of decision trees. Then, it further reduces the cost of proofs that would have otherwise increased due to these representations through consistent memory access. As a result, the overall circuit complexity is decreased. In comparison, under the same conditions, the circuit size of zkDT is O(c(H)N +n(dlogh+ hw)), where c(H) represents the size of the hash circuit.

*C. Experimental Analysis*

### V. CHALLENGES AND FUTURE RESEARCH DIRECTIONS

Based on the discussion of existing schemes above, in this section, we will explore some challenges and future research directions regarding the application of zero-knowledge proofs in the field of machine learning. For the ZKP-VML scheme, its main developmental direction revolves around enabling individuals to achieve zeroknowledge proof verification of existing machine learning algorithms with minimal additional costs. This goal encompasses two key aspects. On one hand, the emphasis lies on seamlessly converting a wider variety of algorithmic languages into corresponding arithmetic circuits for proof generation, thus necessitating the construction of a robust compiler. This high generality compiler should empower users to directly transform their existing algorithms into arithmetic circuits without the need for further modifications. On the other hand, the focus is on mitigating the additional costs incurred by zero-knowledge proof generation and verification. This entails devising strategies to reduce the computational overhead associated with generating and verifying zero-knowledge proofs within the system, particularly tailored to diverse models and computational processes. Beyond these two aspects, it is also imperative to contemplate how to integrate the aforementioned two approaches.

Simultaneously, as a consideration towards verifiability, one of the aspects of security in machine learning, exploring the integration of the ZKP-VML scheme with other potential cryptographic techniques constitutes a noteworthy avenue for the advancement of ZKP-VML. This integration aims to provide machine learning with enhanced and comprehensive security safeguards. Finally, in order to apply ZKPVML technology to practical industrial production, it is crucial to address the barriers that hinder its adoption in real-world applications. A. Generalizability A high generality compiler can greatly enhance the usability of the ZKP-VML scheme. As a fundamental function of the compiler, it needs to be capable of translating a wide range of machine learning algorithms into arithmetic circuits for generating zero-knowledge proofs. However, there are challenges to address:

- The operations in zero-knowledge proofs are conducted over a group, requiring the elements to be integers.
- Arithmetic circuits are limited to multiplication and addition operations.

As a result, the compilation of machine learning processes involving complex and decimal computations will inevitably lead to a certain degree of precision loss. This precision loss constitutes a significant challenge in making ZKP-VML more practical. These challenges are not novel and are commonly referred to as the quantization problem in machine learning. Quantization, as a technique to map input values from a large set (often continuous) to a small set (usually discrete), has a long history. Rounding and truncation are typical examples of this process. For the ZKP-VML scheme, what is needed is an integer-arithmetic-only quantization method. Fortunately, there are several well-established solutions for addressing the quantization problem as listed in [97]. However, this is not sufficient for ZKP-VML. For instance, as in the ZEN[69] proposed by Feng et al., it builds upon quantization and achieves R1CS-friendly quantization for QAP-based zeroknowledge proof systems. This approach significantly reduces the number of constraints generated compared to the general integer-arithmetic-only quantization scheme[64] proposed by Jacob et al., thereby reducing the computational burden in the subsequent proof process.

This is just one solution for a class of zero-knowledge proof schemes. Similar approaches can be explored within the framework of other zero-knowledge proof schemes to reduce the number of constraints and alleviate the proof burden when compiling machine learning algorithms. In addition to the research approach mentioned above, which involves transforming the machine learning process into integer numbers, Garg et al. have also proposed another solution to address this issue, namely, a succinct zero knowledge for floating point computations[98].

This provides a new perspective for solving the quantization problem and opens up possibilities for further improving the efficiency of ZKP-VML schemes. Beyond the fundamental functionalities, supporting different zero-knowledge proof systems poses a significant challenge. Existing ZKP-VML schemes largely confine themselves to employing QAP-based zero-knowledge proof systems, which are contingent on substantial trusted setups and carry certain limitations.

Other types of zero-knowledge proof systems exhibit varying efficiency trade-offs and applicability to different scenarios. For instance, recent advancements in schemes like STARK[99] and Plonk[73] have gradually reduced the significance of trusted setups, transitioning to requiring setup only once or even eliminating the need for trusted setup entirely. This development significantly broadens the applicability of ZKP-VML by relaxing constraints on scenarios and assumptions. However, the circuit representation and arithmetization processes vary among different types of zeroknowledge proof schemes.

For instance, the halo2[72] scheme employed by Kang et al.[70] is based on the Plonk scheme’s Plonkish process. The circuit description and constraint checking methods of this scheme differ significantly from those of Groth16. These divergent circuit descriptions and constraint representations introduce complexity to the task of converting machine learning algorithms into arithmetic circuits usable for zero-knowledge proofs. This not only necessitates addressing new challenges but also offers potential new avenues for optimization given the novel constraint formats.

### B. Efficiency

The enhancement of efficiency in ZKP-VML schemes is a pivotal concern in ZKP-VML research. By mitigating the additional computational burden imposed by zero-knowledge proofs on machine learning, the practicality of ZKP-VML schemes can be significantly improved. Elevating scheme efficiency and reducing overhead can primarily be achieved through three avenues:

From the perspective of proof system design, it is possible to tailor proof systems for specific computations in machine learning, thereby diminishing the proof costs for these types of computations.

From a practical computation standpoint, the development of specialized hardware can be explored to alleviate the computational burden of generating zero knowledge proofs.

In the context of overall scheme design, a balance between scheme security and efficiency can be sought, thereby mitigating the overall system costs to a certain extent.

From the perspective of zero-knowledge proof schemes, designing efficient zero-knowledge proof schemes for specific models is a valuable research direction. This approach has been adopted by many existing schemes, wherein specialized zero-knowledge proof schemes are designed for specific computation types or model structures to improve efficiency. However, most existing schemes have mainly focused on traditional linear models, neural network models, and convolutional neural network models. Future research can concentrate on developing dedicated zero-knowledge proof schemes for more models with special properties.

For instance, such schemes could target graph neural networks, specifically handling the graph structure of neural nodes; transformers, dealing with multiple stacked encoders and decoders, and their self-attention mechanisms; recurrent neural networks, addressing the recurrent structures and time-series features. By doing so, ZKP-ML can be more efficiently applied to a broader range of models. In addition, it is possible to construct ZKP-VML schemes by using other types of zero-knowledge proof systems.

Existing ZKP-VML schemes are mostly constructed on QAP-based zero-knowledge proof systems, especially based on Groth16. However, distinct types of zero-knowledge proof systems exhibit diverse cost characteristics, potentially offering performance advantages in various scenarios and computations that QAP-based solutions might lack. From a specific computation perspective, exploring enhanced proof efficiency through lower-level computational logic is a valuable research direction.

On one hand, existing works have attempted to boost the efficiency of proof matrix and vector multiplication by implementing and reusing current protocols via hardware [76]. On the other hand, certain endeavors aim to accelerate ZKP efficiency through hardware design, such as pipeline design to expedite the proof process[100] or utilizing hardware like FPGAs to handle number theoretic transform operations in zero-knowledge proofs[101]. These demonstrate the feasibility of harnessing hardware to enhance the practical operational efficiency of ZKP-VML systems, which is a critical advancement for the real-world application of ZKP-VML.

From the perspective from the entire system, exploring ways to enhance efficiency by sacrificing a small fraction of security to enhance practical usability is also a worthwhile research direction. Some current endeavors address multi-round training processes by selectively verifying only a minimal subset of rounds. This approach covers the vast majority of training steps. For instance, if an adversary attempts poisoning attacks in 20% of the rounds, verifying just 11 rounds can detect such malicious behavior with over 90% probability. However, this method might not identify adversaries engaging in malicious behavior in only a few rounds. Similar attempts involve weakening the equality constraints in the verification process through range proofs to boost efficiency.

As machine learning models and datasets grow in size and complexity, the computational and communication overhead of zero-knowledge proofs can become a bottleneck. Hence, achieving scalability is pivotal for practical adoption in realworld scenarios. Scalability denotes the system’s capacity to efficiently and effectively handle extensive-scale machine learning tasks. Determining the ways to reduce proof costs and striking the right balance between security and efficiency constitute significant challenges in the journey of ZKP-VML toward practical applicability. C. Properties In addition, in future research, more complex scenarios can be considered to make ZKP-VML schemes possess more specific properties and greater application value. By integrating cryptographic techniques related to data privacy protection, ZKP-VML can further enhance its capability to safeguard data privacy. For instance, by combining differential privacy and homomorphic encryption techniques, ZKP-VML can achieve stronger privacy protection properties.

On the one hand, differential privacy can enhance the protection of local data or model privacy by ensuring that participants’ individual characteristics are obfuscated. On the other hand, through homomorphic encryption, verifiers can protect computational tasks by enabling provers to perform tasks using homomorphic computations, without revealing the plaintext content of their computations, which safeguards the privacy of the verifier’s tasks. The integration of ZKP-VML with federated learning is also a hot research topic. For different architectures of federated learning, ZKP-VML can provide assurance of training correctness without compromising the data privacy of the training participants, which may be challenging for other technologies to achieve. It is essential to consider the variations in proof process and data privacy resulting from different aggregation strategies and sample alignment methods under diverse architectures. Moreover, this process can be combined with blockchain technology to offer stronger fairness guarantees through consensus protocols and smart contracts. Considering the communication burden in federated learning, reducing the additional communication overhead caused by the proof may be also a crucial challenge.

Meanwhile, in more diverse scenarios and with the application of advanced technologies, it becomes crucial to consider how newly introduced features might pose potential threats to the integrity or privacy of the original ZKP-VML scheme. As the system becomes more complex and incorporates additional functionalities, it is essential to conduct thorough security assessments to ensure that the integrity and privacy guarantees of the ZKP-VML scheme remain robust and uncompromised in a dynamic and evolving environment. This includes evaluating the impact of new features on data privacy, ensuring data integrity, and identifying potential vulnerabilities that could be exploited by malicious actors.

### D. Practical

Furthermore, beyond scheme design, the practical application of ZKP-VML is a crucial consideration. Currently, ZKP-VML is in the development stage, and its real-world deployment is limited, with most implementations at the demo stage to showcase the performance of the schemes. In implementing ZKP-VML, it is essential to consider its interoperability with existing mainstream solutions, allowing developers to integrate privacy protection features into their applications without requiring significant modifications to their existing workflows. Factors such as the ease of integration, performance impact, and compatibility with different hardware and software environments should also be taken into account. This approach fosters broader adoption of zero-knowledge proofs and paves the way for their practical application across various fields and industries.

Additionally, standardization and community collaboration play a crucial role in promoting knowledge sharing, best practices, and innovation in the ZKP-ML domain. Collaborations among academia, industry, and open-source communities can facilitate the development of standardized protocols, efficient libraries, and benchmarking tools. Such collaborations create a robust support network for researchers and practitioners, enabling them to collectively tackle challenges and drive advancements in zero-knowledge-proof technology. By encouraging open communication and fostering community driven initiatives, standardization efforts can accelerate the adoption of ZKP-ML and propel it to become a reliable and widely accepted privacy-preserving tool in machine learning applications.