This paper is available on arxiv under CC BY 4.0 DEED license.
Authors:
(1) Ehsan Toreini, University of Surrey, UK;
(2) Maryam Mehrnezhad, Royal Holloway University of London;
(3) Aad Van Moorsel, Birmingham University.
Table of Links
Implementation and Performance Analysis
2 Background and Related Work
One of the benefits of auditing ML-based products relates to trust. Trust and trustworthiness (in socio-technical terms) are complicated matters. Toreini et. al [32] proposed a framework for trustworthiness technologies in AI–solutions based on existing social frameworks on trust (i.e. demonstration of Ability, Benevolence and Integrity, a.k.a. ABI and ABI+ frameworks) and technological trustworthiness [30]. They comprehensively reviewed the policy documents on regulating AI and the existing technical literature and derived any ML–based
Table 1: Features of FaaS and comparison with other privacy–oriented fair ML proposals (support: full: ✓, partial: ✙, none: ✗)
solution needs to demonstrate fairness, explainability, auditability, and safety and security to establish social trust. When using AI solutions, one cannot be assured of the fairness of such systems without trusting the reputation of the technology provider (e.g., datasets and ML models). It is commonly believed that leading tech companies do not make mistake in their implementation [8]; however, in practice, we often witness that such products indeed suffer from bias in ML [28, 23].
2.1 Fairness Metrics
There exist several fairness definitions in the literature. Designing a fair algorithm requires measuring and assessment of fairness. Researchers have worked on formalising fairness for a long time. Narayanan [24] lists at least 21 different fairness definitions in the literature and this number is growing, e.g., [5, 6].
Fairness is typically expressed as discrimination in relation to data features. These features for which discrimination may happen are known as Protected Attributes (PAs) or sensitive attributes. These include, but are not limited to, ethnicity, gender, age, scholarity, nationality, religion and socio-economic group.
The majority of fairness definitions expresses fairness in relation to PAs. In this paper, we consider Group Fairness, which refers to a family of definitions, all of which consider the performance of a model on the population groups level. The fairness definitions in this group are focused on keeping decisions consistent across groups and are relevant to both disparate treatment and disparate impact notions, as defined in [9, 15].
In this paper, we will focus on the computations based on the above three fairness metrics. For this computation, the auditor requires to have access to the three pieces of information for each elements in the dataset: (1) the sensitive group membership (binary value for A demonstrating if a sample does or does not belong to a group with PAs) (2) the actual labelling of the sample (binary value for Y ) (3) the predicted label of the sample (binary value for Yˆ ). The ML system transfers this information for each sample from their test set. Then, the auditor uses this information to compute the above fairness metrics.
Note that while we consider the above metrics for our protocol and proof-ofconcept implementation in next sections, our core architecture is independent of metrics, and the metric set can be replaced by other metrics too (Fig. 1).
2.2 Auditing ML Models for Fairness
The existing research in fair ML normally assumes the computation of the fairness metric to be done locally by the ML system, with full access to the data, including the private attributes [15, 6, 5]. However, there is a lack of verifiability and independence in these approaches which will not necessarily lead to trustworthiness. To increase trust in the ML products, the providers might make the trained model self–explaining (aka transparent or explainable). There is also the transparent–by–design approach [12, 2, 34]. While this approach has its benefits, it is both model–specific and scenario–specific [25]; thus it cannot be generalised. There is also no trusted authority to verify such claims and explanations. Moreover, in reality, the trained model, datasets and feature extraction mechanisms are company assets. Once exposed, it can make them vulnerable to the competitors. Another approach to provide transparency to the fairness implementation comes through the black–box auditing, also known as adhoc [12, 22, 26]. In this way, the model is trained and audited for different purposes [1]. This solution is similar to tax auditing and financial ledgers where accountants verify and ensure these calculations are legitimate.However, unlike the well–established body of certifications and qualifications for accountants in tax auditing and financial ledgers; there does not exist any established processes and resources for fairness computation in AI and ML.
The concept of a service that calculates fairness has been proposed before, e.g., in [33]. The authors introduced an architecture to delegate the computation of fairness to a trusted third party that acts as a guarantor of its algorithmic fairness. In this model, the fairness service is trusted both by the ML system and the other stakeholders (e.g. users and activists). In particular, the ML system must trust the service to maintain the privacy of data and secrecy of its model, whilst revealing to the trusted third party the algorithm outcome, sensitive input data and even inner parameters of the model. This is a big assumption to trust that the third party would not misuse the information and hence the leakage of data and model information is not a threat.
To address these limitations, Kilbertdus et al. [19] proposed a system known as ‘blind justice’, which utilises multi–party computation protocols to enforce fairness into the ML model. Their proposal considers three groups of participants: User (data owner), Model (ML model owner) and the Regulator (that enforces a fairness metric). These three groups collaborate with each in order to train a fair ML model using a federated learning approach [35]. The outcome is a fair model that is trained with the participation of these three groups in a privacy-preserving way. They only provide a limited degree of verifiability in which the trained model is cryptographically certified after training and each of the participants can make sure if the algorithm has not been modified. It should be noted that since they operate in the training stage of the ML pipeline, their approach is highly dependent on the implementation details of the ML model itself. Jagielski et al. [17] proposed a differential privacy approach in order to train a fair model. Similarly, Hu et al. [16] used a distributed approach to fair learning with only demographic information. Segal et al. [29] used similar cryptographic primitives but took a more holistic approach towards the computation and verification of fairness. They proposed a data-centric approach in which the verifier challenges a trained model via an encrypted and digitally certified dataset using merkle tree and other cryptographic primitives. Furthermore, the regulator will certify the model is fair based on the data received from the clients and a set of dataset provided to the model. Their approach does not provide universal verifiability as the regulator is the only party involved in the computation of fairness. More recently, Park et al. [27] proposed a Trusted Execution Environment (TEE) for the secure computation of fairness. Their proposal requires special hardware components which are cryptographically secure and provide enough guarantees and verification for the correct execution of the code.
The previous research generally has integrated fairness into their ML algorithms; therefore, such algorithms should be redesigned to use another fairness metric set. As it can be seen in Table 1, FaaS is the only work which is independent of the ML model and fairness metric with universal verifiability, and hence, can be used as a service.