On the Interpretability of Part-Prototype Based Classifiers: A Human Centric Analysis

cover
17 May 2024

Authors:

(1) Omid Davoodi, Carleton University, School of Computer Science;

(2) Shayan Mohammadizadehsamakosh, Sharif University of Technology, Department of Computer Engineering;

(3) Majid Komeili, Carleton University, School of Computer Science.

Abstract and Intro

Background Information

Methodology

Prototype Interpretability

Prototype-query Similarity

Interpretability of the Decision-Making Process

The Effects of Low Prototype Counts

Discussions

ABSTRACT

Part-prototype networks have recently become methods of interest as an interpretable alternative to many of the current black-box image classifiers. However, the interpretability of these methods from the perspective of human users has not been sufficiently explored. In this work, we have devised a framework for evaluating the interpretability of part-prototype-based models from a human perspective. The proposed framework consists of three actionable metrics and experiments. To demonstrate the usefulness of our framework, we performed an extensive set of experiments using Amazon Mechanical Turk. They not only show the capability of our framework in assessing the interpretability of various part-prototype-based models, but they also are, to the best of our knowledge, the most comprehensive work on evaluating such methods in a unified framework.

Introduction

As Artificial Intelligence and Machine Learning have become more ubiquitous in many parts of the society and economy, the need for transparency, fairness, and trust increases. Many of the state-of-the-art methods and algorithms are black boxes where the decision-making process is opaque to humans. Interpretable and Explainable Artificial Intelligence aims to address this issue by offering methods that either explain the decisions of black-box models or are inherently interpretable themselves.

Figure 1. Example of the decision-making process of a part-prototype method.

Prototype-based classifiers are a category of inherently interpretable methods that use prototypical examples to make their decisions. It is assumed that as long as the prototypes themselves are understandable by a human, the decision itself is interpretable[1]. Prototype-based classifiers are not new inventions. Many existed long before the need for interpretability became so urgent[2–6]. In recent years, newer methods have been proposed that combine the power and expressablility of neural networks with the decision-making process of a prototype based classifier to create prototypical neural nets[7], [8], reaching results competitive with the state of the art while being inherently interpretable in the process.

A newer subcategory of prototype-based classifiers is part-prototype networks. These networks, usually operating in the domain of image classification use regions of a query sample, as opposed to the entire query image, to make their decisions. ProtoPNet[9] is the first of such methods that offered fine-grained explanations for image classification while offering state-of-the-art accuracy. Figure 1 shows an example of how a part-prototype method makes its decisions.

Figure 2. Examples of interpretability problems with prototypes. a) The prototype itself is not interpretable because it ispointing to an irrelevant background region. b) lack of similarity between a prototype and the corresponding region in the query sample.

The explanations given by these methods can be very different from each other. Even when the general layout of the explanation is similar, the part-prototypes themselves can be vastly different. It is unusual to assume that they offer the same level of interpretability. Therefore, the evaluation of their interpretability is necessary.

While many of these methods evaluate the performance of their models and compare them to the state of the art, few analyze the interpretability of their methods. Most of the analysis in this regard seems to be focused on automatic metrics for assessing interpretability[10]. Such automatic metrics, while useful, are not a replacement for human evaluation of interpretability. Others have worked on human-assisted debugging[11] but have not extended that to a full evaluation of method interpretability.

Kim et al. offered a method for evaluating visual concepts by humans and even performed experiments on ProtoPNet and ProtoTree[12], but their evaluation suffers from a number of issues. The scale of the experiments in Kim et al. is small, with only two part-prototype methods evaluated using only a single dataset. The experimental design of that work also relies on fine-grained ratings by human annotators. This type of design can be an unreliable way of measuring human opinion when there is no consensus on what each option means[13]. It used the class label to measure the quality of the prototypes in the CUB dataset even though there was no indication that the human users were familiar with the minutiae of the distinctions between 200 classes of birds. Lastly, it used the default rectangular representation of prototypes from ProtoPNet and ProtoTree. These representations are prone to being overly broad and misleading to the human user compared to the actual activation heatmap. As a result, we propose a human-centric analysis consisting of a set of experiments to assess the interpretability of part-prototype methods.

Goals

The interpretability of a part-prototype system is not a well-defined concept. In this work, we focus on three properties that such systems should have in order to be interpretable.

• Interpretability of the prototype itself: The concept a prototype is referring to should be recognizable and understandable to a human. Figure 2 (a) shows an example of a prototype that is not interpretable because it points to an irrelevant background region. Machine learning methods and neural networks, in particular, can make correct decisions based on feature combinations in the data that a human might not understand. In addition, the presentation of such features is very important. A prototype might refer to a very unusual concept but its presentation might lead a human to wrongfully believe that they understand the reasoning behind a decision.

• The similarity of a prototype to the corresponding region in the query sample: Even if the prototype itself is easily understood by a human, its activation on the query sample might not show the same concept as the prototype. Figure 2 (b) shows an example of this problem. This is important because it shows that the structural similarity in the embedding space that the prototypes reside in is not compatible with human understanding of similarity. This is a problem that has been reported in previous literature[14].

• The interpretability of the decision-making process itself is also an important aspect of prototype-based methods. Even if the prototypes and their similarity to the activated patches of the query sample are understood by humans, the final decision might not be. For example, a model might select and use unrelated prototypes to correctly classify a sample.

The main novelty of this work is a more robust framework for evaluating the interpretability of part-prototype-based networks using human annotators. Some previous methods have tried to do such evaluations based on automatic metrics[10], and some other works have worked on human-based evaluation of interpretability for other types of explainable AI methods[15], [16]. The closest work is HIVE[12] which suffers from a number of issues that are addressed in our approach. More on this will follow in the next section.

Another novelty of this work is the proposal of three actionable metrics and experiments for evaluating the interpretability of part-prototype-based classifiers. We believe that if a model fails these tests, it would not be a good interpretable model. These can assist future researchers in providing evidence rather than just making assumptions about the interpretability of their approaches.

Finally, our extensive set of experiments using Amazon Mechanical Turk includes comparisons of six related methods on three datasets. To the best of our knowledge, this is the most comprehensive work on evaluating the interpretability of such methods in a unified framework.

This paper is available on arxiv under CC 4.0 license.