Explainable AI is a growing topic [BECAUSE].
As a curious person, I would like to better understand my models, not just to know that the accuracy is xy %
.
Models are more than often “machine explainable”: you can track which operation has been done, and why it leads to the current result. However, this is far from being “human explainable”: while a machine can accomodate with 100 of parameters, a human can barrely manage 5 parameters.
We will take a look at this topic. In the first part, this will be theory, what are the main principles of explainable AI. In the second part, we will look at examples and caveats.
Black box White box
There are two ways to study a model:
There are two possible focus:
model x black-box
is very hard to perform, because you need to generate many inputs to define “areas”.
There are two ways to explain why a given \(\mathbf{x}\) gives the corresponding \(\hat{y}\):
Prototypes cannot be used all the time. In a recommender system recommending books, if the explanation is:
x
because you have read [a, b, c, d]
and we find one neighbors having read [a, b, c, d, e, f, g, x]
, so we suggest you book x
”x
because Anna and John who have the same profile than you did like it”In the first example, if you present the closest neighbor by disclosing the features, then imagine with thousand features, that just ununderstandable. In the second example, the neighbors are named so there is no need to print features. However, if the user does not know any of them, this is useless too.
If items can be named, this is much easier. For instance, suppose you do animal classification based on physical attributes: height, weight, has fur?, has tail?, … Here, saying “I [the model] classified your unknown animal as a feline because it looks like a bobcat or a wild cat” is perfectly understandable.
There are two kinds of examples:
Positive examples are the most common. Getting an idea of what is the typical item and what are the main characteristics helps to identify ourselves to the group.
However, sometimes, this prototype is too far from our example, and we don’t get why the example is linked to it. Here, negative examples help: the interpretable model state what are the prototype / characteristics of the nearest group where you don’t belong to.
Most “common” models are interpretable:
The common models that are interepretable loose their interpretability when too many features exist.
If we can list the features that explain 80% of the decision, we would be happy. However, when you have 100 features, this may correspond to 50% of the features, because of redundancy, noise.
If you perform a KMeans with 100 clusters, \(k\)-NN with 20 neighbors, Decision tree with a tree depth of 10, and LR with 1000 of features, for sure, the models would not be ununderstandable for a human.
For that, we need to tune the model, maybe reducing the accuracy, but would improve model speed.
This is a removal of useless features (or the selection of the most interesting one), so we can get only a restricted subset without loosing accuracy. This is a method valid for every models.
Example of mushroom. Describe your mushroom with many features. The model tell you “Mushroom A because feature xyz”.
Then you look at another mushroom. You see xyz, and infer mushroom A.
However, this is not mushroom A.
Mushroom B also has feature xyz, but additionally uvw
Local
Global/ complete behavior
Example:
(More specific and costly)
Goal:
Marginal contribution over subsets $S$ that contains $i$
\[m_x(S, i) := v_x(S) - v_x(X \setminus \{i\})\]Warning: This is not the prediction without the value !!! Average between all sets that contains it VS all sets that do not contain it
\[\phi_i(f)) = \sum_{S \subseteq \in \{1, ..., p\} \setminus \{j\}} \frac{|S|!(p-|S|-1)!}{p!} \left(f(S \cup \{i\}) - f(S)\right)\]$\omega(S) = \frac{ | S | !( | F | - | S | -1)!}{ | F | !}$ |
$ | F | $ number of features |
Properties:
Tutorial / Case studies: mostly decision trees
Lipton: Transparent VS post hoc “A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output
PDP Partial Dependency plot ICE: Individual Conditional Expectation
CCA: Canonical Correlation Analysis
>> You can subscribe to my mailing list here for a monthly update. <<