<< Go back to Posts
Warning : This document is still a draft

DRAFT - EXplainable AI

Short introduction.

Introduction

Explainable AI is a growing topic [BECAUSE].

As a curious person, I would like to better understand my models, not just to know that the accuracy is xy %.

Models are more than often “machine explainable”: you can track which operation has been done, and why it leads to the current result. However, this is far from being “human explainable”: while a machine can accomodate with 100 of parameters, a human can barrely manage 5 parameters.

We will take a look at this topic. In the first part, this will be theory, what are the main principles of explainable AI. In the second part, we will look at examples and caveats.

Table of Content

Keywords

Black box White box

Theory

Concepts

Access to Model Internal

There are two ways to study a model:

Black box: you just see the input and the output, but not the internal operations that have been done
White box: you see I/O and internal weights/operations that are performed.

Scope

There are two possible focus:

model: you want a simplified description of the model
instance: you want to explain why a given decision has been taken

model x black-box is very hard to perform, because you need to generate many inputs to define “areas”.

Instance Explainability

There are two ways to explain why a given $\mathbf{x}$ gives the corresponding $\hat{y}$:

prototype: Basically, it uses the training data as a supporting evidence: “$\mathbf{x}$ was classified as $\hat{y}$ because $\mathbf{x}_1$ and $\mathbf{x}_2$ are”.
feature: There are feature that are more decisive than other. The goal is to identify them, to only spot which information was relevant to take the decision.

Prototypes cannot be used all the time. In a recommender system recommending books, if the explanation is:

“I [the model] suggested reading book x because you have read [a, b, c, d] and we find one neighbors having read [a, b, c, d, e, f, g, x], so we suggest you book x”
“I suggested reading book x because Anna and John who have the same profile than you did like it”

In the first example, if you present the closest neighbor by disclosing the features, then imagine with thousand features, that just ununderstandable. In the second example, the neighbors are named so there is no need to print features. However, if the user does not know any of them, this is useless too.

If items can be named, this is much easier. For instance, suppose you do animal classification based on physical attributes: height, weight, has fur?, has tail?, … Here, saying “I [the model] classified your unknown animal as a feline because it looks like a bobcat or a wild cat” is perfectly understandable.

Contrast

There are two kinds of examples:

Positive: relate to nearest items of the same class
Negative: take examples from a different class

Positive examples are the most common. Getting an idea of what is the typical item and what are the main characteristics helps to identify ourselves to the group.

However, sometimes, this prototype is too far from our example, and we don’t get why the example is linked to it. Here, negative examples help: the interpretable model state what are the prototype / characteristics of the nearest group where you don’t belong to.

Interpretable Models

Most “common” models are interpretable:

Decision Trees
KNN
Linear regression
Logistic regression
KMeans

Issues

The common models that are interepretable loose their interpretability when too many features exist.

If we can list the features that explain 80% of the decision, we would be happy. However, when you have 100 features, this may correspond to 50% of the features, because of redundancy, noise.

Interepretable Model by Design

If you perform a KMeans with 100 clusters, $k$-NN with 20 neighbors, Decision tree with a tree depth of 10, and LR with 1000 of features, for sure, the models would not be ununderstandable for a human.

For that, we need to tune the model, maybe reducing the accuracy, but would improve model speed.

Feature Selection

This is a removal of useless features (or the selection of the most interesting one), so we can get only a restricted subset without loosing accuracy. This is a method valid for every models.

Forward VS Backward explanation

Example of mushroom. Describe your mushroom with many features. The model tell you “Mushroom A because feature xyz”.

Then you look at another mushroom. You see xyz, and infer mushroom A. However, this is not mushroom A. Mushroom B also has feature xyz, but additionally uvw

Interation

one way
two way: ask alternative questions

Missing values ? Prognostics ?

Post Hoc Explainability

Video Github

Local

feat importance
rule basd
saliency maps
prototypes / example based
counterfactual

Global/ complete behavior

Collection of local explanation
model distillation (fit a new small model on top)
representation based
summaries of counterfactual

Example:

diverse (ex not redundant in their description)
representative (model global behavior, not outliers)

Eval of explanation

Functionnaly grounded evaluation: no human + proxy task
Human: real human + simple task
App: human + real task

(More specific and costly)

Goal:

behavior understand
help make decision
debugging

Feature deletion

Feature insertion

Add remove training data

Create simulator

Limits

Faithgulness / fidelity: some explanation do not reflect the model
- security
- sensivity to model parameters
- what if retrain a model ? same explanation
- adversarial attack: small perturbation
Fragility (manipulation)
stability: change inputs cause large changes
usefull in practice ?
- initial image more useful than features ..;

Shapley

Marginal contribution over subsets $S$ that contains $i$

\[m_x(S, i) := v_x(S) - v_x(X \setminus \{i\})\]

Warning: This is not the prediction without the value !!! Average between all sets that contains it VS all sets that do not contain it

\[\phi_i(f)) = \sum_{S \subseteq \in \{1, ..., p\} \setminus \{j\}} \frac{|S|!(p-|S|-1)!}{p!} \left(f(S \cup \{i\}) - f(S)\right)\]

$f(S)$: Value function, ex: accuracy of your model given feature set $S$.
$p$: number of features.

\[\phi_i = \sum_{S \in 2^{F \setminus \{i\}}} \omega(S) M_i(S)\]

$M_i(S) = C(S \cup {i}) - C(S)$
$\omega(S) = \frac{ S !( F - S -1)!}{ F !}$
$ F $ number of features
$C$: evaluation function

Properties:

Efficiency
Symmetry
Dummy/null player: If a value does not contribute, $\phi_j(f) =0$
Additivity: $(C + K)(S) = C(S) + K(S)$
Balance contrib: $\phi_i(C) - \phi_i(C_j) = \phi_j(C) - \phi_j(C_i)$ with $C_i$: Eval but without feature $i$. Each pair of feature share equally the gain/loss

Book

local rules

Black box collaboration

Tutorial Kaggle

Visual on interpretatbility

Project interpretable

Tutorial / Case studies: mostly decision trees

explanation
meaningful
explanation accuracy : need to reflect system behavior
knowledge limits

Lipton: Transparent VS post hoc “A counterfactual explanation of a prediction describes the smallest change to the feature values that changes the prediction to a predefined output

PDP Partial Dependency plot ICE: Individual Conditional Expectation

Book interpretable ML Book 2

CCA: Canonical Correlation Analysis

>> You can subscribe to my mailing list here for a monthly update. <<