traditional_ml / 12 · interpretability & feature importance lesson 12 / 12

Interpretability & feature importance

The capstone. Regulated industries still ship GBDTs and logistic regression because "explain this decision" remains a hard requirement. The senior signal is knowing which question you're answering: global or local, faithful or stable, correlational or causal.

Two definitions of "interpretable"

Before any method, separate two questions that stakeholders conflate:

FlavorQuestionEasy forHard for
Global How does the model work overall? Which features does it use, and how are they combined? Linear/logistic regression, shallow trees, GAMs. Ensembles (RF, GBDT), neural nets, anything with deep interactions.
Local Why did the model output this prediction for this row? Any model — via post-hoc attribution (SHAP, LIME, ICE). Answers are approximate; the local linearization may not hold.

Stakeholders usually want one, not both. A regulator handing a denied applicant an adverse-action notice wants local. A risk officer asking "what is this model keying on?" wants global. The methods are different.

Intrinsically interpretable models

The cheapest path to interpretability is to pick a model whose structure is the explanation.

The first sentence of every interpretability answer
"If interpretability is a hard requirement, pick a model that is interpretable. Don't ship a black box and then try to explain it." Juniors immediately reach for SHAP; seniors first ask whether the accuracy gap justifies the explanation tax.

Post-hoc methods — what they all share

For models that aren't intrinsically interpretable (RF, GBDT, NN), you query the trained model after the fact. There are two questions and many methods:

GLOBAL LOCAL "which features matter overall?" "what drove this prediction?" ──────────────────────────── ──────────────────────────── · permutation importance · LIME · built-in tree split-gain · SHAP (per row) · mean |SHAP| across rows · ICE (one row's curve) · partial dependence (PDP) · gradient × input (for NNs) · ICE (averaged) · counterfactuals

Two warnings: (1) different methods rank features differently — the disagreement problem, below; (2) every post-hoc method queries the model on points that may not exist in the training distribution.

Permutation feature importance

The simplest model-agnostic global method. For each feature x_j: score on a held-out set (s_0), shuffle column j across rows, score again (s_j), and report s_0 − s_j. Shuffling preserves the marginal distribution of x_j but destroys its joint with the target and other features. Average over several shuffles.

ProCon
Model-agnostic; reuses your eval pipeline.Correlated features mask each other. If x_1 ≈ x_2, shuffling x_1 barely hurts because x_2 still carries the signal.
Tied to a real metric (AUC, MSE) you already care about.Shuffling creates out-of-distribution inputs (shuffled-height + real-weight pairs that never exist).
Cheap: O(features × eval cost).Conditional permutation fixes the correlation issue but is expensive and finicky.

Built-in tree importance (Gini / split-gain)

Every GBDT/RF library exposes a "feature_importances_" computed during training: sum the impurity reduction (Gini for classification, MSE for regression) across all nodes that split on the feature, weighted by samples reaching the node.

ProCon
Free — computed as a side-effect of training.Biased toward high-cardinality features. A feature with 1000 unique values has 1000 split candidates; one with 2 has 1. High-cardinality wins by luck.
Captures the model's use of the feature in training.Counts splits, not the magnitude of resulting prediction change. A feature can be split often yet contribute small moves.
No extra compute.Inconsistent: adding a tree that uses feature A more can decrease A's reported importance — TreeSHAP was partly motivated to fix this (Lundberg et al. 2018).

Partial Dependence Plots (PDP) and ICE

PDP answers "what is the average prediction as x_j varies, marginalizing over everything else?" For a grid of values v:

PDP_j(v) = (1/N) · Σ_i f̂(x_i with x_{i,j} := v)

Replace column j with constant v, score every row, average, plot vs v. Same OOD problem as permutation — the row (age=80, income=20k) becomes (age=20, income=20k), which may not exist.

Failure mode. If x_j interacts with x_k — positive for small x_k, negative for large — averaging hides this and PDP shows a flat line. You'd conclude "feature doesn't matter" when in fact it matters for everyone, in opposite directions.

ICE (Individual Conditional Expectation) fixes this: one line per row instead of averaging. If lines slope the same way, the effect is monotone; if they fan out, the feature interacts. ICE = PDP without averaging.

PDP/ICE gotcha
Both marginalize by substitution and hallucinate impossible rows (age=8, retired=yes). Accumulated Local Effects (ALE) plots (Apley & Zhu 2020) integrate local changes within actually-occupied regions — the one-token upgrade when asked "what's wrong with PDP?"

LIME — local surrogate models

Ribeiro, Singh & Guestrin 2016. For a specific prediction f̂(x_0): sample perturbed inputs around x_0, score them with the black-box, fit a simple model (sparse linear or shallow tree) on the perturbed pairs weighted by proximity to x_0. The simple model's coefficients are the local explanation.

Intuition: any model is approximately linear in a small enough neighborhood. LIME finds that neighborhood and fits a line.

Pros: model-agnostic; works for tabular, text (drop tokens), image (occlude superpixels). Cons: unstable — change the perturbation seed, get a different explanation. The kernel/neighborhood is itself a hyperparameter. In regulated settings the instability is disqualifying.

SHAP — the dominant method

Lundberg & Lee 2017. SHAP computes each feature's average marginal contribution to the prediction across all possible orderings of features being added to a "coalition":

φ_j(x) = Σ_{S ⊆ F\{j}} [ |S|! · (|F|−|S|−1)! / |F|! ] · [ f_S∪{j}(x) − f_S(x) ]

The reason SHAP became dominant is the axioms. Shapley values are the unique attribution satisfying:

No other scheme satisfies all four. TreeSHAP (Lundberg et al. 2018) put SHAP in production: exact Shapley values for tree ensembles in O(TLD²) time instead of O(TL · 2^d). This is why every GBDT library ships a SHAP integration. For neural networks you fall back to DeepSHAP or KernelSHAP — both approximate.

ProCon
Principled — unique under four reasonable axioms.Still queries the model on points that may not exist. The "interventional" variant used by TreeSHAP is OOD. TreeSHAP has two variants: tree_path_dependent (uses the tree's own training distribution at each split — stays on-manifold but conflates correlation with attribution) and interventional (uses a background dataset — can go off-manifold but preserves all four Shapley axioms cleanly). Most production tools default to tree_path_dependent.
Local and global from one machinery: mean |φ_j| across rows = global importance, consistent with per-row.Fast for trees only; other models slow or approximate.
Sign matters: positive φ pushed the prediction up, negative pushed it down.Easy to misinterpret as causal. SHAP attributes a prediction, not an outcome.

Interactive · SHAP-like attribution playground

A tiny linear model on a 4-feature regression. For a linear model, φ_j(x) = β_j · (x_j − E[x_j]) is the exact Shapley value (exact under feature-independence; for correlated features there are two correct SHAP variants — conditional and interventional — which can differ.) — so this widget computes the real thing. Pick a row to see its waterfall. Toggle "add a correlated copy of feature 1" to watch permutation importance get fooled while SHAP stays sane.

Local attribution + global importance
Move the row slider to see different waterfalls. Each bar is one feature's contribution; together they sum to (prediction − baseline). Then toggle the correlated copy and watch the permutation bars halve while SHAP stays intact.
baseline E[ŷ]
ŷ for this row
Σ SHAP
residual ŷ−(base+Σφ)
Reading

The disagreement problem

Run permutation, TreeSHAP, and built-in split-gain on the same GBDT and the feature rankings will not match. Not a bug — each method answers a slightly different question.

MethodQuestion it actually answers
Permutation importance"How much worse does my eval loss get if I destroy this feature's signal?"
Built-in split-gain"How much impurity did this feature reduce during training, summed across splits?"
Mean |SHAP|"How much does this feature move the prediction from baseline, on average per row?"
PDP range"How much does the average prediction swing as this feature varies?"
LIME coefficient (averaged)"How much does a local linear surrogate weight this feature, averaged over rows?"

Senior answer to "which one should I report?" — pick the method whose question matches what the stakeholder is asking. Loss when a feature pipeline breaks → permutation. Explaining a single decision → SHAP. Debugging tree structure → split-gain. "What should I change?" → none of them.

Interpretability ≠ causal inference

SHAP says "feature X contributed +0.3 to this prediction." It does not say "if you changed X in the world, the outcome would change by 0.3." Two reasons:

  1. The model is correlational. If x_j is a proxy for an unobserved confounder, SHAP credits the proxy. Intervening on x_j doesn't move the confounder.
  2. SHAP attributes the trained model, not the world. SHAP for "shoe size" predicting reading ability in children will be positive — older kids read better and have bigger feet. Buying bigger shoes does not improve reading.

Causal questions need experimental data or causal-inference machinery (DAGs, do-calculus, IV, matching). The senior signal is calling this out unprompted whenever someone asks "what should we change to flip the decision?"

The counterfactual trap
"SHAP says feature X has a +0.3 effect on the denial. Let's tell the applicant to fix X." This may be correct, useless, or actively misleading — depending on whether X is causal or a proxy. Surface SHAP as a hypothesis, then validate causally. Most regulators only require the model's reasons, not actionable advice; conflating the two is how you end up advising someone to change something they cannot.

When does the stakeholder actually need interpretability?

DriverFlavorMethod
Regulation (GDPR Art. 22, ECOA, FCRA adverse-action notices) Local SHAP per row, or coefficients if linear. Stable + reproducible matters more than minimum-variance.
Debugging ("the model is doing something weird") Global Permutation + mean |SHAP| + PDP/ALE on suspected features. Look for features that shouldn't matter and do.
Trust / adoption Global Ship the simplest model that meets accuracy. EBM/GAM is often right. A small AUC tax buys a year of deployment velocity.
Fairness audit Both SHAP segmented by demographic + group metrics (TPR parity, calibration parity). Detection, not remediation.
Causal action Neither Push back. A/B test the intervention or commission a causal study.

The trade-off table you should have memorized

Linear coefShallow treeTreeSHAP / GBDTLIMEPermutation
FaithfulnessExactExactExact for treesLocal approximationReal metric, corrupts joint distribution
Model-agnosticNoNoTrees onlyYesYes
Local / globalBothBothBothLocal onlyGlobal only
ScalabilityTrivialTrivialPolynomial in tree sizeSlow (sample + fit per row)O(features × eval)
StabilityHighHighHigh (deterministic)Low (random perturbations)Medium (MC over shuffles)
Handles correlationOK if regularizedOKSplits credit between correlatedPoorPoor — features mask each other
Causal?NoNoNoNoNo

Interview prompts you should be ready for

  1. "Walk me through SHAP. Why Shapley values?" (The four axioms. Efficiency: attributions sum to prediction − baseline. Symmetry, dummy, additivity. Shapley is the unique attribution that satisfies all four. TreeSHAP made it tractable for the models people actually ship.)
  2. "Permutation importance vs SHAP — when do they disagree?" (Permutation answers a loss-based question, SHAP a prediction-based one. They disagree when a feature moves predictions a lot but those moves don't help loss — e.g., a well-calibrated feature in a balanced dataset where rearranging predictions doesn't change AUC. They also disagree on correlated features: permutation halves the credit, SHAP splits it more cleanly.)
  3. "Your stakeholders want to know 'what to change to flip the decision.' What's your concern with answering from SHAP?" (SHAP is correlational. A feature with high SHAP may be a proxy for an unobserved confounder, and intervening on it does nothing. For counterfactual advice you need a verified causal structure or an experiment.)
  4. "Why doesn't a tree's built-in feature importance match permutation importance?" (Three reasons: built-in is biased toward high-cardinality features; it counts splits, not prediction magnitude; permutation is computed on held-out data while built-in is computed on training. They're answering different questions.)
  5. "You have a GBDT at AUC 0.83 and a logistic regression at AUC 0.81. Pick one for a regulated lending product." (LR. The 2-point gap rarely outweighs the cost of building adverse-action notices, monitoring, and regulator review for a black box. The exception is when those 2 points represent millions in NPV — in which case ship the GBDT with TreeSHAP and budget for the explanation infrastructure.)
  6. "Your SHAP for feature X is +0.3 on this row. What does that actually mean?" ("Holding the data distribution fixed, this feature's value contributed +0.3 to this prediction relative to the baseline E[ŷ]. It does not mean changing X by one unit changes the prediction by 0.3, and it does not mean changing X in the real world changes the outcome.")
  7. "How do you handle the disagreement between three different feature-importance methods?" (Don't pick the one with the prettiest answer. Identify which question the stakeholder is asking — loss impact, prediction impact, or training behavior — and use the method that answers it. Report the others as sensitivity checks, not as ground truth.)
Takeaway — and end of folder
Interpretability is not a property of a model; it's a property of a question. The senior move, in order: ask which question the stakeholder is solving; pick an intrinsically interpretable model if the accuracy budget allows; otherwise pick the post-hoc method whose question matches; and never claim a causal answer from a correlational tool. The whole folder in one paragraph: bias-variance gave you the trade-off frame, the model lessons the levers, evaluation taught you to measure what you optimize, and this lesson tells you how to talk about what you shipped.