Predictive Analytics - “why” factor & model interpretability

问题

I have the data that contains tons of x variables that are mainly categorical/nominal and my target variable is a multi-class label. I am able to build a couple models around to predict multi-class variables and compare how each of them performed. I have training and testing data. Both the training and testing data gave me good results.

Now, I am trying to find out "why" did the model predicted certain Y-variable? Meaning if I have weather data: X Variable: city, state, zip code, temp, year; Y Variable: rain, sun, cloudy, snow. I want to find out "why" did the model predict: rain, sun, cloudy, or snow respectfully. I used classification algorithms like multi-nominal, decision tree, ... etc

This may be a broad question but I need somewhere I can start researching. I can predict "what" but I can't see "why" it was predicted as rain, sun, cloudy, or snow label. Basically, I am trying to find the links between the variables that caused to predict the variable.

So far I thought of using correlation matrix, principal component analysis (that happened during model building process)...at least to see which are good predictors and which ones are not. Is there is a way to figure out "why" factor?

Thanks a bunch!

回答1:

Model interpretability is a hyper-active and hyper-hot area of current research (think of holy grail, or something), which has been brought forward lately not least due to the (often tremendous) success of deep learning models in various tasks, plus the necessity of algorithmic fairness & accountability...

Apart from the intense theoretical research, there have been some toolboxes & libraries on a practical level lately, both for neural networks as well as for other general ML models; here is a partial list which arguably should keep you busy for some time:

The What-If tool by Google, a brand new (September 2018) feature of the open-source TensorBoard web application, which let users analyze an ML model without writing code (project page, blog post)
The Layer-wise Relevance Propagation (LRP) toolbox for neural networks (paper, project page, code, TF Slim wrapper)
FairML: Auditing Black-Box Predictive Models, by Cloudera Fast Forward Labs (blog post, paper, code)
LIME: Local Interpretable Model-agnostic Explanations (paper, code, blog post, R port)
Black Box Auditing and Certifying and Removing Disparate Impact (authors' Python code)
A recent (November 2017) paper by Geoff Hinton, Distilling a Neural Network Into a Soft Decision Tree, with an independent PyTorch implementation
SHAP: A Unified Approach to Interpreting Model Predictions (paper, authors' Python code, R package)
Interpretable Convolutional Neural Networks (paper, authors' code)
Lucid, a collection of infrastructure and tools for research in neural network interpretability by Google (code; papers: Feature Visualization, The Building Blocks of Interpretability)
Transparecy-by-Design (TbD) networks (paper, code, demo)
SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability (paper, code, Google blog post)
TCAV: Testing with Concept Activation Vectors (ICML 2018 paper, Tensorflow code)
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (paper, authors' Torch code, Tensorflow code, PyTorch code, Keras example notebook)
Network Dissection: Quantifying Interpretability of Deep Visual Representations, by MIT CSAIL (project page, Caffe code, PyTorch port)
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks, by MIT CSAIL (project page, with links to paper & code)
Explain to Fix: A Framework to Interpret and Correct DNN Object Detector Predictions (paper, code)
InterpretML by Microsoft (code still in alpha)
Anchors: High-Precision Model-Agnostic Explanations (paper, code)

Finally, as interpretability moves toward the mainstream, there are already frameworks and toolboxes that incorporate more than one of the algorithms and techniques mentioned and linked above; here is an (again, partial) list for Python stuff:

tf-explain - interpretability methods as Tensorflow 2.0 callbacks (code, docs, blog post)
Skater, by Oracle (code, docs)
Alibi, by SeldonIO (code, docs)
AI Explainability 360, by IBM (code, blog post)