iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - PowerPoint PPT Presentation

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui

Background • Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and Dinosoft Labs from the Noun Project.

Background – Decision Tree 2

Background – Decision Tree 3

Background – Random Forest 4

Motivation – Random Forest 7

Random Forests are A+ predictors on performance but rate an F on interpretability L. Breiman “Statistical modeling: The two cultures.” 8

Interpretability 9 Source: https://xkcd.com/1838/

Interpretability Reveal the relationships between features and predictions Uncover the underlying working mechanisms Provide case-based reasoning 10 Icons created by Melvin, alrigel, and Dinosoft Labs from the Noun Project.

iForest: Interpreting Random Forests via Visual Analytics 11

iForest - Visual Components Data Overview Feature View Decision Path View 12

iForest – Data Overview Data Overview Feature View Decision Path View Provide case-based reasoning 14

iForest – Data Overview • Methods: confusion matrix and t-sne projection Predicted Values True False True False True Actual Values Positive Negative False True False Positive Negative 15

iForest – Data Overview • Methods: confusion matrix and t-sne projection Negative Positive each circle represents a data item Default View Panning & Zooming 16

iForest – Feature View Data Overview Feature View Decision Path View Reveal the relationships between features and predictions 17

iForest – Feature View • Methods: data distribution and partial dependence plot each cell illustrates the statistics and importance of a feature 18

iForest – Feature View • Methods: data distribution and partial dependence plot high Feature A (numerical) 19

iForest – Feature View • Methods: data distribution and partial dependence plot high x = 60 Feature A (numerical) 20

iForest – Feature View • Methods: data distribution and partial dependence plot Split point distribution Feature A (numerical) 21

iForest – Feature View • Methods: data distribution and partial dependence plot high Feature B (ordinal) high 22

iForest – Feature View Data Overview Feature View Decision Path View Uncover the underlying working mechanisms 23

iForest – Decision Path View • Goal: audit the decision process of a particular data item 24

iForest – Decision Path View • Decision Path Projection ration between positive and negative decision paths each circle represents a decision path lasso to select a specific set of paths for exploration Negative Positive 25

iForest – Decision Path View • Feature Summary Feature Cell: Summarize the feature ranges of the selected paths pixel-based bar chart: feature range summary vertical bar: feature value of the current data item 26

iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 27

iForest – Decision Path View • Decision Path Flow: layer-level feature ranges Leaf Node Leaf Node 31

Evaluation – Usage Scenario • Two usage scenarios using the Titanic shipwreck and German Credit data • Titanic shipwreck statistics: • 891 passengers and 6 features after pre-processing • German Credit statistics: • 1,000 bank accounts and 9 features 32

Usage Scenario – Titanic

Evaluation – User Study • Qualitative user study • 10 participants recruited from local university and an industry research lab • 10 tasks covering all important aspects in random forest interpretation • 12 questions related with iForest usage in a post-session interview Task Completion Time (seconds) 30 25 20 15 10 5 0 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Task 10 34

Future Work • Support other tree-based model such as boosting trees • Support multi-class classification or regression • Support random forest diagnosis and debug 35

Q&A iForest: Interpreting Random Forests via Visual Analytics Yanhong Wu Email: yanwu@visa.com URL: http://yhwu.me

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - PowerPoint PPT Presentation

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui Background Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Geospatial Visual Analytics: suggestions for the Body of Knowledge for Visual Analytics Education

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

Interactive Model Learning from High-Dimensional Data: A Visual Analytics Approach Klaus

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml

Understanding Random Forests Gilles Louppe (@glouppe) CERN, September 21, 2015 Outline 1

Visual Analytics and Data Mining Visual Analytics and Data Mining in S- in S -T T-

On-line Random Forests Amir Saffari, Christian Leistner, Jakob Santner Martin Godec, Horst

Uncovering interactions with Random Forests Jake Michaelson Marit Ackermann Andreas Beyer

Lecture #15: Regression Trees & Random Forests Data Science 1 CS 109A, STAT 121A, AC 209A,

Chapter 2 What is Vis? Why do it? Vis/Visual Analytics, Chap 2 What is Vis? 1 CGGM Lab., CS

OpenCL Visual Analytics Platform G R A P H I S T R Y Lee Butterman lsb@graphistry.com

Fast Discriminative Visual Codebooks using Randomized Clusering Forests Frank Moosmann, Bill

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

Interpreting Analytics Data-Informed Decision Making Ashley Martin Executive Director,

Information Visualization & Visual Analytics Jack van Wijk Dept. Math. & Computer

How Stranger Things can happen with Visual Analytics Jason Flittner Senior Analytics

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - PowerPoint PPT Presentation

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui Background Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and

Chapter 9 Object recognition Random Forests 9.9 Random forests 2 9.9 Random forests

Random Forests September 29, 2019 Random Forests September 29, 2019 1 / 30 Motto The clearest

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck &amp; Co., Inc.

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Random forests and wine Machine Learning Toolbox Random forests Popular type of machine

STK-IN4300 Details of Random Forests Statistical Learning Methods in Data Science Adaptive

Geospatial Visual Analytics: suggestions for the Body of Knowledge for Visual Analytics Education

Random Forests COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Random

Interactive Model Learning from High-Dimensional Data: A Visual Analytics Approach Klaus

Recap by Milo Davies, SAS NZ POWERFUL ADAPTIVE OPEN UNIFIED SAS Visual Analytics SAS Visual

Introduction to Machine Learning Random Forests: Proximities compstat-lmu.github.io/lecture_i2ml

Understanding Random Forests Gilles Louppe (@glouppe) CERN, September 21, 2015 Outline 1

Visual Analytics and Data Mining Visual Analytics and Data Mining in S- in S -T T-

On-line Random Forests Amir Saffari, Christian Leistner, Jakob Santner Martin Godec, Horst

Uncovering interactions with Random Forests Jake Michaelson Marit Ackermann Andreas Beyer

Lecture #15: Regression Trees &amp; Random Forests Data Science 1 CS 109A, STAT 121A, AC 209A,

Chapter 2 What is Vis? Why do it? Vis/Visual Analytics, Chap 2 What is Vis? 1 CGGM Lab., CS

OpenCL Visual Analytics Platform G R A P H I S T R Y Lee Butterman lsb@graphistry.com

Fast Discriminative Visual Codebooks using Randomized Clusering Forests Frank Moosmann, Bill

Introduction to Machine Learning Random Forest: Benchmarking Trees, Forests, and Bagging K-NN

ECON 950 Winter 2020 Prof. James MacKinnon 7. Boosting Like bagging and random forests,

Interpreting Analytics Data-Informed Decision Making Ashley Martin Executive Director,

Information Visualization &amp; Visual Analytics Jack van Wijk Dept. Math. &amp; Computer

How Stranger Things can happen with Visual Analytics Jason Flittner Senior Analytics

Random Forests What, Why, And How Andy Liaw Biometrics Research, Merck & Co., Inc.

Lecture #15: Regression Trees & Random Forests Data Science 1 CS 109A, STAT 121A, AC 209A,

Information Visualization & Visual Analytics Jack van Wijk Dept. Math. & Computer