iforest interpreting random forests via visual analytics
play

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, - PowerPoint PPT Presentation

iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui Background Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and


  1. iForest: Interpreting Random Forests via Visual Analytics Xun Zhao, Ya Yanhong Wu , Dik Lun Lee, Weiwei Cui

  2. Background • Random Forest Fraud Detection Medical Diagnosis Churn Prediction 1 Icons created by Anatolii Babii, Atif Arshad, and Dinosoft Labs from the Noun Project.

  3. Background – Decision Tree 2

  4. Background – Decision Tree 3

  5. Background – Random Forest 4

  6. Background – Random Forest 5

  7. Background – Random Forest 6

  8. Motivation – Random Forest 7

  9. Random Forests are A+ predictors on performance but rate an F on interpretability L. Breiman “Statistical modeling: The two cultures.” 8

  10. Interpretability 9 Source: https://xkcd.com/1838/

  11. Interpretability Reveal the relationships between features and predictions Uncover the underlying working mechanisms Provide case-based reasoning 10 Icons created by Melvin, alrigel, and Dinosoft Labs from the Noun Project.

  12. iForest: Interpreting Random Forests via Visual Analytics 11

  13. iForest - Visual Components Data Overview Feature View Decision Path View 12

  14. Demo

  15. iForest – Data Overview Data Overview Feature View Decision Path View Provide case-based reasoning 14

  16. iForest – Data Overview • Methods: confusion matrix and t-sne projection Predicted Values True False True False True Actual Values Positive Negative False True False Positive Negative 15

  17. iForest – Data Overview • Methods: confusion matrix and t-sne projection Negative Positive each circle represents a data item Default View Panning & Zooming 16

  18. iForest – Feature View Data Overview Feature View Decision Path View Reveal the relationships between features and predictions 17

  19. iForest – Feature View • Methods: data distribution and partial dependence plot each cell illustrates the statistics and importance of a feature 18

  20. iForest – Feature View • Methods: data distribution and partial dependence plot high Feature A (numerical) 19

  21. iForest – Feature View • Methods: data distribution and partial dependence plot high x = 60 Feature A (numerical) 20

  22. iForest – Feature View • Methods: data distribution and partial dependence plot Split point distribution Feature A (numerical) 21

  23. iForest – Feature View • Methods: data distribution and partial dependence plot high Feature B (ordinal) high 22

  24. iForest – Feature View Data Overview Feature View Decision Path View Uncover the underlying working mechanisms 23

  25. iForest – Decision Path View • Goal: audit the decision process of a particular data item 24

  26. iForest – Decision Path View • Decision Path Projection ration between positive and negative decision paths each circle represents a decision path lasso to select a specific set of paths for exploration Negative Positive 25

  27. iForest – Decision Path View • Feature Summary Feature Cell: Summarize the feature ranges of the selected paths pixel-based bar chart: feature range summary vertical bar: feature value of the current data item 26

  28. iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 27

  29. iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 28

  30. iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 29

  31. iForest – Decision Path View • Feature Summary Layer 1 (root) Layer 2 Layer 3 Decision Path I: A < 0.5 C < 3.5 C > 1.5 C > 2.5 A < 0.5 Decision Path II: 30

  32. iForest – Decision Path View • Decision Path Flow: layer-level feature ranges Leaf Node Leaf Node 31

  33. Evaluation – Usage Scenario • Two usage scenarios using the Titanic shipwreck and German Credit data • Titanic shipwreck statistics: • 891 passengers and 6 features after pre-processing • German Credit statistics: • 1,000 bank accounts and 9 features 32

  34. Usage Scenario – Titanic

  35. Evaluation – User Study • Qualitative user study • 10 participants recruited from local university and an industry research lab • 10 tasks covering all important aspects in random forest interpretation • 12 questions related with iForest usage in a post-session interview Task Completion Time (seconds) 30 25 20 15 10 5 0 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 Task 10 34

  36. Future Work • Support other tree-based model such as boosting trees • Support multi-class classification or regression • Support random forest diagnosis and debug 35

  37. Q&A iForest: Interpreting Random Forests via Visual Analytics Yanhong Wu Email: yanwu@visa.com URL: http://yhwu.me

Recommend


More recommend