Model Learning Data Analysis Project Madalina Fiterau DAP - PowerPoint PPT Presentation

1 21 st of February 2012 Trade-offs in Explanatory Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon

2 Outline • Motivation: need for interpretable models • Overview of data analysis tools • Model evaluation – accuracy vs complexity • Model evaluation – understandability • Example applications • Summary

3 Example Application: Nuclear Threat Detection • Border control: vehicles are scanned • Human in the loop interpreting results prediction feedback vehicle scan

4 Boosted Decision Stumps • Accurate, but hard to interpret How is the prediction derived from the input?

5 Decision Tree – More Interpretable yes no Radiation > x% no yes Payload type = ceramics yes no Uranium level > max. Consider balance of admissible for ceramics Th232, Ra226 and Co60 Threat Clear

6 Motivation Many users are willing to trade accuracy to better understand the system-yielded results Need : simple, interpretable model Need : explanatory prediction process

7 Analysis Tools – Black-box • Very accurate tree ensemble Random Forests • L. Breiman,‘Random Forests’, 2001 • Guarantee: decreases training error Boosting • R. Schapire , ‘ The boosting approach to machine learning ’ • Bagged boosting • G. Webb, ‘ MultiBoosting: A Multi-boosting Technique for Combining Boosting and Weighted Bagging ’

8 Analysis Tools – White-box CART • Decision tree based on the Gini Impurity criterion • Dec. tree with leaf classifiers Feating • K. Ting, G. Webb, ‘ FaSS: Ensembles for Stable Learners ’ • Ensemble: each discriminator trained Subspacing on a random subset of features • R. Bryll , ‘ Attribute bagging ’ EOP • Builds a decision list that selects the classifier to deal with a query point

9 Explanation-Oriented Partitioning 2 Gaussians Uniform cube 5 4 3 2 1 0 -1 -2 -3 5 4 3 2 5 4 5 1 3 2 0 4 1 -1 0 3 -1 -2 -2 2 -3 -3 -4 1 0 -1 -2 -3 -4 -3 -2 -1 0 1 2 3 4 5 (X,Y) plot

10 EOP Execution Example – 3D data Step 1: Select a projection - (X 1 ,X 2 )

11 EOP Execution Example – 3D data Step 1: Select a projection - (X 1 ,X 2 )

12 EOP Execution Example – 3D data h 1 Step 2: Choose a good classifier - call it h 1

13 EOP Execution Example – 3D data Step 2: Choose a good classifier - call it h 1

14 EOP Execution Example – 3D data OK NOT OK Step 3: Estimate accuracy of h 1 at each point

15 EOP Execution Example – 3D data Step 3: Estimate accuracy of h 1 for each point

16 EOP Execution Example – 3D data Step 4: Identify high accuracy regions

17 EOP Execution Example – 3D data Step 4: Identify high accuracy regions

18 EOP Execution Example – 3D data Step 5:Training points - removed from consideration

19 EOP Execution Example – 3D data Step 5:Training points - removed from consideration

20 EOP Execution Example – 3D data Finished first iteration

21 EOP Execution Example – 3D data Finished second iteration

22 EOP Execution Example – 3D data Iterate until all data is accounted for or error cannot be decreased

23 Learned Model – Processing query [x 1 x 2 x 3 ] yes h 1 (x 1 x 2 ) [x 1 x 2 ] in R 1 ? no yes h 2 (x 2 x 3 ) [x 2 x 3 ] in R 2 ? no yes h 3 (x 1 x 3 ) [x 1 x 3 ] in R 3 ? no Default Value

24 Parametric / Nonparametric Regions Bounding Polyhedra Nearest-neighbor Score Enclose points in convex shapes Consider the k-nearest neighbors (hyper-rectangles /spheres). Region: { X | Score(X) > t} t – learned threshold Easy to test inclusion Easy to test inclusion Visually appealing Can look insular Inflexible Deals with irregularities decision decision n 3 n 2 p n 5 n 1 n 4 Incorrectly classified Correctly classified Query point

25 Feating and EOP Feating EOP Decision Tiles in feature Flexible Structures to space Regions pick right classification Models trained Models trained model on subspaces on all features Decision Tree Decision List

27 Overview of datasets • Real valued features, binary output • Artificial data – 10 features ▫ Low-d Gaussians/uniform cubes • UCI repository • Application-related datasets • Results by k-fold cross validation ▫ Complexity = expected number of vector operations performed for a classification task

28 EOP vs AdaBoost - SVM base classifiers • EOP is often less accurate, but not significantly • the reduction of complexity is statistically significant 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0.85 0.9 0.95 1 0 100 200 300 Boosting Accuracy Complexity EOP (nonparametric) mean diff in accuracy: 0.5% mean diff in complexity: 85 p-value of 2-sided test: 0.832 p-value of 2-sided test: 0.003

29 EOP (stumps as base classifiers) vs CART on data from the UCI repository CART EOP N. BT EOP P. V MB BCW 0 0.5 1 0 20 Accuracy Complexity  Parametric  CART is EOP yields Dataset # of Features # of Points the most Breast Tissue 10 1006 the simplest accurate Vowel 9 990 models MiniBOONE 10 5000 Breast Cancer 10 596

30 Why are EOP models less complex? Typical XOR dataset

31 Why are EOP models less complex? Typical XOR dataset CART • is accurate • takes many iterations • does not uncover or leverage structure of data

32 Why are EOP models less complex? Typical XOR dataset EOP • equally accurate CART • uncovers structure • is accurate + o • takes many iterations • does not uncover or leverage structure of data Iteration 1 o + Iteration 2

33 Error Variation With Model Complexity for EOP and CART Error variation with model complexity 0.5 Breast Cancer Wis CART Breast Cancer Wis EOP 0.4 MiniBOONE CART MiniBOONE EOP Breast Tissue CART Error 0.3 Breast Tissue EOP Error Vowel CART Vowel EOP 0.2 0.1 0 1 2 3 4 5 6 7 8 Depth of decision tree/list Depth of decision tree/list • At low complexities, EOP is typically more accurate

34 UCI data – Accuracy R-EOP Vow N-EOP CART BT Feating Sub-spacing MB Multiboosting Random BCW Forests 0 0.2 0.4 0.6 0.8 1 1.2

35 UCI data – Model complexity R-EOP Vow N-EOP CART BT Feating Sub-spacing MB Multiboosting Complexity of Random Forests is huge BCW - thousands of nodes - 0 20 40 60 80

36 Robustness • Accuracy-targeting EOP ▫ identifies which portions of the data can be confidently classified with a given rate. Accuracy of EOP when regions do not include noisy data Accuracy Max allowed error

38 Metrics of Explainability Lift Bayes Factor J-Score Normalized Mutual Information

39 Evaluation with usefulness metrics • For 3 out of 4 metrics, EOP beats CART CART EOP BF L J NMI BF L J NMI MB 1.982 0.004 0.389 0.040 1.889 0.007 0.201 0.502 BCW 1.057 0.007 0.004 0.011 2.204 0.069 0.150 0.635 BT 0.000 0.009 0.210 0.000 Inf 0.021 0.088 0.643 V Inf 0.020 0.210 0.010 2.166 0.040 0.177 0.383 Mean 1.520 0.010 0.203 0.015 2.047 0.034 0.154 0.541 BF =Bayes Factor. L = Lift. J = J-score. NMI = Normalized Mutual Info Higher values are better

40 Outline • Motivation: need for interpretable models • Overview of data analysis tools • Model evaluation – accuracy vs complexity • Model evaluation – understandability • Example application • Summary

41 Spam Detection (UCI ‘SPAMBASE’) • 10 features: frequencies of misc. words in e-mails • Output: spam or not 100 Accuracy Splits Complexity 0.9 90 80 0.85 70 60 0.8 50 40 0.75 30 20 0.7 10 0 0.65

42 Spam Detection – Iteration 1 ▫ classifier labels everything as spam ▫ high confidence regions do enclose mostly spam and:  Incidence of the word ‘your’ is low  Length of text in capital letters is high

43 Spam Detection – Iteration 2 ▫ the required incidence of capitals is increased ▫ the square region on the left also encloses examples that will be marked as `not spam'

44 Spam Detection – Iteration 3 ▫ Classifier marks everything as spam ▫ Frequency of ‘your’ and ‘hi’ determine the regions word_frequency_hi

45 Effects of Cell Treatment • Monitored population of cells • 7 features: cycle time, area, perimeter ... • Task: determine which cells were treated Accuracy 0.8 Splits Complexity 25 0.79 0.78 20 0.77 0.76 15 0.75 0.74 10 0.73 0.72 5 0.71 0.7 0

47 Mimic Medication Data • Information about administered medication • Features: dosage for each drug • Task: predict patient return to ICU 25 Complexity Splits 0.9945 Accuracy 0.994 20 0.9935 15 0.993 10 0.9925 5 0.992 0.9915 0

Model Learning Data Analysis Project Madalina Fiterau DAP - PowerPoint PPT Presentation

1 21 st of February 2012 Trade-offs in Explanatory Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 2 Outline Motivation: need for interpretable models Overview of data

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi, Dipendra Misra,

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

CGE model development (1) CGE model development (1) Concept of CGE model and Concept of CGE

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh | http://reza-zadeh.com Machine Learning

Reinforcement Learning Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction Model-based

Model REM Rapid Engineering Model What is REM? REM Rapid Engineering Model What is REM? REM

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

CGE model development (2) CGE model development (2) CGE model development CGE model development

PLS Advanced Diffusion Model New Advanced Diffusion Model for Dopants in Silicon Advanced Dopant

a F a 3 = F ad = Adhesion Force dP F ad = Adhesion Force ad Hertz Model Hertz Model 2 K

Unsupervised Voice Conversion by Separately Embedding Speaker and Content Information with Deep

Early Level 1b evaluation based on HIRS experience and AIRS Data Product Validation Larry

A Ch Characteri acteriza zation tion of All ll Retr trof ofit it Co Contr ntroller

Information Transmission Chapter 4, Analog modulation OVE EDFORS ELECTRICAL AND INFORMATION

Application of A Zero-latency Whitening Filter to Compact Binary Coalescence GW Searches Leo

Wireless Communication Systems @CS.NCTU Lecture 10: Rate Adaptation Frequency-Aware Rate

Using Ambient Radio Signals Andrei Popleteev SnT, University of Luxembourg 2 Image: Particle

Measuring Diversity of Preferences in a Group Ulle Endriss Institute for Logic, Language and

Model Learning Data Analysis Project Madalina Fiterau DAP - PowerPoint PPT Presentation

1 21 st of February 2012 Trade-offs in Explanatory Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 2 Outline Motivation: need for interpretable models Overview of data

Cosmological model : Cosmological model Cosmological model Cosmological model : : : :

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi*, Dipendra Misra*,

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

CGE model development (1) CGE model development (1) Concept of CGE model and Concept of CGE

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh | http://reza-zadeh.com Machine Learning

Reinforcement Learning Steven J Zeil Old Dominion Univ. Fall 2010 1 Introduction Model-based

Model REM Rapid Engineering Model What is REM? REM Rapid Engineering Model What is REM? REM

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

CGE model development (2) CGE model development (2) CGE model development CGE model development

PLS Advanced Diffusion Model New Advanced Diffusion Model for Dopants in Silicon Advanced Dopant

a F a 3 = F ad = Adhesion Force dP F ad = Adhesion Force ad Hertz Model Hertz Model 2 K

Unsupervised Voice Conversion by Separately Embedding Speaker and Content Information with Deep

Early Level 1b evaluation based on HIRS experience and AIRS Data Product Validation Larry

A Ch Characteri acteriza zation tion of All ll Retr trof ofit it Co Contr ntroller

Information Transmission Chapter 4, Analog modulation OVE EDFORS ELECTRICAL AND INFORMATION

Application of A Zero-latency Whitening Filter to Compact Binary Coalescence GW Searches Leo

Wireless Communication Systems @CS.NCTU Lecture 10: Rate Adaptation Frequency-Aware Rate

Using Ambient Radio Signals Andrei Popleteev SnT, University of Luxembourg 2 Image: Particle

Measuring Diversity of Preferences in a Group Ulle Endriss Institute for Logic, Language and

Lipschitz Continuity in Model-based Reinforcement Learning Kavosh Asadi, Dipendra Misra,