 
              Empirical Confidence Models for Supervised Machine Learning Margarita Castro 1 , Meinolf Sellmann 2 , Zhaoyuan Yang 2 , Nurali Virani 2 1 University of Toronto, Mechanical and Industrial Engineering 2 General Electric, Global Research Center May, 2020
Motivation 2 ML in high-stake context Main Issues: u We canβt expect the models to be perfect. Self-Driving Cyber Cars security u Summarize statistics (e.g., accuracy) can be misleading to assess a specific prediction. Healthcare diagnosis
Empirical Confidence for Regression 3 We propose: π Run time instance A model that can declare its own incompetence. Regression Model + Competence Assessor βWe develop techniques that learn when models generated by certain learning techniques on a particular data set can be expected to perform well, and when not .β πβ² Competence Prediction π· Level Trusted, Cautioned or Not Trusted
Outline of the Talk 4 Part 1: Competence Assessor Part 2: Numerical Evaluation u Overall framework. u Experimental Setting. u Meta-features. u Results. u Meta Training Data. u Conclusions.
5 Empirical Competence Assessor PART 1
Competence Assessor Pipeline 6 Prediction Run-time input y β² π¦ Regressor π· Meta Feature Competence (π, π) Builder Assessor Input for 1 2 Competence Competence Level Training Set Assessor (π, π) Primary Technique Regressor Model (e.g., Random Forest) Training Set
Meta Feature Builder 7 Relate run-time input with training Relating Input and Training set: data and regressor technique u Different distances measures depending Run-time Prediction on the regressor technique input π¦ π π¦ = π§β² π: πΊΓπΊ β β ! u Neighborhood π(π¦) based on the distance π¦ y β² Meta Feature measure π β ,β . Builder u We consider π nearest neighbors 6 meta with π = 5. features (π, π) Training Set
Our Six Meta Features 8 1. Average Distance to the Neighborhood 2. Average Prediction Distance 3. Deviation from regressorβs prediction π π¦, π¦ ( π π¦ β π(π¦ ( ) π " π¦ β 3 π ) π¦ β 3 π π # ! ,% ! β' # # ! ,% ! β' # π‘(π¦) π * π¦ β π π¦ β 3 π§ π(π¦, π¦β²) # ! ,% ! β' # u Measure how far the run-time input from the training data set. u Relationship between predictions at the vicinity of current input.
Our Six Meta Features 9 4. Average training error on π(π¦) 6. Target value variability on π(π¦) 5. Variance training error on π(π¦) π§ ( β 9 π§ ) π , π¦ β 3 π β 1 π‘ π¦ π π¦ ( β π§β² # ! ,% ! β' # π + π¦ β 3 π π¦ ( , π¦ # ! ,% ! β' # π π¦ ' β π§ ' β π ( π¦ ) u Variance of true value in π(π¦) . π ! π¦ β $ π β 1 " ! ,$ ! β& " u Accuracy of regressor in the immediate vicinity.
Training Data For Competence Assesor 10 Validation Splitter Training Set Base Technique Regressor π πβ² Meta Feature π· Builder Training Data for Competence Assessor
Splitter Procedure 11 Standard Cross-Validation Projection Splits Random splitting into β β {3,5,10} buckets. Assess i.i.d. assumption of the technique. u u One validation bucket and the rest as base. Create interpolation and extrapolation u u scenarios. Project over 1 st and 2 nd PC dimension and sort the u Base training data before splitting. Training Set Base Training Set Validation Validation Projected and sorted data
Training Meta Model 12 Classification Label (C) Training Techniques u Based on the true error of the learned u Off-the-shelf SVM and Random Forest model. Classifier. u Sort the absolute residual values in u Our goal is to test the framework in several datasets. ascending order and set the labels as: u 80% smaller Γ Trusted Note: More sophisticated techniques can be u 80-95% Γ Cautioned used for specific applications. u Last 5% Γ Not trusted Note: the labeling can be modified for specific applications
13 Numerical Evaluation PART 2
Experimental Setting 14 Objective: Cross-Validation Tasks Evaluate our Empirical Competence u Standard cross-validation. Model (ECM) over different u Interpolation and Extrapolation: scenarios. u Cluster data and take complete clusters as test set. u PC projections (1 st and 3 rd ). u Six UCI benchmark data-sets. u Regressors: Linear, Random Forest, and SVR. (Off-the-shelf) u Task: standard, interpolation, and extrapolation.
Proof-of-Concept Experiment 15 Setting: Linear Regression Model u 1-dimension data following a linear regression with random noise. ECM u Interpolation task. Predictions u Regressors: u Linear regression. Random Forest u Random forest. Model
Evaluating ECM over Airfoil Dataset 16 Trusted Cautious Not Trusted Bigger MSE for C and NT classes.
Evaluate Effectiveness of Pipeline 17 Baseline: Competence assessor trained over original data (only standard splitting and no meta features) Trusted Warned ECM has lower MSE for Trusted class and higher MSE for Warned class.
Conclusions & Future Works 18 u We present an Empirical Confidence Model (ECM) that assess the reliability of the regression model predictions. u We show the effectives of ECM for i.i.d. and non-i.i.d. train/test splits. u Future works: u Study other reliability measures as meta-features. u Integrate our methodology in an active learning setting.
Thank You! Empirical Confidence Models for Supervised Machine Learning
Recommend
More recommend