Using Meta-learning for Model Type Selection in Predictive Big Data Analytics Mustafa Nural, Hao Peng, John A. Miller Department of Computer Science University of Georgia
What is is Predictive Analytics? • The process of building a statistical model from data to capture the relationships between variables in order to • make sense of • to predict outcomes • Model 𝑧 = 𝑔 𝒚 + ϵ • Modeling Technique / Model Type • E.g., OLS Regression, Lasso Regression • Classification • Target outcome of the model is a categorical variable • Prediction • Target outcome of the model is a non-categorical variable • Includes many types of regression models
What is is the Problem? • Choosing the most predictive model from a set of candidate models is non-trivial • No free lunch theorem (Wolpert & Macready, 1997) • No single modeling technique can consistently outperform others • Different Restrictions per problem • Interpretability • Parsimony • Etc.
Meta-learning • “Learning to learn” • Active area of research in machine learning • Learning performance of classification algorithms • Hyper-parameter optimization • Pre-processing of datasets • Little focus has been given to prediction algorithms • No previous work on the regression family • OLS Regression • Regression with regularization • Generalized Linear Models
Overview of Meta-learning Performance Training Modeling Statistics Datasets Techniques Report most predictive technique for each dataset Meta-learning Training Set Feature Extraction Meta-features Candidate s Train Meta-learner Meta-features Suggestion Candidate Dataset Most Predictive Technique(s) Engine
Meta-feature Extraction • Features from the literature • 𝑐𝑏𝑡𝑓 𝑒𝑔, 𝑐𝑏𝑡𝑓 𝑠𝑒𝑔, non-negative response, domain response, distinct ratio of response, % numeric, % categorical, % binary variables • Grand mean: stddev, mean, skewness, and kurtosis of numeric variables • Grand mean: min, max, mean, stddev of categorical variables • Additional features particularly relevant for Regression problems • Log Dimensionality • Matrix Condition Number • Skewness & Kurtosis of Response • Coefficient of Variation of Response
Target Modeling Techniques • Ordinary Least Squares • Exponential Regression (R) Regression (ScalaTion) • Poisson Regression (R) • Weighted Least Squares • Inverse Gaussian Regression (R) Regression (ScalaTion) • Gamma Regression (R) • Back-elim Regression (ScalaTion) • Ridge Regression (R, ScalaTion) • Response Surface Analysis ( Quadratic Expansion) • Lasso Regression (R, ScalaTion) (ScalaTion) • Partial Least Squares Regression • Response Surface Analysis (Cubic (R) Expansion) (ScalaTion) • Principal Components Regression • Log Transformed Regression (R) (ScalaTion) • Root Transformed Regression (ScalaTion)
Generating Training Set • Performance metrics • Root mean squared error ( 𝑆𝑁𝑇𝐹 ) • Root relative squared error ( 𝑆𝑆𝑇𝐹 ) ( 1 − 𝑆 2 ) • 15 modeling techniques • 114 datasets • UCI, OpenML, R, Luis Torgo collection, Bilkent Unv. Collection and etc. • https://github.com/scalation/data • 10-fold cross validation repeated 10 times per dataset/technique to get more reliable estimates • Hyper-parameter optimization is done by some modeling techniques • E.g ., 𝜇 penalty for 𝑀 1 (Lasso) and 𝑀 2 (Ridge) regularization
Training the Meta-learner • Meta-features are used as predictors • Top-performing modeling technique as the response • Random Forest Classifier, k-NN Classifier • Evaluation metrics • Mean Average Precision ( 𝑁𝐵𝑄@𝑙 ) • Rank-wise precision • Loose Accuracy ( 𝑀𝐵@𝑙 ) • If any of the top-k predictions match actual top-1 => 1 • Otherwise => 0 • Normalized Discounted Cumulative Gain ( 𝑂𝐸𝐷𝐻@𝑙 ) • Graded penalty if rankings are out of order
Results (Cont’d) 1.00 0.90 0.84 0.83 0.77 0.80 0.74 0.70 0.70 0.65 0.60 0.56 0.55 0.53 0.50 0.45 0.40 0.30 0.20 0.10 0.00 LA@1 LA@3 MAP@3 NDGC@1 NDGC@3 Random Forest kNN
Conclusions & Future Work • Meta-learning can be used for predictive analytics including regression family of techniques • Random forest classifier is a viable alternative as a meta-learner for prediction • Dimensionality and characteristics of the response variable are the most important meta-features. • Generalized Linear Models have specific assumptions on the response variable. • Low dimensionality and negative base degrees of freedom are important indicators for using a regularization technique such as Lasso or Ridge. • Future work includes: • More through comparison with AutoWEKA • Comparison with Ontology-based and Subsampling-based
Questions ?
Current Approaches • Exhaustive Search • Meta-learning • Ontology-based Semantics • Other/Proprietary
Exhaustive Search • Naïve approach • Build a model using each modeling technique to find the optimal model • 238 in R caret package • > 10000 packages in R total • Examples: AutoWEKA, caret(R), performanceEstimation(R), SPSS Auto Modeler, Data Robot, … • PROS • Conceptually simple • Not complex to implement • CONS • Might be tedious to implement • Time consuming • Doesn’t scale well w.r.t. dataset size and number of techniques
Meta-learning • Applying a learning algorithm to pick a base machine learning algorithm • Learns a mapping between dataset characteristics and top-performing technique(s) among candidates • Has been studied extensively for classification problems. • Limited work on • predictive models & regression based models • mapping data to a model (rather than a technique)
Meta- learning (cont’d) • PROS • Fast once trained • Let m be number of instances and n number of variables • Scalable w.r.t dataset size • CONS • Training required • Adding new techniques not possible without re-training
Ontology-based Semantics • Leverage domain expertise captured formally in an Ontology • Use logical reasoning to suggest optimal model(s)
Ontology- based Semantics(cont’d) • PROS • Fast • Scalable • Extending is straightforward • CONS • Requires manual curation
Other/Proprietary A More Modern Approach • No expertise needed • Limited analysis capabilities • Doesn’t let you change default model criteria and diagnostics • Not transparent • Doesn’t walk you through decisions it’s making • Therefore limited statistical insight • Emphasizes Text Analysis Screenshot taken from Watson Analytics platform
Other/Proprietary (cont’d) • Examples: IBM Watson Analytics, Google Prediction API .. • PROS • Very simple to use • CONS • Decision-making process is not transparent ( Watson Analytics, Google Prediction API) • The chosen technique is not known ( Google Prediction API)
Generating Training Set • 114 datasets • 43 datasets from UCI Machine Learning Repository • 17 datasets from OpenML • 16 datasets from publicly available packages in R • 12 datasets from Luis Torgo Regression datasets collection • 9 datasets from Bilkent University Function Approximation Library • 9 datasets from NCI-60 Cell Line panel: • Similar to (Lee et al. 2011), we have used gene expression data obtained from Affymetrix HG-U133A and B chips normalized using the GCRMA method as predictors of proteins with top 9 most variance obtained from Reverse-phase protein lysate arrays (RPLA). • 8 datasets from various other sources https://github.com/scalation/data
Recommend
More recommend