A metalearning study for robust nonlinear regression Jan Kalina - PowerPoint PPT Presentation

A metalearning study for robust nonlinear regression Jan Kalina & Petra Vidnerov´ a The Czech Academy of Sciences, Institute of Computer Science J. Kalina & P. Vidnerov´ a A metalearning study

Metalearning: motivation, principles Transfer learning for automatic method selection Automatic algorithm selection Empirical approach for (black-box) comparison of methods Attempt to generalize information across datasets Learn prior knowledge from previously analyzed datasets and exploit it for a given dataset A dataset (instance) viewed as a point in a high-dimensional space J. Kalina & P. Vidnerov´ a A metalearning study

Nonlinear regression Model Y i = f ( β 1 X i 1 , . . . , β p X ip ) + e i , i = 1 , . . . , n Nonlinear least squares (NLS) Minimal sum of squares Vulnerability to outliers J. Kalina & P. Vidnerov´ a A metalearning study

Nonlinear least weighted squares estimator (NLWS) Y i = f ( β 0 + β 1 X i 1 + · · · + β p X ip ) + e i , i = 1 , . . . , n Y = ( Y 1 , . . . , Y n ) T = continuous outcome f = given nonlinear function Nonlinear least squares: sensitive to outliers Residuals for a fixed b = ( b 0 , b 1 , . . . , b p ) T ∈ ❘ p +1 : u i ( b ) = Y i − f ( b 0 − b 1 X i 1 − · · · − b p X ip ) , i = 1 , . . . , n Squared residuals arranged in ascending order: u 2 (1) ( b ) ≤ u 2 (2) ( b ) ≤ · · · ≤ u 2 ( n ) ( b ) . Nonlinear least weighted squares (NLWS): n b = ( b 0 , b 1 , . . . , b p ) T ∈ ❘ p +1 , � w i u 2 b LWS = arg min ( i ) ( b ) over i =1 where w 1 , . . . , w n are given magnitudes of weights. J. Kalina & P. Vidnerov´ a A metalearning study

Nonlinear least weighted squares estimator (NLWS) Examples of weight functions: � n i =1 w i = 1 Nonlinear least trimmed squares (NLTS): 0-1 weights J. Kalina & P. Vidnerov´ a A metalearning study

Meta-learning process J. Kalina & P. Vidnerov´ a A metalearning study

Data acquisition and pre-processing We start with 2000 real publicly available datasets (github) Diversity of domains Automatic downloading Pre-processing in Python Reducing n Missing values Categorical variables Reducing p Y i − min i Y i Y i �− max i Y i − min i Y i , i = 1 , . . . , n → Standardizing continuous regressors J. Kalina & P. Vidnerov´ a A metalearning study

Description of the metalearning study) Datasets 721 real datasets Algorithms Fully automatic, including finding suitable parameters Least squares & 6 robust nonlinear estimators (NLTS, NLWS with various weights, nonlinear regression median) Prediction measure Mean square error (MSE) evaluated within a cross validation Robust versions: trimmed MSE (TMSE), weighted MSE (WMSE) n h n MSE = 1 TMSE( α ) = 1 � r 2 � r 2 � w i r 2 WMSE = i , ( i ) , ( i ) n h i =1 i =1 i =1 Features of the datasets 9 features Metalearning (performed over metadata) Classification by means of various classifiers J. Kalina & P. Vidnerov´ a A metalearning study

Selected 10 features of the datasets 1 The number of observations n 2 The number of variables p 3 The ratio n / p 4 Normality of residuals ( p -value of Shapiro-Wilk test) 5 Skewness of residuals 6 Kurtosis of residuals Coefficient of determination R 2 , 7 8 Percentage of outliers (estimated by the LTS) – important! 9 Heteroscedasticity ( p -value of Breusch-Pagan test) 10 Donoho-Stahel outlyingness measure of X J. Kalina & P. Vidnerov´ a A metalearning study

Primary learning Model p p X j ) 2 + e i , β p + j ( X ij − ¯ � � Y i = β 0 + β j X ij + i = 1 , . . . , n j =1 j =1 Leave-one-out cross validation MSE: NLS yields the minimal prediction error for 23 % of the datasets, NLTS 26 % any of the versions of the NLWS 31 % nonlinear median 20 % TMSE: NLTS best for 39 % of datasets NLWS 35 % WMSE: NLTS best for 34 % of datasets NLWS 45 % Weights for the NLWS: no choice uniformly best J. Kalina & P. Vidnerov´ a A metalearning study

Secondary learning Results of metalearning evaluated as the classification accuracy in a leave-one-out cross validation study. Three different prediction error measures are compared. Classification accuracy Classification method MSE TMSE WMSE Classification tree 0.35 0.45 0.47 k -nearest neighbor ( k = 3) 0.56 0.61 0.64 LDA 0.60 0.68 0.65 SCRDA 0.60 0.68 0.66 Linear MWCD-classification 0.60 0.68 0.66 Multilayer perceptron 0.56 0.66 0.66 Logistic regression 0.56 0.67 0.69 SVM (linear) 0.60 0.69 0.70 SVM (Gaussian kernel) 0.64 0.71 0.70 J. Kalina & P. Vidnerov´ a A metalearning study

Conclusions First comparison of robust nonlinear regression estimates 721 datasets Arguments in favor of the NLWS estimator (robustness & efficiency) Metalearning is useful Future work: robust metalearning Limitations of metalearning: No theory Number of methods/algorithms/features Choice of datasets Too automatic Correct pre-processing (incl. variable selection) of data needed! J. Kalina & P. Vidnerov´ a A metalearning study

A metalearning study for robust nonlinear regression Jan Kalina - PowerPoint PPT Presentation

A metalearning study for robust nonlinear regression Jan Kalina & Petra Vidnerov a The Czech Academy of Sciences, Institute of Computer Science J. Kalina & P. Vidnerov a A metalearning study Metalearning: motivation, principles

Metalearning - A Tutorial Christophe Giraud-Carrier December 2008 Christophe Giraud-Carrier

CS 478 - Tools for Machine Learning and Data Mining Introduction to Metalearning November 11,

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter

Nonlinear Control Lecture # 28 Robust State Feedback Stabilization Nonlinear Control Lecture # 28

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Nonlinear models Posterior Gradient Ascent Adaptive Step Size Approach to Limit Example Will

Nonlinear regression analysis Peter Dalgaard (orig. Lene Theil Skovgaard) Department of

Machine Learning 2: Nonlinear Regression Stefano Ermon April 13, 2016 Stefano Ermon April 13,

PS 405 Week 8 Section: Non-Linear Transformations, Outliers, and Heteroskedasticity D.J.

Diagnostics Internally studentized residuals, PRESS residuals or externally studentized

Identification in Macroeconomics by Emi Nakamura and Jn Steinsson Journal of Economic

The Summit and the decline and fall of internationalism Ed Conway LSE 100 lecture October 15

Brownian Motion Variations and Brownian Motion with drift Today: Various variations of

Brownian Motion Recall the random walk { S n } n 0 under a probability measure P : S 0 = 0,

E F s , s t and, after some rearranging, 1 2 ( t s ) 2 . e ( W t W

The local time of Martin-L of random Brownian motion Willem L. Fouch e and Safari Mukeru