using R for regression model selection with adaptive penalties - PowerPoint PPT Presentation

using R for regression model selection with adaptive penalties procedures based on the FDR criteria” Tal Galili Tel Aviv University Based on the paper by YOAV BENJAMINI and YULIA GAVRILOV “A SIMPLE FORWARD SELECTION PROCEDURE BASED ON � FALSE DISCOVERY RATE CONTROL” ( Annals of Applied Statistics 2009)

(Some Zeros) Task: Stopping rule (finding the “Best model” on the Forward selection path)

Why forward selection ? Motivation – Big (m) datasets: 1) Fast results • Simple models • Simple procedure 2) Good results 3) Easy to use

Finding variables Over fitting Model Size Penalty Minimize

How to choose � ? � type examples for “big” models • � a = 2 (AIC) constant •Over fitting � • � n = log(n) (BIC) •Better results Non- constant •Faster then (adaptive) • � m = 2log(m) (universal-threshold) bootstraping. � • … • � k,m = ? Minimize

Multiple Step FDR (MSFDR) Adaptive Penalty � � k 1 � 2 z � � � � k m , � � q i k � � � � � � �� i 1 � � � � � � � � � � 2 m 1 i 1 q � � � �

Model selection multiple testing � � ? ? ? � � � � � � � � 0,..., 0,..., 0 � � 1 i m �� H ,..., H ,..., H 0,1 0, i 0, m Orthogonal X matrix => non changing, coefficients “at once”: � � � � � � � � � � � � � ˆ ˆ ˆ 1 X X nI ,..., n X y 1 m Keeping (Beta) P-values which are bellow � � forward selection But how should we adjust for multiplicity of the many tests?

How to adjust for multiplicity? Approach Principle keeping properties Keeping the probability of FWE •Conservative � making one or more false (familywise •Low-power discoveries. error rate) Controlling FDR •Not “too the expected proportion of (False discovery permissive” incorrectly rejected null out of rate) •high-power q the rejected

� � Coefficient P value Bonferroni BH – 0.05 m � � � (FDR at ) q 0 m 2 ~ t (0.16=AIC) (FWE) df Adaptive - Step down Over Low power More power fitting 2 � � � ˆ � � � � � � 1 P � � � � � � 1 � ˆ 1 SE � � � � qm 1 m (Largest) (Smallest) � � 2 2 � � � ˆ P qm � � � � � � 2 � � 2 � � m � ˆ SE � � � � 2 . . . � 2 � � � i � ˆ P � � � � qm i � � � � i � � m � ˆ SE � � � � i . . . 2 � � � � � ˆ m � � � � qm � � m P � � � � � � � ˆ m SE � � � � m m (Largest) (Smallest)

� � Coefficient P value Bonferroni BH – 0.05 m � � � (FDR at ) q 0 m 2 ~ t (0.16=AIC) (FWE) df Adaptive - Step down Over Low power More power fitting 1 qm 2 qm � � 1 k � 2 z � � k m , q i � k � � � 1 i � � 2 m i qm m qm � �

Theoretical motivation � – results The minimax properties of the BH procedure were proved (in ABDJ 2006*) asymptotically for: • large m, <and > • orthogonal variables , <and > • for sparse signals. *ABRAMOVICH, F., BENJAMINI, Y., DONOHO, D. and JOHNSTONE, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist.

� � Coefficient P value Bonferroni BH – Adaptive BH – 0.05 m � � � (FDR at ) q 0 (FDR at level q) m 2 ~ t (0.16=AIC) (FWE) df Adaptive - Step down Over Low power More power More power fitting for richer models 2 � � � ˆ � � � � 1 P � � � � � � 1 1 � ˆ 1 SE � � � � qm qm � � 1 � � � 1 1 1 q (Largest) (Smallest) 2 2 � � 2 � ˆ P qm qm � � � � � � � � 2 � � � � � 2 1 2 1 q � � � ˆ SE � � � � 2 . . . 2 � � i i � ˆ P qm � � � � qm � � i � � � � � � � i � � 1 i 1 q � ˆ SE � � � � i . . . 2 � � � ˆ m m � � � � qm � � m P � � qm � � � � � � � ˆ m � � � SE � � � � 1 m 1 q m (Largest) (Smallest)

BH – Adaptive BH – m � (FDR at ) q 0 (FDR at level q) m Adaptive - Step down model size VS More power More power for richer models Penalty factor � � 1 k � 2 z � � k m , q i � k � � � i 1 � � 2 m i i qm qm � � � � � 1 i 1 q � � 1 k � 2 z � � k m , q i k � � � � � � � � i 1 � � � 2 m 1 i 1 q � �

Forward-selection - Multiple stage FDR : (a.k.a: MSFDR) 1. Fit Empty model 2. Find the “best” variables ( X i* ) to enter (with the smallest P value) 3. Is this true ? 1. Yes - Enter X i and repeat (step 2) 2. No – Finish.

R implementation - stepAIC

Modeling the diabetes data (Efron et al., 2004) • n =442 diabetes patients. • m = 64 (10 baseline variables with 45 paired and 9 squared interactions ). • Y - disease progression (a year after baseline)

Modeling the diabetes data (Efron et al., 2004) � 2 t 2 adj R Factor P-value P-to-enter df k m , ( ) bmi 230.740.000000 0.000781 11.29 0.342 ltg 93.86 0.000000 0.001585 10.63 0.457 map 17.36 0.000037 0.002414 10.16 0.477 age.sex 13.56 0.000259 0.003268 9.78 0.491 bmi.map 9.60 0.002076 0.004149 9.47 0.501 hdl 9.00 0.002859 0.005059 9.20 0.510 sex 16.23 0.000066 0.005998 8.96 0.527 glu.2 5.75 0.016920 0.006969 8.75 0.531 age.2 2.58 0.109060 0.007972 8.56 0.533

Modeling the diabetes data (Efron et al., 2004) Number of Method variables R^2 MS_FDR (q=.05), 7 0.53 BIC, universal-threshold AIC 9 0.54 LARS (with Cp) 16 0.55 Over fitting

Simulation - configurations � X � � � Y i i i • � penalty based model selection procedures • m = 20, 40, 80, 160, Ratio: n = 2*m � 3 m m m m • proportion of non-zero = m , , , , , m 4 3 2 4 � � � � � � � � � � • Dependencies in X: | i j | 0, ; N � � � m m m m � = 0.5, 0, − 0.5 • � = 1 constant (with ), 2 rates of R � 2 0.75 decrease (in one minimal � is constant ) • Computation – avg MSPE over 1000 runs • done on 80 computers (distributed computing)

Simulation – Comparison methodology 1) Compute the ratio: M SP E m odel M SP E (For each model) random oracle Random Oracle = the “best” model we could find on our search path 2) For each procedure Over all simulation configuration find the worst ratio – and compare them

Simulation – results Comparing the minimax between procedure • forward selection procedure • Cp • the universal threshold in Donoho and Johnstone (1994) • Birgé and Massart (2001) • Foster and Stine (2004) • Tibshirani and Knight (1999) • multiple-stage procedure in Benjamini, Krieger and Yekutieli (2006) and Gavrilov, Benjamini and Sarkar (2009)—MSFDR

R implementation – biglm + leaps

Future research • Beyond Linear regression? (logistic and more) • Beyond forward selection? (Mixed with Lasso and more) • More variables then observation? (m>n)

Tel Aviv University Based on the paper by YOAV BENJAMINI and YULIA GAVRILOV “A SIMPLE FORWARD SELECTION PROCEDURE BASED ON � FALSE DISCOVERY RATE CONTROL” ( Annals of Applied Statistics 2009) www .R-Statistics. com Tal.Galili@gmail.com Thank you! Questions?

Simulation – Comparison methodology Challenge (1) : Path performance depends on simulation (while exhaustive search over all subsets – impossible!) What do we compare to ? Solution (1) : a “ random oracle ” 1)Find the “best” model on the forward path of nested models Example: for the path: X7 , X20 , X5 , X9 … The possible subsets are: { X7 },{ X7 , X20 }, { X7 , X20 , X5 } … 2)Compare current models with random oracle M SP E m odel M SP E random oracle

Simulation – Comparison methodology Challenge (2) : MSPE changes per configuration, so how do we compare algorithms? Solution (2) : search for “ empirical minimax performance ” – find the minimum across “maximum relative MSPE over the configurations”

Simulation – conclusions

Simulation – results (extended)

Earlier studies limitations: 1)Constant coefficients (mostly) 2)Largest m = 50 3)NOT Compared to other non-constant adaptive penalties

using R for regression model selection with adaptive penalties - PowerPoint PPT Presentation

using R for regression model selection with adaptive penalties procedures based on the FDR criteria Tal Galili Tel Aviv University Based on the paper by YOAV BENJAMINI and YULIA GAVRILOV A SIMPLE FORWARD SELECTION PROCEDURE BASED ON

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Ensembled Multivariate Adaptive Regression Splines Ensembled Multivariate Adaptive Regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Model selection and parameter estimation with covariates in logistic regression missing

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Farhad Fallah 11/17/2015 Previous descriptors Introduction to the HOG descriptor

CSSE463: Image Recognition Day 11 Due: Written assignment 1 tomorrow, 4:00 pm Start

Off-line Signature Verification: A Circular Outline Grid-Based Feature Extraction Approach

Inference following aggregate-level hypothesis testing in large scale genomic data Ruth Heller

Post hoc bounds on false positives using Post hoc bounds on false positives using reference

FHA CATALYST : ELECTRONIC APPRAISAL DELIVERY MODULE - August 27, 2020 LENDER USER OVERVIEW TECH

COLLEGE & UNIVERSITY FREQUENTLY ASKED QUESTIONS Last Updated 6/02/2020 COLLEGE &

Can we trust the Bible? Rev. Brad Rogers Historical Critical Method 1. Author? 2. Date? 3.

using R for regression model selection with adaptive penalties - PowerPoint PPT Presentation

using R for regression model selection with adaptive penalties procedures based on the FDR criteria Tal Galili Tel Aviv University Based on the paper by YOAV BENJAMINI and YULIA GAVRILOV A SIMPLE FORWARD SELECTION PROCEDURE BASED ON

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Ensembled Multivariate Adaptive Regression Splines Ensembled Multivariate Adaptive Regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Model selection and parameter estimation with covariates in logistic regression missing

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

ERP Selection KIRTANE &amp; PANDIT Suhas Deshpande Why ERP Selection is important ?

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

STAT 213 Model Selection II Colin Reimer Dawson Oberlin College March 30, 2018 1 / 13 Outline

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Farhad Fallah 11/17/2015 Previous descriptors Introduction to the HOG descriptor

CSSE463: Image Recognition Day 11 Due: Written assignment 1 tomorrow, 4:00 pm Start

Off-line Signature Verification: A Circular Outline Grid-Based Feature Extraction Approach

Inference following aggregate-level hypothesis testing in large scale genomic data Ruth Heller

Post hoc bounds on false positives using Post hoc bounds on false positives using reference

FHA CATALYST : ELECTRONIC APPRAISAL DELIVERY MODULE - August 27, 2020 LENDER USER OVERVIEW TECH

COLLEGE &amp; UNIVERSITY FREQUENTLY ASKED QUESTIONS Last Updated 6/02/2020 COLLEGE &amp;

Can we trust the Bible? Rev. Brad Rogers Historical Critical Method 1. Author? 2. Date? 3.

ERP Selection KIRTANE & PANDIT Suhas Deshpande Why ERP Selection is important ?

COLLEGE & UNIVERSITY FREQUENTLY ASKED QUESTIONS Last Updated 6/02/2020 COLLEGE &