RIDGE and LASSO regularization for regression Feature selection - - PowerPoint PPT Presentation

May 05, 2023 •641 likes •801 views

RIDGE and LASSO regularization for regression Feature selection - Some algorithms perform naturally feature selection - for example Decision Trees, Boosting - Other algorithms have difficulty with correlated features - for example Naive Bayes,

RIDGE and LASSO regularization for regression
Feature selection - Some algorithms perform naturally feature selection - for example Decision Trees, Boosting - Other algorithms have difficulty with correlated features - for example Naive Bayes, Regression - Some algorithms have difficulty with too many features
Feature selection - Task(label) Independent, Model independent - Dimensionality reduction, clustering - PCA - Filter Methods: Task dependent, Model independent - compute correlation among pairs of features - compute correlation of feature with labels - Wrapper methods: Task dependent, Model dependent - try subsets of features with a given ML algorithm, pick a “best” subset
Forward Feature Selection - Task dependent, Model dependent - Select one feature at a time, dynamically - depending on how previous features do
Problems with regression - Free coefficients (unconstrained) can result in problems - features canceling each other - features overwhelming each other - large complexity with no generalization benefit - Solution : constrain the coefficients
Regularization for regression - Regression: same as before, a linear predictor � � - Regularized regression means add a “complexity” penalty in the objective - the objective contains the traditional least square (to be minimized) - but also R(w) a notion of complexity (to be minimized) � - λ tradeoffs the complexity for the objective
Regularization for regression - RIDGE penalty : L2 norm - causes all w coefficients to be small � � - LASSO penalty: L1 norm - causes some coefficients to be 0 (feature selection) � � - “elastic-net” : mixture of L1 and L2 norms
Digits dataset - can be written as constrained optimization - a direct correspondence between λ and t - solved by taking derivatives with Lagrangian Multipliers
RIDGE vs LASSO - the solution w will be in the feasible region (solid blue)
RIDGE vs LASSO - RIDGE penalty for linear regression is essentially a regression problem with bigger matrices - Z = matrix data; n=number of data points, p=number of dimensions/features � � � � � � � � � � - like regression, admits analytical solution
RIDGE vs LASSO - LASSO does not have an analytical solution - RIDGE regularized regression can be solved with Gradient Descent : simply add a term to the gradient - same for RIDGE-Logistic regression - LASSO can be solved via quadratic programming - or via approximation schemas like “forward stagewise”
Logistic Regression with RIDGE � - like before, Logistic Regression optimizes max log likelihood of data - but now we add the L2 RIDGE penalty � � � � - to use Gradient Descent we differentiate for each component j - gradient same as the one for logistic regression, except adding the differential of RIDGE penalty � �
Logistic Regression with RIDGE - The differential gives the Gradient Descend rule

Recommend

Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon

Ridge/Lasso Regression Model Selection Ridge/Lasso Regression, Model selection Xuezhi Wang Computer Science Department Carnegie Mellon University 10701-recitation, Apr 22 Lasso Ridge/Lasso Regression Model Selection Outline Ridge/Lasso

584 views • 27 slides

Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007

Agenda Regularization: Ridge Regression and the LASSO Statistics 305: Autumn Quarter 2006/2007 Wednesday, November 29, 2006 Statistics 305: Autumn Quarter 2006/2007 Regularization: Ridge Regression and the LASSO Agenda Agenda 1 The

910 views • 56 slides

Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why

Need for Regularization Which Regularizations . . . Need for Degrees of . . . Need for And- and . . . Why LASSO, Ridge Need for Strictly . . . Regression, and EN: General Analysis of the . . . Why LASSO Explanation Based on Soft Why

779 views • 34 slides

Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . .

Need for Regularization Currently Used . . . Why: Remaining . . . Probabilistic . . . Why LASSO, EN, and General Regularization CLOT: Invariance-Based Scale-Invariance: . . . Shift-Invariance: . . . Explanation Why LASSO Beyond EN and

601 views • 59 slides

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley San Diego, SIAM Optimization, May 2014 Julien Mairal, Inria Complexity Analysis of the Lasso Regularization Path 1/15 What this work is about

802 views • 22 slides

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Regularization is a general approach to add a complexity parameter to a learning algorithm. Requires that the model parameters be continuous. (i.e., Regression OK, IAML: Regularization and Ridge Regression Decision trees

204 views • 3 slides

Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural

Need for Regression Need for Linear Regression The Least Squares . . . Need to Go Beyond . . . Why Geometric Progression LASSO Method in Selecting the LASSO How Is Selected: . . . Natural Uniqueness . . . Parameter: A Theoretical

610 views • 31 slides

Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

Introduction Sparsity oracle inequalities(SOI) BIC and LASSO Dantzig selector and LASSO for linear regression Sparse exponential weighting (SEW) Sparse Exponential Weighting as an alternative to LASSO and Dantzig selector Alexandre Tsybakov

666 views • 62 slides

Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014

Introduction Sparse High Dimensional Regression Lasso estimation Application Big Data - Lecture 2 High dimensional regression with the Lasso S. Gadat Toulouse, Octobre 2014 S. Gadat Big Data - Lecture 2 Introduction Sparse High

548 views • 36 slides

Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and

Introduction CCA Lasso Sparse CCA Summary Sparse CCA using Lasso Anastasia Lykou & Joe Whittaker Department of Mathematics and Statistics, Lancaster University July 23, 2008 Introduction CCA Lasso Sparse CCA Summary Outline

447 views • 34 slides

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp August, 2019 w w w . u o n e - t e c h . c n 1 / 52 Overview of Stata 16s lasso features Lasso toolbox

733 views • 54 slides

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort

A practical tour of optimization algorithms for the Lasso Alexandre Gramfort alexandre.gramfort@inria.fr Inria, Parietal Team Universit Paris-Saclay Huawei - Apr. 2017 Outline What is the Lasso Lasso with an orthogonal design

560 views • 29 slides

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970 Lecture 3: Stephen Scott Stephen Scott and Vinod and Vinod Regularization Variyam Variyam Machine learning can generally be distilled to an

551 views • 9 slides

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Regularization Overview Regularization Overview Problems & Multicollinearity We will discuss three popular methods for obtaining better estimates of the linear model coefficients Regularization Techniques Principal

305 views • 12 slides

Mount Sutro Mount Sutro South Ridge & Edgewood Avenue South Ridge & Edgewood Avenue

FEMA-UCSF FEMA-UCSF Mount Sutro Mount Sutro South Ridge & Edgewood Avenue South Ridge & Edgewood Avenue South Ridge & Edgewood Avenue South Ridge & Edgewood Avenue Vegetation Ve g etation Mana Management g ement Pro

593 views • 20 slides

Blue Ridge Blue Ridge $858,700,000 in new investment since 2010 Blue Ridge Anecdotal Market

Blue Ridge Blue Ridge $858,700,000 in new investment since 2010 Blue Ridge Anecdotal Market Summary Housing Markets In the core of the district, housing developers are wai=ng to see how new apartments on Lake Boone Trail do in the market,

710 views • 40 slides

Q1 2015 results 30 April 2015 Q1 2015 results highlights Attributable loss of 446m; adjusted

Q1 2015 results 30 April 2015 Q1 2015 results highlights Attributable loss of 446m; adjusted operating profit of 1.6bn (1) , up 16% Y/Y Committed to delivering 800m (2) cost reduction in 2015, despite absorbing the impact of the increase in

399 views • 21 slides

Welcome Rabbits Parents Reception Parents Induction Meeting The Rabbits team Mrs Skilton Mrs

Welcome Rabbits Parents Reception Parents Induction Meeting The Rabbits team Mrs Skilton Mrs Blows What does a day in Rabbits look like? - Put fruit and water bottles in basket - Pegging in board - Ready Steady Go time - Inside / outside

495 views • 16 slides

Techniques for Overlapped Pulse Discrimination Taylor Nunes 2019 Year End Presentation 1

Techniques for Overlapped Pulse Discrimination Taylor Nunes 2019 Year End Presentation 1 Motivation Scintillators or CsI Crystals Some detectors have a high rate of accidentals Photomultiplier Tubes Possibility of multiple

264 views • 14 slides

comments on star formation at the peak of the galaxy formation epoch its all different and

comments on star formation at the peak of the galaxy formation epoch its all different and still so similar Reinhard Genzel MPE & UCB star formation and feedback at the peak of the galaxy formation epoch continuous LIRGs accretion

349 views • 9 slides

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das , LTI, CMU Google

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das , LTI, CMU Google Noah Smith , LTI, CMU Thanks: Andr Martins, Amar Subramanya, and Partha Talukdar. This research was supported by Qatar National Research

557 views • 35 slides

Learning Outcomes Understand what Academic Integrity is and why it is important to demonstrate

Learning Outcomes Understand what Academic Integrity is and why it is important to demonstrate is as: A Student enrolled at a college A Professional in any workplace. Recognize and avoid Academic Offences Understand the Academic

617 views • 37 slides

1 Branch History Table of 1-bit Predictor 1-bit BHT Weakness BHT also Called Branch Example: in

Reducing Branch Penalty Branch penalty in dynamically scheduled processors: wasted cycles due to pipeline flushing on mis- predicted branches Lecture 9: Branch Prediction Reduce branch penalty: 1. Basic idea, saturating counter, BHT, Predict

248 views • 3 slides

Information Decay + POMDP Incorporating Defenders Behaviour in Autonomous Penetration Testing

Information Decay + POMDP Incorporating Defenders Behaviour in Autonomous Penetration Testing Jonathon Schwartz 1 , Hanna Kurniawati 1 , and Edwin El-Mahassni 2 1 1 Research School of Computer Science, ANU 2 Defence Science and Technology

698 views • 16 slides