10. Regularization More on tradeoffs Regularization Effect of - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 2017–18 10. Regularization ❼ More on tradeoffs ❼ Regularization ❼ Effect of using different norms ❼ Example: hovercraft revisited Laurent Lessard (www.laurentlessard.com)

Review of tradeoffs Recap of tradeoffs: ❼ We want to make both J 1 ( x ) and J 2 ( x ) small subject to constraints. ❼ Choose a parameter λ > 0, solve minimize J 1 ( x ) + λ J 2 ( x ) x subject to: constraints ❼ Each λ > 0 yields a solution ˆ x λ . ❼ Can visualize tradeoff by plotting J 2 (ˆ x λ ) vs J 1 (ˆ x λ ). This is called the Pareto curve . 10-2

Multi-objective tradeoff ❼ Similar procedure if we have more than two costs we’d like to make small, e.g. J 1 , J 2 , J 3 ❼ Choose parameters λ > 0 and µ > 0. Then solve: minimize J 1 ( x ) + λ J 2 ( x ) + µ J 3 ( x ) x subject to: constraints ❼ Each λ > 0 and µ > 0 yields a solution ˆ x λ,µ . ❼ Can visualize tradeoff by plotting J 3 (ˆ x λ,µ ) vs J 2 (ˆ x λ,µ ) vs J 1 (ˆ x λ,µ ) on a 3D plot. You then obtain a Pareto surface . 10-3

Minimum-norm as a regularization ❼ When Ax = b is underdetermined ( A is wide), we can resolve ambiguity by adding a cost function, e.g. min-norm LS: � x � 2 minimize x subject to: Ax = b ❼ Alternative approach: express it as a tradeoff! � Ax − b � 2 + λ � x � 2 minimize x Tradeoffs of this type are called regularization and λ is called the regularization parameter or regularization weight ❼ If we let λ → ∞ , we just obtain ˆ x = 0 ❼ If we let λ → 0, we obtain the minimum-norm solution! 10-4

Proof of minimum-norm equivalence � Ax − b � 2 + λ � x � 2 minimize x Equivalent to the least squares problem: � A 2 � � � b �� √ � � minimize x − � � 0 λ I x � � Solution is found via pseudoinverse (for tall matrix) �� A � T � A �� − 1 � A � T � b � √ √ √ x = ˆ 0 λ I λ I λ I = ( A T A + λ I ) − 1 A T b 10-5

Proof of minimum-norm equivalence Solution of 2-norm regularization is: x = ( A T A + λ I ) − 1 A T b ˆ ❼ Can’t simply set λ → 0 because A is wide , and therefore A T A will not be invertible. ❼ Use the fact that: A T AA T + λ A T can be factored two ways: ( A T A + λ I ) A T = A T AA T + λ A T = A T ( AA T + λ I ) ( A T A + λ I ) A T = A T ( AA T + λ I ) A T ( AA T + λ I ) − 1 = ( A T A + λ I ) − 1 A T 10-6

Proof of minimum-norm equivalence Solution of 2-norm regularization is: x = ( A T A + λ I ) − 1 A T b ˆ Also equal to: x = A T ( AA T + λ I ) − 1 b ˆ ❼ Since AA T is invertible, we can take the limit λ → 0 by just setting λ = 0. x = A T ( AA T ) − 1 b . This is the exact solution to ❼ In the limit: ˆ the minimum-norm least squares problem we found before! 10-7

Tradeoff visualization � Ax − b � 2 + λ � x � 2 minimize x λ → 0 � 0 , � A † b � 2 � � x � 2 λ → ∞ � � b � 2 , 0 � � Ax − b � 2 10-8

Regularization Regularization: Additional penalty term added to the cost function to encourage a solution with desirable properties. Regularized least squares: � Ax − b � 2 + λ R ( x ) minimize x ❼ R ( x ) is the regularizer (penalty function) ❼ λ is the regularization parameter ❼ The model has different names depending on R ( x ). 10-9

Regularization � Ax − b � 2 + λ R ( x ) minimize x 1. If R ( x ) = � x � 2 = x 2 1 + x 2 2 + · · · + x 2 n It is called: L 2 regularization , Tikhonov regularization , or Ridge regression depending on the application. It has the effect of smoothing the solution. 2. If R ( x ) = � x � 1 = | x 1 | + | x 2 | + · · · + | x n | It is called: L 1 regularization or LASSO . It has the effect of sparsifying the solution (ˆ x will have few nonzero entries). 3. R ( x ) = � x � ∞ = max {| x 1 | , | x 2 | , . . . , | x n |} It is called L ∞ regularization and it has the effect of equalizing the solution (makes most components equal). 10-10

Norm balls For a norm �·� p , the norm ball of radius r is the set: B r = { x ∈ R n | � x � p ≤ r } 1.5 1.5 1.5 1.0 1.0 1.0 0.5 0.5 0.5 - 1.5 - 1.0 - 0.5 0.5 1.0 1.5 - 1.5 - 1.0 - 0.5 0.5 1.0 1.5 - 1.5 - 1.0 - 0.5 0.5 1.0 1.5 - 0.5 - 0.5 - 0.5 - 1.0 - 1.0 - 1.0 - 1.5 - 1.5 - 1.5 � x � 2 ≤ 1 � x � 1 ≤ 1 � x � ∞ ≤ 1 x 2 + y 2 ≤ 1 | x | + | y | ≤ 1 max {| x | , | y |} ≤ 1 10-11

Simple example Consider the minimum-norm problem for different norms: minimize � x � p x subject to: Ax = b ❼ set of solutions to Ax = b 2.5 is an affine subspace 2.0 x 1.5 ❼ solution is point belonging 1.0 to smallest norm ball 0.5 - 1 1 2 3 4 ❼ for p = 2, this occurs at - 0.5 the perpendicular distance 10-12

Simple example 2.5 x 2.0 ❼ for p = 1, this occurs at 1.5 one of the axes. 1.0 0.5 ❼ sparsifying behavior - 1 1 2 3 4 - 0.5 2.5 ❼ for p = ∞ , this occurs at 2.0 1.5 x equal values of 1.0 coordinates 0.5 ❼ equalizing behavior - 1 1 2 3 4 - 0.5 10-13

Another simple example Suppose we have data points { y 1 , . . . , y m } ⊂ R , and we would like to find the best estimator for the data, according to different norms. Suppose data is sorted: y 1 ≤ · · · ≤ y m . �     � y 1 x � � . . � � . . minimize  − �  .   .  � �    � x � � y m x � � p x = 1 ❼ p = 2: ˆ m ( y 1 + · · · + y m ). This is the mean of the data. ❼ p = 1: ˆ x = y ⌈ m / 2 ⌉ . This is the median of the data. x = 1 ❼ p = ∞ : ˆ 2 ( y 1 + y m ). This is the mid-range of the data. Julia demo: Data Norm.ipynb 10-14

Example: hovercraft revisited One-dimensional version of the hovercraft problem: ❼ Start at x 1 = 0 with v 1 = 0 (at rest at position zero) ❼ Finish at x 50 = 100 with v 50 = 0 (at rest at position 100) ❼ Same simple dynamics as before: x t +1 = x t + v t for: t = 1 , 2 , . . . , 49 v t +1 = v t + u t ❼ Decide thruster inputs u 1 , u 2 , . . . , u 49 . ❼ This time: minimize � u � p 10-15

Example: hovercraft revisited minimize � u � p x t , v t , u t subject to: x t +1 = x t + v t for t = 1 , . . . , 49 v t +1 = v t + u t for t = 1 , . . . , 49 x 1 = 0 , x 50 = 100 v 1 = 0 , v 50 = 0 ❼ This model has 150 variables, but very easy to understand. ❼ We can simplify the model considerably... 10-16

Model simplification x t +1 = x t + v t for: t = 1 , 2 , . . . , 49 v t +1 = v t + u t v 50 = v 49 + u 49 = v 48 + u 48 + u 49 = . . . = v 1 + ( u 1 + u 2 + · · · + u 49 ) 10-17

Model simplification x t +1 = x t + v t for: t = 1 , 2 , . . . , 49 v t +1 = v t + u t x 50 = x 49 + v 49 = x 48 + 2 v 48 + u 48 = x 47 + 3 v 47 + 2 u 47 + u 48 = . . . = x 1 + 49 v 1 + (48 u 1 + 47 u 2 + · · · + 2 u 47 + u 48 ) 10-18

Model simplification x t +1 = x t + v t for: t = 1 , 2 , . . . , 49 v t +1 = v t + u t Constraint can be rewritten as:   u 1 u 2 � 48 � � x 50 − x 1 − 49 v 1 � 47 2 1 0 . . .    =  .  . 1 1 1 1 1 v 50 − v 1 . . .   .  u 49 so we don’t need the intermediate variables x t and v t ! Julia demo: Hover 1D.ipynb 10-19

Results 1. Minimizing � u � 2 2 (smooth) 0.3 0.2 0.1 Thrust 0.0 0.1 0.2 0.3 0 10 20 30 40 50 Time 2. Minimizing � u � 1 (sparse) 3 2 1 Thrust 0 1 2 3 0 10 20 30 40 50 Time 3. Minimizing � u � ∞ (equalized) 0.20 0.15 0.10 0.05 Thrust 0.00 0.05 0.10 0.15 0.20 0 10 20 30 40 50 10-20 Time

Tradeoff studies 1. Minimizing � u � 2 2 + λ � u � 1 (smooth and sparse) 0.4 0.2 Thrust 0.0 0.2 0.4 0 10 20 30 40 50 Time 2. Minimizing � u � ∞ + λ � u � 1 (equalized and sparse) 0.6 0.4 0.2 Thrust 0.0 0.2 0.4 0.6 0 10 20 30 40 50 Time 3. Minimizing � u � 2 2 + λ � u � ∞ (equalized and smooth) 0.3 0.2 0.1 Thrust 0.0 0.1 0.2 0.3 0 10 20 30 40 50 10-21 Time

10. Regularization More on tradeoffs Regularization Effect of - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 201718 10. Regularization More on tradeoffs Regularization Effect of using different norms Example: hovercraft revisited Laurent Lessard (www.laurentlessard.com) Review of

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

Regularization of ill-posed problems Uno H amarik University of Tartu, Estonia Content 1.

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold

Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented

Set Consensus Armando Castaeda, Technion Joint work with: Yannai A. Gonczarowski, Hebrew U. of

English Learner Parent Advisory Council Information Session Monday, December 17, 2018 Public

Parent and Child Adjustment to Pediatric Burn Injuries Adam Morris, Ph.D. Dylan Stewart, MD

Parallel Programming Patterns Overview and Concepts Reusing this material This work is licensed

Dynamique(s) de descente pour loptimisation multi-objectif Guillaume Garrigos Istituto

RT2C Team Webinar July 9, 2015 Todays Webinar: Approaches for collecting and displaying data

Business Through a Crisis Brian J. Sharkey, Director-in-Charge, Business Advisory Robert. S.

Welcome to the BOOST Collaborative! Please familiarize yourself with the control panel. The

10. Regularization More on tradeoffs Regularization Effect of - PowerPoint PPT Presentation

CS/ECE/ISyE 524 Introduction to Optimization Spring 201718 10. Regularization More on tradeoffs Regularization Effect of using different norms Example: hovercraft revisited Laurent Lessard (www.laurentlessard.com) Review of

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

Regularization of ill-posed problems Uno H amarik University of Tartu, Estonia Content 1.

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold

Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented

Set Consensus Armando Castaeda, Technion Joint work with: Yannai A. Gonczarowski, Hebrew U. of

English Learner Parent Advisory Council Information Session Monday, December 17, 2018 Public

Parent and Child Adjustment to Pediatric Burn Injuries Adam Morris, Ph.D. Dylan Stewart, MD

Parallel Programming Patterns Overview and Concepts Reusing this material This work is licensed

Dynamique(s) de descente pour loptimisation multi-objectif Guillaume Garrigos Istituto

RT2C Team Webinar July 9, 2015 Todays Webinar: Approaches for collecting and displaying data

Business Through a Crisis Brian J. Sharkey, Director-in-Charge, Business Advisory Robert. S.

Welcome to the BOOST Collaborative! Please familiarize yourself with the control panel. The

Regularization Overview Regularization Overview Problems & Multicollinearity We will