Nondifferentiable Convex Functions DS-GA 1013 / MATH-GA 2824 - PowerPoint PPT Presentation

Proof Assume � g is a subgradient at � x , for any α ≥ 0 g T α � f ( � x + α � e i ) ≥ f ( � x ) + � e i = f ( � x ) + � g [ i ] α g T α � f ( � x ) ≥ f ( � x − α � e i ) + � e i = f ( � x − α � e i ) + � g [ i ] α Combining both inequalities f ( � x ) − f ( � x − α � e i ) g [ i ] ≤ f ( � x + α � e i ) − f ( � x ) ≤ � α α

Proof Assume � g is a subgradient at � x , for any α ≥ 0 g T α � f ( � x + α � e i ) ≥ f ( � x ) + � e i = f ( � x ) + � g [ i ] α g T α � f ( � x ) ≥ f ( � x − α � e i ) + � e i = f ( � x − α � e i ) + � g [ i ] α Combining both inequalities f ( � x ) − f ( � x − α � e i ) g [ i ] ≤ f ( � x + α � e i ) − f ( � x ) ≤ � α α g [ i ] = ∂ f ( � x ) Letting α → 0, implies � ∂� x [ i ]

Subgradient A function f : R n → R is convex if and only if it has a subgradient at every point x ∈ R n there exists � g ∈ R n It is strictly convex if and only for all � such that g T ( � f ( � y ) > f ( � x ) + � y − � x ) , for all � y � = � x .

Optimality condition for nondifferentiable functions If � 0 is a subgradient of f at � x , then 0 T ( � x ) + � f ( � y ) ≥ f ( � y − � x )

Optimality condition for nondifferentiable functions If � 0 is a subgradient of f at � x , then 0 T ( � x ) + � f ( � y ) ≥ f ( � y − � x ) = f ( � x ) y ∈ R n for all �

Optimality condition for nondifferentiable functions If � 0 is a subgradient of f at � x , then 0 T ( � x ) + � f ( � y ) ≥ f ( � y − � x ) = f ( � x ) y ∈ R n for all � Under strict convexity the minimum is unique

Sum of subgradients x ∈ R n of f 1 : R n → R and f 2 : R n → R Let � g 1 and � g 2 be subgradients at � � g := � g 1 + � g 2 is a subgradient of f := f 1 + f 2 at � x

Sum of subgradients x ∈ R n of f 1 : R n → R and f 2 : R n → R Let � g 1 and � g 2 be subgradients at � � g := � g 1 + � g 2 is a subgradient of f := f 1 + f 2 at � x y ∈ R n Proof: For any � f ( � y ) = f 1 ( � y ) + f 2 ( � y )

Sum of subgradients x ∈ R n of f 1 : R n → R and f 2 : R n → R Let � g 1 and � g 2 be subgradients at � � g := � g 1 + � g 2 is a subgradient of f := f 1 + f 2 at � x y ∈ R n Proof: For any � f ( � y ) = f 1 ( � y ) + f 2 ( � y ) g T g T ≥ f 1 ( � x ) + � 1 ( � y − � x ) + f 2 ( � y ) + � 2 ( � y − � x )

Sum of subgradients x ∈ R n of f 1 : R n → R and f 2 : R n → R Let � g 1 and � g 2 be subgradients at � � g := � g 1 + � g 2 is a subgradient of f := f 1 + f 2 at � x y ∈ R n Proof: For any � f ( � y ) = f 1 ( � y ) + f 2 ( � y ) g T g T ≥ f 1 ( � x ) + � 1 ( � y − � x ) + f 2 ( � y ) + � 2 ( � y − � x ) g T ( � ≥ f ( � x ) + � y − � x )

Subgradient of scaled function x ∈ R n of f 1 : R n → R Let � g 1 be a subgradient at � For any η ≥ 0 � g 2 := η� g 1 is a subgradient of f 2 := η f 1 at � x

Subgradient of scaled function x ∈ R n of f 1 : R n → R Let � g 1 be a subgradient at � For any η ≥ 0 � g 2 := η� g 1 is a subgradient of f 2 := η f 1 at � x y ∈ R n Proof: For any � f 2 ( � y ) = η f 1 ( � y )

Subgradient of scaled function x ∈ R n of f 1 : R n → R Let � g 1 be a subgradient at � For any η ≥ 0 � g 2 := η� g 1 is a subgradient of f 2 := η f 1 at � x y ∈ R n Proof: For any � f 2 ( � y ) = η f 1 ( � y ) � � g T ≥ η f 1 ( � x ) + � 1 ( � y − � x )

Subgradient of scaled function x ∈ R n of f 1 : R n → R Let � g 1 be a subgradient at � For any η ≥ 0 � g 2 := η� g 1 is a subgradient of f 2 := η f 1 at � x y ∈ R n Proof: For any � f 2 ( � y ) = η f 1 ( � y ) � � g T ≥ η f 1 ( � x ) + � 1 ( � y − � x ) g T ≥ f 2 ( � x ) + � 2 ( � y − � x )

Subdifferential of absolute value f ( x ) = | x |

Subdifferential of absolute value At x � = 0, f ( x ) = | x | is differentiable, so g = sign ( x ) At x = 0, we need f ( 0 + y ) ≥ f ( 0 ) + g ( y − 0 )

Subdifferential of absolute value At x � = 0, f ( x ) = | x | is differentiable, so g = sign ( x ) At x = 0, we need f ( 0 + y ) ≥ f ( 0 ) + g ( y − 0 ) | y | ≥ gy

Subdifferential of absolute value At x � = 0, f ( x ) = | x | is differentiable, so g = sign ( x ) At x = 0, we need f ( 0 + y ) ≥ f ( 0 ) + g ( y − 0 ) | y | ≥ gy Holds if and only if | g | ≤ 1

Subdifferential of ℓ 1 norm x ∈ R n if and only if � g is a subgradient of the ℓ 1 norm at � � g [ i ] = sign ( x [ i ]) if x [ i ] � = 0 | � g [ i ] | ≤ 1 if � x [ i ] = 0

Proof � g is a subgradient of ||·|| 1 at � x if and only if � g [ i ] is a subgradient of |·| at � x [ i ] for all 1 ≤ i ≤ n

Proof If � g is a subgradient of ||·|| 1 at � x then for any y ∈ R | y | = | � x [ i ] | + || � x + ( y − � x [ i ]) � e i || 1 − || � x || 1

Proof If � g is a subgradient of ||·|| 1 at � x then for any y ∈ R | y | = | � x [ i ] | + || � x + ( y − � x [ i ]) � e i || 1 − || � x || 1 g T ( y − � ≥ | � x [ i ] | + || � x || 1 + � x [ i ]) � e i − || � x || 1

Proof If � g is a subgradient of ||·|| 1 at � x then for any y ∈ R | y | = | � x [ i ] | + || � x + ( y − � x [ i ]) � e i || 1 − || � x || 1 g T ( y − � ≥ | � x [ i ] | + || � x || 1 + � x [ i ]) � e i − || � x || 1 = | � x [ i ] | + � g [ i ] ( y − � x [ i ]) so � g [ i ] is a subgradient of |·| at | � x [ i ] | for all 1 ≤ i ≤ n

Proof y ∈ R n If � g [ i ] is a subgradient of |·| at | � x [ i ] | for 1 ≤ i ≤ n then for any � n � || � y || 1 = | � y [ i ] | i = 1

Proof y ∈ R n If � g [ i ] is a subgradient of |·| at | � x [ i ] | for 1 ≤ i ≤ n then for any � n � || � y || 1 = | � y [ i ] | i = 1 n � ≥ | � x [ i ] | + � g [ i ] ( � y [ i ] − � x [ i ]) i = 1

Proof y ∈ R n If � g [ i ] is a subgradient of |·| at | � x [ i ] | for 1 ≤ i ≤ n then for any � n � || � y || 1 = | � y [ i ] | i = 1 n � ≥ | � x [ i ] | + � g [ i ] ( � y [ i ] − � x [ i ]) i = 1 g T ( � = || � x || 1 + � y − � x ) so � g is a subgradient of ||·|| 1 at � x

Subdifferential of ℓ 1 norm

Subdifferential of the nuclear norm Let X ∈ R m × n be a rank- r matrix with SVD USV T , where U ∈ R m × r , V ∈ R n × r and S ∈ R r × r A matrix G is a subgradient of the nuclear norm at X if and only if G := UV T + W where W satisfies || W || ≤ 1 U T W = 0 W V = 0

Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � �

Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | �

Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | � 2 � � � � � UV T � x || 2 = max x 2 + || W � � � � � 2 � � � { || � x ∈ R n } x || 2 = 1 | �

Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 �� x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | � 2 � � � � � UV T � x || 2 = max x 2 + || W � � � � � 2 � � � { || � x ∈ R n } x || 2 = 1 | � 2 2 � � � UV T P row ( X ) � � � � � � � � W P row ( X ) ⊥ � = max x 2 + x � � � � � � � � � � � � � � { || � x ∈ R n } 2 x || 2 = 1 | �

Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | � 2 � � � � � UV T � x || 2 = max x 2 + || W � � � � � 2 � � � { || � x ∈ R n } x || 2 = 1 | � 2 2 � � � UV T P row ( X ) � � � � � � � � W P row ( X ) ⊥ � = max x 2 + x � � � � � � � � � � � � � � { || � x ∈ R n } 2 x || 2 = 1 | � 2 � 2 � � � UV T � � 2 + || W || 2 � � � � � 2 �� ≤ � P row ( X ) � x � P row ( X ) ⊥ � x � � � � � � � � � � � � � � 2

Proof x ∈ R m with unit ℓ 2 norm we have By Pythagoras’ Theorem, for any � 2 � � � � � 2 � �� x || 2 � P row ( X ) � x 2 + � P row ( X ) ⊥ � x 2 = || � 2 = 1 � � � � � � � The rows of UV T are in row ( X ) and the rows of W in row ( X ) ⊥ , so || G || 2 := x || 2 max || G � 2 { || � x ∈ R n } x || 2 = 1 | � 2 � � � � � UV T � x || 2 = max x 2 + || W � � � � � 2 � � � { || � x ∈ R n } x || 2 = 1 | � 2 2 � � � UV T P row ( X ) � � � � � � � � W P row ( X ) ⊥ � = max x 2 + x � � � � � � � � � � � � � � { || � x ∈ R n } 2 x || 2 = 1 | � 2 � 2 � � UV T � � � 2 + || W || 2 � � � � � 2 �� ≤ � P row ( X ) � x � P row ( X ) ⊥ � x � � � � � � � � � � � � � � 2 ≤ 1

Hölder’s inequality for matrices For any matrix A ∈ R m × n , || A || ∗ = sup � A , B � . {|| B ||≤ 1 | B ∈ R m × n }

Proof For any matrix Y ∈ R m × n || Y || ∗ ≥ � G , Y � = � G , X � + � G , Y − X � � � UV T , X = + � W , X � + � G , Y − X �

Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � UV T , X

Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr

Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr � VU T USV T � = tr

Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr � VU T USV T � = tr � � V T V S = tr

Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr � VU T USV T � = tr � � V T V S = tr = tr ( S )

Proof U T W = 0 implies � W , X � = � W , USV T � � U T W , SV T � = = 0 � � � � UV T , X VU T X = tr � VU T USV T � = tr � � V T V S = tr = tr ( S ) = || X || ∗

Proof For any matrix Y ∈ R m × n || Y || ∗ ≥ � G , Y � = � G , X � + � G , Y − X � � � UV T , X = + � G , Y − X � � � UV T , X = + � W , X � + � G , Y − X � = || X || ∗ + � G , Y − X �

Sparse linear regression with 2 features y := α � � x 1 + � z � � � � X := x 1 x 2 || � x 1 || 2 = 1 || � x 2 || 2 = 1 � � x 1 , � x 2 � = ρ

Analysis of lasso estimator Let α ≥ 0 � α + � x T � 1 � z − λ � β lasso = 0 as long as � � x T x T � � 2 � z − ρ� 1 � z � x T ≤ λ ≤ α + � 1 � z 1 − | ρ |

Lasso estimator 1.0 0.8 Coefficients 0.6 0.4 0.2 0.0 0.00 0.05 0.10 0.15 0.20 Regularization parameter

Optimality condition for nondifferentiable functions If � 0 is a subgradient of f at � x , then 0 T ( � x ) + � f ( � y ) ≥ f ( � y − � x ) = f ( � x ) y ∈ R n for all � Under strict convexity the minimum is unique

Proof The cost function is strictly convex if n ≥ 2 and ρ � = 1 Aim: Show that there is a subgradient equal to � 0 at a 1-sparse solution

Proof The gradient of the quadratic term := 1 2 � � � � � � � � X � β β − � q y � � � � 2 � � � 2 at � β lasso equals � � = X T � � � X � ∇ q β lasso β lasso − � y

Nondifferentiable Convex Functions DS-GA 1013 / MATH-GA 2824 - PowerPoint PPT Presentation

Nondifferentiable Convex Functions DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda Applications Subgradients Optimization methods Regression The

Convex Hell 362 dnc CS 16: Convex Hull Whoops, I mean... Convex Hull Whats a Convex Hull?

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Functions Instructor: Shaddin

CS675: Convex and Combinatorial Optimization Fall 2014 Convex Functions Instructor: Shaddin

Convex hull 1 - 1 Convex hull 1 - 2 Convex hull 1 - 3 Convex hull Definition, extremal

CS133 Computational Geometry Convex Hull 1 Convex Hull Given a set of n points, find the

constrained convex optimization virgil pavlu 1 convex set a set X in a vector space is convex if

CS675: Convex and Combinatorial Optimization Spring 2018 Duality of Convex Sets and Functions

14. Convex programming Convex sets and functions Convex programs Hierarchy of

16. Review of convex optimization Convex sets and functions Convex programming models

CS675: Convex and Combinatorial Optimization Fall 2019 Geometric Duality of Convex Sets and

Optimizing Convex Functions over Non-Convex Domains Dan Bienstock and Alex Michalka

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Convex hull: basic facts Convex hull: basic facts CG Lecture 1 CG Lecture 1 Problem : give a set

Convex hulls of spheres and convex hulls of convex polytopes lying on parallel hyperplanes

Convex Analysis Jos e De Don a September 2004 Centre of Complex Dynamic Systems and

CS675: Convex and Combinatorial Optimization Fall 2019 Convex Sets Instructor: Shaddin Dughmi

Session 5: Probability 2 Stats 60/Psych 10 Ismael Lemhadri Summer 2020 News Probability

FSM-based test derivation methods: From TAROT-1 to TAROT-12 Nina Yevtushenko , Tomsk State

Control of the cylinder wake in the laminar regime by Trust-Region methods and POD Reduced Order

Mortar multiscale framework for Stokes-Darcy flows Ivan Yotov Department of Mathematics,

COMP24111: Machine Learning and Optimisation Chapter 1: Machine Learning Basics Dr. Tingting Mu

Page 1 USPSTF USPSTF Grades Grade Evidence Recommendation Rigorous review of existing peer

Produce Safety Educators Call #26 August 29, 2017 Instructions All participants are

Plataforma Solar de Almer Almer a a: : Plataforma Solar de The European Solar Thermal