A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na - PowerPoint PPT Presentation

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe˜ na Carnegie Mellon University (joint work with Negar Soheili) Shubfest, Fields Institute May 2012 1 / 34

Polyhedral feasibility problems ∈ R m × n , consider the alternative � � Given A := a 1 a 2 · · · a n feasibility problems A T y > 0 , (D) and Ax = 0 , x ≥ 0 , x � = 0 . (P) Theme Condition-based analysis of elementary algorithms for solving (P) and (D). 2 / 34

Perceptron Algorithm Algorithm to solve A T y > 0 . (D) Perceptron Algorithm (Rosenblatt, 1958) y := 0 while A T y � > 0 a j � a j � , where a T y := y + j y ≤ 0 end while Throughout this talk: � · � = � · � 2 . 3 / 34

Von Neumann’s Algorithm Algorithm to solve Ax = 0 , x ≥ 0 , x � = 0 . (P) Von Neumann’s Algorithm (von Neumann, 1948) x 0 := 1 n 1 ; y 0 := Ax 0 for k = 0 , 1 , . . . if a T j y k := min i a T i y k > 0 then halt: (P) is infeasible 1 − a T j y k λ k := argmin λ ∈ [0 , 1] � (1 − λ ) y k − λ a j � = � y k � 2 − 2 a T j y k +1 x k +1 := λ k x k + (1 − λ k ) e j , where j = argmin i a T i y k end for 4 / 34

Elementary algorithms The perceptron and von Neumann’s algorithms are “elementary” algorithms. “Elementary” means that each iteration involves only simple computations. Why should we care about elementary algorithms? Some large-scale optimization problems (e.g., in compressive sensing) are not solvable via conventional Newton-based algorithms. In some cases, the entire matrix A may not be explicitly available at once. Elementary algorithms have been effective in these cases. 5 / 34

Conditioning Throughout the sequel assume � � A = a 1 · · · a n , where � a j � = 1 , j = 1 , . . . , n . Key parameter j =1 ,..., n a T ρ ( A ) := max � y � =1 min j y . Goffin-Cheung-Cucker condition number 1 C ( A ) := | ρ ( A ) | . (This is closely related to Renegar’s condition number.) 6 / 34

Conditioning Notice A T y > 0 feasible ⇔ ρ ( A ) > 0 . Ax = 0 , x ≥ 0 , x � = 0 feasible ⇔ ρ ( A ) ≤ 0 . Ill-posedness A is ill-posed when ρ ( A ) = 0. In this case both A T y > 0 and Ax = 0 , x > 0 are on the verge of feasibility. Theorem (Cheung & Cucker, 2001) a i − a i � : ˜ | ρ ( A ) | = min { max � ˜ A is ill-posed } . ˜ i A 7 / 34

Some geometry When ρ ( A ) > 0, it is a measure of thickness of the feasible cone: � � r : B ( y , r ) ⊆ { z : A T z ≥ 0 } ρ ( A ) = max . � y � =1 ! small ρ ( A ) large ρ ( A ) 8 / 34

More geometry Let ∆ n := { x ≥ 0 : � x � 1 = 1 } . Proposition (From Renegar 1995 and Cheung-Cucker 2001) | ρ ( A ) | = dist (0 , ∂ { Ax : x ≥ 0 , x ∈ ∆ n } ) . ρ ( A ) > 0 ρ ( A ) < 0 9 / 34

Condition-based complexity Recall our problems of interest A T y > 0 , (D) and Ax = 0 , x ∈ ∆ n . (P) Theorem (Block-Novikoff 1962) If ρ ( A ) > 0 , then the perceptron algorithm terminates after at most 1 ρ ( A ) 2 = C ( A ) 2 iterations. 10 / 34

Condition-based complexity Theorem (Dantzig, 1992) If ρ ( A ) < 0 , then von Neumann’s algorithm finds an ǫ -solution to (P) , i.e, x ∈ ∆ n with � Ax � < ǫ in at most 1 ǫ 2 iterations. Theorem (Epelman & Freund, 2000) If ρ ( A ) < 0 , then von Neumann’s algorithm finds an ǫ -solution to (P) in at most 1 � 1 � ρ ( A ) 2 · log ǫ iterations. 11 / 34

Main Theorem Theorem (Soheili & P, 2012) A smooth version of perceptron/von Neumann’s algorithm such that: (a) If ρ ( A ) > 0 , then it finds a solution to A T y > 0 in at most � √ n � 1 �� O ρ ( A ) · log ρ ( A ) iterations. (b) If ρ ( A ) < 0 , then it finds an ǫ -solution to Ax = 0 , x ∈ ∆ n in at most � √ n � 1 �� O | ρ ( A ) | · log ǫ iterations. (c) Iterations are elementary (not much more complicated than those of the perceptron or von Neumann’s algorithms). 12 / 34

Perceptron algorithm again Perceptron Algorithm y 0 := 0 for k = 0 , 1 , . . . a T a T j y k := min i y k i y k +1 := y k + a j end for Observe a T � A T y , x � . a T j y := min i y ⇔ a j = Ax ( y ) , x ( y ) = argmin i x ∈ ∆ n Hence in the above algorithm y k = Ax k where x k ≥ 0 , � x k � 1 = k . 13 / 34

Normalized Perceptron Algorithm � A T y , x � . Recall x ( y ) := argmin x ∈ ∆ n Normalized Perceptron Algorithm y 0 := 0 for k = 0 , 1 , . . . 1 θ k := k +1 y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) end for In this algorithm y k = Ax k for x k ∈ ∆ n . 14 / 34

Perceptron-Von Neumann’s Template Both the perceptron and von Neumann’s algorithms perform similar iterations. PVN Template x 0 ∈ ∆ n ; y 0 := Ax 0 for k = 0 , 1 , . . . x k +1 := (1 − θ k ) x k + θ k x ( y k ) y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) end for Observe 1 Recover (normalized) perceptron if θ k = k +1 Recover von Neumann’s if θ k = argmin � (1 − λ ) y k − λ Ax ( y k ) � . λ ∈ [0 , 1] 15 / 34

Smooth Perceptron-Von Neumann Algorithm Apply Nesterov’s smoothing technique (Nesterov, 2005). Key step: Use a smooth version of � A T y , x � , x ( y ) = argmin x ∈ ∆ n namely, � A T y , x � + µ � x � 2 � x µ ( y ) := argmin 2 � x − ¯ , x ∈ ∆ n for some µ > 0 and ¯ x ∈ ∆ n . 16 / 34

Smooth Perceptron-Von Neumann Algorithm Assume ¯ x ∈ ∆ n and δ > 0 are given inputs. Algorithm SPVN(¯ x , δ ) y 0 := A ¯ x ; µ 0 := n ; x 0 := x µ 0 ( y 0 ) for k = 0 , 1 , . . . 2 θ k := k +3 y k +1 := (1 − θ k )( y k + θ k Ax k ) + θ 2 k Ax µ k ( y k ) µ k +1 := (1 − θ k ) µ k x k +1 := (1 − θ k ) x k + θ k x µ k +1 ( y k +1 ) if A T y k +1 > 0 then halt: y k +1 is a solution to (D) if � Ax k +1 � ≤ δ then halt: x k +1 is δ -solution to (P) end for 17 / 34

PVN update versus SPVN update Update in PVN template y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) x k +1 := (1 − θ k ) x k + θ k x ( y k ) Update in Algorithm SPVN y k +1 := (1 − θ k )( y k + θ k Ax k ) + θ 2 k Ax µ k ( y k ) µ k +1 := (1 − θ k ) µ k x k +1 := (1 − θ k ) x k + θ k x µ k +1 ( y k +1 ) 18 / 34

Theorem (Soheili and P, 2011) Assume ¯ x ∈ ∆ n and δ > 0 are given. (a) If δ < ρ ( A ) , then Algorithm SPVN finds a solution to (D) in at most √ 2 2 n ρ ( A ) − 1 . iterations. (b) If ρ ( A ) < 0 , then Algorithm SPVN finds a δ -solution to (P) in at most √ 2 2 n − 1 δ iterations. 19 / 34

Iterated Smooth Perceptron-Von Neumann Algorithm Assume γ > 1 is a given constant. Algorithm ISPVN( γ ) x 0 := 1 ˜ n 1 for i = 0 , 1 , . . . δ i := � A ˜ x i � γ ˜ x i +1 = SPVN(˜ x i , δ i ) end for 20 / 34

Main Theorem Again Theorem (Soheili & P, 2012) (a) If ρ ( A ) > 0 , then each call to SPVN in Algorithm ISPVN √ halts in at most 2 2 n ρ ( A ) − 1 iterations. Consequently, Algorithm ISPVN finds a solution to (D) in at most √ � � 2 2 n · log(1 /ρ ( A )) ρ ( A ) − 1 log( γ ) SPVN iterations. (b) If ρ ( A ) < 0 , then each call to SPVN in Algorithm ISPVN √ halts in at most 2 γ 2 n | ρ ( A ) | − 1 iterations. Hence for ǫ > 0 Algorithm ISPVN finds an ǫ -solution to (P) in at most √ � � 2 γ 2 n · log(1 /ǫ ) | ρ ( A ) | − 1 log( γ ) SPVN iterations. 21 / 34

Observe A “pure” SPVN ( δ = 0): � √ n � When ρ ( A ) > 0, it solves (D) in O iterations. ρ ( A ) � √ n � When ρ ( A ) < 0, it finds ǫ -solution to (P) in O iterations. ǫ ISPVN (iterated SPVN with gradual reduction on δ ): � √ n � �� 1 When ρ ( A ) > 0, it solves (D) in O ρ ( A ) log ρ ( A ) iterations. √ n � 1 � �� When ρ ( A ) < 0, it finds ǫ -solution to (P) in O | ρ ( A ) | log ǫ iterations. 22 / 34

Perceptron and von Neumann’s as subgradient algorithms Let φ ( y ) := −� y � 2 x ∈ ∆ n � A T y , x � . + min 2 Observe  2 ρ ( A ) 2 1 if ρ ( A ) > 0 1 2 � Ax � 2 =  max φ ( y ) = min y x ∈ ∆ n 0 if ρ ( A ) ≤ 0 .  PVN Template: y k +1 = y k + θ k ( − y k + Ax ( y k )) is a subgradient algorithm for max φ ( y ) . y For µ > 0 and ¯ x ∈ ∆ n let � A T y , x � + µ − � y � 2 � x � 2 � φ µ ( y ) := + min 2 � x − ¯ 2 x ∈ ∆ n − � y � 2 + � A T y , x µ ( y ) � + µ x � 2 . = 2 � x µ ( y ) − ¯ 2 23 / 34

Proof of Main Theorem Apply Nesterov’s excessive gap technique (Nesterov, 2005). Claim For all x ∈ ∆ n and y ∈ R m we have φ ( y ) ≤ 1 2 � Ax � 2 . Claim For all y ∈ R m we have φ ( y ) ≤ φ µ ( y ) ≤ φ ( y ) + 2 µ. Lemma The iterates x k ∈ ∆ n , y k ∈ R m , k = 0 , 1 , . . . generated by the SPVN Algorithm satisfy the Excessive Gap Condition 1 2 � Ax k � 2 ≤ φ µ k ( y k ) . 24 / 34

Proof of Main Theorem (a): ρ ( A ) > 0 Putting together the two claims and lemma we get 1 2 ρ ( A ) 2 ≤ 1 2 � Ax k � 2 ≤ φ µ k ( y k ) ≤ φ ( y k ) + 2 µ k . So φ ( y k ) ≥ 1 2 ρ ( A ) 2 − 2 µ k . In the algorithm µ k = n · 1 3 · 2 k 2 n 2 n 4 · · · k +2 = ( k +1)( k +2) < ( k +1) 2 . Thus φ ( y k ) > 0, and consequently A T y k > 0, as soon as √ k ≥ 2 2 n ρ ( A ) − 1 . 25 / 34

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na - PowerPoint PPT Presentation

A primal-dual smooth perceptron-von Neumann algorithm Javier Pe na Carnegie Mellon University (joint work with Negar Soheili) Shubfest, Fields Institute May 2012 1 / 34 Polyhedral feasibility problems R m n , consider the

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

13.1 Review of Last Lecture Review of primal and dual of SVM. Insights: Dual only depends on

A practical primal-dual interior-point algorithm for nonsymmetric conic optimization September 8,

A primal-dual algorithm for expontial-cone optimization ICCOPT Berlin, August 8th, 2019

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

The Perceptron Algorithm Perceptron (Frank Rosenblatt, 1957) First learning algorithm for

Primal-Dual Algorithm Math 482, Lecture 29 Misha Lavrov April 17, 2020 Introduction The

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Convergence of Perceptron Learning Algorithm Matthieu R. Bloch 1 Convergence of Perceptron

Solving Linear and Integer Programs Robert E. Bixby ILOG, Inc. and Rice University Dual

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

On projections onto polyhedral sets and applications to primal-dual projection algorithms for

Chapter 2 Integer Programming Paragraph 1 Total Unimodularity What we did so far We

Perceptron Algorithm An aside: a hyperplane is a perceptron. (single layer neural network, do you

Supervised Classification with Logistic Regression CMSC 470 Marine Carpuat The Perceptron What

Primal-dual interior-point optimization for a regularized reconstruction of NMR relaxation time

Primal-dual fixed point algorithms for separable minimization problems and their applications in

Approximat e k-MSTs and k-St einer Trees via t he Primal-Dual Met hod and Lagrangean Relaxat

NLP Programming Tutorial 3 - The Perceptron Algorithm Graham Neubig Nara Institute of Science

Projected primal-dual splitting for solving constrained convex optimization 1 L. M. Brice

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems Presenter: Mingyi

An Algebraic Convergence Theory for Primal and Dual Substructuring Methods by Constraints Jan