A primal-dual smooth perceptron-von Neumann algorithm Javier Pe˜ na Carnegie Mellon University (joint work with Negar Soheili) Shubfest, Fields Institute May 2012 1 / 34
Polyhedral feasibility problems ∈ R m × n , consider the alternative � � Given A := a 1 a 2 · · · a n feasibility problems A T y > 0 , (D) and Ax = 0 , x ≥ 0 , x � = 0 . (P) Theme Condition-based analysis of elementary algorithms for solving (P) and (D). 2 / 34
Perceptron Algorithm Algorithm to solve A T y > 0 . (D) Perceptron Algorithm (Rosenblatt, 1958) y := 0 while A T y � > 0 a j � a j � , where a T y := y + j y ≤ 0 end while Throughout this talk: � · � = � · � 2 . 3 / 34
Von Neumann’s Algorithm Algorithm to solve Ax = 0 , x ≥ 0 , x � = 0 . (P) Von Neumann’s Algorithm (von Neumann, 1948) x 0 := 1 n 1 ; y 0 := Ax 0 for k = 0 , 1 , . . . if a T j y k := min i a T i y k > 0 then halt: (P) is infeasible 1 − a T j y k λ k := argmin λ ∈ [0 , 1] � (1 − λ ) y k − λ a j � = � y k � 2 − 2 a T j y k +1 x k +1 := λ k x k + (1 − λ k ) e j , where j = argmin i a T i y k end for 4 / 34
Elementary algorithms The perceptron and von Neumann’s algorithms are “elementary” algorithms. “Elementary” means that each iteration involves only simple computations. Why should we care about elementary algorithms? Some large-scale optimization problems (e.g., in compressive sensing) are not solvable via conventional Newton-based algorithms. In some cases, the entire matrix A may not be explicitly available at once. Elementary algorithms have been effective in these cases. 5 / 34
Conditioning Throughout the sequel assume � � A = a 1 · · · a n , where � a j � = 1 , j = 1 , . . . , n . Key parameter j =1 ,..., n a T ρ ( A ) := max � y � =1 min j y . Goffin-Cheung-Cucker condition number 1 C ( A ) := | ρ ( A ) | . (This is closely related to Renegar’s condition number.) 6 / 34
Conditioning Notice A T y > 0 feasible ⇔ ρ ( A ) > 0 . Ax = 0 , x ≥ 0 , x � = 0 feasible ⇔ ρ ( A ) ≤ 0 . Ill-posedness A is ill-posed when ρ ( A ) = 0. In this case both A T y > 0 and Ax = 0 , x > 0 are on the verge of feasibility. Theorem (Cheung & Cucker, 2001) a i − a i � : ˜ | ρ ( A ) | = min { max � ˜ A is ill-posed } . ˜ i A 7 / 34
Some geometry When ρ ( A ) > 0, it is a measure of thickness of the feasible cone: � � r : B ( y , r ) ⊆ { z : A T z ≥ 0 } ρ ( A ) = max . � y � =1 ! small ρ ( A ) large ρ ( A ) 8 / 34
More geometry Let ∆ n := { x ≥ 0 : � x � 1 = 1 } . Proposition (From Renegar 1995 and Cheung-Cucker 2001) | ρ ( A ) | = dist (0 , ∂ { Ax : x ≥ 0 , x ∈ ∆ n } ) . ρ ( A ) > 0 ρ ( A ) < 0 9 / 34
Condition-based complexity Recall our problems of interest A T y > 0 , (D) and Ax = 0 , x ∈ ∆ n . (P) Theorem (Block-Novikoff 1962) If ρ ( A ) > 0 , then the perceptron algorithm terminates after at most 1 ρ ( A ) 2 = C ( A ) 2 iterations. 10 / 34
Condition-based complexity Theorem (Dantzig, 1992) If ρ ( A ) < 0 , then von Neumann’s algorithm finds an ǫ -solution to (P) , i.e, x ∈ ∆ n with � Ax � < ǫ in at most 1 ǫ 2 iterations. Theorem (Epelman & Freund, 2000) If ρ ( A ) < 0 , then von Neumann’s algorithm finds an ǫ -solution to (P) in at most 1 � 1 � ρ ( A ) 2 · log ǫ iterations. 11 / 34
Main Theorem Theorem (Soheili & P, 2012) A smooth version of perceptron/von Neumann’s algorithm such that: (a) If ρ ( A ) > 0 , then it finds a solution to A T y > 0 in at most � √ n � 1 �� O ρ ( A ) · log ρ ( A ) iterations. (b) If ρ ( A ) < 0 , then it finds an ǫ -solution to Ax = 0 , x ∈ ∆ n in at most � √ n � 1 �� O | ρ ( A ) | · log ǫ iterations. (c) Iterations are elementary (not much more complicated than those of the perceptron or von Neumann’s algorithms). 12 / 34
Perceptron algorithm again Perceptron Algorithm y 0 := 0 for k = 0 , 1 , . . . a T a T j y k := min i y k i y k +1 := y k + a j end for Observe a T � A T y , x � . a T j y := min i y ⇔ a j = Ax ( y ) , x ( y ) = argmin i x ∈ ∆ n Hence in the above algorithm y k = Ax k where x k ≥ 0 , � x k � 1 = k . 13 / 34
Normalized Perceptron Algorithm � A T y , x � . Recall x ( y ) := argmin x ∈ ∆ n Normalized Perceptron Algorithm y 0 := 0 for k = 0 , 1 , . . . 1 θ k := k +1 y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) end for In this algorithm y k = Ax k for x k ∈ ∆ n . 14 / 34
Perceptron-Von Neumann’s Template Both the perceptron and von Neumann’s algorithms perform similar iterations. PVN Template x 0 ∈ ∆ n ; y 0 := Ax 0 for k = 0 , 1 , . . . x k +1 := (1 − θ k ) x k + θ k x ( y k ) y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) end for Observe 1 Recover (normalized) perceptron if θ k = k +1 Recover von Neumann’s if θ k = argmin � (1 − λ ) y k − λ Ax ( y k ) � . λ ∈ [0 , 1] 15 / 34
Smooth Perceptron-Von Neumann Algorithm Apply Nesterov’s smoothing technique (Nesterov, 2005). Key step: Use a smooth version of � A T y , x � , x ( y ) = argmin x ∈ ∆ n namely, � A T y , x � + µ � x � 2 � x µ ( y ) := argmin 2 � x − ¯ , x ∈ ∆ n for some µ > 0 and ¯ x ∈ ∆ n . 16 / 34
Smooth Perceptron-Von Neumann Algorithm Assume ¯ x ∈ ∆ n and δ > 0 are given inputs. Algorithm SPVN(¯ x , δ ) y 0 := A ¯ x ; µ 0 := n ; x 0 := x µ 0 ( y 0 ) for k = 0 , 1 , . . . 2 θ k := k +3 y k +1 := (1 − θ k )( y k + θ k Ax k ) + θ 2 k Ax µ k ( y k ) µ k +1 := (1 − θ k ) µ k x k +1 := (1 − θ k ) x k + θ k x µ k +1 ( y k +1 ) if A T y k +1 > 0 then halt: y k +1 is a solution to (D) if � Ax k +1 � ≤ δ then halt: x k +1 is δ -solution to (P) end for 17 / 34
PVN update versus SPVN update Update in PVN template y k +1 := (1 − θ k ) y k + θ k Ax ( y k ) x k +1 := (1 − θ k ) x k + θ k x ( y k ) Update in Algorithm SPVN y k +1 := (1 − θ k )( y k + θ k Ax k ) + θ 2 k Ax µ k ( y k ) µ k +1 := (1 − θ k ) µ k x k +1 := (1 − θ k ) x k + θ k x µ k +1 ( y k +1 ) 18 / 34
Theorem (Soheili and P, 2011) Assume ¯ x ∈ ∆ n and δ > 0 are given. (a) If δ < ρ ( A ) , then Algorithm SPVN finds a solution to (D) in at most √ 2 2 n ρ ( A ) − 1 . iterations. (b) If ρ ( A ) < 0 , then Algorithm SPVN finds a δ -solution to (P) in at most √ 2 2 n − 1 δ iterations. 19 / 34
Iterated Smooth Perceptron-Von Neumann Algorithm Assume γ > 1 is a given constant. Algorithm ISPVN( γ ) x 0 := 1 ˜ n 1 for i = 0 , 1 , . . . δ i := � A ˜ x i � γ ˜ x i +1 = SPVN(˜ x i , δ i ) end for 20 / 34
Main Theorem Again Theorem (Soheili & P, 2012) (a) If ρ ( A ) > 0 , then each call to SPVN in Algorithm ISPVN √ halts in at most 2 2 n ρ ( A ) − 1 iterations. Consequently, Algorithm ISPVN finds a solution to (D) in at most √ � � 2 2 n · log(1 /ρ ( A )) ρ ( A ) − 1 log( γ ) SPVN iterations. (b) If ρ ( A ) < 0 , then each call to SPVN in Algorithm ISPVN √ halts in at most 2 γ 2 n | ρ ( A ) | − 1 iterations. Hence for ǫ > 0 Algorithm ISPVN finds an ǫ -solution to (P) in at most √ � � 2 γ 2 n · log(1 /ǫ ) | ρ ( A ) | − 1 log( γ ) SPVN iterations. 21 / 34
Observe A “pure” SPVN ( δ = 0): � √ n � When ρ ( A ) > 0, it solves (D) in O iterations. ρ ( A ) � √ n � When ρ ( A ) < 0, it finds ǫ -solution to (P) in O iterations. ǫ ISPVN (iterated SPVN with gradual reduction on δ ): � √ n � �� 1 When ρ ( A ) > 0, it solves (D) in O ρ ( A ) log ρ ( A ) iterations. √ n � 1 � �� When ρ ( A ) < 0, it finds ǫ -solution to (P) in O | ρ ( A ) | log ǫ iterations. 22 / 34
Perceptron and von Neumann’s as subgradient algorithms Let φ ( y ) := −� y � 2 x ∈ ∆ n � A T y , x � . + min 2 Observe 2 ρ ( A ) 2 1 if ρ ( A ) > 0 1 2 � Ax � 2 = max φ ( y ) = min y x ∈ ∆ n 0 if ρ ( A ) ≤ 0 . PVN Template: y k +1 = y k + θ k ( − y k + Ax ( y k )) is a subgradient algorithm for max φ ( y ) . y For µ > 0 and ¯ x ∈ ∆ n let � A T y , x � + µ − � y � 2 � x � 2 � φ µ ( y ) := + min 2 � x − ¯ 2 x ∈ ∆ n − � y � 2 + � A T y , x µ ( y ) � + µ x � 2 . = 2 � x µ ( y ) − ¯ 2 23 / 34
Proof of Main Theorem Apply Nesterov’s excessive gap technique (Nesterov, 2005). Claim For all x ∈ ∆ n and y ∈ R m we have φ ( y ) ≤ 1 2 � Ax � 2 . Claim For all y ∈ R m we have φ ( y ) ≤ φ µ ( y ) ≤ φ ( y ) + 2 µ. Lemma The iterates x k ∈ ∆ n , y k ∈ R m , k = 0 , 1 , . . . generated by the SPVN Algorithm satisfy the Excessive Gap Condition 1 2 � Ax k � 2 ≤ φ µ k ( y k ) . 24 / 34
Proof of Main Theorem (a): ρ ( A ) > 0 Putting together the two claims and lemma we get 1 2 ρ ( A ) 2 ≤ 1 2 � Ax k � 2 ≤ φ µ k ( y k ) ≤ φ ( y k ) + 2 µ k . So φ ( y k ) ≥ 1 2 ρ ( A ) 2 − 2 µ k . In the algorithm µ k = n · 1 3 · 2 k 2 n 2 n 4 · · · k +2 = ( k +1)( k +2) < ( k +1) 2 . Thus φ ( y k ) > 0, and consequently A T y k > 0, as soon as √ k ≥ 2 2 n ρ ( A ) − 1 . 25 / 34
Recommend
More recommend