Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and Statistical Learning – OSL 2013 January 6–11, 2013 – Les Houches, France Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 1
Sparsity Constrained Rank-One Matrix Approximation ≡ PCA Principal Component Analysis solves min {� A − xx T � 2 F : � x � 2 = 1 , x ∈ R n } ⇔ max { x T Ax : � x � 2 = 1 , x ∈ R n } , ( A ∈ S n + ) Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 2
Sparsity Constrained Rank-One Matrix Approximation ≡ PCA Principal Component Analysis solves min {� A − xx T � 2 F : � x � 2 = 1 , x ∈ R n } ⇔ max { x T Ax : � x � 2 = 1 , x ∈ R n } , ( A ∈ S n + ) Sparse Principal Component Analysis solves max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } , k ∈ [ 1 , n ] sparsity � x � 0 counts the number of nonzero entries of x Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 2
Sparsity Constrained Rank-One Matrix Approximation ≡ PCA Principal Component Analysis solves min {� A − xx T � 2 F : � x � 2 = 1 , x ∈ R n } ⇔ max { x T Ax : � x � 2 = 1 , x ∈ R n } , ( A ∈ S n + ) Sparse Principal Component Analysis solves max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } , k ∈ [ 1 , n ] sparsity � x � 0 counts the number of nonzero entries of x Difficulties: Maximizing a Convex objective. 1 Hard Nonconvex Constraint � x � 0 ≤ k . 2 Current Approaches: SDP Convex Relaxations [D’aspremont-El Ghaoui-Jordan-Lankcriet 07] 1 Approximation/Modified formulations [Many....] 2 Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 2
Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3
Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3
Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 � Relaxed l 1 -constrained PCA ( � x � 1 ≤ � x � 0 � x � 2 , ∀ x ) √ max { x T Ax : � x � 2 = 1 , � x � 1 ≤ k } Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3
Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 � Relaxed l 1 -constrained PCA ( � x � 1 ≤ � x � 0 � x � 2 , ∀ x ) √ max { x T Ax : � x � 2 = 1 , � x � 1 ≤ k } Relaxed l 1 -penalized PCA max { x T Ax − s � x � 1 : � x � 2 = 1 } Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3
Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 � Relaxed l 1 -constrained PCA ( � x � 1 ≤ � x � 0 � x � 2 , ∀ x ) √ max { x T Ax : � x � 2 = 1 , � x � 1 ≤ k } Relaxed l 1 -penalized PCA max { x T Ax − s � x � 1 : � x � 2 = 1 } Approximate-Penalized: Uses concave approximation of � x � 0 max { x T Ax − s ϕ p ( | x � ) : � x � 2 = 1 } ϕ p ( x ) ≃ � x � 0 , p → 0 + . Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3
Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 � Relaxed l 1 -constrained PCA ( � x � 1 ≤ � x � 0 � x � 2 , ∀ x ) √ max { x T Ax : � x � 2 = 1 , � x � 1 ≤ k } Relaxed l 1 -penalized PCA max { x T Ax − s � x � 1 : � x � 2 = 1 } Approximate-Penalized: Uses concave approximation of � x � 0 max { x T Ax − s ϕ p ( | x � ) : � x � 2 = 1 } ϕ p ( x ) ≃ � x � 0 , p → 0 + . SDP-Convex Relaxation max { tr ( AX ) : tr ( X ) = 1 , X � 0 , � X � 1 ≤ k } Convex relaxations can be computationally expensive for very large problems and will not be discussed here. Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3
Quick Highlight of Simple Algorithms on ”Modified Problems” Type Iteration Per-Iteration References Complexity 2 ) xj ) i )( | (( A + σ 2 ) xj ) i |− λ j )+ sgn ((( A + σ xj + 1 O ( n 2 ) , O ( mn ) l 1-constrained = Witten et al. (2009) i �� ( | (( A + σ 2 ) xj ) h |− λ j ) 2 + h = sgn (( Axj ) i )( | ( Axj ) i |− sj )+ xj + 1 O ( n 2 ) , O ( mn ) l 1-constrained where Sigg-Buhman (2008) i �� ( | ( Axj ) h |− sj ) 2 + h sj is ( k + 1 ) -largest entry of vector | Axj | � [ sgn (( bT i zj ) 2 − s )]+( bT i zj ) bi zj + 1 = i l 0-penalized O ( mn ) Shen-Huang (2008), � � [ sgn (( bT i zj ) 2 − s )]+( bT i zj ) bi � 2 i Journee et al. (2010) sgn ( 2 ( Axj ) i )( | 2 ( Axj ) i |− s ϕ ′ p ( | xj i | ))+ xj + 1 O ( n 2 ) l 0-penalized = Sriperumbudur et al. (2010) i �� p ( | xj ( | 2 ( Axj ) h |− s ϕ ′ h | )) 2 + h � � bi − xj yT bi � 2 2 + λ � y � 2 yj + 1 = argmin l 1-penalized { 2 + s � y � 1 } Zou et al. (2006) y i ( � bi bT i ) yj + 1 xj + 1 = i � ( � bi bT i ) yj + 1 � 2 i � ( | bT i zj |− s )+ sgn ( bT i zj ) bi zj + 1 = i l 1-penalized O ( mn ) Shen-Huang (2008), � � ( | bT i zj |− s )+ sgn ( bT i zj ) bi � 2 i Journee et al. (2010) Table : Cheap sparse PCA algorithms for modified problems. Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 4
A Plethora of Models/Algorithms Revisited All previous listed algorithms have been derived from various disparate approaches/motivations to solve modifications of SPCA: Nonsmooth reformulations Expectation Maximization Majoration-Mininimization techniques DC programming ... etc... Q1: Are all these algorithms different? ...Any connection? Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 5
A Plethora of Models/Algorithms Revisited All previous listed algorithms have been derived from various disparate approaches/motivations to solve modifications of SPCA: Nonsmooth reformulations Expectation Maximization Majoration-Mininimization techniques DC programming ... etc... Q1: Are all these algorithms different? ...Any connection? Our problem of interest is the difficult sparse PCA problem ”as is” max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Q2: Is is possible to derive a simple/cheap scheme to tackle directly the sparse PCA problem as is? Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 5
A Plethora of Models/Algorithms Revisited All previous listed algorithms have been derived from various disparate approaches/motivations to solve modifications of SPCA: Nonsmooth reformulations Expectation Maximization Majoration-Mininimization techniques DC programming ... etc... Q1: Are all these algorithms different? ...Any connection? Our problem of interest is the difficult sparse PCA problem ”as is” max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Q2: Is is possible to derive a simple/cheap scheme to tackle directly the sparse PCA problem as is? Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 5
Recommend
More recommend