Conditional Gradient Algorithms for Rank-One Matrix Approximations - PowerPoint PPT Presentation

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and Statistical Learning – OSL 2013 January 6–11, 2013 – Les Houches, France Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 1

Sparsity Constrained Rank-One Matrix Approximation ≡ PCA Principal Component Analysis solves min {� A − xx T � 2 F : � x � 2 = 1 , x ∈ R n } ⇔ max { x T Ax : � x � 2 = 1 , x ∈ R n } , ( A ∈ S n + ) Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 2

Sparsity Constrained Rank-One Matrix Approximation ≡ PCA Principal Component Analysis solves min {� A − xx T � 2 F : � x � 2 = 1 , x ∈ R n } ⇔ max { x T Ax : � x � 2 = 1 , x ∈ R n } , ( A ∈ S n + ) Sparse Principal Component Analysis solves max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } , k ∈ [ 1 , n ] sparsity � x � 0 counts the number of nonzero entries of x Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 2

Sparsity Constrained Rank-One Matrix Approximation ≡ PCA Principal Component Analysis solves min {� A − xx T � 2 F : � x � 2 = 1 , x ∈ R n } ⇔ max { x T Ax : � x � 2 = 1 , x ∈ R n } , ( A ∈ S n + ) Sparse Principal Component Analysis solves max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } , k ∈ [ 1 , n ] sparsity � x � 0 counts the number of nonzero entries of x Difficulties: Maximizing a Convex objective. 1 Hard Nonconvex Constraint � x � 0 ≤ k . 2 Current Approaches: SDP Convex Relaxations [D’aspremont-El Ghaoui-Jordan-Lankcriet 07] 1 Approximation/Modified formulations [Many....] 2 Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 2

Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3

Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3

Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 � Relaxed l 1 -constrained PCA ( � x � 1 ≤ � x � 0 � x � 2 , ∀ x ) √ max { x T Ax : � x � 2 = 1 , � x � 1 ≤ k } Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3

Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 � Relaxed l 1 -constrained PCA ( � x � 1 ≤ � x � 0 � x � 2 , ∀ x ) √ max { x T Ax : � x � 2 = 1 , � x � 1 ≤ k } Relaxed l 1 -penalized PCA max { x T Ax − s � x � 1 : � x � 2 = 1 } Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3

Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 � Relaxed l 1 -constrained PCA ( � x � 1 ≤ � x � 0 � x � 2 , ∀ x ) √ max { x T Ax : � x � 2 = 1 , � x � 1 ≤ k } Relaxed l 1 -penalized PCA max { x T Ax − s � x � 1 : � x � 2 = 1 } Approximate-Penalized: Uses concave approximation of � x � 0 max { x T Ax − s ϕ p ( | x � ) : � x � 2 = 1 } ϕ p ( x ) ≃ � x � 0 , p → 0 + . Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3

Sparse PCA via Penalization/Relaxation/Approximation The problem of interest is the difficult sparse PCA problem as is max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Literature has focused on solving various modifications: l 0 -penalized PCA max { x T Ax − s � x � 0 : � x � 2 = 1 } , s > 0 � Relaxed l 1 -constrained PCA ( � x � 1 ≤ � x � 0 � x � 2 , ∀ x ) √ max { x T Ax : � x � 2 = 1 , � x � 1 ≤ k } Relaxed l 1 -penalized PCA max { x T Ax − s � x � 1 : � x � 2 = 1 } Approximate-Penalized: Uses concave approximation of � x � 0 max { x T Ax − s ϕ p ( | x � ) : � x � 2 = 1 } ϕ p ( x ) ≃ � x � 0 , p → 0 + . SDP-Convex Relaxation max { tr ( AX ) : tr ( X ) = 1 , X � 0 , � X � 1 ≤ k } Convex relaxations can be computationally expensive for very large problems and will not be discussed here. Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 3

Quick Highlight of Simple Algorithms on ”Modified Problems” Type Iteration Per-Iteration References Complexity 2 ) xj ) i )( | (( A + σ 2 ) xj ) i |− λ j )+ sgn ((( A + σ xj + 1 O ( n 2 ) , O ( mn ) l 1-constrained = Witten et al. (2009) i �� ( | (( A + σ 2 ) xj ) h |− λ j ) 2 + h = sgn (( Axj ) i )( | ( Axj ) i |− sj )+ xj + 1 O ( n 2 ) , O ( mn ) l 1-constrained where Sigg-Buhman (2008) i �� ( | ( Axj ) h |− sj ) 2 + h sj is ( k + 1 ) -largest entry of vector | Axj | � [ sgn (( bT i zj ) 2 − s )]+( bT i zj ) bi zj + 1 = i l 0-penalized O ( mn ) Shen-Huang (2008), � � [ sgn (( bT i zj ) 2 − s )]+( bT i zj ) bi � 2 i Journee et al. (2010) sgn ( 2 ( Axj ) i )( | 2 ( Axj ) i |− s ϕ ′ p ( | xj i | ))+ xj + 1 O ( n 2 ) l 0-penalized = Sriperumbudur et al. (2010) i �� p ( | xj ( | 2 ( Axj ) h |− s ϕ ′ h | )) 2 + h � � bi − xj yT bi � 2 2 + λ � y � 2 yj + 1 = argmin l 1-penalized { 2 + s � y � 1 } Zou et al. (2006) y i ( � bi bT i ) yj + 1 xj + 1 = i � ( � bi bT i ) yj + 1 � 2 i � ( | bT i zj |− s )+ sgn ( bT i zj ) bi zj + 1 = i l 1-penalized O ( mn ) Shen-Huang (2008), � � ( | bT i zj |− s )+ sgn ( bT i zj ) bi � 2 i Journee et al. (2010) Table : Cheap sparse PCA algorithms for modified problems. Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 4

A Plethora of Models/Algorithms Revisited All previous listed algorithms have been derived from various disparate approaches/motivations to solve modifications of SPCA: Nonsmooth reformulations Expectation Maximization Majoration-Mininimization techniques DC programming ... etc... Q1: Are all these algorithms different? ...Any connection? Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 5

A Plethora of Models/Algorithms Revisited All previous listed algorithms have been derived from various disparate approaches/motivations to solve modifications of SPCA: Nonsmooth reformulations Expectation Maximization Majoration-Mininimization techniques DC programming ... etc... Q1: Are all these algorithms different? ...Any connection? Our problem of interest is the difficult sparse PCA problem ”as is” max { x T Ax : � x � 2 = 1 , � x � 0 ≤ k , x ∈ R n } Q2: Is is possible to derive a simple/cheap scheme to tackle directly the sparse PCA problem as is? Marc Teboulle – Tel Aviv University, Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint 5

Conditional Gradient Algorithms for Rank-One Matrix Approximations - PowerPoint PPT Presentation

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and Statistical Learning OSL 2013

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Matrix invertibility Rank-Nullity Theorem: For any n -column matrix A , nullity A + rank A = n

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Physics becomes the computer Norm Margolus Physics becomes the computer Emulating Physics

Finite Degradation Structures A Unified Framework of Combinatorial Models in Probabilistic

Introduction to Algorithm Analysis Algorithm : Design & Analysis [1] As soon as an

Asymptotics of Robin eigenvalues in domains with corners Magda Khalile Institut f ur Analysis,

Recent advances on the acceleration of first-order methods in convex optimization . Juan

A generic algorithm for some optimization problems in rotagraphs and fasciagraphs Marwane

Multiplier tricks for spectral convergence Sabine B ogli (Imperial College London) Re 15

What does Mathematical Notation actually mean, and how can computers process it? James Davenport

Conditional Gradient Algorithms for Rank-One Matrix Approximations - PowerPoint PPT Presentation

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and Statistical Learning OSL 2013

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Matrix invertibility Rank-Nullity Theorem: For any n -column matrix A , nullity A + rank A = n

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Multiple-Rank Updates to Matrix Factorizations Zack 8/30/2013 Outline u Introduction u

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Physics becomes the computer Norm Margolus Physics becomes the computer Emulating Physics

Finite Degradation Structures A Unified Framework of Combinatorial Models in Probabilistic

Introduction to Algorithm Analysis Algorithm : Design &amp; Analysis [1] As soon as an

Asymptotics of Robin eigenvalues in domains with corners Magda Khalile Institut f ur Analysis,

Recent advances on the acceleration of first-order methods in convex optimization . Juan

A generic algorithm for some optimization problems in rotagraphs and fasciagraphs Marwane

Multiplier tricks for spectral convergence Sabine B ogli (Imperial College London) Re 15

What does Mathematical Notation actually mean, and how can computers process it? James Davenport

Introduction to Algorithm Analysis Algorithm : Design & Analysis [1] As soon as an