Nonlinear Eigenproblems in Data Analysis and Graph Partitioning Matthias Hein Department of Mathematics and Computer Science Saarland University, Saarbr¨ ucken, Germany Minisymposium: Modern matrix methods for large scale data and networks SIAM Conference on Applied Linear Algebra, Valencia 19.06.2012 Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 1 / 27
Linear Eigenproblems in Machine Learning Motivation: Eigenvalue problems are abundant in data analysis Principal Component Analysis: Largest eigenvectors of covariance matrix of the data Usage: Denoising by projection onto largest eigenvectors. Spectral Clustering: Second smallest eigenvector of the graph Laplacian Usage: Graph partitioning using thresholded eigenvector. Latent Semantic Analysis: Singular value decomposition of term-document matrix Usage: Recover underlying latent semantic structure. Many more ... ! Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 2 / 27
The Symmetric Linear Eigenproblem Generalized Symmetric Linear Eigenproblem: Let A , B ∈ R n × n be symmetric and B positive definite. Then Ax = � x , Ax � � x , Ax � ⇐ ⇒ � x , Bx � . � x , Bx � Bx x critical point of Variational Principle: Courant-Fischer min-max theorem yields n eigenvalues: � x , Ax � λ m = min U m ∈U m max x ∈ U m � x , Bx � , m = 1 , . . . , n , where U m is the class of all m -dimensional subspaces of R n . Critical point theory for ratios of quadratic functions Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 3 / 27
Robust PCA Principal Component Analysis (PCA) 4 first PCA component 3 2 1 0 −1 −2 −3 −4 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 PCA Type of eigenproblem Linear n 2 i =1 � w , X i − 1 � n j =1 X j � � n Ratio � w � 2 2 Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 4 / 27
Robust PCA Principal Component Analysis (PCA) 4 first PCA component (original) first PCA component (perturbed) 3 2 Source of outliers 1 noisy data 0 adversarial −1 manipulation −2 −3 −4 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 PCA Type of eigenproblem Linear n 2 i =1 � w , X i − 1 � n j =1 X j � � n Ratio � w � 2 2 Robustness no Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 4 / 27
Robust PCA Principal Component Analysis (PCA) 4 first PCA component (original) first PCA component (perturbed) 3 2 Source of outliers 1 noisy data 0 adversarial −1 manipulation −2 −3 −4 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 PCA Type of eigenproblem Linear Nonlinear n � � 2 i =1 � w , X i − 1 � n j =1 X j � � n V � w , X 1 � ,..., � w , X n � Ratio ⇒ � w � 2 � w � 2 2 Robustness no yes Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 4 / 27
The Symmetric Linear Eigenproblem Pros: Fast solvers available Cons: Restriction to ratio of quadratic functionals = ⇒ limited modeling abilities Quadratic functionals are non-robust against outliers (PCA). Quadratic functionals cannot induce eigenvectors which are sparse. Idea: Replace quadratic functionals by convex p -homogeneous functions ! Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 5 / 27
The Nonlinear Eigenproblem (Homogeneous) Nonlinear Eigenproblem: Let R , S : R n → R be convex, even and p -homogeneous ( R ( γ x ) = | γ | p R ( x )) and S ( x ) = 0 ⇔ x = 0. Then 0 ∈ ∂ R ( x ) − R ( x ) R ( x ) S ( x ) ∂ S ( x ) ⇐ = x critical point of S ( x ) . Variational Principle: Lusternik-Schnirelmann min-max theorem yields n nonlinear eigenvalues: R ( x ) λ m = min K ∈K m max x ∈ K S ( x ) , m = 1 , . . . , n , where K m is the class of all compact symmetric subsets of { x ∈ R n | S ( x ) > 0 } with Krasnoselskii genus greater or equal to m . New: general more than n eigenvectors exist. Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 6 / 27
The Nonlinear Eigenproblem II Pros: Stronger modeling power using non-quadratic functions R and S Specific properties of eigenvectors like robustness against outliers or sparsity can be induced by nonsmooth choices of S respectively R . Challenges: Optimization problems for eigenproblems are typically nonconvex and nonsmooth . Need for new efficient algorithms ! Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 7 / 27
(Inverse) Power Method for Nonlinear Eigenproblems Inverse Power Method for Linear Eigenproblems � u , Bf k � � � 1 ⇐ ⇒ 2 � u , Au � − Af k +1 = B f k f k +1 = arg min u ∈ R n Sequence f k converges to smallest eigenvector of generalized eigenproblem. Inverse Power Method for Nonlinear Eigenproblems (H.,B.(2010)) p > 1 p = 1 { R ( u ) − � u , s ( f k ) �} { R ( u ) − λ k � u , s ( f k ) �} g k +1 = arg min f k +1 = arg min u ∈ R n � u � 2 ≤ 1 f k +1 = g k +1 / S ( g k +1 ) 1 / p s ( f k +1 ) ∈ ∂ S ( f k +1 ) s ( f k +1 ) ∈ ∂ S ( f k +1 ) λ k +1 = R ( f k +1 ) λ k +1 = R ( f k +1 ) S ( f k +1 ) S ( f k +1 ) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 8 / 27
Properties of Nonlinear Inverse Power Method Theorem (Hein, B¨ uhler (2010)): It holds either λ k +1 < λ k or λ k +1 = λ k and the sequence terminates. Moreover, for every cluster point f ∗ of the sequence f k one has 0 ∈ ∂ R ( f ∗ ) − λ ∗ ∂ S ( f ∗ ) , where λ ∗ = R ( f ∗ ) S ( f ∗ ) . Guarantees: monotonic descent method convergence guaranteed to some nonlinear eigenvector but not necessarily the one associated with the smallest eigenvalue. Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 9 / 27
Benefits of Nonlinear Eigenproblems Linear EP Nonlinear EP Modeling power low high Relaxation of loose tight combinatorial problems Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 10 / 27
The Cheeger Cut Problem Cheeger cut: ( C , C ) is a partition of the weighted, undirected graph � cut ( C , C ) � � φ ( C ) = � } , where cut ( A , B ) = w ij � C min {| C | , i ∈ A , j ∈ B Optimal Cheeger cut, φ ∗ = min C φ ( C ) , is NP-hard Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 11 / 27
Balanced Graph Cuts - Applications Clustering/Community detection Image Segmentation � Parallel Computing (Matrix Reordering) � Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 12 / 27
Relaxation of Cheeger Cut Problem Relaxation into semi-definite program with | V | 3 constraints: � Best known (worst case) approximation guarantee: O ( log | V | ). Spectral Relaxation based on graph Laplacian L L = D − W , Isoperimetric inequality (Alon, Milman (1984)) ( φ ∗ ) 2 ≤ λ 2 ( L ) ≤ 2 φ ∗ . 2 max i d i there are graphs known which realize lower bound bipartitioning obtained by optimal thresholding of second eigenvector Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 13 / 27
1-Spectral Clustering 1 -graph Laplacian: The nonlinear graph 1-Laplacian ∆ 1 induces the functional F 1 ( f ), � n 1 i , j =1 w ij | f i − f j | F 1 ( f ) := � f , ∆ 1 f � 2 = . � f � 1 � f � 1 Theorem (Hein,B¨ uhler (2010)): Let G be connected, then cut ( C , C ) min � � � } = min F 1 ( f ) = λ 2 (∆ 1 ) , � C min {| C | , C f nonconstant median ( f )= 0 where λ 2 (∆ 1 ) is the second smallest eigenvalue of ∆ 1 . The second eigenvector of ∆ 1 is the indicator vector of the optimal partition. Tight relaxation of the optimal Cheeger cut ! Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 14 / 27
Quality Guarantee Tight relaxation of Cheeger cut: Minimization of continuous relaxation is as hard as original Cheeger cut problem = ⇒ non-convex and non-smooth No guarantee that one obtains optimal solution by NIPM ! Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 15 / 27
Quality Guarantee Tight relaxation of Cheeger cut: Minimization of continuous relaxation is as hard as original Cheeger cut problem = ⇒ non-convex and non-smooth No guarantee that one obtains optimal solution by NIPM ! but Quality guarantee: Theorem Let ( A , A ) be a given partition of V . If one uses as initialization of NIPM, f 0 = 1 A , then either NIPM terminates after one step or it yields an f 1 which after optimal thresholding gives a partition ( B , B ) which satisifies cut ( B , B ) cut ( A , A ) min {| B | , | B |} < min {| A | , | A |} . Next Goal: Global approximation guarantees. Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 15 / 27
Cheeger Cut: 1-Laplacian (NLEP) vs. 2-Laplacian (LEP) Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 16 / 27
Linear Nonlinear � n i , j =1 w ij ( x i − x j ) 2 � n i , j =1 w ij | x i − x j | Ratio � x � 2 � x � 1 2 Approximation Guarantee loose tight ! Hein, B¨ uhler(2010) Convergence globally optimal locally optimal Scalability � � Quality + +++ 1-Spectral Clustering beats state of the art methods on graph partitioning benchmark Hein (Saarland University) Nonlinear Eigenproblems in Data Analysis 17 / 27
Recommend
More recommend