the k variance problem orthogonal projections
play

The k -variance problem Orthogonal projections If V R d , then V - PowerPoint PPT Presentation

The k -variance problem Orthogonal projections If V R d , then V := { y R d | x V : y , x = 0 } is the orthogonal complement of V V V = { 0 } and for all x R d there exist unique x V , x V


  1. The k -variance problem Orthogonal projections If V ⊆ R d , then V ⊥ := { y ∈ R d | ∀ x ∈ V : � y , x � = 0 } is the orthogonal complement of V V ∩ V ⊥ = { 0 } and for all x ∈ R d there exist unique x ′ ∈ V , x ′′ ∈ V ⊥ with x = x ′ + x ′′ π V : R d → V , π V ( x ) = x ′ , orthogonal projection onto V , x ′′ denoted π V ( x ) ⊥ . If dim( V ) = 1 , V = span( v ) , then π V ( x ) = � x , v � π V ( x ) ⊥ = x − � x , v � � v , v � v and � v , v � v 1 / 20

  2. The k -variance problem Problem 5.1 ( k -variance problem) Given P ⊂ R d , | P | = n and k ∈ N , Find the k-dimensional subspace V k that minimizes � � p − π V ( p ) � 2 . D ( P , V ) := p ∈ P The subspace V k is called the (k-dimensional) singular value decomposition of P. 2 / 20

  3. Characterization of optimal subspace Lemma 5.2 For all P ⊂ R d V k = argmin V :dim( V )= k { D ( P , V ) }     � � π V ( p ) � 2 ⇔ V k = argmax V :dim( V )= k  .  p ∈ P More generally, for every subspace V ⊆ R d � q � 2 − � � � π V ( q ) � 2 . D ( P , V ) = q ∈ P q ∈ P 3 / 20

  4. Complexity and relation to k -means Theorem 5.3 For every P ⊂ R d and k ∈ N the subspace V k minimizing D ( P , V ) can be computed efficiently. Lemma 5.4 For every P ⊂ R d and k ∈ N D ( P , V k ) ≤ opt k ( P ) . 4 / 20

  5. Spectral algorithms Spectral algorithms Given P ⊂ R d , 1 compute the singular value decomposition V k , i.e. the subspace minimizing D ( P , V ), 2 solve your favorite clustering problem with your favorite algorithm on input π V k ( P ) := { π V k ( p ) : p ∈ P } , 3 return the solution found in the previous step. 5 / 20

  6. Orthonormal bases Definition 5.5 Let V ⊆ R d be a k-dimensional subspace of R d and let B = { v 1 , . . . , v k } be a basis of V . Basis B is an orthonormal basis (ONB) of V if 1 � v i � = 1 , i = 1 , . . . , k 2 � v i , v j � = 0 for i � = j , i , j = 1 , . . . , n. Theorem 5.6 Every subspace V ⊆ R d has an orthonormal basis. Moreover, any orthonormal basis of V can be extended to an orthonormal basis of R d . 6 / 20

  7. Length-preserving linear maps V ⊆ R d subspace with orthonormal basis B V = { v 1 , . . . , v k } . U ∈ R k × d matrix with rows v T 1 , . . . , v T k . Π V denotes function Π V : R d → R k , x �→ U · x Theorem 5.7 The linear function Π V has the following properties: 1 Π V is surjective. 2 Π V is length-preserving on V , i.e. for all x ∈ V : � x � = � Π V ( x ) � . 7 / 20

  8. Spectral algorithms revisited Spectral algorithms Given P ⊂ R d , 1 compute the singular value decomposition V k , i.e. the subspace minimizing D ( P , V ), 2 solve your favorite clustering problem with your favorite algorithm on input π V k ( P ) := { π V k ( p ) : p ∈ P } , i.e. compute an orthonormal basis for V k and apply your favorite clustering algorithm on the set Π V k ( π V k ( P )) 3 return the solution found in the previous step. 8 / 20

  9. k -variance and k -means Lemma 5.8 Let P ⊂ R d and let V be an arbitrary k-dimensional subspace of R d . Then opt k ( π V ( P )) ≤ opt k ( P ) , where opt k ( P ) denotes the cost of an optimal solution of k-means with input P. 9 / 20

  10. k -variance and k -means Lemma 5.9 Let P ⊂ R d and let V be an arbitrary k-dimensional subspace of R d . Assume ˆ C = { ˆ C 1 , . . . , ˆ C k } is a k-clustering of π V ( P ) and denote by C := { C 1 , . . . , C k } with C i := { p ∈ P : π V ( p ) ∈ ˆ C i } , the corresponding k-clustering of P. Then cost ( π V ( P ) , ˆ C ) ≤ cost ( P , C ) ≤ cost ( π V ( P ) , ˆ C ) + D ( P , V ) . 10 / 20

  11. Approximation guarantees for spectral algorithms Spectral algorithms Given P ⊂ R d , 1 compute the singular value decomposition V k , i.e. the subspace minimizing D ( P , C ), 2 solve your favorite clustering problem with your favorite algorithm on input π V k ( P ) := { π V k ( p ) : p ∈ P } , 3 return the solution found in the previous step. Theorem 5.10 Let P ⊂ R d and let V k be the k-dimensional subspace of R d minimizing D ( P , V ) . If ˆ C is a γ -approximate k-clustering for π V k ( P ) , then the corresponding k-clustering C as defined in the previous lemma is a ( γ + 1) -approximate k-clustering for P. 11 / 20

  12. An excact algorithm for k -means Exact-k-Means ( P , k ) � k � Compute the set K of sets of t hyperplanes with k ≤ t ≤ where each 2 hyperplane contains d affinely independent points from P ; for S ∈ K do check that S defines an arrangement of exactly k cells; for all assignments a S of points of P on hyperplanes in S to cells do for all cells do compute the centroid of points of P in the cell; end C S , a s := set of centroids computed in the previous step; end C S := argmin C S , aS { D ( P , C S , a S ) } ; end return argmin C S { D ( P , C S ) } ; 12 / 20

  13. An excact algorithm for k -means Theorem 5.11 Algorithm Exact-k-Means solves the k-means problem n dk 2 / 2 � � optimally in time O . 13 / 20

  14. A spectral approximation algorithm Spectral-k-Means ( P , k ) Compute V k := argmin V :dim( V )= k { D ( P , V ) } ; ¯ C := Exact-k-Means ( π V k ( P ) , k ); return ¯ C ; Theorem 5.12 Spectral-k-Means is an approximation algorithm for the n · d 2 + n k 3 / 2 � � k-means problem with running time O and approximation factor 2 . 14 / 20

  15. Matrix representation of point sets P = { p 1 , . . . , p n } ⊂ R d matrix A ∈ R d × n with columns p i called m atrix representation of P rows of A T ∈ R n × d are p T i for every v ∈ R d : A T · v = ( � p 1 , v � , . . . , � p n , v � ) T ∈ R n � A T · v � 2 = v T · A · A T · v = � n i =1 � p i , v � 2 15 / 20

  16. Characterization of k -variance solutions Theorem 5.13 For every set of points P ⊂ R d , | P | = n , with matrix representation A ∈ R d × n :     � � π V ( p ) � 2 argmax V :dim( V )= k  =  p ∈ P �� � v T · A · A T · v argmax ONB B : | B | = k v ∈ B 16 / 20

  17. Eigenvalues and eigenvectors Definition 5.14 Let M ∈ R d × d , λ ∈ R and v ∈ R d , v � = 0 .Then λ is called an eigenvalue of M to eigenvector v (and vice versa) if M · v = λ · v. Theorem 5.15 For every A ∈ R d × n the matrix M = A · A T ∈ R d × d has non-negative eigenvalues λ 1 ≥ · · · λ d ≥ 0 . Moreover, there is an orthonormal basis B = { v 1 , . . . , v d } such that λ i is an eigenvalue of M to eigenvector v i , i = 1 , . . . , d. 17 / 20

  18. Solutions to the k -variance problem Theorem 5.16 Let P ⊂ R d be a finite set of points with matrix representation A ∈ R d × n and k ∈ N . If A · A T has eigenvalues λ 1 ≥ · · · ≥ λ d and B = { v 1 , . . . , v d } is an orthonormal basis consisting of eigenvectors, i.e. v i is an eigenvector to eigenvalue λ i , i = 1 . . . , d, then span { v 1 , . . . , v k } = argmin V :dim( V )= k { D ( P , V ) } . 18 / 20

  19. Singular values and vectors M ∈ R n × d , case d = n : v ∈ R d eigenvector to eigenvalue σ if M · v = σ · v generalization to n � = d ? can one compute eigenvectors and eigenvalues of A · A T without computing the matrix product? Singular vectors and singular values σ ∈ R is called singular value of M with corresponding singular vectors v ∈ R d , u ∈ R n if 1 M · v = σ · u 2 u T · M = σ · v T . 19 / 20

  20. Eigenvectors and singular vectors Lemma 5.17 Let M ∈ R n × d . Then σ ∈ R is a singular value of M with corresponding singular vectors v ∈ R d and u ∈ R n if and only if 1 σ 2 is an eigenvalue of M T · M, 2 v is a right eigenvector of M T · M to eigenvalue σ 2 , 3 u T is a left eigenvector of M · M T to eigenvalue σ 2 . 20 / 20

Recommend


More recommend