recitations for 10 701 randomized algorithm for matrices
play

Recitations for 10-701 Randomized Algorithm for matrices Mu Li - PowerPoint PPT Presentation

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank approximation Given a matrix A R n m , we form the rank- k approximation by A = P Q T such that A A , where R k k is typical


  1. Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013

  2. Low-rank approximation Given a matrix A ∈ R n × m , we form the rank- k approximation by A = P Σ Q T � � such that A ≈ A , where Σ ∈ R k × k is typical a diagonal matrix and k ≪ min { n , m } m k k m × × Q T Σ rank- k n A P approx

  3. Why low-rank? Why low-rank approximation works? ◮ data are quite often lie in a low dimensional manifold, a low-rank representation approximates the data well Advantages of low-rank presentation ◮ denoising ◮ visualization ◮ reduce storage requirement from O ( nm ) into O (( n + m ) k ) ◮ reduce computational complexity ◮ Ax ≈ P (Σ( Q T x )), time O ( nm ) → O (( n + m ) k ) ◮ A + ≈ P Σ + Q T , QR-decomposition to ensure P , Q are orthogonal, time O ( nm min { n , m } ) → O (( n + m ) k 2 + k 3 )

  4. Best k -rank approximation Given matrix A and rank k , we solves the following problem: � � A − P Σ Q T � � � � subject to Σ ∈ R k × k min P , Σ , Q The close form solution is truncated SVD, namely keeping the top k singular values and the corresponding singular vectors However, the time complexity O ( nmk ) is too large...

  5. Johnson-Lindenstrauss Theorem. For any 0 < ǫ < 1 and any integer n , let k be a positive integer such that k ≥ 4( ǫ 2 / 2 − ǫ 3 / 3) − 1 ln n . Then for any set V of n points in R d , there is a map f : R d → R k such that for all u , v ∈ V , (1 − ǫ ) � u − v � 2 ≤ � f ( u ) − f ( v ) � 2 ≤ (1 + ǫ ) � u − v � 2 How to construct f ( u ) ◮ f ( u ) = Ru , p ( R ij = 0) = 2 / 3, p ( R ij = +1) = 1 / 6, p ( R ij = − 1) = 1 / 6, ◮ f ( u ) = GSu , G : random Gaussian matrix, S : diagonal scaling matrix ◮ f ( u ) = DHSu , D : random row selection matrix, H Hadamard transform matrix

  6. A basic random projection algorithm Algorithm 1. construct an m × k random matrix Ω, e.g. Gaussian or sampled Hadamard, with ℓ = O ( k /ǫ ) 2. B = A Ω 3. perform truncated SVD B = U k S k V k k where U k ∈ R n × k 4. approximate A by U k ( U T k A ) Theoretical guarantee with high probability � A − U k U T k A � F ≤ (1 + ǫ ) � A − A k � F , where A k is the best rank- k approximation Time complexity: O ( n ( m log( ℓ ) + k 2 )), but ℓ may be much bigger than k

  7. A cheaper but less accurate algorithm Remember JL Theorem, typical ℓ = O ( k ln k ) is good enough Algorithm The same as before, but use a different ℓ = k + p , usually p = 5 , 10 , . . . Theoretical guarantee with high probability � � A − U k U T k A � 2 ≤ 10 ℓ min { n , m }� A − A k � 2 Comparing the previous algorithm, the error constant here is much worse

  8. Still cheap but more accurate algorithm iterate several time improves the space quality, remember how to fast compute the leading singular vector: 1. start with random u , 2. repeat u = A T Au � u � 2 until converges Algorithm the same as above except for B = ( AA T ) q A Ω Theoretical guarantee with high probability k A � 2 ≤ (10 ℓ min { n , m } ) 1 / (4 q +2) � A − A k � 2 � A − U k U T Time complexity: O ( qnm ln ℓ ) by following the order B = ( . . . ( A ( A T ( A Ω))))

  9. Nystr¨ om Methods The previous algorithms need to touch the whole matrix A , which is expensive Nystr¨ om Methods Assume A is symmetric 1. random pick ℓ columns of A to form P 2. the corresponding rows are then Q T = P T , denote by B the k -by- k cross matrix 3. then approximate A by � A = PB + P T ℓ n SVD n + × × P T B Nystr¨ om n A P

  10. Nystr¨ om Methods, cont Theoretical guarantee � � ≤ � A − A k � 2 + 2 n � A − � ℓ A ∗ here A ∗ E A � 2 ii = max A ii ii i The error bound is much worse, but the time complexity reduces to O ( n ℓ k + ℓ 2 k ) with k ≤ ℓ ≪ n

  11. Example, spectral embedding Dataset: minist8m; in total 3,276,294 samples (a) SVD on 8k samples (b) Nystrom method Mu Li, et.al. Making Large-Scale Nystrm Approximation Possible. ICML 2010 Mu Li, et.al. Time and Space Ecient Spectral Clustering via Column Sampling. CVPR 2011

  12. Example, image segmentations 1 Million pixels

  13. Example, image segmentations 1 Million pixels, 2 segmentations, CPU time 1.2 sec

  14. Example, image segmentations 1.8 Million pixels

  15. Example, image segmentations 1.8 Million pixels, 4 segmentations, CPU time 5.9 sec

  16. Example, image segmentations 10 Million pixels

  17. Example, image segmentations 10 Million pixels, 18 segmentations, CPU time 18.9 sec

  18. Example, image segmentations 15 Million pixels

  19. Example, image segmentations 15 Million pixels, 8 segmentations, CPU time 22.6 sec

  20. Conclusion It works!

Recommend


More recommend