Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013
Low-rank approximation Given a matrix A ∈ R n × m , we form the rank- k approximation by A = P Σ Q T � � such that A ≈ A , where Σ ∈ R k × k is typical a diagonal matrix and k ≪ min { n , m } m k k m × × Q T Σ rank- k n A P approx
Why low-rank? Why low-rank approximation works? ◮ data are quite often lie in a low dimensional manifold, a low-rank representation approximates the data well Advantages of low-rank presentation ◮ denoising ◮ visualization ◮ reduce storage requirement from O ( nm ) into O (( n + m ) k ) ◮ reduce computational complexity ◮ Ax ≈ P (Σ( Q T x )), time O ( nm ) → O (( n + m ) k ) ◮ A + ≈ P Σ + Q T , QR-decomposition to ensure P , Q are orthogonal, time O ( nm min { n , m } ) → O (( n + m ) k 2 + k 3 )
Best k -rank approximation Given matrix A and rank k , we solves the following problem: � � A − P Σ Q T � � � � subject to Σ ∈ R k × k min P , Σ , Q The close form solution is truncated SVD, namely keeping the top k singular values and the corresponding singular vectors However, the time complexity O ( nmk ) is too large...
Johnson-Lindenstrauss Theorem. For any 0 < ǫ < 1 and any integer n , let k be a positive integer such that k ≥ 4( ǫ 2 / 2 − ǫ 3 / 3) − 1 ln n . Then for any set V of n points in R d , there is a map f : R d → R k such that for all u , v ∈ V , (1 − ǫ ) � u − v � 2 ≤ � f ( u ) − f ( v ) � 2 ≤ (1 + ǫ ) � u − v � 2 How to construct f ( u ) ◮ f ( u ) = Ru , p ( R ij = 0) = 2 / 3, p ( R ij = +1) = 1 / 6, p ( R ij = − 1) = 1 / 6, ◮ f ( u ) = GSu , G : random Gaussian matrix, S : diagonal scaling matrix ◮ f ( u ) = DHSu , D : random row selection matrix, H Hadamard transform matrix
A basic random projection algorithm Algorithm 1. construct an m × k random matrix Ω, e.g. Gaussian or sampled Hadamard, with ℓ = O ( k /ǫ ) 2. B = A Ω 3. perform truncated SVD B = U k S k V k k where U k ∈ R n × k 4. approximate A by U k ( U T k A ) Theoretical guarantee with high probability � A − U k U T k A � F ≤ (1 + ǫ ) � A − A k � F , where A k is the best rank- k approximation Time complexity: O ( n ( m log( ℓ ) + k 2 )), but ℓ may be much bigger than k
A cheaper but less accurate algorithm Remember JL Theorem, typical ℓ = O ( k ln k ) is good enough Algorithm The same as before, but use a different ℓ = k + p , usually p = 5 , 10 , . . . Theoretical guarantee with high probability � � A − U k U T k A � 2 ≤ 10 ℓ min { n , m }� A − A k � 2 Comparing the previous algorithm, the error constant here is much worse
Still cheap but more accurate algorithm iterate several time improves the space quality, remember how to fast compute the leading singular vector: 1. start with random u , 2. repeat u = A T Au � u � 2 until converges Algorithm the same as above except for B = ( AA T ) q A Ω Theoretical guarantee with high probability k A � 2 ≤ (10 ℓ min { n , m } ) 1 / (4 q +2) � A − A k � 2 � A − U k U T Time complexity: O ( qnm ln ℓ ) by following the order B = ( . . . ( A ( A T ( A Ω))))
Nystr¨ om Methods The previous algorithms need to touch the whole matrix A , which is expensive Nystr¨ om Methods Assume A is symmetric 1. random pick ℓ columns of A to form P 2. the corresponding rows are then Q T = P T , denote by B the k -by- k cross matrix 3. then approximate A by � A = PB + P T ℓ n SVD n + × × P T B Nystr¨ om n A P
Nystr¨ om Methods, cont Theoretical guarantee � � ≤ � A − A k � 2 + 2 n � A − � ℓ A ∗ here A ∗ E A � 2 ii = max A ii ii i The error bound is much worse, but the time complexity reduces to O ( n ℓ k + ℓ 2 k ) with k ≤ ℓ ≪ n
Example, spectral embedding Dataset: minist8m; in total 3,276,294 samples (a) SVD on 8k samples (b) Nystrom method Mu Li, Making Large-Scale Nystrm Approximation Possible. ICML 2010 Mu Li, Time and Space Ecient Spectral Clustering via Column Sampling. CVPR 2011
Example, image segmentations 1 Million pixels
Example, image segmentations 1 Million pixels, 2 segmentations, CPU time 1.2 sec
Example, image segmentations 1.8 Million pixels
Example, image segmentations 1.8 Million pixels, 4 segmentations, CPU time 5.9 sec
Example, image segmentations 10 Million pixels
Example, image segmentations 10 Million pixels, 18 segmentations, CPU time 18.9 sec
Example, image segmentations 15 Million pixels
Example, image segmentations 15 Million pixels, 8 segmentations, CPU time 22.6 sec
Conclusion It works!
More recommend