ra randomized sv svd cu cur de decom ompos osition on and
play

Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and - PowerPoint PPT Presentation

Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and and SPSD SPSD Ma Matri trix Ap Approximati tion on Shusen Wang Outline CX Decomposition & Approximate SVD CUR Decomposition SPSD Matrix Approximation CX


  1. Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and and SPSD SPSD Ma Matri trix Ap Approximati tion on Shusen Wang

  2. Outline • CX Decomposition & Approximate SVD • CUR Decomposition • SPSD Matrix Approximation

  3. CX Decomposition • Given any matrix 𝐁 ∈ ℝ $×& • The CX decomposition of 𝐁 1. Sketching: 𝐃 = 𝐁𝐐 ∈ ℝ $×* 2. Find 𝐘 such that 𝐁 ≈ 𝐃𝐘 6 = 𝐃 7 𝐁 • E.g. 𝐘 ⋆ = argmin 𝐘 𝐁 − 𝐃𝐘 5 • It costs 𝑃 𝑛𝑜𝑑 • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G

  4. CX Decomposition • Let the sketching matrix 𝐐 ∈ ℝ &×* be defined in the table. 6 ≤ 1 + 𝜗 𝐁 − 𝐁 \ 6 WXYZ 𝐘 [\ 𝐁 − 𝐃𝐘 min • ] ] Uniform sampling Leverage score Gaussian SRHT Count sketch sampling projection O 𝑙 O 𝑙 6 + 𝑙 c ≥ O 𝜉𝑙 log 𝑙 + 1 O 𝑙 log 𝑙 + 1 log 𝑙 + 1 O 𝑙 + log 𝑜 𝜗 𝜗 𝜗 𝜗 𝜗 𝜉 is the column coherence of 𝐁 Z

  5. CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G

  6. CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G Time cost : 𝑃(𝑛𝑑 6 )

  7. CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G J 𝐘 = 𝐚 ∈ ℝ *×& Let 𝚻 G 𝐖 G J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G Time cost : 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 )

  8. CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G J 𝐘 = 𝐚 ∈ ℝ *×& Let 𝚻 G 𝐖 G J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G J ∈ ℝ *×& SVD: 𝐚 = 𝐕 L 𝚻 L 𝐖 L Time cost : 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 + 𝑜𝑑 6 )

  9. CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD 𝑛×𝑡 matrix with orthonormal columns J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝑡×𝑜 matrix with 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G orthonormal rows J 𝐘 = 𝐚 ∈ ℝ *×& Let 𝚻 G 𝐖 G diagonal matrix J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G J ∈ ℝ *×& SVD: 𝐚 = 𝐕 L 𝚻 L 𝐖 L Time cost : 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 + 𝑜𝑑 6 + 𝑛𝑑 6 )

  10. CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD 𝑛×𝑡 matrix with • Done! Approximate rank 𝑑 SVD: 𝐁 ≈ (𝐕 G 𝐕 L )𝚻 L 𝐖 L J orthonormal columns J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝑡×𝑜 matrix with 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G orthonormal rows diagonal matrix Time cost : 𝑃 𝑛𝑑 6 + 𝑜𝑑 6 + 𝑜𝑑 6 + 𝑛𝑑 6 = 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 )

  11. CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD • Given 𝐁 ∈ ℝ $×& and 𝐃 ∈ ℝ $×* , the approximate SVD costs • 𝑃 𝑛𝑜𝑑 time • 𝑃 𝑛𝑑 + 𝑜𝑑 memory

  12. CX Decomposition • The CX decomposition of 𝐁 ∈ ℝ $×& 6 = 𝐃 7 𝐁 • Optimal solution: 𝐘 ⋆ = argmin 𝐘 𝐁 − 𝐃𝐘 5 • How to make it more efficient?

  13. CX Decomposition • The CX decomposition of 𝐁 ∈ ℝ $×& 6 = 𝐃 7 𝐁 • Optimal solution: 𝐘 ⋆ = argmin 𝐘 𝐁 − 𝐃𝐘 5 • How to make it more efficient? A regression problem!

  14. Fast CX Decomposition • Fast CX [Drineas, Mahoney, Muthukrishnan, 2008][Clarkson & Woodruff, 2013] • Draw another sketching matrix 𝐓 ∈ ℝ $×m 6 = 𝐓 J 𝐃 7 𝐓 J 𝐁 n = argmin 𝐘 𝐓 o 𝐁 − 𝐃𝐘 • Compute 𝐘 5 • Time cost: 𝑃 𝑜𝑑𝑡 + TimeOfSketch q 𝑑/𝜗 , • When 𝑡 = 𝑃 6 6 n 𝐁 − 𝐃𝐘 ≤ 1 + 𝜗 ⋅ min 𝐘 𝐁 − 𝐃𝐘 5 5

  15. Outline • CX Decomposition & Approximate SVD • CUR Decomposition • SPSD Matrix Approximation

  16. CUR Decomposition • Sketching • 𝐃 = 𝐁𝐐 𝐃 ∈ ℝ $×* J 𝐁 ∈ ℝ v×& • 𝐒 = 𝐐 𝐒 • Find 𝐕 such that 𝐃𝐕𝐒 ≈ 𝐁 • CUR ⇔ Approximate SVD • In the same way as “ CX ⇔ Approximate SVD ”

  17. CUR Decomposition • Sketching • 𝐃 = 𝐁𝐐 𝐃 ∈ ℝ $×* J 𝐁 ∈ ℝ v×& • 𝐒 = 𝐐 𝐒 • Find 𝐕 such that 𝐃𝐕𝐒 ≈ 𝐁 • CUR ⇔ Approximate SVD • In the same way as “ CX ⇔ Approximate SVD ” • 3 types of 𝐕

  18. CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 𝐒 𝐁 𝐃 𝐕

  19. CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘

  20. CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘

  21. CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘 n = 𝐃 𝐕 𝐒 • They’re equivalent: 𝐃 𝐘

  22. CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘 n = 𝐃 𝐕 𝐒 • They’re equivalent: 𝐃 𝐘 \ * q q • Require 𝑑 = 𝑃 w and 𝑠 = 𝑃 w such that 6 ≤ 1 + 𝜗 𝐁 − 𝐁 \ 6 𝐁 − 𝐃𝐕𝐒 5 5

  23. CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 𝐔 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Efficient • O 𝑠𝑑 6 + TimeOfSketch • Loose bound • Sketch size ∝ 𝜗 {6 • Bad empirical performance

  24. CUR Decomposition • Type 2: Optimal CUR 6 = 𝐃 7 𝐁𝐒 7 𝐕 ⋆ = min 𝐁 − 𝐃𝐕𝐒 ] 𝐕

  25. CUR Decomposition • Type 2: Optimal CUR 6 = 𝐃 7 𝐁𝐒 7 𝐕 ⋆ = min 𝐁 − 𝐃𝐕𝐒 ] 𝐕 • Theory [W & Zhang, 2013], [Boutsidis & Woodruff, 2014] : • 𝐃 and 𝐒 are selected by the adaptive sampling algorithm \ \ • 𝑑 = 𝑃 w and 𝑠 = 𝑃 w 6 ≤ 1 + 𝜗 𝐁 − 𝐁 \ 6 𝐁 − 𝐃𝐕𝐒 • ] 5

  26. CUR Decomposition • Type 2: Optimal CUR 6 = 𝐃 7 𝐁𝐒 7 𝐕 ⋆ = min 𝐁 − 𝐃𝐕𝐒 ] 𝐕 • Inefficient • O 𝑛𝑜𝑑 + TimeOfSketch

  27. CUR Decomposition • Type 3: Fast CUR [W, Zhang, Zhang, 2015] • Draw 2 sketching matrices 𝐓 𝐃 and 𝐓 𝐒 • Solve the problem 7 𝐓 𝐃 6 o 𝐁 − 𝐃𝐕𝐒 𝐓 𝐒 n = min J 𝐃 o 𝐁𝐓 𝑺 𝐒𝐓 𝐒 7 𝐕 𝑻 𝑫 = 𝐓 𝐃 𝐕 ] • Intuition?

  28. CUR Decomposition • The optimal 𝐕 matrix is obtained by the optimization problem 6 𝐕 ⋆ = min 𝐃𝐕𝐒 − 𝐁 ] 𝐕

  29. CUR Decomposition • Approximately solve the optimization problem, e.g. by column selection

  30. CUR Decomposition • Solve the small scale problem

  31. CUR Decomposition • Type 3: Fast CUR [W, Zhang, Zhang, 2015] • Draw 2 sketching matrices 𝐓 𝐃 ∈ ℝ $×m € and 𝐓 𝐒 ∈ ℝ &×m • • Solve the problem 7 𝐓 𝐃 6 o 𝐁 − 𝐃𝐕𝐒 𝐓 𝐒 n = min J 𝐃 o 𝐁𝐓 𝑺 𝐒𝐓 𝐒 7 𝐕 𝐓 𝑫 = 𝐓 𝐃 𝐕 ] • Theory * v • 𝑡 * = 𝑃 w and s v = 𝑃 w 6 6 n𝐒 𝐁 − 𝐃𝐕 ≤ 1 + 𝜗 ⋅ min 𝐁 − 𝐃𝐕𝐒 • ] 𝐕 ]

  32. CUR Decomposition • Type 3: Fast CUR [W, Zhang, Zhang, 2015] • Draw 2 sketching matrices 𝐓 𝐃 ∈ ℝ $×m € and 𝐓 𝐒 ∈ ℝ &×m • • Solve the problem 7 𝐓 𝐃 6 o 𝐁 − 𝐃𝐕𝐒 𝐓 𝐒 n = min J 𝐃 o 𝐁𝐓 𝑺 𝐒𝐓 𝐒 7 𝐕 𝐓 𝑫 = 𝐓 𝐃 𝐕 ] • Efficient • 𝑃 𝑡 * 𝑡 v 𝑑 + 𝑠 + TimeOfSketch • Good empirical performance

  33. 𝐁 : 𝑛 = 1920 𝑜 = 1168 𝐃 and 𝐒 : 𝑑 = 𝑠 = 100 • uniform sampling • Original Type 2: Optimal CUR Type 1: Fast CX Type 3: Fast CUR Type 3: Fast CUR 𝑡 * = 2𝑑, 𝑡 v = 2𝑠 𝑡 * = 4𝑑, 𝑡 v = 4𝑠

  34. Conclusions • Approximate truncated SVD • CX decomposition • CUR decomposition (3 types) • Fast CUR is the best

  35. Outline • CX Decomposition & Approximate SVD • CUR Decomposition • SPSD Matrix Approximation

  36. Motivation 1: Kernel Matrix • Given 𝑜 samples 𝐲 ‰ , ⋯ , 𝐲 & ∈ ℝ ‹ and kernel function 𝜆 ⋅,⋅ . • E.g. Gaussian RBF kernel 6 𝐲 • − 𝐲 Ž 6 𝜆 𝐲 • , 𝐲 Ž = exp − . 𝜏 6 • Computing the kernel matrix 𝐋 ∈ ℝ &×& • where 𝑙 •Ž = 𝜆 𝐲 • , 𝐲 Ž • costs O(𝑜 6 𝑒) time

Recommend


More recommend