Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and - PowerPoint PPT Presentation
Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and and SPSD SPSD Ma Matri trix Ap Approximati tion on Shusen Wang Outline CX Decomposition & Approximate SVD CUR Decomposition SPSD Matrix Approximation CX
Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and and SPSD SPSD Ma Matri trix Ap Approximati tion on Shusen Wang
Outline • CX Decomposition & Approximate SVD • CUR Decomposition • SPSD Matrix Approximation
CX Decomposition • Given any matrix 𝐁 ∈ ℝ $×& • The CX decomposition of 𝐁 1. Sketching: 𝐃 = 𝐁𝐐 ∈ ℝ $×* 2. Find 𝐘 such that 𝐁 ≈ 𝐃𝐘 6 = 𝐃 7 𝐁 • E.g. 𝐘 ⋆ = argmin 𝐘 𝐁 − 𝐃𝐘 5 • It costs 𝑃 𝑛𝑜𝑑 • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G
CX Decomposition • Let the sketching matrix 𝐐 ∈ ℝ &×* be defined in the table. 6 ≤ 1 + 𝜗 𝐁 − 𝐁 \ 6 WXYZ 𝐘 [\ 𝐁 − 𝐃𝐘 min • ] ] Uniform sampling Leverage score Gaussian SRHT Count sketch sampling projection O 𝑙 O 𝑙 6 + 𝑙 c ≥ O 𝜉𝑙 log 𝑙 + 1 O 𝑙 log 𝑙 + 1 log 𝑙 + 1 O 𝑙 + log 𝑜 𝜗 𝜗 𝜗 𝜗 𝜗 𝜉 is the column coherence of 𝐁 Z
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G Time cost : 𝑃(𝑛𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G J 𝐘 = 𝐚 ∈ ℝ *×& Let 𝚻 G 𝐖 G J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G Time cost : 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G J 𝐘 = 𝐚 ∈ ℝ *×& Let 𝚻 G 𝐖 G J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G J ∈ ℝ *×& SVD: 𝐚 = 𝐕 L 𝚻 L 𝐖 L Time cost : 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 + 𝑜𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD 𝑛×𝑡 matrix with orthonormal columns J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝑡×𝑜 matrix with 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G orthonormal rows J 𝐘 = 𝐚 ∈ ℝ *×& Let 𝚻 G 𝐖 G diagonal matrix J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G J ∈ ℝ *×& SVD: 𝐚 = 𝐕 L 𝚻 L 𝐖 L Time cost : 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 + 𝑜𝑑 6 + 𝑛𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD 𝑛×𝑡 matrix with • Done! Approximate rank 𝑑 SVD: 𝐁 ≈ (𝐕 G 𝐕 L )𝚻 L 𝐖 L J orthonormal columns J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝑡×𝑜 matrix with 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G orthonormal rows diagonal matrix Time cost : 𝑃 𝑛𝑑 6 + 𝑜𝑑 6 + 𝑜𝑑 6 + 𝑛𝑑 6 = 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD • Given 𝐁 ∈ ℝ $×& and 𝐃 ∈ ℝ $×* , the approximate SVD costs • 𝑃 𝑛𝑜𝑑 time • 𝑃 𝑛𝑑 + 𝑜𝑑 memory
CX Decomposition • The CX decomposition of 𝐁 ∈ ℝ $×& 6 = 𝐃 7 𝐁 • Optimal solution: 𝐘 ⋆ = argmin 𝐘 𝐁 − 𝐃𝐘 5 • How to make it more efficient?
CX Decomposition • The CX decomposition of 𝐁 ∈ ℝ $×& 6 = 𝐃 7 𝐁 • Optimal solution: 𝐘 ⋆ = argmin 𝐘 𝐁 − 𝐃𝐘 5 • How to make it more efficient? A regression problem!
Fast CX Decomposition • Fast CX [Drineas, Mahoney, Muthukrishnan, 2008][Clarkson & Woodruff, 2013] • Draw another sketching matrix 𝐓 ∈ ℝ $×m 6 = 𝐓 J 𝐃 7 𝐓 J 𝐁 n = argmin 𝐘 𝐓 o 𝐁 − 𝐃𝐘 • Compute 𝐘 5 • Time cost: 𝑃 𝑜𝑑𝑡 + TimeOfSketch q 𝑑/𝜗 , • When 𝑡 = 𝑃 6 6 n 𝐁 − 𝐃𝐘 ≤ 1 + 𝜗 ⋅ min 𝐘 𝐁 − 𝐃𝐘 5 5
Outline • CX Decomposition & Approximate SVD • CUR Decomposition • SPSD Matrix Approximation
CUR Decomposition • Sketching • 𝐃 = 𝐁𝐐 𝐃 ∈ ℝ $×* J 𝐁 ∈ ℝ v×& • 𝐒 = 𝐐 𝐒 • Find 𝐕 such that 𝐃𝐕𝐒 ≈ 𝐁 • CUR ⇔ Approximate SVD • In the same way as “ CX ⇔ Approximate SVD ”
CUR Decomposition • Sketching • 𝐃 = 𝐁𝐐 𝐃 ∈ ℝ $×* J 𝐁 ∈ ℝ v×& • 𝐒 = 𝐐 𝐒 • Find 𝐕 such that 𝐃𝐕𝐒 ≈ 𝐁 • CUR ⇔ Approximate SVD • In the same way as “ CX ⇔ Approximate SVD ” • 3 types of 𝐕
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 𝐒 𝐁 𝐃 𝐕
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘 n = 𝐃 𝐕 𝐒 • They’re equivalent: 𝐃 𝐘
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘 n = 𝐃 𝐕 𝐒 • They’re equivalent: 𝐃 𝐘 \ * q q • Require 𝑑 = 𝑃 w and 𝑠 = 𝑃 w such that 6 ≤ 1 + 𝜗 𝐁 − 𝐁 \ 6 𝐁 − 𝐃𝐕𝐒 5 5
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 𝐔 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Efficient • O 𝑠𝑑 6 + TimeOfSketch • Loose bound • Sketch size ∝ 𝜗 {6 • Bad empirical performance
CUR Decomposition • Type 2: Optimal CUR 6 = 𝐃 7 𝐁𝐒 7 𝐕 ⋆ = min 𝐁 − 𝐃𝐕𝐒 ] 𝐕
CUR Decomposition • Type 2: Optimal CUR 6 = 𝐃 7 𝐁𝐒 7 𝐕 ⋆ = min 𝐁 − 𝐃𝐕𝐒 ] 𝐕 • Theory [W & Zhang, 2013], [Boutsidis & Woodruff, 2014] : • 𝐃 and 𝐒 are selected by the adaptive sampling algorithm \ \ • 𝑑 = 𝑃 w and 𝑠 = 𝑃 w 6 ≤ 1 + 𝜗 𝐁 − 𝐁 \ 6 𝐁 − 𝐃𝐕𝐒 • ] 5
CUR Decomposition • Type 2: Optimal CUR 6 = 𝐃 7 𝐁𝐒 7 𝐕 ⋆ = min 𝐁 − 𝐃𝐕𝐒 ] 𝐕 • Inefficient • O 𝑛𝑜𝑑 + TimeOfSketch
CUR Decomposition • Type 3: Fast CUR [W, Zhang, Zhang, 2015] • Draw 2 sketching matrices 𝐓 𝐃 and 𝐓 𝐒 • Solve the problem 7 𝐓 𝐃 6 o 𝐁 − 𝐃𝐕𝐒 𝐓 𝐒 n = min J 𝐃 o 𝐁𝐓 𝑺 𝐒𝐓 𝐒 7 𝐕 𝑻 𝑫 = 𝐓 𝐃 𝐕 ] • Intuition?
CUR Decomposition • The optimal 𝐕 matrix is obtained by the optimization problem 6 𝐕 ⋆ = min 𝐃𝐕𝐒 − 𝐁 ] 𝐕
CUR Decomposition • Approximately solve the optimization problem, e.g. by column selection
CUR Decomposition • Solve the small scale problem
CUR Decomposition • Type 3: Fast CUR [W, Zhang, Zhang, 2015] • Draw 2 sketching matrices 𝐓 𝐃 ∈ ℝ $×m € and 𝐓 𝐒 ∈ ℝ &×m • • Solve the problem 7 𝐓 𝐃 6 o 𝐁 − 𝐃𝐕𝐒 𝐓 𝐒 n = min J 𝐃 o 𝐁𝐓 𝑺 𝐒𝐓 𝐒 7 𝐕 𝐓 𝑫 = 𝐓 𝐃 𝐕 ] • Theory * v • 𝑡 * = 𝑃 w and s v = 𝑃 w 6 6 n𝐒 𝐁 − 𝐃𝐕 ≤ 1 + 𝜗 ⋅ min 𝐁 − 𝐃𝐕𝐒 • ] 𝐕 ]
CUR Decomposition • Type 3: Fast CUR [W, Zhang, Zhang, 2015] • Draw 2 sketching matrices 𝐓 𝐃 ∈ ℝ $×m € and 𝐓 𝐒 ∈ ℝ &×m • • Solve the problem 7 𝐓 𝐃 6 o 𝐁 − 𝐃𝐕𝐒 𝐓 𝐒 n = min J 𝐃 o 𝐁𝐓 𝑺 𝐒𝐓 𝐒 7 𝐕 𝐓 𝑫 = 𝐓 𝐃 𝐕 ] • Efficient • 𝑃 𝑡 * 𝑡 v 𝑑 + 𝑠 + TimeOfSketch • Good empirical performance
𝐁 : 𝑛 = 1920 𝑜 = 1168 𝐃 and 𝐒 : 𝑑 = 𝑠 = 100 • uniform sampling • Original Type 2: Optimal CUR Type 1: Fast CX Type 3: Fast CUR Type 3: Fast CUR 𝑡 * = 2𝑑, 𝑡 v = 2𝑠 𝑡 * = 4𝑑, 𝑡 v = 4𝑠
Conclusions • Approximate truncated SVD • CX decomposition • CUR decomposition (3 types) • Fast CUR is the best
Outline • CX Decomposition & Approximate SVD • CUR Decomposition • SPSD Matrix Approximation
Motivation 1: Kernel Matrix • Given 𝑜 samples 𝐲 ‰ , ⋯ , 𝐲 & ∈ ℝ ‹ and kernel function 𝜆 ⋅,⋅ . • E.g. Gaussian RBF kernel 6 𝐲 • − 𝐲 Ž 6 𝜆 𝐲 • , 𝐲 Ž = exp − . 𝜏 6 • Computing the kernel matrix 𝐋 ∈ ℝ &×& • where 𝑙 •Ž = 𝜆 𝐲 • , 𝐲 Ž • costs O(𝑜 6 𝑒) time
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.