Ra Randomized SV SVD, CU CUR De Decom ompos osition on, and and SPSD SPSD Ma Matri trix Ap Approximati tion on Shusen Wang
Outline • CX Decomposition & Approximate SVD • CUR Decomposition • SPSD Matrix Approximation
CX Decomposition • Given any matrix 𝐁 ∈ ℝ $×& • The CX decomposition of 𝐁 1. Sketching: 𝐃 = 𝐁𝐐 ∈ ℝ $×* 2. Find 𝐘 such that 𝐁 ≈ 𝐃𝐘 6 = 𝐃 7 𝐁 • E.g. 𝐘 ⋆ = argmin 𝐘 𝐁 − 𝐃𝐘 5 • It costs 𝑃 𝑛𝑜𝑑 • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G
CX Decomposition • Let the sketching matrix 𝐐 ∈ ℝ &×* be defined in the table. 6 ≤ 1 + 𝜗 𝐁 − 𝐁 \ 6 WXYZ 𝐘 [\ 𝐁 − 𝐃𝐘 min • ] ] Uniform sampling Leverage score Gaussian SRHT Count sketch sampling projection O 𝑙 O 𝑙 6 + 𝑙 c ≥ O 𝜉𝑙 log 𝑙 + 1 O 𝑙 log 𝑙 + 1 log 𝑙 + 1 O 𝑙 + log 𝑜 𝜗 𝜗 𝜗 𝜗 𝜗 𝜉 is the column coherence of 𝐁 Z
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G Time cost : 𝑃(𝑛𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G J 𝐘 = 𝐚 ∈ ℝ *×& Let 𝚻 G 𝐖 G J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G Time cost : 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G J 𝐘 = 𝐚 ∈ ℝ *×& Let 𝚻 G 𝐖 G J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G J ∈ ℝ *×& SVD: 𝐚 = 𝐕 L 𝚻 L 𝐖 L Time cost : 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 + 𝑜𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD 𝑛×𝑡 matrix with orthonormal columns J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝑡×𝑜 matrix with 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G orthonormal rows J 𝐘 = 𝐚 ∈ ℝ *×& Let 𝚻 G 𝐖 G diagonal matrix J ∈ ℝ $×* SVD: 𝐃 = 𝐕 G 𝚻 G 𝐖 G J ∈ ℝ *×& SVD: 𝐚 = 𝐕 L 𝚻 L 𝐖 L Time cost : 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 + 𝑜𝑑 6 + 𝑛𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD 𝑛×𝑡 matrix with • Done! Approximate rank 𝑑 SVD: 𝐁 ≈ (𝐕 G 𝐕 L )𝚻 L 𝐖 L J orthonormal columns J 𝐘 = 𝐕 G 𝐚 = 𝐕 G 𝐕 L 𝚻 L 𝐖 L J 𝑡×𝑜 matrix with 𝐁 ≈ 𝐃𝐘 = 𝐕 G 𝚻 G 𝐖 G orthonormal rows diagonal matrix Time cost : 𝑃 𝑛𝑑 6 + 𝑜𝑑 6 + 𝑜𝑑 6 + 𝑛𝑑 6 = 𝑃(𝑛𝑑 6 + 𝑜𝑑 6 )
CX Decomposition ⇔ Approximate SVD • CX decomposition ⇔ approximate SVD • Given 𝐁 ∈ ℝ $×& and 𝐃 ∈ ℝ $×* , the approximate SVD costs • 𝑃 𝑛𝑜𝑑 time • 𝑃 𝑛𝑑 + 𝑜𝑑 memory
CX Decomposition • The CX decomposition of 𝐁 ∈ ℝ $×& 6 = 𝐃 7 𝐁 • Optimal solution: 𝐘 ⋆ = argmin 𝐘 𝐁 − 𝐃𝐘 5 • How to make it more efficient?
CX Decomposition • The CX decomposition of 𝐁 ∈ ℝ $×& 6 = 𝐃 7 𝐁 • Optimal solution: 𝐘 ⋆ = argmin 𝐘 𝐁 − 𝐃𝐘 5 • How to make it more efficient? A regression problem!
Fast CX Decomposition • Fast CX [Drineas, Mahoney, Muthukrishnan, 2008][Clarkson & Woodruff, 2013] • Draw another sketching matrix 𝐓 ∈ ℝ $×m 6 = 𝐓 J 𝐃 7 𝐓 J 𝐁 n = argmin 𝐘 𝐓 o 𝐁 − 𝐃𝐘 • Compute 𝐘 5 • Time cost: 𝑃 𝑜𝑑𝑡 + TimeOfSketch q 𝑑/𝜗 , • When 𝑡 = 𝑃 6 6 n 𝐁 − 𝐃𝐘 ≤ 1 + 𝜗 ⋅ min 𝐘 𝐁 − 𝐃𝐘 5 5
Outline • CX Decomposition & Approximate SVD • CUR Decomposition • SPSD Matrix Approximation
CUR Decomposition • Sketching • 𝐃 = 𝐁𝐐 𝐃 ∈ ℝ $×* J 𝐁 ∈ ℝ v×& • 𝐒 = 𝐐 𝐒 • Find 𝐕 such that 𝐃𝐕𝐒 ≈ 𝐁 • CUR ⇔ Approximate SVD • In the same way as “ CX ⇔ Approximate SVD ”
CUR Decomposition • Sketching • 𝐃 = 𝐁𝐐 𝐃 ∈ ℝ $×* J 𝐁 ∈ ℝ v×& • 𝐒 = 𝐐 𝐒 • Find 𝐕 such that 𝐃𝐕𝐒 ≈ 𝐁 • CUR ⇔ Approximate SVD • In the same way as “ CX ⇔ Approximate SVD ” • 3 types of 𝐕
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 𝐒 𝐁 𝐃 𝐕
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘 n = 𝐃 𝐕 𝐒 • They’re equivalent: 𝐃 𝐘
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 o 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Recall the fast CX decomposition 7 𝐐 𝐒 n = 𝐃 𝐐 𝐒 o 𝐃 o 𝐁 = 𝐃𝐕𝐒 𝐁 ≈ 𝐃𝐘 n = 𝐃 𝐕 𝐒 • They’re equivalent: 𝐃 𝐘 \ * q q • Require 𝑑 = 𝑃 w and 𝑠 = 𝑃 w such that 6 ≤ 1 + 𝜗 𝐁 − 𝐁 \ 6 𝐁 − 𝐃𝐕𝐒 5 5
CUR Decomposition • Type 1 [Drineas, Mahoney, Muthukrishnan, 2008] : 7 𝐔 𝐁𝐐 𝐃 𝐕 = 𝐐 𝐒 • Efficient • O 𝑠𝑑 6 + TimeOfSketch • Loose bound • Sketch size ∝ 𝜗 {6 • Bad empirical performance
CUR Decomposition • Type 2: Optimal CUR 6 = 𝐃 7 𝐁𝐒 7 𝐕 ⋆ = min 𝐁 − 𝐃𝐕𝐒 ] 𝐕
CUR Decomposition • Type 2: Optimal CUR 6 = 𝐃 7 𝐁𝐒 7 𝐕 ⋆ = min 𝐁 − 𝐃𝐕𝐒 ] 𝐕 • Theory [W & Zhang, 2013], [Boutsidis & Woodruff, 2014] : • 𝐃 and 𝐒 are selected by the adaptive sampling algorithm \ \ • 𝑑 = 𝑃 w and 𝑠 = 𝑃 w 6 ≤ 1 + 𝜗 𝐁 − 𝐁 \ 6 𝐁 − 𝐃𝐕𝐒 • ] 5
CUR Decomposition • Type 2: Optimal CUR 6 = 𝐃 7 𝐁𝐒 7 𝐕 ⋆ = min 𝐁 − 𝐃𝐕𝐒 ] 𝐕 • Inefficient • O 𝑛𝑜𝑑 + TimeOfSketch
CUR Decomposition • Type 3: Fast CUR [W, Zhang, Zhang, 2015] • Draw 2 sketching matrices 𝐓 𝐃 and 𝐓 𝐒 • Solve the problem 7 𝐓 𝐃 6 o 𝐁 − 𝐃𝐕𝐒 𝐓 𝐒 n = min J 𝐃 o 𝐁𝐓 𝑺 𝐒𝐓 𝐒 7 𝐕 𝑻 𝑫 = 𝐓 𝐃 𝐕 ] • Intuition?
CUR Decomposition • The optimal 𝐕 matrix is obtained by the optimization problem 6 𝐕 ⋆ = min 𝐃𝐕𝐒 − 𝐁 ] 𝐕
CUR Decomposition • Approximately solve the optimization problem, e.g. by column selection
CUR Decomposition • Solve the small scale problem
CUR Decomposition • Type 3: Fast CUR [W, Zhang, Zhang, 2015] • Draw 2 sketching matrices 𝐓 𝐃 ∈ ℝ $×m € and 𝐓 𝐒 ∈ ℝ &×m • • Solve the problem 7 𝐓 𝐃 6 o 𝐁 − 𝐃𝐕𝐒 𝐓 𝐒 n = min J 𝐃 o 𝐁𝐓 𝑺 𝐒𝐓 𝐒 7 𝐕 𝐓 𝑫 = 𝐓 𝐃 𝐕 ] • Theory * v • 𝑡 * = 𝑃 w and s v = 𝑃 w 6 6 n𝐒 𝐁 − 𝐃𝐕 ≤ 1 + 𝜗 ⋅ min 𝐁 − 𝐃𝐕𝐒 • ] 𝐕 ]
CUR Decomposition • Type 3: Fast CUR [W, Zhang, Zhang, 2015] • Draw 2 sketching matrices 𝐓 𝐃 ∈ ℝ $×m € and 𝐓 𝐒 ∈ ℝ &×m • • Solve the problem 7 𝐓 𝐃 6 o 𝐁 − 𝐃𝐕𝐒 𝐓 𝐒 n = min J 𝐃 o 𝐁𝐓 𝑺 𝐒𝐓 𝐒 7 𝐕 𝐓 𝑫 = 𝐓 𝐃 𝐕 ] • Efficient • 𝑃 𝑡 * 𝑡 v 𝑑 + 𝑠 + TimeOfSketch • Good empirical performance
𝐁 : 𝑛 = 1920 𝑜 = 1168 𝐃 and 𝐒 : 𝑑 = 𝑠 = 100 • uniform sampling • Original Type 2: Optimal CUR Type 1: Fast CX Type 3: Fast CUR Type 3: Fast CUR 𝑡 * = 2𝑑, 𝑡 v = 2𝑠 𝑡 * = 4𝑑, 𝑡 v = 4𝑠
Conclusions • Approximate truncated SVD • CX decomposition • CUR decomposition (3 types) • Fast CUR is the best
Outline • CX Decomposition & Approximate SVD • CUR Decomposition • SPSD Matrix Approximation
Motivation 1: Kernel Matrix • Given 𝑜 samples 𝐲 ‰ , ⋯ , 𝐲 & ∈ ℝ ‹ and kernel function 𝜆 ⋅,⋅ . • E.g. Gaussian RBF kernel 6 𝐲 • − 𝐲 Ž 6 𝜆 𝐲 • , 𝐲 Ž = exp − . 𝜏 6 • Computing the kernel matrix 𝐋 ∈ ℝ &×& • where 𝑙 •Ž = 𝜆 𝐲 • , 𝐲 Ž • costs O(𝑜 6 𝑒) time
Recommend
More recommend