Matrix Sketching over Sliding Windows Zhewei Wei 1 , Xuancheng Liu 1 , Feifei Li 2 , Shuo Shang 1 Xiaoyong Du 1 , Ji-Rong Wen 1 1 School of Information, Renmin University of China 2 School of Computing, The University of Utah
Matrix data • Modern data sets are modeled as large matrices. Think of 𝐵 ∈ 𝑆 𝑜×𝑒 as n rows in 𝑆 𝑒 . • Data Rows Columns d n 10 5 – 10 7 Textual Documents Words >10 10 10 1 – 10 4 >10 7 Actions Users Types 10 5 – 10 6 >10 8 Visual Images Pixels, SIFT 10 5 – 10 6 >10 8 Audio Songs, tracks Frequencies 10 2 – 10 4 >10 6 Machine Learning Examples Features 10 3 – 10 5 Financial Prices Items, Stocks >10 6
Singular Value Decomposition (SVD) 𝑊 𝑈 𝐵 𝑉 Σ 𝑤 𝑒1 𝑤 11 … … 𝑏 11 𝑏 1𝑒 𝑣 11 𝑣 1𝑜 𝜀 1 … … 0 0 𝜀 2 0 0 … ⋮ ⋮ × × ⋱ ⋮ ⋮ ⋮ … ⋮ ⋮ … ⋮ 𝑤 1𝑒 𝑤 𝑜𝑒 … 𝜀 𝑒 … 0 0 = … 0 0 0 ⋮ ⋮ ⋮ … 𝑣 𝑜1 𝑣 𝑜𝑜 … … 0 𝑏 𝑜1 𝑏 𝑜𝑒 0 0 … • Principal component analysis (PCA) • K-means clustering • Latent semantic indexing (LSI)
SVD & Eigenvalue decomposition 𝐵 𝐵 𝑈 𝑏 𝑜1 𝑏 11 𝑏 11 𝑏 1𝑒 … … … Covariance Matrix ⋮ × ⋮ 𝐵 𝑈 𝐵 𝑏 1𝑒 𝑏 𝑜𝑒 … … ⋮ ⋮ 𝑏 𝑜1 𝑏 𝑜𝑒 … 𝑊 𝑈 𝑊 Σ 2 𝑤 𝑒1 𝑤 11 𝑤 1𝑒 𝑤 11 … 2 … … 𝜀 1 0 0 2 𝜀 2 0 0 … … = ⋮ ⋮ ⋮ ⋮ × × ⋱ ⋮ ⋮ 𝑤 𝑒1 𝑤 𝑜𝑒 𝑤 1𝑒 𝑤 𝑜𝑒 … 2 … … 𝜀 𝑒 0 0
Matrix Sketching 𝑒 • Computing SVD is slow (and offline). 𝐶 𝑚 𝑏 𝑗 • Matrix sketching: approximate large matrix 𝐵 ∈ 𝑆 𝑜×𝑒 with B ∈ 𝑆 𝑚×𝑒 , 𝑚 ≪ 𝑜 , in an online fashion. • Row-update stream: each update receives a row. • Covariance error [Liberty2013, Ghashami2014, 2 ≤ 𝜁 . Woodruff2016]: 𝐵 𝑈 𝐵 − 𝐶 𝑈 𝐶 /||𝐵|| 𝐺 𝐵 𝑜 • Feature hashing [Weinberger2009], random projection [Papadimitriou2011], … • Frequent Directions (FD) [Liberty2013]: 𝑏 𝑗 B ∈ 𝑆 𝑚×𝑒 , 𝑚 = 1 𝜁 , s.t. covariance error ≤ 𝜁 .
Matrix Sketching over Sliding Windows • Each row is associated with a timestamp. • Maintain 𝐶 𝑋 for 𝐵 𝑋 : rows in sliding window 𝑋. 𝑈 𝐵 𝑋 − 𝐶 𝑋 𝑈 𝐶 𝑋 ||/||𝐵 𝑋 || 𝐺 2 ≤ 𝜁 Covariance error: ||𝐵 𝑋 • Sequence-based window: past N rows. 𝐵 𝑋 : 𝑂 rows • Time-based window: rows in a past time period Δ . 𝐵 𝑋 : rows in Δ time units
Motivation 1: Sliding windows vs. unbounded streams • Sliding window model is a more appropriate model in many real-world applications. • Particularly so in the areas of data analysis wherein matrix sketching techniques are widely used. • Applications: Analyzing tweets for the past 24 hours. Sliding window PCA for detecting changes and anomalies [Papadimitriou2006, Qahtan2015].
Motivation 2: Lower bound • Unbounded stream solution: use O(𝑒 2 ) space to store 𝐵 𝑈 𝐵. Update: 𝐵 𝑈 𝐵 ← 𝐵 𝑈 𝐵 + 𝑏 𝑗 𝑈 𝑏 𝑗 Theorem 4.1 An algorithm that returns 𝐵 𝑈 𝐵 for any sequence- based sliding window must use Ω(𝑂𝑒) bits space. • Matrix sketching is necessary for sliding window, even when dimension 𝑒 is small. • Matrix sketching over sliding windows requires new techniques.
Three algorithms • Sampling: Sample 𝑏 𝑗 w.p. proportional to ||𝑏 𝑗 || 2 [Frieze2004]. Priority sampling[Efraimidis2006] + Sliding window top-k. • LM-FD: Exponential Histogram (Logarithmic method) [Datar2002] + Frequent Directions. • DI-FD: Dyadic interval techniques [Arasu2004] + Frequent Directions. Sketches Update Space Window Interpretable? 𝑒 𝑒 𝜁 2 log log 𝑂𝑆 𝜁 2 log 𝑂𝑆 Sampling Sequence & time Yes 1 𝑒 log 𝜁𝑂𝑆 𝜁 2 log 𝜁𝑂𝑆 LM-FD Sequence & time No 𝑒 𝜁 log 𝑆 𝑆 𝜁 log 𝑆 DI-FD Sequence No 𝜁 𝜁
Three algorithms • Sampling: Sample 𝑏 𝑗 w.p. proportional to ||𝑏 𝑗 || 2 [Frieze2004]. Priority sampling[Efraimidis2006] + Sliding window top-k. • LM-FD: Exponential Histogram (Logarithmic method) [Datar2002] + Frequent Directions. • DI-FD: Dyadic interval techniques [Arasu2004] + Frequent Directions. Sketches Update Space Window Interpretable? Sampling Slow Large Sequence & time Yes LM-FD Fast Small Sequence & time No Best for small 𝑆 DI-FD Slow Sequence No • Interpretable: rows of the sketch 𝐶 come from 𝐵 . • 𝑆 : ratio between maximum squared norm and minimum squared norms.
Experiments: space vs. error 𝑆 = 8.35 𝑆 = 1 𝑆 = 90089 Sketches Update Space Window Interpretable? Sampling Slow Large Sequence & time Yes LM-FD Fast Small Sequence & time No Best for small 𝑆 DI-FD Slow Sequence No • Interpretable: rows of the sketch 𝐶 come from 𝐵 . • 𝑆 : ratio between maximum squared norm and minimum squared norms.
Experiments: time vs. space 𝑆 = 8.35 𝑆 = 1 𝑆 = 90089 Sketches Update Space Window Interpretable? Sampling Slow Large Sequence & time Yes LM-FD Fast Small Sequence & time No Best for small 𝑆 DI-FD Slow Sequence No • Interpretable: rows of the sketch 𝐶 come from 𝐵 . • 𝑆 : ratio between maximum squared norm and minimum squared norms.
Conclusions • First attempt to tackle the sliding window matrix sketching problem. • Lower bounds show that the sliding window model is different from unbounded streaming model for the matrix sketching problem. • Propose algorithms for both time-based and sequence- based windows with theoretical guarantee and experimental evaluation.
Thanks!
Recommend
More recommend