sliding windows

Sliding Windows Zhewei Wei 1 , Xuancheng Liu 1 , Feifei Li 2 , Shuo - PowerPoint PPT Presentation

Matrix Sketching over Sliding Windows Zhewei Wei 1 , Xuancheng Liu 1 , Feifei Li 2 , Shuo Shang 1 Xiaoyong Du 1 , Ji-Rong Wen 1 1 School of Information, Renmin University of China 2 School of Computing, The University of Utah Matrix data


  1. Matrix Sketching over Sliding Windows Zhewei Wei 1 , Xuancheng Liu 1 , Feifei Li 2 , Shuo Shang 1 Xiaoyong Du 1 , Ji-Rong Wen 1 1 School of Information, Renmin University of China 2 School of Computing, The University of Utah

  2. Matrix data โ€ข Modern data sets are modeled as large matrices. Think of ๐ต โˆˆ ๐‘† ๐‘œร—๐‘’ as n rows in ๐‘† ๐‘’ . โ€ข Data Rows Columns d n 10 5 โ€“ 10 7 Textual Documents Words >10 10 10 1 โ€“ 10 4 >10 7 Actions Users Types 10 5 โ€“ 10 6 >10 8 Visual Images Pixels, SIFT 10 5 โ€“ 10 6 >10 8 Audio Songs, tracks Frequencies 10 2 โ€“ 10 4 >10 6 Machine Learning Examples Features 10 3 โ€“ 10 5 Financial Prices Items, Stocks >10 6

  3. Singular Value Decomposition (SVD) ๐‘Š ๐‘ˆ ๐ต ๐‘‰ ฮฃ ๐‘ค ๐‘’1 ๐‘ค 11 โ€ฆ โ€ฆ ๐‘ 11 ๐‘ 1๐‘’ ๐‘ฃ 11 ๐‘ฃ 1๐‘œ ๐œ€ 1 โ€ฆ โ€ฆ 0 0 ๐œ€ 2 0 0 โ€ฆ โ‹ฎ โ‹ฎ ร— ร— โ‹ฑ โ‹ฎ โ‹ฎ โ‹ฎ โ€ฆ โ‹ฎ โ‹ฎ โ€ฆ โ‹ฎ ๐‘ค 1๐‘’ ๐‘ค ๐‘œ๐‘’ โ€ฆ ๐œ€ ๐‘’ โ€ฆ 0 0 = โ€ฆ 0 0 0 โ‹ฎ โ‹ฎ โ‹ฎ โ€ฆ ๐‘ฃ ๐‘œ1 ๐‘ฃ ๐‘œ๐‘œ โ€ฆ โ€ฆ 0 ๐‘ ๐‘œ1 ๐‘ ๐‘œ๐‘’ 0 0 โ€ฆ โ€ข Principal component analysis (PCA) โ€ข K-means clustering โ€ข Latent semantic indexing (LSI)

  4. SVD & Eigenvalue decomposition ๐ต ๐ต ๐‘ˆ ๐‘ ๐‘œ1 ๐‘ 11 ๐‘ 11 ๐‘ 1๐‘’ โ€ฆ โ€ฆ โ€ฆ Covariance Matrix โ‹ฎ ร— โ‹ฎ ๐ต ๐‘ˆ ๐ต ๐‘ 1๐‘’ ๐‘ ๐‘œ๐‘’ โ€ฆ โ€ฆ โ‹ฎ โ‹ฎ ๐‘ ๐‘œ1 ๐‘ ๐‘œ๐‘’ โ€ฆ ๐‘Š ๐‘ˆ ๐‘Š ฮฃ 2 ๐‘ค ๐‘’1 ๐‘ค 11 ๐‘ค 1๐‘’ ๐‘ค 11 โ€ฆ 2 โ€ฆ โ€ฆ ๐œ€ 1 0 0 2 ๐œ€ 2 0 0 โ€ฆ โ€ฆ = โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ ร— ร— โ‹ฑ โ‹ฎ โ‹ฎ ๐‘ค ๐‘’1 ๐‘ค ๐‘œ๐‘’ ๐‘ค 1๐‘’ ๐‘ค ๐‘œ๐‘’ โ€ฆ 2 โ€ฆ โ€ฆ ๐œ€ ๐‘’ 0 0

  5. Matrix Sketching ๐‘’ โ€ข Computing SVD is slow (and offline). ๐ถ ๐‘š ๐‘ ๐‘— โ€ข Matrix sketching: approximate large matrix ๐ต โˆˆ ๐‘† ๐‘œร—๐‘’ with B โˆˆ ๐‘† ๐‘šร—๐‘’ , ๐‘š โ‰ช ๐‘œ , in an online fashion. โ€ข Row-update stream: each update receives a row. โ€ข Covariance error [Liberty2013, Ghashami2014, 2 โ‰ค ๐œ . Woodruff2016]: ๐ต ๐‘ˆ ๐ต โˆ’ ๐ถ ๐‘ˆ ๐ถ /||๐ต|| ๐บ ๐ต ๐‘œ โ€ข Feature hashing [Weinberger2009], random projection [Papadimitriou2011], โ€ฆ โ€ข Frequent Directions (FD) [Liberty2013]: ๐‘ ๐‘— ๏‚ง B โˆˆ ๐‘† ๐‘šร—๐‘’ , ๐‘š = 1 ๐œ , s.t. covariance error โ‰ค ๐œ .

  6. Matrix Sketching over Sliding Windows โ€ข Each row is associated with a timestamp. โ€ข Maintain ๐ถ ๐‘‹ for ๐ต ๐‘‹ : rows in sliding window ๐‘‹. ๐‘ˆ ๐ต ๐‘‹ โˆ’ ๐ถ ๐‘‹ ๐‘ˆ ๐ถ ๐‘‹ ||/||๐ต ๐‘‹ || ๐บ 2 โ‰ค ๐œ Covariance error: ||๐ต ๐‘‹ โ€ข Sequence-based window: past N rows. ๐ต ๐‘‹ : ๐‘‚ rows โ€ข Time-based window: rows in a past time period ฮ” . ๐ต ๐‘‹ : rows in ฮ” time units

  7. Motivation 1: Sliding windows vs. unbounded streams โ€ข Sliding window model is a more appropriate model in many real-world applications. โ€ข Particularly so in the areas of data analysis wherein matrix sketching techniques are widely used. โ€ข Applications: ๏‚ง Analyzing tweets for the past 24 hours. ๏‚ง Sliding window PCA for detecting changes and anomalies [Papadimitriou2006, Qahtan2015].

  8. Motivation 2: Lower bound โ€ข Unbounded stream solution: use O(๐‘’ 2 ) space to store ๐ต ๐‘ˆ ๐ต. ๏‚ง Update: ๐ต ๐‘ˆ ๐ต โ† ๐ต ๐‘ˆ ๐ต + ๐‘ ๐‘— ๐‘ˆ ๐‘ ๐‘— Theorem 4.1 An algorithm that returns ๐ต ๐‘ˆ ๐ต for any sequence- based sliding window must use ฮฉ(๐‘‚๐‘’) bits space. โ€ข Matrix sketching is necessary for sliding window, even when dimension ๐‘’ is small. โ€ข Matrix sketching over sliding windows requires new techniques.

  9. Three algorithms โ€ข Sampling: ๏‚ง Sample ๐‘ ๐‘— w.p. proportional to ||๐‘ ๐‘— || 2 [Frieze2004]. ๏‚ง Priority sampling[Efraimidis2006] + Sliding window top-k. โ€ข LM-FD: Exponential Histogram (Logarithmic method) [Datar2002] + Frequent Directions. โ€ข DI-FD: Dyadic interval techniques [Arasu2004] + Frequent Directions. Sketches Update Space Window Interpretable? ๐‘’ ๐‘’ ๐œ 2 log log ๐‘‚๐‘† ๐œ 2 log ๐‘‚๐‘† Sampling Sequence & time Yes 1 ๐‘’ log ๐œ๐‘‚๐‘† ๐œ 2 log ๐œ๐‘‚๐‘† LM-FD Sequence & time No ๐‘’ ๐œ log ๐‘† ๐‘† ๐œ log ๐‘† DI-FD Sequence No ๐œ ๐œ

  10. Three algorithms โ€ข Sampling: ๏‚ง Sample ๐‘ ๐‘— w.p. proportional to ||๐‘ ๐‘— || 2 [Frieze2004]. ๏‚ง Priority sampling[Efraimidis2006] + Sliding window top-k. โ€ข LM-FD: Exponential Histogram (Logarithmic method) [Datar2002] + Frequent Directions. โ€ข DI-FD: Dyadic interval techniques [Arasu2004] + Frequent Directions. Sketches Update Space Window Interpretable? Sampling Slow Large Sequence & time Yes LM-FD Fast Small Sequence & time No Best for small ๐‘† DI-FD Slow Sequence No โ€ข Interpretable: rows of the sketch ๐ถ come from ๐ต . โ€ข ๐‘† : ratio between maximum squared norm and minimum squared norms.

  11. Experiments: space vs. error ๐‘† = 8.35 ๐‘† = 1 ๐‘† = 90089 Sketches Update Space Window Interpretable? Sampling Slow Large Sequence & time Yes LM-FD Fast Small Sequence & time No Best for small ๐‘† DI-FD Slow Sequence No โ€ข Interpretable: rows of the sketch ๐ถ come from ๐ต . โ€ข ๐‘† : ratio between maximum squared norm and minimum squared norms.

  12. Experiments: time vs. space ๐‘† = 8.35 ๐‘† = 1 ๐‘† = 90089 Sketches Update Space Window Interpretable? Sampling Slow Large Sequence & time Yes LM-FD Fast Small Sequence & time No Best for small ๐‘† DI-FD Slow Sequence No โ€ข Interpretable: rows of the sketch ๐ถ come from ๐ต . โ€ข ๐‘† : ratio between maximum squared norm and minimum squared norms.

  13. Conclusions โ€ข First attempt to tackle the sliding window matrix sketching problem. โ€ข Lower bounds show that the sliding window model is different from unbounded streaming model for the matrix sketching problem. โ€ข Propose algorithms for both time-based and sequence- based windows with theoretical guarantee and experimental evaluation.

  14. Thanks!

Recommend


More recommend