Numerical Linear Algebra in the Streaming Model David Woodruff IBM - PowerPoint PPT Presentation

Numerical Linear Algebra in the Streaming Model David Woodruff IBM Almaden

Data Streams • A data stream is a sequence of data, that is too large to be stored in available memory • Examples – Internet search logs – Network Traffic – Sensor networks – Scientific data streams (astronomical, genomics, physical simulations)…

Data Stream Models • Underlying object an n x d matrix A • Row-Insertion Model – See rows (or columns) of A one at a time in an arbitrary order – E.g., document/term entries • Turnstile Model – See entries of A one at a time in an arbitrary order – E.g., customer/item entries – Stream may be a long interleaved sequence of arbitrary additive updates A i,j <- A i,j + Δ to entries • Goals: – 1 pass (or small number of passes) over the data – Low space complexity – Fast processing time per update

Linear Algebra Problems • Approximate Matrix Product – Given matrices A and B, approximate A*B • Regression – Given a matrix A and a vector b, find an x which approximately minimizes |Ax-b| – Least squares, least absolute deviation, M-estimators • Low Rank Approximation – Given a matrix A, find a rank-k matrix A’ for which |A’-A| is as small as possible – Frobenius, spectral, robust • Leverage Score Approximation – Given a matrix A, if A = Q*R where Q has orthonormal columns, estimate |Q i,* | 22 for all rows i – Sampling based algorithms

Linear Algebra Problems Con’d • Sketching norms – Given a matrix A, approximate its trace, Frobenius, and operator norms – Lower bounds imply lower bounds for harder problems, such as low rank approximation in spectral norm • Graph sparsification – Given the Laplacian L of a graph G, approximate the quadratic form x T L x for all vectors x – Approximately preserve all cut values

Talk Outline • Overview of techniques – Oblivious Subspace Embeddings – Leverage Score Sampling • Sample of known results for linear algebra problems • Open problems

Example Sketching Technique: Least squares regression [S] • Suppose A is an n x d matrix with n À d. • How to find an approximate solution x to min x |Ax-b| 2 ? • Goal: output x‘ for which |Ax‘-b| 2 · (1+ ε ) min x |Ax-b| 2 w.h.p. • Draw S from a k x n random family of matrices, for k ¿ n • Compute S*A and S*b. Output solution x‘ to min x‘ |(SA)x-(Sb)| 2 • Streaming implementation: maintain S*A and S*b

How to choose the right sketching matrix S? • Recall: output the solution x‘ to min x‘ |(SA)x-(Sb)| 2 • Lots of matrices work • S is d/ ε 2 x n matrix of i.i.d. Normal random variables • Computing S*A may be slow…

Fast JL [AC, S] • S is a Fast Johnson Lindenstrauss Transform – S = P*H*D – D is a diagonal matrix with +1, -1 on diagonals – H is the Hadamard transform – P just chooses a random (small) subset of rows of H*D – S*A can be computed much faster • In a stream, useful if you see one column of A at a time

Even faster sketching matrices S [CW,MM,NN] • CountSketch matrix • Define k x n matrix S, for k ¼ d 2 / ε 2 • S is really sparse: single randomly chosen non-zero [ entry per column [ 0 0 1 0 0 1 0 0 Surprisingly, 1 0 0 0 0 0 0 0 0 0 0 -1 1 0 -1 0 this works! 0-1 0 0 0 0 0 1 • Easy to maintain in a stream

Leverage Score Sampling [DMM] • Main reason sketching works is – |S(Ax-b)| 2 = (1± ε ) |Ax-b| 2 for all x in R d – S is a subspace embedding for column span of [A, b] • Leverage score sampling also provides a subspace embedding – If [A, b] = Q*R where Q has orthonormal columns, sample row i of [A, b] w.pr. » |Q i,* | 22 for all rows i – Let S implement sampling of d log d / ε 2 rows of A. |S(Ax-b)| 2 = (1± ε ) |Ax-b| 2 for all x in R d – Gives a coreset, not directly implementable in a stream, but possible

Talk Outline • Overview of techniques – Oblivious Subspace Embeddings – Leverage Score Sampling • Sample of known results for linear algebra problems • Open problems

Regression Example Regression 250 200 150 Example 100 Regression 50 0 0 50 100 150 • Least Squares Regression [CW,MM,NN] • £ ~(d 2 / ε ) space in a stream, O(1) update time • Least Absolute Deviation Regression [SW] • poly(d/ ε ) space in a stream, O~(1) update time

Low Rank Approximation [S,CW] • A is an n x n matrix • Want to output a rank k matrix A’, so that w.h.p., |A-A’| F · (1+ ε ) |A-A k | F where A k is the best rank-k approximation to A • O~(n/poly( ε )) space in a stream, O(1) update time

Matrix Norms in A Stream [LNW] • A is an n x n matrix • p-th Schatten norm is Σ i=1 rank(A) σ i p (A) • p = 2 is the Frobenius norm – O~(1) space in a stream, O(1) update time • p = 1 is trace norm – Omega(n 1/2 ) space in a stream, no nontrivial upper bound! • p = 1 is the operator norm max unit x,y x T Ay – Ώ (n 2 ) space in a stream’ – Same lower bound for operator norm low rank approximation

Graph Sparsification [KLMMS] • Given graph G, let H be a subgraph with reweighted edges • Let L G be the Laplacian of G and L H be the Laplacian of H. • Want x T L H x = (1 ± ε ) x T L G x for all x • O~(n/ ε 2 ) space in a stream of edges possible • Clever recursive leverage score sampling in a stream [MP]

Open Problems • Optimal bounds in terms of ε in streaming model – Tradeoff with number of passes • Spectral low rank approximation not possible in a stream, but maybe can get O(nnz(A)) time offline? – Current best nnz(A) poly(k/ ε ) • Robust low rank approximation: Output a rank k matrix A’, so that |A-A’| 1 · (1+ ε ) |A-A k | 1

Numerical Linear Algebra in the Streaming Model David Woodruff IBM - PowerPoint PPT Presentation

Numerical Linear Algebra in the Streaming Model David Woodruff IBM Almaden Data Streams A data stream is a sequence of data, that is too large to be stored in available memory Examples Internet search logs Network Traffic

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Numerical Semigroup Algebra Joint with Kee, Mee-Kyoung International meeting on numerical

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Linear algebra and analysis recalls Lectures for PHD course on Numerical optimization Enrico

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

Basic Concepts in Linear Algebra Department of Mathematics Boise State University September 14,

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Graph Distances in the Streaming Model Joan Feigenbaum Sampath Kannan Andrew McGregor Siddharth

Numerical Linear Algebra Software (based on slides written by Michael Grant) BLAS, ATLAS

A Numerical Linear Algebra Framework for Solving Problems with Multivariate Polynomials Kim

Hy Hydr drolo logic ic M Modeling deling ABSI CAB 01/08/2020 Dr. Steve Leitman Florida

How much is water entering the Apalachicola River through Jim Woodruff Dam influenced by climate

Renewables Integration Study Next Steps Mark Rothleder Director, Market Analysis and Development

Sketching and Streaming Matrix Norms David Woodruff IBM Almaden Based on joint works with Yi Li

Karalyn Stott 10 years of Adaptive Sports Experience Graduate Degree in Sports Management

FULL-DAY KINDERGARTEN Pilot Program Implementation Dr. Melissa Varley, Superintendent of

Subscription Bus Service February 2018 Background Information Types of Transportation

Improving Access to SME Finance Evidence Dialogue on SME Development in Kenya Bilal al Z

Numerical Linear Algebra in the Streaming Model David Woodruff IBM - PowerPoint PPT Presentation

Numerical Linear Algebra in the Streaming Model David Woodruff IBM Almaden Data Streams A data stream is a sequence of data, that is too large to be stored in available memory Examples Internet search logs Network Traffic

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Numerical Semigroup Algebra Joint with Kee, Mee-Kyoung International meeting on numerical

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Linear algebra and analysis recalls Lectures for PHD course on Numerical optimization Enrico

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

Basic Concepts in Linear Algebra Department of Mathematics Boise State University September 14,

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Graph Distances in the Streaming Model Joan Feigenbaum Sampath Kannan Andrew McGregor Siddharth

Numerical Linear Algebra Software (based on slides written by Michael Grant) BLAS, ATLAS

A Numerical Linear Algebra Framework for Solving Problems with Multivariate Polynomials Kim

Hy Hydr drolo logic ic M Modeling deling ABSI CAB 01/08/2020 Dr. Steve Leitman Florida

How much is water entering the Apalachicola River through Jim Woodruff Dam influenced by climate

Renewables Integration Study Next Steps Mark Rothleder Director, Market Analysis and Development

Sketching and Streaming Matrix Norms David Woodruff IBM Almaden Based on joint works with Yi Li

Karalyn Stott 10 years of Adaptive Sports Experience Graduate Degree in Sports Management

FULL-DAY KINDERGARTEN Pilot Program Implementation Dr. Melissa Varley, Superintendent of

Subscription Bus Service February 2018 Background Information Types of Transportation

Improving Access to SME Finance Evidence Dialogue on SME Development in Kenya Bilal al Z

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE