Deterministic Distributed and Streaming Algorithms for Linear Algebra Problems Charlie Dickens Joint work with Graham Cormode and David P. Woodruff University of Warwick, Department of Computer Science WPCCS, 30th June 2017
Motivation ◮ Large data can be abstracted as a matrix, A ∈ R n × d
Motivation ◮ Large data can be abstracted as a matrix, A ∈ R n × d ◮ Matrices = Linear Algebra!
Motivation ◮ Large data can be abstracted as a matrix, A ∈ R n × d ◮ Matrices = Linear Algebra! ◮ But there are also some problems...
Motivation ◮ Large data can be abstracted as a matrix, A ∈ R n × d ◮ Matrices = Linear Algebra! ◮ But there are also some problems... ◮ Storage - the data may be too large to store
Motivation ◮ Large data can be abstracted as a matrix, A ∈ R n × d ◮ Matrices = Linear Algebra! ◮ But there are also some problems... ◮ Storage - the data may be too large to store ◮ Time complexity - ‘efficient’ polynomial time algorithms might be too slow
Motivation ◮ Large data can be abstracted as a matrix, A ∈ R n × d ◮ Matrices = Linear Algebra! ◮ But there are also some problems... ◮ Storage - the data may be too large to store ◮ Time complexity - ‘efficient’ polynomial time algorithms might be too slow ◮ Instead of solving exactly can we find ‘efficient’ algorithms which are more suitable for large scale data analysis, perhaps allowing for some approximate solution?
Motivation ◮ Large data can be abstracted as a matrix, A ∈ R n × d ◮ Matrices = Linear Algebra! ◮ But there are also some problems... ◮ Storage - the data may be too large to store ◮ Time complexity - ‘efficient’ polynomial time algorithms might be too slow ◮ Instead of solving exactly can we find ‘efficient’ algorithms which are more suitable for large scale data analysis, perhaps allowing for some approximate solution? ◮ Randomised methods have been proposed but are they necessary?
Computation Models
Computation Models Streaming Model
Computation Models Streaming Model ◮ See data one item at a time
Computation Models Streaming Model ◮ See data one item at a time ◮ Cannot store all of the data
Computation Models Streaming Model ◮ See data one item at a time ◮ Cannot store all of the data ◮ Want to optimise storage - sublinear in n
Computation Models Streaming Model ◮ See data one item at a time ◮ Cannot store all of the data ◮ Want to optimise storage - sublinear in n ◮ Need to keep a running ‘summary’ of the data
Computation Models Streaming Model ◮ See data one item at a time ◮ Cannot store all of the data ◮ Want to optimise storage - sublinear in n ◮ Need to keep a running ‘summary’ of the data ◮ Use the summary to compute approximation to the original problem.
Computation Models Streaming Model Distributed Model ◮ See data one item at a time ◮ Cannot store all of the data ◮ Want to optimise storage - sublinear in n ◮ Need to keep a running ‘summary’ of the data ◮ Use the summary to compute approximation to the original problem.
Computation Models Streaming Model Distributed Model ◮ See data one item at a ◮ Coordinator sends small time blocks of input to worker nodes ◮ Cannot store all of the data ◮ Want to optimise storage - sublinear in n ◮ Need to keep a running ‘summary’ of the data ◮ Use the summary to compute approximation to the original problem.
Computation Models Streaming Model Distributed Model ◮ See data one item at a ◮ Coordinator sends small time blocks of input to worker nodes ◮ Cannot store all of the data ◮ Worker nodes report back ◮ Want to optimise storage - a summary of the data to sublinear in n coordinator ◮ Need to keep a running ‘summary’ of the data ◮ Use the summary to compute approximation to the original problem.
Computation Models Streaming Model Distributed Model ◮ See data one item at a ◮ Coordinator sends small time blocks of input to worker nodes ◮ Cannot store all of the data ◮ Worker nodes report back ◮ Want to optimise storage - a summary of the data to sublinear in n coordinator ◮ Need to keep a running ◮ Coordinator computes ‘summary’ of the data approximation to original ◮ Use the summary to problem using by using the compute approximation to summaries sent back the original problem.
Summary of Results Previous results are specific for p = 2. Our results are the first deterministic algorithms which generalise to arbitrary p -norm (where applicable). Problem Solution type Time Space O ( nd 2 + nd 5 log( n )) High Leverage Scores 1 / poly ( d ) additive poly ( d )
Summary of Results Previous results are specific for p = 2. Our results are the first deterministic algorithms which generalise to arbitrary p -norm (where applicable). Problem Solution type Time Space O ( nd 2 + nd 5 log( n )) High Leverage Scores 1 / poly ( d ) additive poly ( d ) ℓ p -regression ( p � = ∞ ) poly ( d ) relative poly ( nd ) O (1 /γ ) n γ d
Summary of Results Previous results are specific for p = 2. Our results are the first deterministic algorithms which generalise to arbitrary p -norm (where applicable). Problem Solution type Time Space O ( nd 2 + nd 5 log( n )) High Leverage Scores 1 / poly ( d ) additive poly ( d ) ℓ p -regression ( p � = ∞ ) poly ( d ) relative poly ( nd ) O (1 /γ ) n γ d poly ( nd 5 ) d O ( p ) /ε O (1) ℓ ∞ -regression ε � b � p additive
Summary of Results Previous results are specific for p = 2. Our results are the first deterministic algorithms which generalise to arbitrary p -norm (where applicable). Problem Solution type Time Space O ( nd 2 + nd 5 log( n )) High Leverage Scores 1 / poly ( d ) additive poly ( d ) ℓ p -regression ( p � = ∞ ) poly ( d ) relative poly ( nd ) O (1 /γ ) n γ d poly ( nd 5 ) d O ( p ) /ε O (1) ℓ ∞ -regression ε � b � p additive ℓ 1 low- ( k ) rank approximation poly ( k ) relative poly ( nd ) O (1 /γ ) n γ poly ( d )
Summary of Results Previous results are specific for p = 2. Our results are the first deterministic algorithms which generalise to arbitrary p -norm (where applicable). Problem Solution type Time Space O ( nd 2 + nd 5 log( n )) High Leverage Scores 1 / poly ( d ) additive poly ( d ) ℓ p -regression ( p � = ∞ ) poly ( d ) relative poly ( nd ) O (1 /γ ) n γ d poly ( nd 5 ) d O ( p ) /ε O (1) ℓ ∞ -regression ε � b � p additive ℓ 1 low- ( k ) rank approximation poly ( k ) relative poly ( nd ) O (1 /γ ) n γ poly ( d )
Main Algorithmic Techniques: well-conditioned basis Much of the work relies on the notion of a well-conditioned basis (wcb) .
Main Algorithmic Techniques: well-conditioned basis Much of the work relies on the notion of a well-conditioned basis (wcb) . A matrix U is a wcb for the column space of A if:
Main Algorithmic Techniques: well-conditioned basis Much of the work relies on the notion of a well-conditioned basis (wcb) . A matrix U is a wcb for the column space of A if: ◮ � U � p ≤ α
Main Algorithmic Techniques: well-conditioned basis Much of the work relies on the notion of a well-conditioned basis (wcb) . A matrix U is a wcb for the column space of A if: ◮ � U � p ≤ α ◮ for all z , � z � q ≤ β � Uz � p where q is the dual norm to p
Main Algorithmic Techniques: well-conditioned basis Much of the work relies on the notion of a well-conditioned basis (wcb) . A matrix U is a wcb for the column space of A if: ◮ � U � p ≤ α ◮ for all z , � z � q ≤ β � Uz � p where q is the dual norm to p ◮ α and β are at most poly ( d ).
Main Algorithmic Techniques: well-conditioned basis Much of the work relies on the notion of a well-conditioned basis (wcb) . A matrix U is a wcb for the column space of A if: ◮ � U � p ≤ α ◮ for all z , � z � q ≤ β � Uz � p where q is the dual norm to p ◮ α and β are at most poly ( d ). Mahoney et al. show that a change of basis matrix R can be computed in polynomial time such that AR is well conditioned.
Main Algorithmic Techniques: high leverage rows Let U = AR for change of basis matrix R . Then the full ℓ p -leverage scores are w i = � ( AR ) i � p p . Similar definition for local leverage scores.
Main Algorithmic Techniques: high leverage rows Let U = AR for change of basis matrix R . Then the full ℓ p -leverage scores are w i = � ( AR ) i � p p . Similar definition for local leverage scores. Problem: Can the rows of high leverage be found without reading the whole matrix?
Main Algorithmic Techniques: high leverage rows Let U = AR for change of basis matrix R . Then the full ℓ p -leverage scores are w i = � ( AR ) i � p p . Similar definition for local leverage scores. Problem: Can the rows of high leverage be found without reading the whole matrix? Theory: Rows with high global leverage scores have high local leverage scores up to poly ( d ) factors: ˆ w i ≥ w i ′ / poly ( d ).
Recommend
More recommend