Distributed Statistical Estimation of Matrix Products with - PowerPoint PPT Presentation

Distributed Statistical Estimation of Matrix Products with Applications David Woodruff Qin Zhang CMU IUB PODS 2018 June, 2018 1-1

The Distributed Computation Model p -norms, heavy-hitters, . . . A ∈ { 0 , 1 } m × n B ∈ { 0 , 1 } n × m Alice and Bob want to compute some function on C = A × B with the minimum amount of communication and number of rounds Communication: sum of message lengths (maximized over all choices of A, B, and randomness) The protocol can fail with prob. 0 . 01 (over its randomness) 2-1

Statistics of Matrix Products: p -Norms • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B . Alice and Bob want to approximate i , j ∈ [ n ] | C i , j | p � 1 / p �� C � p = 3-1

Statistics of Matrix Products: p -Norms • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B . Alice and Bob want to approximate i , j ∈ [ n ] | C i , j | p � 1 / p �� C � p = – p = 0: number of non-zero entries of C ⇒ size of set-intersection join i -th row of A as set A i , j -th column of B as set B j , compute #( i , j ) s.t. A i ∩ B j � = ∅ 3-2

Statistics of Matrix Products: p -Norms • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B . Alice and Bob want to approximate i , j ∈ [ n ] | C i , j | p � 1 / p �� C � p = – p = 0: number of non-zero entries of C ⇒ size of set-intersection join i -th row of A as set A i , j -th column of B as set B j , compute #( i , j ) s.t. A i ∩ B j � = ∅ – p = 1: sum of entries of C ⇒ size of corresponding natural join compute #( i , k , j ) s.t. k ∈ A i ∩ B j 3-3

Statistics of Matrix Products: p -Norms • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B . Alice and Bob want to approximate i , j ∈ [ n ] | C i , j | p � 1 / p �� C � p = – p = 0: number of non-zero entries of C ⇒ size of set-intersection join i -th row of A as set A i , j -th column of B as set B j , compute #( i , j ) s.t. A i ∩ B j � = ∅ – p = 1: sum of entries of C ⇒ size of corresponding natural join compute #( i , k , j ) s.t. k ∈ A i ∩ B j – p = ∞ : maximum entry of C ⇒ most “similar” ( A i , B j ) pair 3-4

Application of set-intersection join Applicant Skills Skills Opening S 1 , S 4 , S 9 , S 13 S 2 , S 3 , S 4 A 1 B 1 S 2 , S 9 , S 10 S 3 , S 4 , S 9 , S 11 A 2 B 2 ⊲ ⊳ S 6 , S 7 , S 8 , S 15 S 4 , S 8 A m B m Find all candidate (Applicant, Opening) pairs 4-1

� C � 1 corresponds to natural join U V V W 1 2 1 1 1 4 2 1 ⊲ ⊳ 2 1 4 1 2 2 1 2 2 3 3 2 W V 1 1 0 1 0 1 × 0 1 U 1 1 1 1 0 V 0 1 0 1 B A 5-1

� C � 1 corresponds to natural join U V V W 1 2 1 1 1 4 2 1 ⇒ ⊲ ⊳ (2 , 1 , 2) 2 1 4 1 2 2 1 2 2 3 3 2 W V 1 1 0 1 0 1 × = 0 1 +1 U 1 1 1 1 0 V 0 1 0 1 B A C 5-2

Statistics of Matrix Products: Heavy Hitters • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B , and let HH p φ ( C ) = { ( i , j ) | C i , j ≥ φ � C � p } 6-1

Statistics of Matrix Products: Heavy Hitters • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B , and let HH p φ ( C ) = { ( i , j ) | C i , j ≥ φ � C � p } – ℓ p -( φ, ǫ )-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ { ( i , j ) | i , j ∈ [ m ] } such that HH p φ ( C ) ⊆ S ⊆ HH p φ − ǫ ( C ) Pairs ( A i , B j ) that are similar; for similarity join 6-2

Statistics of Matrix Products: Heavy Hitters • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B , and let HH p φ ( C ) = { ( i , j ) | C i , j ≥ φ � C � p } – ℓ p -( φ, ǫ )-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ { ( i , j ) | i , j ∈ [ m ] } such that HH p φ ( C ) ⊆ S ⊆ HH p φ − ǫ ( C ) Pairs ( A i , B j ) that are similar; for similarity join Both the p -norm estimation and heavy-hitters are well-studied in the data stream literature, but not as much in the distributed model (model next slide) 6-3

Our Main Results – ℓ p ( p ∈ [0 , 2]) For simplicity, assume m = n • For any p ∈ [0 , 2], a 2-round ˜ O ( n /ǫ )-bit algorithm that approximates � AB � p within a (1 + ǫ ) factor 7-1

Our Main Results – ℓ p ( p ∈ [0 , 2]) For simplicity, assume m = n • For any p ∈ [0 , 2], a 2-round ˜ O ( n /ǫ )-bit algorithm that approximates � AB � p within a (1 + ǫ ) factor – For p = 0, this improves the previous result ˜ O ( n /ǫ 2 ) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω( n /ǫ 2 / 3 ). 7-2

Our Main Results – ℓ p ( p ∈ [0 , 2]) For simplicity, assume m = n • For any p ∈ [0 , 2], a 2-round ˜ O ( n /ǫ )-bit algorithm that approximates � AB � p within a (1 + ǫ ) factor – For p = 0, this improves the previous result ˜ O ( n /ǫ 2 ) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω( n /ǫ 2 / 3 ). If we restrict the communication to be one-way, then we have a lower bound Ω( n /ǫ 2 ). 7-3

Our Main Results – ℓ ∞ • O (1)-round algorithms that approximate � AB � ∞ – within a factor of (2 + ǫ ) use ˜ O ( n 1 . 5 /ǫ ) bits – within a factor of κ use ˜ O ( n 1 . 5 /κ ) bits 8-1

Our Main Results – ℓ ∞ • O (1)-round algorithms that approximate � AB � ∞ – within a factor of (2 + ǫ ) use ˜ O ( n 1 . 5 /ǫ ) bits – within a factor of κ use ˜ O ( n 1 . 5 /κ ) bits • Any algorithm (regardless of the #rounds used) that approximates � AB � ∞ within a factor of – within a factor of 2 needs Ω( n 2 ) bits – within a factor of κ ≥ 4 needs Ω( n 1 . 5 /κ ) bits 8-2

Our Main Results – ℓ ∞ • O (1)-round algorithms that approximate � AB � ∞ – within a factor of (2 + ǫ ) use ˜ O ( n 1 . 5 /ǫ ) bits – within a factor of κ use ˜ O ( n 1 . 5 /κ ) bits • Any algorithm (regardless of the #rounds used) that approximates � AB � ∞ within a factor of – within a factor of 2 needs Ω( n 2 ) bits – within a factor of κ ≥ 4 needs Ω( n 1 . 5 /κ ) bits • The above results hold for binary matrices A and B . For general matrices A , B ∈ Σ n × m , the bound is ˜ Θ( n 2 /κ 2 ) bits ( O (1)-round for UB, any round for LB) 8-3

Our Main Results – Heavy Hitters • For binary matrices A and B , for any p ∈ (0 , 2], an O (1)-round ˜ O ( n + φ ǫ 2 )-bit algorithm that computes ℓ p -( φ, ǫ )-heavy-hitters 9-1

Our Main Results – Heavy Hitters • For binary matrices A and B , for any p ∈ (0 , 2], an O (1)-round ˜ O ( n + φ ǫ 2 )-bit algorithm that computes ℓ p -( φ, ǫ )-heavy-hitters • For general matrices A and B , for any p ∈ (0 , 2], we √ φ obtain O (1)-round ˜ O ( ǫ · n ) bits algorithms 9-2

Our Main Results – Heavy Hitters • For binary matrices A and B , for any p ∈ (0 , 2], an O (1)-round ˜ O ( n + φ ǫ 2 )-bit algorithm that computes ℓ p -( φ, ǫ )-heavy-hitters • For general matrices A and B , for any p ∈ (0 , 2], we √ φ obtain O (1)-round ˜ O ( ǫ · n ) bits algorithms All of our results above can be easily extended to rectangular matrices where A ∈ Σ m × n and B ∈ Σ n × m 9-3

Previous Results • Most relevant (Van Gucht et al., PODS’15). It studies set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of � AB � 0 mentioned before 10-1

Previous Results • Most relevant (Van Gucht et al., PODS’15). It studies set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of � AB � 0 mentioned before • A number of recent works look at distributed linear algebra problems (Balcan et al. KDD’16; Boutsidis et al. STOC’16; Woodruff&Zhong, ICDE’16; etc.) These works concern statistics estimation on C = A + B , compared with C = A · B studied in this paper 10-2

Previous Results • Most relevant (Van Gucht et al., PODS’15). It studies set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of � AB � 0 mentioned before • A number of recent works look at distributed linear algebra problems (Balcan et al. KDD’16; Boutsidis et al. STOC’16; Woodruff&Zhong, ICDE’16; etc.) These works concern statistics estimation on C = A + B , compared with C = A · B studied in this paper • Similar problems have been studied in the RAM model (Cohen&Lewis, J. Algorithms, ’99; Pagh TOCT’13; etc.) 10-3

(1 + ǫ )-approximate ℓ 0 11-1

(1 + ǫ )-approximate ℓ 0 • Alice holds A ∈ { 0 , 1 } n × n , Bob holds B ∈ { 0 , 1 } n × n • Let C = A · B . Goal: (1 + ǫ )-approximate � C � 0 C A × = B B n × n n × n n × n High level idea: 1. First perform a rough estimation of the number of non-zero entries in the rows of C 2. Use the rough estimation to partition the rows of C to groups s.t. rows in the same group have similar #non-zero entries 3. Sample rows in each group of C with a probability propotional to the (estimated) average #non-zero entries of the group 4. Use sampled rows to estimate #non-zero entries of C 12-1

Distributed Statistical Estimation of Matrix Products with - PowerPoint PPT Presentation

Distributed Statistical Estimation of Matrix Products with Applications David Woodruff Qin Zhang CMU IUB PODS 2018 June, 2018 1-1 The Distributed Computation Model p -norms, heavy-hitters, . . . A { 0 , 1 } m n B { 0 , 1 } n m

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

A Tale of Two Theories: A Tale of Two Theories: Reconciling Reconciling random matrix theory

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

New matrix norms for structured matrix estimation Jean-Philippe Vert Optimization and Statistical

Squared distance matrix of a tree R.B.Bapat Indian Statistical Institute New Delhi, India

Product presentation Content Types of products Pre-cooked products Delicacy products

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Matrix COSEC Right People in Right Place at Right Time Matrix COmplete SECurity Matrix COSEC

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

What is it? Whats changed lately? Whats next? @benpa:matrix.org benp@matrix.org

Multiprecision Multiplication on ARMv8 ZHE LIU 1 , KIMMO JRVINENDL 2 , WEIQIANG LIU 3 ,

Object Oriented Programming COP3330 / CGS5409 Exception Handling Bitwise Operators

MLC/TLC NAND support: (new ?) challenges for the MTD/NAND subsystem Free Electrons - Embedded

Lecture 28 of 41 Collision Handling Part 2 of 2: Dynamic Collision Response, Particle Systems

How to Write Fast Numerical Code Spring 2011 Lecture 8 Instructor: Markus Pschel TA: Georg

Powering a number (a bit easier than the recursive mystery question on the homework) Problem:

Lecture 8: Cryptography Trust No One. 1 / 20 Cryptography: Basic Set Up Alice Bob Eve Goal:

Side-Channel Plaintext-Recovery Attacks on Leakage-Resilient Encryption Thomas Unterluggauer,