Distributed Statistical Estimation of Matrix Products with Applications David Woodruff Qin Zhang CMU IUB PODS 2018 June, 2018 1-1
The Distributed Computation Model p -norms, heavy-hitters, . . . A ∈ { 0 , 1 } m × n B ∈ { 0 , 1 } n × m Alice and Bob want to compute some function on C = A × B with the minimum amount of communication and number of rounds Communication: sum of message lengths (maximized over all choices of A, B, and randomness) The protocol can fail with prob. 0 . 01 (over its randomness) 2-1
Statistics of Matrix Products: p -Norms • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B . Alice and Bob want to approximate i , j ∈ [ n ] | C i , j | p � 1 / p �� � C � p = 3-1
Statistics of Matrix Products: p -Norms • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B . Alice and Bob want to approximate i , j ∈ [ n ] | C i , j | p � 1 / p �� � C � p = – p = 0: number of non-zero entries of C ⇒ size of set-intersection join i -th row of A as set A i , j -th column of B as set B j , compute #( i , j ) s.t. A i ∩ B j � = ∅ 3-2
Statistics of Matrix Products: p -Norms • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B . Alice and Bob want to approximate i , j ∈ [ n ] | C i , j | p � 1 / p �� � C � p = – p = 0: number of non-zero entries of C ⇒ size of set-intersection join i -th row of A as set A i , j -th column of B as set B j , compute #( i , j ) s.t. A i ∩ B j � = ∅ – p = 1: sum of entries of C ⇒ size of corresponding natural join compute #( i , k , j ) s.t. k ∈ A i ∩ B j 3-3
Statistics of Matrix Products: p -Norms • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B . Alice and Bob want to approximate i , j ∈ [ n ] | C i , j | p � 1 / p �� � C � p = – p = 0: number of non-zero entries of C ⇒ size of set-intersection join i -th row of A as set A i , j -th column of B as set B j , compute #( i , j ) s.t. A i ∩ B j � = ∅ – p = 1: sum of entries of C ⇒ size of corresponding natural join compute #( i , k , j ) s.t. k ∈ A i ∩ B j – p = ∞ : maximum entry of C ⇒ most “similar” ( A i , B j ) pair 3-4
Application of set-intersection join Applicant Skills Skills Opening S 1 , S 4 , S 9 , S 13 S 2 , S 3 , S 4 A 1 B 1 S 2 , S 9 , S 10 S 3 , S 4 , S 9 , S 11 A 2 B 2 ⊲ ⊳ S 6 , S 7 , S 8 , S 15 S 4 , S 8 A m B m Find all candidate (Applicant, Opening) pairs 4-1
� C � 1 corresponds to natural join U V V W 1 2 1 1 1 4 2 1 ⊲ ⊳ 2 1 4 1 2 2 1 2 2 3 3 2 W V 1 1 0 1 0 1 × 0 1 U 1 1 1 1 0 V 0 1 0 1 B A 5-1
� C � 1 corresponds to natural join U V V W 1 2 1 1 1 4 2 1 ⇒ ⊲ ⊳ (2 , 1 , 2) 2 1 4 1 2 2 1 2 2 3 3 2 W V 1 1 0 1 0 1 × = 0 1 +1 U 1 1 1 1 0 V 0 1 0 1 B A C 5-2
Statistics of Matrix Products: Heavy Hitters • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B , and let HH p φ ( C ) = { ( i , j ) | C i , j ≥ φ � C � p } 6-1
Statistics of Matrix Products: Heavy Hitters • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B , and let HH p φ ( C ) = { ( i , j ) | C i , j ≥ φ � C � p } – ℓ p -( φ, ǫ )-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ { ( i , j ) | i , j ∈ [ m ] } such that HH p φ ( C ) ⊆ S ⊆ HH p φ − ǫ ( C ) Pairs ( A i , B j ) that are similar; for similarity join 6-2
Statistics of Matrix Products: Heavy Hitters • Alice holds A ∈ { 0 , 1 } m × n , Bob holds B ∈ { 0 , 1 } n × m • Let C = A · B , and let HH p φ ( C ) = { ( i , j ) | C i , j ≥ φ � C � p } – ℓ p -( φ, ǫ )-heavy-hitter (0 < ǫ ≤ φ ≤ 1): output a set S ⊆ { ( i , j ) | i , j ∈ [ m ] } such that HH p φ ( C ) ⊆ S ⊆ HH p φ − ǫ ( C ) Pairs ( A i , B j ) that are similar; for similarity join Both the p -norm estimation and heavy-hitters are well-studied in the data stream literature, but not as much in the distributed model (model next slide) 6-3
Our Main Results – ℓ p ( p ∈ [0 , 2]) For simplicity, assume m = n • For any p ∈ [0 , 2], a 2-round ˜ O ( n /ǫ )-bit algorithm that approximates � AB � p within a (1 + ǫ ) factor 7-1
Our Main Results – ℓ p ( p ∈ [0 , 2]) For simplicity, assume m = n • For any p ∈ [0 , 2], a 2-round ˜ O ( n /ǫ )-bit algorithm that approximates � AB � p within a (1 + ǫ ) factor – For p = 0, this improves the previous result ˜ O ( n /ǫ 2 ) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω( n /ǫ 2 / 3 ). 7-2
Our Main Results – ℓ p ( p ∈ [0 , 2]) For simplicity, assume m = n • For any p ∈ [0 , 2], a 2-round ˜ O ( n /ǫ )-bit algorithm that approximates � AB � p within a (1 + ǫ ) factor – For p = 0, this improves the previous result ˜ O ( n /ǫ 2 ) (Van Gucht et al., PODS’15) – Same paper shows a lower bound of Ω( n /ǫ 2 / 3 ). If we restrict the communication to be one-way, then we have a lower bound Ω( n /ǫ 2 ). 7-3
Our Main Results – ℓ ∞ • O (1)-round algorithms that approximate � AB � ∞ – within a factor of (2 + ǫ ) use ˜ O ( n 1 . 5 /ǫ ) bits – within a factor of κ use ˜ O ( n 1 . 5 /κ ) bits 8-1
Our Main Results – ℓ ∞ • O (1)-round algorithms that approximate � AB � ∞ – within a factor of (2 + ǫ ) use ˜ O ( n 1 . 5 /ǫ ) bits – within a factor of κ use ˜ O ( n 1 . 5 /κ ) bits • Any algorithm (regardless of the #rounds used) that approximates � AB � ∞ within a factor of – within a factor of 2 needs Ω( n 2 ) bits – within a factor of κ ≥ 4 needs Ω( n 1 . 5 /κ ) bits 8-2
Our Main Results – ℓ ∞ • O (1)-round algorithms that approximate � AB � ∞ – within a factor of (2 + ǫ ) use ˜ O ( n 1 . 5 /ǫ ) bits – within a factor of κ use ˜ O ( n 1 . 5 /κ ) bits • Any algorithm (regardless of the #rounds used) that approximates � AB � ∞ within a factor of – within a factor of 2 needs Ω( n 2 ) bits – within a factor of κ ≥ 4 needs Ω( n 1 . 5 /κ ) bits • The above results hold for binary matrices A and B . For general matrices A , B ∈ Σ n × m , the bound is ˜ Θ( n 2 /κ 2 ) bits ( O (1)-round for UB, any round for LB) 8-3
Our Main Results – Heavy Hitters • For binary matrices A and B , for any p ∈ (0 , 2], an O (1)-round ˜ O ( n + φ ǫ 2 )-bit algorithm that computes ℓ p -( φ, ǫ )-heavy-hitters 9-1
Our Main Results – Heavy Hitters • For binary matrices A and B , for any p ∈ (0 , 2], an O (1)-round ˜ O ( n + φ ǫ 2 )-bit algorithm that computes ℓ p -( φ, ǫ )-heavy-hitters • For general matrices A and B , for any p ∈ (0 , 2], we √ φ obtain O (1)-round ˜ O ( ǫ · n ) bits algorithms 9-2
Our Main Results – Heavy Hitters • For binary matrices A and B , for any p ∈ (0 , 2], an O (1)-round ˜ O ( n + φ ǫ 2 )-bit algorithm that computes ℓ p -( φ, ǫ )-heavy-hitters • For general matrices A and B , for any p ∈ (0 , 2], we √ φ obtain O (1)-round ˜ O ( ǫ · n ) bits algorithms All of our results above can be easily extended to rectangular matrices where A ∈ Σ m × n and B ∈ Σ n × m 9-3
Previous Results • Most relevant (Van Gucht et al., PODS’15). It studies set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of � AB � 0 mentioned before 10-1
Previous Results • Most relevant (Van Gucht et al., PODS’15). It studies set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of � AB � 0 mentioned before • A number of recent works look at distributed linear algebra problems (Balcan et al. KDD’16; Boutsidis et al. STOC’16; Woodruff&Zhong, ICDE’16; etc.) These works concern statistics estimation on C = A + B , compared with C = A · B studied in this paper 10-2
Previous Results • Most relevant (Van Gucht et al., PODS’15). It studies set-intersection/disjointness/equality/at-least-T joins in the 2-party communication model The only overlap between Van Gucht et al. and this paper is the estimation of � AB � 0 mentioned before • A number of recent works look at distributed linear algebra problems (Balcan et al. KDD’16; Boutsidis et al. STOC’16; Woodruff&Zhong, ICDE’16; etc.) These works concern statistics estimation on C = A + B , compared with C = A · B studied in this paper • Similar problems have been studied in the RAM model (Cohen&Lewis, J. Algorithms, ’99; Pagh TOCT’13; etc.) 10-3
(1 + ǫ )-approximate ℓ 0 11-1
(1 + ǫ )-approximate ℓ 0 • Alice holds A ∈ { 0 , 1 } n × n , Bob holds B ∈ { 0 , 1 } n × n • Let C = A · B . Goal: (1 + ǫ )-approximate � C � 0 C A × = B B n × n n × n n × n High level idea: 1. First perform a rough estimation of the number of non-zero entries in the rows of C 2. Use the rough estimation to partition the rows of C to groups s.t. rows in the same group have similar #non-zero entries 3. Sample rows in each group of C with a probability propotional to the (estimated) average #non-zero entries of the group 4. Use sampled rows to estimate #non-zero entries of C 12-1
Recommend
More recommend