optimal communication cost
play

Optimal Communication Cost Magdalena Balazinska and Dan Suciu - PowerPoint PPT Presentation

Query Processing with Optimal Communication Cost Magdalena Balazinska and Dan Suciu University of Washington AITF 2017 1 Context Past: NSF Big Data grant PhD student Paris Koutris received the ACM SIGMOD Jim Gray Dissertation Award


  1. Query Processing with Optimal Communication Cost Magdalena Balazinska and Dan Suciu University of Washington AITF 2017 1

  2. Context Past: NSF Big Data grant • PhD student Paris Koutris received the ACM SIGMOD Jim Gray Dissertation Award Current: AiTF Grant • PI’s Magda Balazinska, Dan Suciu • Student: Walter Cai 2

  3. Basic Question • How much communication is needed to compute a query Q on p servers? • Parallel data processing – Gamma, MapReduce, Hive, Teradata, Aster Data, Spark, Impala, Myria, Tensorflow – See Magda Balazinska’s current class

  4. Background • Q conjunctive query; ρ * = its fractional edge covering number Thm. [ Atserias,Grohe,Marx’2011] If every input relation has size ≤ m then |Output(Q )| ≤ m ρ * • Q(x,y,z) :- R(x,y) ∧ S(y,z) ∧ T(z,x) If |R|, |S|, |T| ≤ m then |Output(Q)| ≤ m 3/2 x ½ ½ ρ * = 3/2 ½ y z 4

  5. Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p

  6. Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate

  7. Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p ≤ L ≤ L Round 3 . . . . . . . .

  8. Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . .

  9. Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p

  10. Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p

  11. Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p

  12. Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p

  13. Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p

  14. A Naïve Lower Bound • Query Q • Inputs R, S, T, … s.t. |size(Q)| = m ρ * • Algorithm with load L, • After r rounds, one server “knows” ≤ L*r tuples: it can output ≤ ( L*r) ρ * tuples (AGM) • p servers compute |size(Q)| = m ρ * , hence p*(L*r) ρ * ≥ m ρ * Thm. Any r-round algorithm has L ≥ m / r*p 1/ ρ * 14

  15. Speedup Speed = O(1/L) A load of L = m/p corresponds to linear speedup A load of L = m/p 1- ε corresponds to # processors (=p) sub-linear speedup What is the theoretically optimal load L = f(m,p)? What is the theoretically optimal load L = f(m,p)? Is this the right question in the field? Is this the right question in the field? 15

  16. Join of Two Tables Join(x,y,z) = R(x,y) ∧ S(y,z) z y x 1 1 |R| = |S| = m tuples ρ * = 2 In the field: • Hash-join on y: L = m / p (w/o skew) • Broadcast-join: L ≈ m In theory: L ≥ m / p 1/2

  17. |R| = |S| = |T| = m tuples Triangles Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) State of the art: • Hash-join, two rounds: • Problem: intermediate result too big! • Broadcast S,T, one round: • Problem: two local tables are huge! 17

  18. Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = m tuples Triangles in One Round • Place servers in a cube p = p 1/3 × p 1/3 × p 1/3 • Each server identified by (i,j,k) Server (i,j,k) Server (i,j,k) (i,j,k) k j [Afrati&Ullman’10] i p 1/3 [Beame’13,’14 ] 18

  19. Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = m tuples Triangles in One Round T Round 1 : Z X Send R(x,y) to all servers (h 1 (x),h 2 (y),*) Fred Alice Send S(y,z) to all servers (*, h 2 (y), h 3 (z)) S Y Z Send T(z,x) to all servers (h 1 (x), *, h 3 (z)) Jack Jack Jim Jim Output : Fred Alice R Fred Jim compute locally R(x,y) ∧ S(y,z) ∧ T(z,x) X Y Jack Jim Carol Alice Fred Alice Fred Jim … Jack Jim Carol Alice Fred Fred Jim Jim Jim Jim Jack Jack Jim Jack Carol Alice Fred Jim (i,j,k) Fred Jim … Jim Jack Fred Jim Fred Jim Jim Jack Fred Jim Jack Jim Fred Jim Jack Jim k j = h 1 (Jim) Jim Jack p 1/3 19 i = h 2 (Fred)

  20. Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = m tuples Communication load per server Theorem Assuming “no skew”, HyperCube computes Triangles with L = O(m/p 2/3 ) w.h.p. Can we compute Triangles with L = m/p? No! Theorem Any 1-round algo. has L = Ω (m/p 2/3 ), even on inputs with no skew. 20

  21. Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = 1.1M 1.1M triples of Twitter data  220k triangles; p=64 local 1 or 2-step hash-join; local 1-step Leapfrog Trie-join (a.k.a. Generic-Join) Total CPU time Number of tuples shuffled Wall clock time 2 rounds hash-join 1 round broadcast 1 round hypercube 21

  22. Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = 1.1M 1.1M triples of Twitter data  220k triangles; p=64

  23. General Case Theorem The optimal load for computing Q in one-round on skew-free data is L = O(m / p 1/ τ * ) τ * = fractional vertex cover of Q’s hypergraph 0 1 0 ½ m / p 2/3 τ * = 3/2 m / p τ * = 1 ½ ½ Thm. Any r-round algorithm has L ≥ m / r*p 1/ ρ * ρ * = fractional edge cover of Q’s hypergraph 1 1 m / p 2/3 ρ * = 3/2 ½ ½ m / p 1/2 ρ * = 2 ½

  24. Skew • Skewed data is major impediment to parallel data processing • Practical solutions: – Deal with stragglers, hope they eventually terminate – Remove heavy hitters from computation • Our approach: – Query  Residual Query – Join R(x,y) ∧ S(y,z)  Cartesian Product R(x) ∧ S(z) 24

  25. Skewed Values  New Query Join(x,y,z) = R(x,y) ∧ S(y,z) No-skew : 1 2 p τ * = 1, L = m/p 1 2 p ½ 1 Skewed: (y = single value, degree = m) 2 Join becomes Product(x,z) = R(x) ∧ S(z) τ * = 2, L = m/p 1/2 R(x)  p ½ S(z) 

Recommend


More recommend