Query Processing with Optimal Communication Cost Magdalena Balazinska and Dan Suciu University of Washington AITF 2017 1
Context Past: NSF Big Data grant • PhD student Paris Koutris received the ACM SIGMOD Jim Gray Dissertation Award Current: AiTF Grant • PI’s Magda Balazinska, Dan Suciu • Student: Walter Cai 2
Basic Question • How much communication is needed to compute a query Q on p servers? • Parallel data processing – Gamma, MapReduce, Hive, Teradata, Aster Data, Spark, Impala, Myria, Tensorflow – See Magda Balazinska’s current class
Background • Q conjunctive query; ρ * = its fractional edge covering number Thm. [ Atserias,Grohe,Marx’2011] If every input relation has size ≤ m then |Output(Q )| ≤ m ρ * • Q(x,y,z) :- R(x,y) ∧ S(y,z) ∧ T(z,x) If |R|, |S|, |T| ≤ m then |Output(Q)| ≤ m 3/2 x ½ ½ ρ * = 3/2 ½ y z 4
Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p
Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate
Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p ≤ L ≤ L Round 3 . . . . . . . .
Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . .
Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p
Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p
Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p
Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p
Massively Parallel Communication Model (MPC) Extends BSP [Valiant] Input (size=m) Input data = size m O(m/p) O(m/p) Server 1 Server 1 . . . . Server p Server p Number of servers = p ≤ L ≤ L Round 1 Server 1 Server 1 . . . . Server p Server p One round = Compute & communicate ≤ L Round 2 ≤ L Algorithm = Several rounds Server 1 Server 1 . . . . Server p Server p Max communication load / round / server = L ≤ L ≤ L Round 3 . . . . . . . . Practical ε ∈ (0,1) Cost: Ideal Naïve 1 Naïve 2 Load L L = m/p L = m/p 1- ε L = m L = m/p Rounds r 1 O(1) 1 p
A Naïve Lower Bound • Query Q • Inputs R, S, T, … s.t. |size(Q)| = m ρ * • Algorithm with load L, • After r rounds, one server “knows” ≤ L*r tuples: it can output ≤ ( L*r) ρ * tuples (AGM) • p servers compute |size(Q)| = m ρ * , hence p*(L*r) ρ * ≥ m ρ * Thm. Any r-round algorithm has L ≥ m / r*p 1/ ρ * 14
Speedup Speed = O(1/L) A load of L = m/p corresponds to linear speedup A load of L = m/p 1- ε corresponds to # processors (=p) sub-linear speedup What is the theoretically optimal load L = f(m,p)? What is the theoretically optimal load L = f(m,p)? Is this the right question in the field? Is this the right question in the field? 15
Join of Two Tables Join(x,y,z) = R(x,y) ∧ S(y,z) z y x 1 1 |R| = |S| = m tuples ρ * = 2 In the field: • Hash-join on y: L = m / p (w/o skew) • Broadcast-join: L ≈ m In theory: L ≥ m / p 1/2
|R| = |S| = |T| = m tuples Triangles Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) State of the art: • Hash-join, two rounds: • Problem: intermediate result too big! • Broadcast S,T, one round: • Problem: two local tables are huge! 17
Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = m tuples Triangles in One Round • Place servers in a cube p = p 1/3 × p 1/3 × p 1/3 • Each server identified by (i,j,k) Server (i,j,k) Server (i,j,k) (i,j,k) k j [Afrati&Ullman’10] i p 1/3 [Beame’13,’14 ] 18
Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = m tuples Triangles in One Round T Round 1 : Z X Send R(x,y) to all servers (h 1 (x),h 2 (y),*) Fred Alice Send S(y,z) to all servers (*, h 2 (y), h 3 (z)) S Y Z Send T(z,x) to all servers (h 1 (x), *, h 3 (z)) Jack Jack Jim Jim Output : Fred Alice R Fred Jim compute locally R(x,y) ∧ S(y,z) ∧ T(z,x) X Y Jack Jim Carol Alice Fred Alice Fred Jim … Jack Jim Carol Alice Fred Fred Jim Jim Jim Jim Jack Jack Jim Jack Carol Alice Fred Jim (i,j,k) Fred Jim … Jim Jack Fred Jim Fred Jim Jim Jack Fred Jim Jack Jim Fred Jim Jack Jim k j = h 1 (Jim) Jim Jack p 1/3 19 i = h 2 (Fred)
Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = m tuples Communication load per server Theorem Assuming “no skew”, HyperCube computes Triangles with L = O(m/p 2/3 ) w.h.p. Can we compute Triangles with L = m/p? No! Theorem Any 1-round algo. has L = Ω (m/p 2/3 ), even on inputs with no skew. 20
Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = 1.1M 1.1M triples of Twitter data 220k triangles; p=64 local 1 or 2-step hash-join; local 1-step Leapfrog Trie-join (a.k.a. Generic-Join) Total CPU time Number of tuples shuffled Wall clock time 2 rounds hash-join 1 round broadcast 1 round hypercube 21
Triangles(x,y,z) = R(x,y) ∧ S(y,z) ∧ T(z,x) |R| = |S| = |T| = 1.1M 1.1M triples of Twitter data 220k triangles; p=64
General Case Theorem The optimal load for computing Q in one-round on skew-free data is L = O(m / p 1/ τ * ) τ * = fractional vertex cover of Q’s hypergraph 0 1 0 ½ m / p 2/3 τ * = 3/2 m / p τ * = 1 ½ ½ Thm. Any r-round algorithm has L ≥ m / r*p 1/ ρ * ρ * = fractional edge cover of Q’s hypergraph 1 1 m / p 2/3 ρ * = 3/2 ½ ½ m / p 1/2 ρ * = 2 ½
Skew • Skewed data is major impediment to parallel data processing • Practical solutions: – Deal with stragglers, hope they eventually terminate – Remove heavy hitters from computation • Our approach: – Query Residual Query – Join R(x,y) ∧ S(y,z) Cartesian Product R(x) ∧ S(z) 24
Skewed Values New Query Join(x,y,z) = R(x,y) ∧ S(y,z) No-skew : 1 2 p τ * = 1, L = m/p 1 2 p ½ 1 Skewed: (y = single value, degree = m) 2 Join becomes Product(x,z) = R(x) ∧ S(z) τ * = 2, L = m/p 1/2 R(x) p ½ S(z)
Recommend
More recommend