WAN Aware Query Optimization DC 2 QUERY 40 Gbps 80 Gbps T2 SELECT T1.user, T1.latency, T2.latency, T3.latency 40 s 1 s WAN-only FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user bottleneck AND T1.device == T2.device == T3.device == “mobile”; 100 Gbps DC 3 DC 1 T1 T3 T1, T2, T3: Tables storing click logs WAN-aware query optimizer that uses network transfer Plan running time: 41 s Plan running time: 20.96 s Plan running time: 17.6 s duration to choose query plans ⋈ ⋈ ⋈ 10 GB 200 GB 12 GB 200 GB 16 GB 200 GB ⋈ ⋈ ⋈ 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 200 GB 200 GB 200 GB 200 GB 200 GB 200 GB 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T3 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T1 T2 Chosen by network T2 T3 T2 T1 T1 T3 agnostic query optimizer Plan C Plan A Plan B 5
Outline 1. Motivation 2. Challenges in choosing query plan based on WAN transfer durations 3. Solution • Single query • Multiple simultaneous queries 4. Experimental Evaluation 6
Other factors also affect query plan run time DC 2 40 Gbps T2 80 Gbps 100 Gbps DC 3 DC 1 T1 T3 7
Other factors also affect query plan run time ⋈ DC 2 200 GB 40 Gbps 200 GB T2 𝜏 𝐷 𝜏 𝐷 80 Gbps 100 Gbps DC 3 DC 1 T1 T2 T1 T3 7
Other factors also affect query plan run time REDUCE: JOIN ⋈ DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job 7
Other factors also affect query plan run time Tasks REDUCE: JOIN placed in ⋈ single DC DC 2 200 GB 40 Gbps 200 GB 20 s T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job 7
Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 10 s 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job 7
Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 2. Plan B: 20.96 3. Plan C: 17.6 s 7
Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7
Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 Used by high T1 T2 priority application Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7
Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 Used by high T1 T2 priority application Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7
Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 Choose query plan based on: DC 1 T1 T2 T1 T3 1. Best available task placements Used by high T1 T2 priority application Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7
Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 Choose query plan based on: DC 1 T1 T2 T1 T3 1. Best available task placements Used by high T1 T2 priority application 2. Schedule of network transfers Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7
Joint plan selection, placement and scheduling 8
Joint plan selection, placement and scheduling SELECT * FROM … WHERE.. ; Multiple query plans (join orders) per Query Optimizer query 8
Joint plan selection, placement and scheduling SELECT * FROM … WHERE.. ; Multiple query plans (join orders) per Query Optimizer query Assign parallelism for each stage Logical plan to physical plan 8
Joint plan selection, placement and scheduling SELECT * FROM … WHERE.. ; Multiple query plans (join orders) per Query Optimizer query Assign parallelism for each stage Logical plan to physical plan Network aware task placement and scheduling for each query plan Clarinet Choose plan with smallest run time for execution 8
Joint plan selection, placement and scheduling SELECT * FROM … WHERE.. ; Multiple query plans (join orders) per Query Optimizer query Clarinet binds query to plan lower in the stack Assign parallelism for each stage Logical plan to physical plan Network aware task placement and scheduling for each query plan Clarinet Choose plan with smallest run time for execution 8
Network aware placement and scheduling JOIN JOIN SELECT SELECT SELECT T2 T1 T3 9
Network aware placement and scheduling • Task placement decided greedily one stage at a time JOIN • Minimize per stage run time JOIN SELECT SELECT SELECT T2 T1 T3 9
Network aware placement and scheduling • Task placement decided greedily one stage at a time JOIN • Minimize per stage run time JOIN SELECT • Scheduling of network transfers • Determines start times of inter-DC network transfers SELECT SELECT T2 T1 T3 9
Network aware placement and scheduling • Task placement decided greedily one stage at a time JOIN • Minimize per stage run time JOIN SELECT • Scheduling of network transfers • Determines start times of inter-DC network transfers • SELECT SELECT Formulate a Binary Integer Linear Program to solve T2 scheduling • Factors transfer dependencies T1 T3 9
How to extend the late-binding strategy to multiple queries? 10
Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 T1 T3 11
Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 T1 T3 QUERY 1 SELECT … device == “mobile” … ; QUERY 2 SELECT … genre == “pc” … ; 11
Queries affect each others’ run time ⋈ ⋈ DC 2 200 GB 16 GB 16 GB 200 GB 40 Gbps 80 Gbps T2 ⋈ 𝜏 𝐷 ⋈ 𝜏 𝑆 200 GB 200 GB 200 GB 200 GB 100 Gbps DC 3 DC 1 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T2 T1 T3 T1 T3 T1 T3 QUERY 1 Same query plan (Plan C) for Query 1 and Query 2 SELECT … device == “mobile” … ; QUERY 2 SELECT … genre == “pc” … ; 11
Queries affect each others’ run time ⋈ ⋈ DC 2 200 GB 16 GB 16 GB 200 GB 40 Gbps 80 Gbps T2 ⋈ 𝜏 𝐷 ⋈ 𝜏 𝑆 200 GB 200 GB 200 GB 200 GB 100 Gbps DC 3 DC 1 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T2 T1 T3 T1 T3 Contention increases query run time T1 T3 QUERY 1 Same query plan (Plan C) for Query 1 and Query 2 SELECT … device == “mobile” … ; QUERY 2 SELECT … genre == “pc” … ; 11
Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 T1 T3 Different query plans for Query 1 (Plan C) and Query 2 (Plan B) ⋈ ⋈ QUERY 1 SELECT … 16 GB 200 GB 12 GB 200 GB device == “mobile” … ; ⋈ ⋈ 𝜏 𝐷 𝜏 𝐷 200 GB 200 GB 200 GB 200 GB QUERY 2 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 SELECT … 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T3 genre == “pc” … ; T1 T3 T2 T1 11
Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 T1 T3 Different query plans for Query 1 (Plan C) No contention of network links and Query 2 (Plan B) ⋈ ⋈ QUERY 1 SELECT … 16 GB 200 GB 12 GB 200 GB device == “mobile” … ; ⋈ ⋈ 𝜏 𝐷 𝜏 𝐷 200 GB 200 GB 200 GB 200 GB QUERY 2 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 SELECT … 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T3 genre == “pc” … ; T1 T3 T2 T1 11
Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 Choosing execution plans jointly for multiple T1 T3 Different query plans for Query 1 (Plan C) No contention of network links and Query 2 (Plan B) queries improves performance ⋈ ⋈ QUERY 1 SELECT … 16 GB 200 GB 12 GB 200 GB device == “mobile” … ; ⋈ ⋈ 𝜏 𝐷 𝜏 𝐷 200 GB 200 GB 200 GB 200 GB QUERY 2 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 SELECT … 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T3 genre == “pc” … ; T1 T3 T2 T1 11
Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination minimize average completion QO QO QO • Computationally intractable 12
Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination minimize average completion QO QO QO • Computationally intractable • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration 12
Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination minimize average completion QO QO QO • Computationally intractable Iter 1: 10 18 12 5 8 20 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration 12
Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination minimize average completion QO QO QO • Computationally intractable Iter 1: 10 18 12 5 8 20 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration 12
Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination minimize average completion QO QO QO • Computationally intractable Iter 1: 10 18 12 5 8 20 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 Reserve bandwidth to guarantee completion time Link 2 t 0 5 12
Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination minimize average completion QO QO QO • Computationally intractable • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 Reserve bandwidth to guarantee completion time Link 2 t 0 5 12
Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination minimize average completion QO QO QO • Computationally intractable Iter 2: 15 18 17 25 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 Reserve bandwidth to guarantee completion time Link 2 t 0 5 12
Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination minimize average completion QO QO QO • Computationally intractable Iter 2: 15 18 17 25 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 Reserve bandwidth to guarantee completion time Link 2 t 0 5 12
Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination minimize average completion QO QO QO • Computationally intractable Iter 2: 15 18 17 25 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 A1 Reserve bandwidth to guarantee completion time A2 Link 2 t 0 5 7 15 12
Avoid fragmentation and improve completion time 13
Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation 13
Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Link 1 B1 A1 A2 Link 2 22 t 10 12 0 13
Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Dominant transfers execute sequentially Link 1 B1 A1 A2 Link 2 22 t 10 12 0 13
Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Dominant transfers execute sequentially Link 1 B1 A1 A2 Link 2 Extended idling 22 t 10 12 0 13
Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Alternate schedule with same query plans Dominant transfers execute sequentially Link 1 B1 A1 A1 B1 A2 A2 Link 2 Extended idling 22 t t 10 12 12 2 0 0 13
Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Alternate schedule with same query plans Re-arranging transfers resulting in deviation from Dominant transfers execute sequentially SJF schedule can help Link 1 B1 A1 A1 B1 A2 A2 Link 2 Extended idling 22 t t 10 12 12 2 0 0 13
k-Shortest Jobs First Heuristic Offline schedule Link n Link 2 Link 1 t 14
k-Shortest Jobs First Heuristic Offline schedule Link n Link 2 Link 1 t • Identify transfers of k-shortest yet incomplete jobs 14
k-Shortest Jobs First Heuristic Offline schedule Link n Link 2 Link 1 t • Identify transfers of k-shortest yet incomplete jobs • Relax transfer schedule Start as soon as link is free and task is available 14
k-Shortest Jobs First Heuristic Offline schedule Link n Link 2 Link 1 t • Identify transfers of k-shortest yet incomplete jobs • Relax transfer schedule Start as soon as link is free and task is available • Best ’k’ Prior observations (or) through offline simulations 14
Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO 15
Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans 15
Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans • QOs control set of generated plans • Existing optimizations are applied • Push down Select • Partition pruning 15
Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans • QOs control set of generated plans • Existing optimizations are applied • Push down Select Clarinet • Partition pruning Enforces Clarinet’s schedule Execution framework 15
Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans • QOs control set of generated plans • Existing optimizations are applied • Push down Select Clarinet • Partition pruning Enforces Clarinet’s schedule Execution framework • Modified Tez’s DAGScheduler 15
Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Online query arrivals Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans • QOs control set of generated plans • Existing optimizations are applied • Push down Select Clarinet • Partition pruning Enforces Clarinet’s schedule Execution framework • Modified Tez’s DAGScheduler • Fairness guarantees 15
Evaluation Compare Clarinet with following GDA approaches: 16
Evaluation Compare Clarinet with following GDA approaches: 1. Hive 2. Hive + Iridium 3. Hive + Reducers in single DC 16
Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium 3. Hive + Reducers in single DC 16
Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium : WAN aware task placement across DCs 3. Hive + Reducers in single DC 16
Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium : WAN aware task placement across DCs 3. Hive + Reducers in single DC : Distributed filtering + central aggregation 16
Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium : WAN aware task placement across DCs 3. Hive + Reducers in single DC : Distributed filtering + central aggregation • Geo-Distributed Analytics stack across 10 EC2 regions 16
Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium : WAN aware task placement across DCs 3. Hive + Reducers in single DC : Distributed filtering + central aggregation • Geo-Distributed Analytics stack across 10 EC2 regions • Workload: • 30 batches of 12 randomly chosen TPC-DS queries 16
Evaluation: Reduction in average completion time GDA Approach Average Gains Vs. Hive Clarinet 2.7x Hive + Iridium 1.5x Hive + Reducers in 0.6x single DC 17
Evaluation: Reduction in average completion time GDA Approach Average Gains Vs. Hive Clarinet 2.7x Hive + Iridium 1.5x Hive + Reducers in 0.6x single DC Clarinet chooses a different plan for 75% of queries 17
Evaluation: Reduction in average completion time 1 WAN GDA Approach Average Gains 0.9 bandwidth 0.8 distribution Vs. Hive 0.7 0.6 CDF Hive bytes 0.5 Clarinet 2.7x distribution 0.4 0.3 Hive + Iridium 1.5x Clarinet bytes 0.2 distribution 0.1 Hive + Reducers in 0.6x 0 1 6 11 16 21 26 31 36 41 46 51 56 single DC Link ID sorted by bandwidth Data from a single batch 12 queries Clarinet chooses a different plan for 75% of queries 17
Evaluation: Reduction in average completion time 1 WAN GDA Approach Average Gains 0.9 bandwidth 0.8 distribution Vs. Hive 0.7 0.6 CDF Hive bytes 0.5 Clarinet 2.7x distribution 0.4 0.3 Hive + Iridium 1.5x Clarinet bytes 0.2 distribution 0.1 Hive + Reducers in 0.6x 0 1 6 11 16 21 26 31 36 41 46 51 56 single DC Link ID sorted by bandwidth Data from a single batch 12 queries Clarinet chooses a different plan for 75% of queries 17
Evaluation: Reduction in average completion time 1 WAN GDA Approach Average Gains 0.9 bandwidth 0.8 distribution Vs. Hive 0.7 0.6 CDF Hive bytes 0.5 Clarinet 2.7x distribution 0.4 0.3 Hive + Iridium 1.5x Clarinet bytes 0.2 distribution 0.1 Hive + Reducers in 0.6x 0 1 6 11 16 21 26 31 36 41 46 51 56 single DC Link ID sorted by bandwidth Data from a single batch 12 queries Clarinet chooses a different plan for 75% of queries 17
Evaluation: Optimization overhead 18
Evaluation: Optimization overhead 1. Generate multiple query plans 2. Iterative multi-query plan selection 18
Recommend
More recommend