Research Collection https://doi.org/10.3929/ethz-b-000294819 information please consult the Terms of use. This page was generated automatically upon download from the ETH Zurich Research Collection. For more Creative Commons Attribution 4.0 International Rights / License: http://doi.org/10.1145/3267809.3267822 Originally published in: Permanent Link: Conference Paper 2018-10 Publication Date: Marroquín, Renato; Müller, Ingo; Makreshanski, Darko; Alonso, Gustavo Author(s): Shared Query Execution Pay One, Get Hundreds for Free: Reducing Cloud Costs through ETH Library
Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution Renato Marroquín , Ingo Müller, Darko Makreshanski, Gustavo Alonso | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 1
Motivation: Query-as-a-service Pros Disadvantages No need to move data outside Expensive with frequent usage the cloud No intuitive way to optimize No infrastructure deployment Cost No database maintenance Throughput Pricing model Pay-per-byte-processed Popular Systems | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 2
Motivation: Use cases Parameter exploration BI reporting Ad-hoc analytics Observations: High overlap among concurrent queries Query burst from single user Similar workload from multiple users Work sharing: carry out redundant work only once. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 3
Our Goal Reduce query-as-a-service costs using work sharing techniques through query rewriting | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 4
Outline Related work Multi-query execution Evaluation Summary Conclusions | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 5
Multi Query Execution: General sharing workflow Our Focus Collect Co-plan Separate Co-execute | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 6
Shared execution techniques 2010 – 2017 1988 – 2005 2005 – 2010 Scan Work MQO Sharing Sharing 1 st generation: Crescando 2 nd generation: SharedDB, MQJoin, BatchDB Name Age Sex query_id Annotate tuples with Renato 30 m Q1,Q3 query_id attribute Darko 30 m Q1,Q2,Q3 Supports all relational Ingo 32 m Q1 operators | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 7
Shared execution techniques 2010 – 2017 1988 – 2005 2005 – 2010 Scan Work MQO Sharing Sharing Previous approaches need dedicated engine High implementation effort Needs vendor support This work: rewrite shared query plans as SQL. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 8
Outline Related work Multi-query execution | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 9
Multi Query Execution: Data-query model Name Age Sex SELECT * FROM Emp WHERE … Renato 30 m Darko 30 m Q1: … age < 40 Q2: … name LIKE ‘%k%’ Ingo 32 m Q3: … age <= 30 AND age > 20 Gustavo 50 m Name Age Sex query_id Renato 30 m { 1, 3 } Darko 30 m { 1, 2, 3 } Ingo 32 m { 1 } Gustavo 50 m { } | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 10
Multi Query Execution: The query_id attribute Name: Age: Sex: query_id: VARCHAR INT ENUM INTEGER ARRAY Renato 30 m { 1, 3 } Darko 30 m { 1, 2, 3 } Ingo 32 m { 1 } Result of Qx is given by SELECT * FROM R WHERE x = ANY(query_id) This is standard SQL! | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 11
Shared operators: Shared scan Name Age Sex SELECT * FROM Emp WHERE … Renato 30 m Q1: … age < 40 Darko 30 m Q2: … name LIKE ‘%k%’ Q3: … age <= 30 AND age > 20 Ingo 32 m Gustavo 50 m SELECT *, ARRAY_REMOVE( ARRAY[ CASE WHEN age < 40 THEN 1 ELSE 0 END, CASE WHEN name LIKE ‘%k%’ THEN 2 ELSE 0 END, CASE WHEN age <= 30 AND age > 20 THEN 3 ELSE 0 END ] , 0) AS query_id FROM Emp WHERE (age < 40) OR (name LIKE ‘%k%’) OR (age <= 30 AND age > 20) | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 12
Shared operators: Shared join Departments Employees Name Age Sex D_ID query_id D_ID D_Name query_id Renato 30 m 1 { 1, 3 } 1 Systems { 2, 3 } Darko 30 m 1 { 1, 2, 3 } 2 Algorithms { 2 } Ingo 32 m 1 { 1 } 3 ML { 1, 2, 3 } SELECT * FROM Emp e JOIN Dep d ON e.did = d.did WHERE … Name Age Sex D_ID D_Name query_id Renato 30 m 1 Systems { 3 } Darko 30 m 1 Systems { 2, 3 } | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 13
Shared operators: Shared join Departments Employees Name Age Sex D_ID query_id D_ID D_Name query_id Renato 30 m 1 { 1, 3 } 1 Systems { 2, 3 } Darko 30 m 1 { 1, 2, 3 } 2 Algorithms { 2 } Ingo 32 m 1 { 1 } 3 ML { 1, 2, 3 } WITH shared_emp AS (…), -- shared scan on emp shared_dep AS (…), -- shared scan on dep shared_join_helper AS ( SELECT R.name, R.age, R.sex, S.d_id, S.d_name, ARRAY_INTERSECT(R.query_id, S.query_id) AS query_id FROM shared_emp e JOIN shared_dep d ON e.d_id = d.d_id) SELECT * FROM shared_join_helper WHERE CARDINALITY(query_id) > 0 | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 14
Shared operators: Other shared operators GROUP BY Use UNNEST to duplicate tuples for each query Then group by query_id LIMIT / TOP K PARTITION BY query_id Then filter by RANK() Enough to express all TPC-H queries! | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 15
Outline Related work Multi-query execution Evaluation Summary Conclusions | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 16
Evaluation: Query-as-a-Service systems Systems-under-test Amazon Athena Google BigQuery Run SQL queries against files in cloud storage Apache Parquet Google internal columnar storage Micro-benchmarks End-to-end query execution | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 17
Evaluation: Shared scan Data Lineitem from TPC-H Scale Factor 100 Workload Parameterized queries Queries in a group/batch have equal selectivity | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 18
Evaluation: Shared scan Queries Selectivity: Google BigQuery Execution time increases sublinearly with query count. 84GB table, native columnar format | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 19
Evaluation: Shared scan Queries Selectivity: Amazon Athena 27GB, Parquet format Pay per processed byte Monetary cost (almost) independent of batch size. Google Big Query: constant 0.011$ | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 20
Evaluation: TPC-H Data TPC-H Scale Factor 100 Simulate multiple apps interacting with QaaS Each emitting different TPC-H queries Workload 128 instantiations of each query type | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 21
Evaluation: TPC-H Throughput improvement over Query-at-a-Time (batch size = 128) Athena BigQuery 100 Throughput Improvement 10 1 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 TPC-H Query Considerable speedup: up to ~128x. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 22
Evaluation: TPC-H Cost compared to Query-at-a-Time (batch size = 128) Query-at-a-time SharedExecution 1000.00 100.00 Cost [$] 16x 10.00 128x 1.00 107x $80 $240 ~$14 <$1 $227 ~$1.7 0.10 Athena BigQuery (all) BigQuery(without Q10) Again: cost (almost) independent of batch size. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 23
Outline Related work Multi-query execution Evaluation Summary Conclusions | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 24
Summary Shared query execution through query rewriting Rewriting can be done with standard SQL Evaluate our approach by executing queries end-to-end Show improvements in execution cost and throughput Conclusion Sharing can be implemented in: Client Library Middleware | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 25
Thanks! | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 26
Backup Slides | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 27
Evaluation: TPC-H Relative cost compared to Query-at-a-Time (batch size = 128) 100 Cost Improvement 10 16x Query-at-a-time SharedExecution 107x 1 128x $80 $240 ~$14 <$1 $227 ~$1.7 0.1 Athena BigQuery (all) BigQuery (without Q10) Again: cost (almost) independent of batch size. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 28
Shared execution techniques 2010 – 2017 1988 – 2005 2005 – 2010 Scan Work MQO Sharing Sharing row value First generation: Crescando Index predicates to reduce work idx Lower complexity of work Only scans … q1 q2 q3 | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 29
Shared plans Sal Emp Dep Emp Dep Emp Dep 𝜏 𝜏 sex = “m” age > 30 𝜏 ⨝ ⨝ d_id d_id Q1 Q2 ⨝ ⨝ Sal Sal Emp Emp e_id d_id 𝜏 𝜏 sex = “f” age < 50 ⨝ ⨝ e_id e_id Q3,Q4 Q1,Q2 Q3 Q4 | | Marroquín, Müller, Makreshanski, Alonso 30.05.2018 30
Recommend
More recommend