Changing the Face of Database Cloud Services with Personalized Service Level Agreements Jennifer Ortiz, Victor Teixeira de Almeida, Magdalena Balazinska University of Washington, Computer Science and Engineering PETROBRAS S.A., Rio de Janerio, RJ, Brazil CIDR 2015 1
2
Many Data Management & Analytics Systems Available 3
Many Systems are Available as Cloud Services 4
Cloud Services Today Amazon EMR Which Hadoop Version? Pig or Hive? How many instances of the service? 5
Cloud Services Today Amazon EMR Which Hadoop Version? Pig or Hive? How many instances of the service? 6
Cloud Services Today BigQuery How long will my query take? 7
Cloud Services Can Do Better! System Internals 8
Cloud Services Can Do Better! System Internals 9
Cloud Services Can Do Better! Query: • Query Capabilities • Time • Money SELECT … FROM … WHERE … 10
A new proposal Time to Re- think the interface… • Hide details of cluster deployment and resources • Show users monetary costs and performance estimates on their data • Let users pick the desired trade-off between options shown Personalized Service Level Agreements 11
A PSLA Example Fixed, Tier 1: $0.10/hour hourly price Within 20 seconds : Expected SELECT <up to 10 attributes> performance FROM <Fact | Dimension> WHERE <up to 100% of data> Templates Within 1 minute : capture SELECT <up to 5 attributes> capabilities FROM <JOIN Fact + 4 Dimensions> WHERE <up to 10% of data> Within 10 minutes : SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data> 12
A PSLA Example Fixed, Tier 1: $0.10/hour Tier 2: $0.50/hour hourly price Within 20 seconds : Within 1 second : Expected SELECT <up to 10 attributes> SELECT <up to 10 attributes> performance FROM <Fact | Dimension> FROM <Fact | Dimension> WHERE <up to 100% of data> WHERE <up to 100% of data> Templates Within 1 minute : capture SELECT <up to 5 attributes> capabilities Different tiers FROM <JOIN Fact + 4 Dimensions> of service WHERE <up to 10% of data> Within 10 minutes : SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data> 13
Goals Cloud C Database D 14
Goals Cloud C PSLA P Database D Money, Time, Capabilities 15
Goals Cloud C PSLA P PSLAManager Database D Money, Time, Capabilities 16
Goals Cloud C PSLA P PSLAManager Database D Money, Time, Capabilities 17
Example of a Real PSLA 18
TPC-H Star Schema Benchmark • Based on TPC-H • 10GB 19
Myria is a data management service in the cloud that we built at UW. It has a parallel, shared-nothing back-end query execution engine called MyriaX 20
PSLA for Myria 21
PSLA for Myria 22
PSLA for Myria 23
PSLA for Myria 24
PSLA for Myria 25
PSLAManager Cloud C PSLA P PSLAManager Database D 26
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 27
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 28
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 29
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 30
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 31
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 32
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 33
Query Workload Generation • Which queries to generate? – Joins drive performance • Think about possible combinations of joins Consider: All possible 2-way joins Tables in Order by Size: Lineitem, Part, Customer, Supplier, Date (Lineitem Part) (Customer Date), (Lineitem Supplier), etc. – Only consider most expensive queries – Build toward more complex queries, include selections and projections 34
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 35
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 36
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 37
Tier Selection Runtime Distributions of Query Workload Per Configuration in Myria 350 300 250 EMD Seconds 200 17.43 Seconds EMD 150 7.07 EMD 100 13.74 50 0 0 2 4 6 8 10 12 14 16 18 Workers Workers (Configurations) 38
Tier Selection 39
PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 40
Workload Compression STEP 1: Query Clustering • Threshold-based • Density-based 0 2 4 6 8 10 12 14 16 18 Workers 41
Workload Compression STEP 1: Query Clustering th 0 2 4 6 8 10 12 14 16 18 Workers 42
Workload Compression STEP 1: Query Clustering 0 2 4 6 8 10 12 14 16 18 Workers 43
Workload Compression STEP 1: Query Clustering Tier 1: $0.XX/hour Within Cluster Max Threshold : Query Templates in Cluster… Within Cluster Max Threshold : Query Templates in Cluster… 44
Workload Compression STEP 2: Template Generation Queries Query Templates Query Dominance SELECT (5 ATT) FROM (5 TABLES) WHERE 10% Time(s) SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Configuration 45
Workload Compression STEP 2: Template Generation Attributes Projected Tables Query Dominance SELECT (5 ATT) SELECT (4 ATT) FROM (5 TABLES) FROM (4 TABLES) WHERE 10% WHERE 1% Time(s) Selectivity Configuration 46
Workload Compression STEP 2: Template Generation Attributes Projected Tables Query Dominance SELECT (5 ATT) SELECT (4 ATT) FROM (5 TABLES) FROM (4 TABLES) WHERE 10% WHERE 1% Time(s) Selectivity Given: Configuration 47
Workload Compression STEP 2: Template Generation Time(s) Configuration 48
Workload Compression STEP 2: Template Generation Time(s) Root Query Template : We call a query template a root query template if no other query template in the same cluster dominates it. Configuration 49
Workload Compression STEP 3: Dropping Queries with Similar Times Time(s) Configuration 1 50 50
Workload Compression STEP 3: Dropping Queries with Similar Times Root Query Templates Queries Query Templates Time(s) Configuration 2 Configuration 1 51 51
Workload Compression STEP 3: Dropping Queries with Similar Times Time(s) Configuration 2 Configuration 1 52 52
Workload Compression STEP 3: Dropping Queries with Similar Times Time(s) Configuration 2 Configuration 1 53 53
Workload Compression STEP 3: Dropping Queries with Similar Times Time(s) Configuration 2 Configuration 1 54 54
PSLA Quality Assessment 55
PSLA Quality Metrics • PSLA Query Capabilities • PSLA Complexity • PSLA Performance Error Metric 56
Recommend
More recommend