changing the face of database cloud
play

Changing the Face of Database Cloud Services with Personalized - PowerPoint PPT Presentation

Changing the Face of Database Cloud Services with Personalized Service Level Agreements Jennifer Ortiz, Victor Teixeira de Almeida, Magdalena Balazinska University of Washington, Computer Science and Engineering PETROBRAS S.A., Rio de Janerio,


  1. Changing the Face of Database Cloud Services with Personalized Service Level Agreements Jennifer Ortiz, Victor Teixeira de Almeida, Magdalena Balazinska University of Washington, Computer Science and Engineering PETROBRAS S.A., Rio de Janerio, RJ, Brazil CIDR 2015 1

  2. 2

  3. Many Data Management & Analytics Systems Available 3

  4. Many Systems are Available as Cloud Services 4

  5. Cloud Services Today Amazon EMR Which Hadoop Version? Pig or Hive? How many instances of the service? 5

  6. Cloud Services Today Amazon EMR Which Hadoop Version? Pig or Hive? How many instances of the service? 6

  7. Cloud Services Today BigQuery How long will my query take? 7

  8. Cloud Services Can Do Better! System Internals 8

  9. Cloud Services Can Do Better! System Internals 9

  10. Cloud Services Can Do Better! Query: • Query Capabilities • Time • Money SELECT … FROM … WHERE … 10

  11. A new proposal Time to Re- think the interface… • Hide details of cluster deployment and resources • Show users monetary costs and performance estimates on their data • Let users pick the desired trade-off between options shown Personalized Service Level Agreements 11

  12. A PSLA Example Fixed, Tier 1: $0.10/hour hourly price Within 20 seconds : Expected SELECT <up to 10 attributes> performance FROM <Fact | Dimension> WHERE <up to 100% of data> Templates Within 1 minute : capture SELECT <up to 5 attributes> capabilities FROM <JOIN Fact + 4 Dimensions> WHERE <up to 10% of data> Within 10 minutes : SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data> 12

  13. A PSLA Example Fixed, Tier 1: $0.10/hour Tier 2: $0.50/hour hourly price Within 20 seconds : Within 1 second : Expected SELECT <up to 10 attributes> SELECT <up to 10 attributes> performance FROM <Fact | Dimension> FROM <Fact | Dimension> WHERE <up to 100% of data> WHERE <up to 100% of data> Templates Within 1 minute : capture SELECT <up to 5 attributes> capabilities Different tiers FROM <JOIN Fact + 4 Dimensions> of service WHERE <up to 10% of data> Within 10 minutes : SELECT <up to 10 attributes> FROM <JOIN Fact + 8 Dimensions> WHERE <up to 100% of data> 13

  14. Goals Cloud C Database D 14

  15. Goals Cloud C PSLA P Database D Money, Time, Capabilities 15

  16. Goals Cloud C PSLA P PSLAManager Database D Money, Time, Capabilities 16

  17. Goals Cloud C PSLA P PSLAManager Database D Money, Time, Capabilities 17

  18. Example of a Real PSLA 18

  19. TPC-H Star Schema Benchmark • Based on TPC-H • 10GB 19

  20. Myria is a data management service in the cloud that we built at UW. It has a parallel, shared-nothing back-end query execution engine called MyriaX 20

  21. PSLA for Myria 21

  22. PSLA for Myria 22

  23. PSLA for Myria 23

  24. PSLA for Myria 24

  25. PSLA for Myria 25

  26. PSLAManager Cloud C PSLA P PSLAManager Database D 26

  27. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 27

  28. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 28

  29. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 29

  30. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 30

  31. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 31

  32. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 32

  33. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 33

  34. Query Workload Generation • Which queries to generate? – Joins drive performance • Think about possible combinations of joins Consider: All possible 2-way joins Tables in Order by Size: Lineitem, Part, Customer, Supplier, Date (Lineitem Part) (Customer Date), (Lineitem Supplier), etc. – Only consider most expensive queries – Build toward more complex queries, include selections and projections 34

  35. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 35

  36. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 36

  37. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 37

  38. Tier Selection Runtime Distributions of Query Workload Per Configuration in Myria 350 300 250 EMD Seconds 200 17.43 Seconds EMD 150 7.07 EMD 100 13.74 50 0 0 2 4 6 8 10 12 14 16 18 Workers Workers (Configurations) 38

  39. Tier Selection 39

  40. PSLAManager Workflow Service Perf. Modeling Perform Offline Perform Online Workload Runtime Tier Selection Data Generation Prediction Workload Compression into PSLA (repeat for each tier) Dropping Query Template PSLA Queries with Clustering Generation Similar Times 40

  41. Workload Compression STEP 1: Query Clustering • Threshold-based • Density-based 0 2 4 6 8 10 12 14 16 18 Workers 41

  42. Workload Compression STEP 1: Query Clustering th 0 2 4 6 8 10 12 14 16 18 Workers 42

  43. Workload Compression STEP 1: Query Clustering 0 2 4 6 8 10 12 14 16 18 Workers 43

  44. Workload Compression STEP 1: Query Clustering Tier 1: $0.XX/hour Within Cluster Max Threshold : Query Templates in Cluster… Within Cluster Max Threshold : Query Templates in Cluster… 44

  45. Workload Compression STEP 2: Template Generation Queries  Query Templates Query Dominance SELECT (5 ATT) FROM (5 TABLES) WHERE 10% Time(s) SELECT (4 ATT) FROM (4 TABLES) WHERE 1% Configuration 45

  46. Workload Compression STEP 2: Template Generation Attributes Projected Tables Query Dominance SELECT (5 ATT) SELECT (4 ATT) FROM (5 TABLES) FROM (4 TABLES) WHERE 10% WHERE 1% Time(s) Selectivity Configuration 46

  47. Workload Compression STEP 2: Template Generation Attributes Projected Tables Query Dominance SELECT (5 ATT) SELECT (4 ATT) FROM (5 TABLES) FROM (4 TABLES) WHERE 10% WHERE 1% Time(s) Selectivity Given: Configuration 47

  48. Workload Compression STEP 2: Template Generation Time(s) Configuration 48

  49. Workload Compression STEP 2: Template Generation Time(s) Root Query Template : We call a query template a root query template if no other query template in the same cluster dominates it. Configuration 49

  50. Workload Compression STEP 3: Dropping Queries with Similar Times Time(s) Configuration 1 50 50

  51. Workload Compression STEP 3: Dropping Queries with Similar Times Root Query Templates Queries  Query Templates Time(s) Configuration 2 Configuration 1 51 51

  52. Workload Compression STEP 3: Dropping Queries with Similar Times Time(s) Configuration 2 Configuration 1 52 52

  53. Workload Compression STEP 3: Dropping Queries with Similar Times Time(s) Configuration 2 Configuration 1 53 53

  54. Workload Compression STEP 3: Dropping Queries with Similar Times Time(s) Configuration 2 Configuration 1 54 54

  55. PSLA Quality Assessment 55

  56. PSLA Quality Metrics • PSLA Query Capabilities • PSLA Complexity • PSLA Performance Error Metric 56

Recommend


More recommend