building an elastic main memory database e store
play

Building an Elastic Main-Memory Database: E-Store AARON J. ELMORE - PowerPoint PPT Presentation

Building an Elastic Main-Memory Database: E-Store AARON J. ELMORE AELMORE@CS.UCHICAGO.EDU Collaboration Between Many Rebecca Taft, Vaibhav Arora, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andy Pavlo, Amr


  1. Building an Elastic Main-Memory Database: E-Store AARON J. ELMORE AELMORE@CS.UCHICAGO.EDU

  2. Collaboration Between Many Rebecca Taft, Vaibhav Arora, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andy Pavlo, Amr El Abbadi, Divy Agrawal, Michael Stonebraker E-Store @ VLDB 2015, Squall @ SIGMOD 2015, E-Store++ @ ????

  3. Databases are Great Developer ease via ACID Turing Award winning great

  4. But they are Rigid and Complex

  5. Growth… Rapid growth of some web services led to Average Millions of Active Users design of new “web-scale” databases… 350 300 250 200 150 100 50 0 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 09 09 10 10 10 10 11 11 11 11 12 12 12

  6. Rise of NoSQL Scaling is needed Chisel away at functionality ◦ No transactions ◦ No secondary indexes ◦ Minimal recovery ◦ Mixed Consistency Not always suitable…

  7. Workloads Fluctuate Resources Demand Time Slide Credits: Berkeley RAD Lab

  8. Peak Provisioning Resources Capacity Demand Time Unused resources Slide Credits: Berkeley RAD Lab

  9. Peak Provisioning isn’t Perfect Resources Time Slide Credits: Berkeley RAD Lab

  10. Growth is not always sustained Average Millions of Active Users 350 300 250 200 150 100 50 0 Q3 09 Q4 09 Q1 10 Q2 10 Q3 10 Q4 10 Q1 11 Q2 11 Q3 11 Q4 11 Q1 12 Q2 12 Q3 12 Q4 12 Q1 13 Q2 13 Q3 13 Q4 13 Q1 14 Q2 14 Q3 14 Q4 14 http://www.statista.com/statistics/273569/monthly-active-users-of-zynga-games/

  11. Need Elasticity ELASTICITY > SCALABILITY

  12. The Promise of Elasticity Resources Capacity Demand Time Unused resources Slide Credits: Berkeley RAD Lab

  13. Primary use-cases for elasticity Database-as-a-Service with elastic placement of non- correlated tenants, often low utilization per tenant. High-throughput transactional systems (OLTP)

  14. No Need to Weaken the Database!

  15. High Throughput = Main Memory Cost per GB for RAM is dropping. Network memory is faster than local disk. Much faster than disk based DBs.

  16. Approaches for “NewSQL” main- memory * Highly concurrent, latch-free data structures Partitioning into single-threaded executors *Excuse the generalization

  17. Procedure Name Input Parameters Client Application Slide Credits: Andy Pavlo

  18. Database Partitioning TPC-C Schema Schema Tree WAREHOUSE WAREHOUSE ITEM DISTRICT STOCK DISTRICT STOCK CUSTOMER CUSTOMER ORDERS ITEM Replicated ORDERS ORDER_ITEM ORDER_ITEM Slide Credits: Andy Pavlo

  19. Database Partitioning Schema Tree Partitions P1 P2 P3 P4 P5 WAREHOUSE P1 P2 P1 P2 P3 P4 P5 P1 P2 P3 P4 P5 ITEM ITEM DISTRICT STOCK P1 P2 P3 P4 P5 CUSTOMER P3 P4 ITEM ITEM P1 P2 P3 P4 P5 ORDERS ITEM ITEM ITEMj ITEM ITEM ITEM Replicated P5 P1 P2 P3 P4 P5 ITEM ORDER_ITEM Slide Credits: Andy Pavlo

  20. The Problem: Workload Skew Many OLTP applications suffer from variable load and high skew: Extreme Skew: 40-60% of NYSE trading volume is on 40 individual stocks Time Variation: Load “follows the sun” Seasonal Variation: Ski resorts have high load in the winter months Load Spikes: First and last 10 minutes of trading day have 10X the average volume Hockey Stick Effect: A new application goes “viral”

  21. The Problem: Workload Skew No Skew Uniform data access Low Skew 2/3 of queries access top 1/3 of data High Skew Few very hot items

  22. The Problem: Workload Skew High skew increases latency by 10X and decreases throughput by 4X Partitioned shared-nothing systems are especially susceptible

  23. The Problem: Workload Skew Possible solutions: o Provision resources for peak load (Very expensive and brittle!) o Limit load on system (Poor performance!) o Enable system to elastically scale in or out to dynamically adapt to changes in load

  24. Elastic Scaling Key Value Key Value Key Value 1 kjfhlaks 5 wekjxmn 9 mznjku 2 ewoiej 6 fmeixtm 10 ewrenx 3 fsakjfkf 7 ewjkdm 11 qieucm 4 fkjwei 8 weiuqw 12 roiwrio Key Value Key Value Key Value Key Value 1 kjfhlaks 5 wekjxmn 9 mznjku 11 qieucm 2 ewoiej 6 fmeixtm 10 ewrenx 12 roiwrio 3 fsakjfkf 7 ewjkdm 4 fkjwei 8 weiuqw

  25. Load Balancing Key Value Key Value Key Value 1 kjfhlaks 5 wekjxmn 9 mznjku 2 ewoiej 6 fmeixtm 10 ewrenx 3 fsakjfkf 7 ewjkdm 11 qieucm 4 fkjwei 8 weiuqw 12 roiwrio Key Value Key Value Key Value 1 kjfhlaks 5 wekjxmn 9 mznjku 2 ewoiej 6 fmeixtm 3 fsakjfkf 7 ewjkdm 4 fkjwei 8 weiuqw 10 ewrenx 12 roiwrio 11 qieucm

  26. Two-Tiered Partitioning What if only a few specific tuples are very hot? Deal with them separately! Two tiers: 1. Individual hot tuples, mapped explicitly to partitions 2. Large blocks of colder tuples, hash- or range-partitioned at coarse granularity Possible implementations: Fine-grained range partitioning o Consistent hashing with virtual nodes o Lookup table combined with any standard partitioning scheme o Existing systems are “one-tiered” and partition data only at course granularity Unable to handle cases of extreme skew o

  27. E-Store End-to-end system which extends H-Store (a distributed, shared-nothing, main memory DBMS) with automatic, adaptive, two-tiered elastic partitioning

  28. E-Store Normal Load operation, Reconfiguration imbalance high level complete detected monitoring Tuple level Online reconfiguration monitoring (Squall) (E-Monitor) Tuple placement New partition Hot tuples, planning plan partition-level (E-Planner) access counts

  29. E-Monitor: High-Level Monitoring High level system statistics collected every ~1 minute o CPU indicates system load, used to determine whether to add or remove nodes, or re- shuffle the data o Accurate in H-Store since partition executors are pinned to specific cores o Cheap to collect o When a load imbalance (or overload/underload) is detected, detailed monitoring is triggered

  30. E-Monitor: Tuple-Level Monitoring Tuple-level statistics collected in case of load imbalance Finds the top 1% of tuples accessed per partition (read or written) during a 10 second window o Finds total access count per block of cold tuples o Can be used to determine workload distribution, using tuple access count as a proxy for system load Reasonable assumption for main-memory DBMS w/ OLTP workload o Minor performance degradation during collection

  31. E-Monitor: Tuple-Level Monitoring Sample output

  32. E-Planner Given current partitioning of data, system statistics and hot tuples/partitions from E-Monitor, E- Planner determines: Whether to add or remove nodes o How to balance load o Optimization problem: minimize data movement (migration is not free) while balancing system load. We tested five different data placement algorithms: One-tiered bin packing (ILP – computationally intensive!) o Two-tiered bin packing (ILP – computationally intensive!) o First Fit (global repartitioning to balance load) o Greedy (only move hot tuples) o Greedy Extended (move hot tuples first, then cold blocks until load is balanced) o

  33. E-Planner: Greedy Extended Algorithm Hot Accesses Range Accesses tuples 3-1000 5,000 0 20,000 1000-2000 3,000 1 12,000 2000-3000 2,000 New YCSB partition 2 5,000 … … plan "usertable": { Current YCSB partition plan ? 0: [1000-100000) 1: [1-2),[100000-200000) "usertable": { 2: [200000-300000),[0-1), [2-1000) 0: [0-100000) 1: [100000-200000) } 2: [200000-300000) }

  34. E-Planner: Greedy Extended Algorithm Target cost per Partition Keys Total Cost (tuple partition: 35,000 accesses) 0 [0-100000) 77,000 1 [100000-200000) 23,000 2 [200000-300000) 5,000 Hot Accesses Range Accesses tuples 3-1000 5,000 0 20,000 1000-2000 3,000 1 12,000 2000-3000 2,000 2 5,000 … …

  35. E-Planner: Greedy Extended Algorithm Target cost per Partition Keys Total Cost (tuple partition: 35,000 accesses) 0 [0-100000) 77,000 1 [100000-200000) 23,000 2 [200000-300000) 5,000 Hot Accesses Range Accesses tuples 3-1000 5,000 0 20,000 1000-2000 3,000 1 12,000 2000-3000 2,000 2 5,000 … …

  36. E-Planner: Greedy Extended Algorithm Target cost per Partition Keys Total Cost (tuple partition: 35,000 accesses) 0 [1-100000) 57,000 1 [100000-200000) 23,000 2 [200000-300000),[0- 25,000 1) Hot Accesses Range Accesses tuples 3-1000 5,000 0 20,000 1000-2000 3,000 1 12,000 2000-3000 2,000 2 5,000 … …

  37. E-Planner: Greedy Extended Algorithm Target cost per Partition Keys Total Cost (tuple partition: 35,000 accesses) 0 [1-100000) 57,000 1 [100000-200000) 23,000 2 [200000-300000),[0- 25,000 1) Hot Accesses Range Accesses tuples 3-1000 5,000 0 20,000 1000-2000 3,000 1 12,000 2000-3000 2,000 2 5,000 … …

  38. E-Planner: Greedy Extended Algorithm Target cost per Partition Keys Total Cost (tuple partition: 35,000 accesses) 0 [1-100000) 57,000 1 [100000-200000) 23,000 2 [200000-300000),[0- 25,000 1) Hot Accesses Range Accesses tuples 3-1000 5,000 0 20,000 1000-2000 3,000 1 12,000 2000-3000 2,000 2 5,000 … …

Recommend


More recommend