in support of workload aware streaming state management
play

IN SUPPORT OF WORKLOAD-AWARE STREAMING STATE MANAGEMENT Vasiliki - PowerPoint PPT Presentation

IN SUPPORT OF WORKLOAD-AWARE STREAMING STATE MANAGEMENT Vasiliki Kalavri John Liagouris vkalavri@bu.edu liagos@bu.edu HotStorage 2020 14 July 2020 STREAMING DATAFLOWS Nexmark Q4: Rolling average of winning bids auctions source


  1. IN SUPPORT OF WORKLOAD-AWARE STREAMING STATE MANAGEMENT Vasiliki Kalavri John Liagouris vkalavri@bu.edu liagos@bu.edu HotStorage 2020 14 July 2020

  2. STREAMING DATAFLOWS Nexmark Q4: “Rolling average of winning bids” auctions source Worker 1 rolling join average sink bids Worker 2 source Logical Dataflow Physical Dataflow Nexmark Streaming Benchmark Suite: https://beam.apache.org/documentation/sdks/java/testing/nexmark/ 2

  3. LARGER-THAN-MEMORY STATE MANAGEMENT Worker 1 put/get <k,v> <k,v> put/get Worker 2 Large operator state is backed by key-value stores 3

  4. LARGER-THAN-MEMORY STATE MANAGEMENT Worker 1 put/get <k,v> <k,v> put/get Worker 2 Large operator state is LSM-based write-optimized backed by key-value stores store with efficient range scans 4

  5. STATE REQUIREMENTS VARY ACROSS OPERATORS Nexmark Q4: “Rolling average of winning bids” auctions source rolling join Join: Write-heavy and can potentially average accumulate large state Average: Read-Modify-Write a single value sink bids source Dataflow operators may have different state access patterns and memory requirements 5

  6. CURRENT PRACTICE: MONOLITHIC STATE MANAGEMENT Worker 1 One key-value store (RocksDB) per stateful operator instance <k,v> <k,v> All key-value stores in the <k,v> <k,v> dataflow are globally-configured Worker 2 6

  7. FLAWS OF MONOLITHIC STATE MANAGEMENT Worker 1 - Oblivious store configuration <k,v> <k,v> - Unnecessary data marshaling - Unnecessary key-value store features <k,v> <k,v> Worker 2 7

  8. UNNECESSARY KEY-VALUE STORE FEATURES - State partitioning All these operations are handled by modern stream processors outside the state store - State scoping - Concurrent access to state Stream processors guarantee single-thread access to state - State checkpointing 8

  9. WORKLOAD-AWARE STREAMING STATE MANAGEMENT Worker 1 Multiple state stores of different store:<u64,auction> types and configurations store:u64 store:<u64,bid> according to the requirements of the stateful operators rmw_u64 put/get Streaming operators are instantiated once and are long-running: their access patterns and state sizes are largely known in advance Worker 2 9

  10. A FLEXIBLE TESTBED FOR STREAMING STATE MANAGEMENT RocksDB : LSM-based - Implemented in Rust with efficient range scans - Based on Timely Dataflow stream processor - Supports two key-value stores - RocksDB - FASTER FASTER : Hybrid log with efficient lookups and in-place updates - Supports different window evaluation strategies Testbed: https://github.com/jliagouris/wassm Timely Dataflow: https://github.com/TimelyDataflow/timely-dataflow 10 FASTER: https://github.com/microsoft/FASTER

  11. 11 EXPERIMENTAL RESULTS

  12. EVALUATION GOALS 1. Study the effect of the backend’s data layout on the evaluation of streaming windows 2. Study the effect of workload-aware configuration on queries with multiple stateful operators 12

  13. EFFECT OF DATA LAYOUT ON WINDOW EVALUATION COUNT-30s-1s �� � ������ - Query 1: Count the number of records in a ��������������� �� �� 30s window that slides every 1s ������������� �� �� - Input rate: 10K records/s ���� �� �� - Single thread execution �� �� �� �� - Report end-to-end latency (ms) per record �� �� �� � �� � �� � �� � ������������ 13

  14. EFFECT OF DATA LAYOUT ON WINDOW EVALUATION COUNT-30s-1s �� � ������ ��������������� �� �� p90 ������������� p99 �� �� Complementary CDF : Each ���� point (x,y) indicates that y% of p99.9 �� �� the latency measurements are … �� �� at least x ms �� �� �� �� �� � �� � �� � �� � ������������ Lower is better 14

  15. EFFECT OF DATA LAYOUT ON WINDOW EVALUATION COUNT-30s-1s �� � ������ ��������������� �� �� ������������� RocksDB PUT/GET: On record , �� �� retrieve window contents, apply ���� �� �� new record, and put the updated �� �� contents back to the store �� �� �� �� �� � �� � �� � �� � ������������ Lower is better 15

  16. EFFECT OF DATA LAYOUT ON WINDOW EVALUATION COUNT-30s-1s �� � ������ ��������������� �� �� ������������� RocksDB MERGE: On record , put �� �� record to the store using MERGE. ���� �� �� The record is applied to the window �� �� contents lazily on trigger �� �� �� �� �� � �� � �� � �� � ������������ Lower is better 16

  17. EFFECT OF DATA LAYOUT ON WINDOW EVALUATION COUNT-30s-1s �� � ������ ��������������� �� �� ������������� �� �� FASTER performs better 100X in p99 ���� �� �� due to in-place updates �� �� �� �� �� �� �� � �� � �� � �� � ������������ Lower is better 17

  18. EFFECT OF DATA LAYOUT ON WINDOW EVALUATION RANK-30s-30s �� � - Query 2: Rank records in a 30s tumbling ������ ��������������� window �� �� ������������� �� �� - Input rate: 1K records/s ���� �� �� - Single thread execution �� �� �� �� - Report end-to-end latency (ms) per record �� �� �� � �� � �� � �� � �� � �� � ������������ Lower is better 18

  19. EFFECT OF DATA LAYOUT ON WINDOW EVALUATION RANK-30s-30s �� � ������ ��������������� �� �� ������������� �� �� RocksDB MERGE performs 100X 1000X ���� �� �� best due to lazy evaluation �� �� �� �� �� �� �� � �� � �� � �� � �� � �� � ������������ Lower is better 19

  20. THERE IS NO CLEAR WINNER COUNT-30s-1s RANK-30s-30s �� � �� � ������ ������ ��������������� ��������������� �� �� �� �� ������������� ������������� �� �� �� �� 100X in p99 100X 1000X ���� ���� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � ������������ ������������ 20

  21. MONOLITHIC VS WORKLOAD-WARE STATE MANAGEMENT - Experiments with six Nexmark * queries - Different stateful operators ( joins , window aggregations , custom aggregations ) - Simple workload-aware configuration of data types and available memory size * Nexmark Streaming Benchmark Suite: https://beam.apache.org/documentation/sdks/java/testing/nexmark/ 21

  22. MONOLITHIC VS WORKLOAD-AWARE STATE MANAGEMENT - State store used: FASTER custom join and Q4 rolling aggregate �� � - Input rate: 10K records/s ���������� �������������� �� �� - SIngle thread execution �� �� - Monolithic memory configuration: 8GB �� �� �� �� - Workload-aware memory configuration: 6GB �� �� (bids), 1.5GB (auctions), 512MB (average) �� � �� � �� � ������������ - Report end-to-end latency (ms) per record 22

  23. MONOLITHIC VS WORKLOAD-AWARE STATE MANAGEMENT - State store used: FASTER custom join and Q4 rolling aggregate �� � - Input rate: 10K records/s ���������� �������������� �� �� - SIngle thread execution �� �� - Monolithic memory configuration: 8GB �� �� �� �� - Workload-aware memory configuration: 6GB �� �� (bids), 1.5GB (auctions), 512MB (average) �� � �� � �� � ������������ - Report end-to-end latency (ms) per record 23

  24. MONOLITHIC VS WORKLOAD-AWARE STATE MANAGEMENT - State store used: FASTER custom join and Q4 rolling aggregate �� � - Input rate: 10K records/s ���������� �������������� �� �� - SIngle thread execution �� �� - Monolithic memory configuration: 8GB �� �� �� �� - Workload-aware memory configuration: 6GB �� �� (bids), 1.5GB (auctions), 512MB (average) �� � �� � �� � ������������ - Report end-to-end latency (ms) per record 24

Recommend


More recommend