streaming in practice
play

Streaming In Practice KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron - PowerPoint PPT Presentation

Streaming In Practice KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron TALK OUTLINE BEGIN b I II ( III HERON HERON HERON PERFORMANCE BACKPRESSURE OVERVIEW K V Z IV END CONCLUSION HERON LOAD SHEDDING b HERON OVERVIEW


  1. Streaming In Practice KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron

  2. TALK OUTLINE BEGIN b I II ( � III HERON HERON HERON PERFORMANCE BACKPRESSURE OVERVIEW K V Z IV END CONCLUSION HERON LOAD SHEDDING

  3. b HERON OVERVIEW

  4. STORM/HERON TERMINOLOGY TOPOLOGY , Directed acyclic graph Vertices=computation, and edges=streams of data tuples SPOUTS Sources of data tuples for the topology Examples - Kafka/Kestrel/MySQL/Postgres BOLTS % Process incoming tuples and emit outgoing tuples Examples - filtering/aggregation/join/arbitrary function

  5. STORM/HERON TOPOLOGY BOLT 1 % SPOUT 1 BOLT 4 % BOLT 2 % % % SPOUT 2 BOLT 5 BOLT 3

  6. WHY HERON? � PERFORMANCE PREDICTABILITY � IMPROVE DEVELOPER PRODUCTIVITY d EASE OF MANAGEABILITY

  7. HERON DESIGN DECISIONS FULLY API COMPATIBLE WITH STORM � Directed acyclic graph Topologies, spouts and bolts TASK ISOLATION � Ease of debug ability/resource isolation/profiling USE OF MAIN STREAM LANGUAGES d C++/JAVA/Python

  8. HERON ARCHITECTURE Topology 1 Scheduler Topology 2 TOPOLOGY SUBMISSION Topology 3 Topology N

  9. TOPOLOGY ARCHITECTURE Logical Plan, Physical Plan and Topology Execution State Master ZK CLUSTER Sync Physical Plan Stream Metrics Stream Metrics Manager Manager Manager Manager I1 I2 I3 I4 I1 I2 I3 I4 CONTAINER CONTAINER

  10. HERON SAMPLE TOPOLOGIES

  11. HERON @TWITTER Heron has been in production for 2 years Large amount of data Large cluster Several hundred Several billion produced every day topologies deployed messages every day 1 stage 10 stages 3x reduction in cores and memory

  12. HERON USE CASES REAL TIME SPAM REAL TIME REALTIME BI DETECTION TRENDS ETL REAL TIME REAL TIME REALTIME MEDIA OPS ML

  13. HERON ENVIRONMENT Laptop/Server Cluster/Aurora Cluster/Mesos

  14. x HERON RESOURCE USAGE 9

  15. HERON PERFORMANCE Settings COMPONENTS EXPT #1 EXPT #2 EXPT #3 EXPT #4 Spout 25 100 200 300 Bolt 25 100 200 300 # Heron containers 25 100 200 300 # Storm workers 25 100 200 300

  16. HERON PERFORMANCE Word count topology - Acknowledgements enabled Throughput Latency Storm Heron Storm Heron 1400 2500 1050 1875 million tuples/min latency (ms) 700 1250 350 625 0 0 25 100 200 500 25 100 200 500 Spout Parallelism Spout Parallelism 10-14x 5-15x

  17. HERON RESOURCE USAGE Event Spout Aggregate Bolt Redis 60-100M/min Filter Output 8-12M/min 25-42M/min Flat-Map Aggregate 40-60M/min Cache 1 sec

  18. RESOURCE CONSUMPTION Memory Cores Cores Memory Requested Requested Used Used (GB) Redis 24 2-4 48 N/A Heron 120 30-50 200 180

  19. RESOURCE CONSUMPTION Spout Instances Bolt Instances Heron Overhead 7% 9% 84%

  20. PROFILING SPOUTS Deserialize Parse/Filter Mapping Kafka Iterator Kafka Fetch Rest 2% 7% 16% 6% 63% 6%

  21. PROFILING BOL TS Write Data Serialize Deserialize Aggregation Data Transport Rest 2% 4% 5% 2% 19% 68%

  22. RESOURCE CONSUMPTION - BREAKDOWN Fetching Data User Logic Heron Usage Writing Data 8% 11% 21% 61%

  23. PRESSURE x HERON BACK 9

  24. BACK PRESSURE AND STRAGGLERS Stragglers are the norm in a multi-tenant distributed systems Bad machine, inadequate provisioning and hot keys / Ñ \ b PROVIDES PROCESSES REDUCE HANDLES PREDICTABILITY DATA AT RECOVERY TEMPORARY MAXIMUM TIMES SPIKES RATE

  25. BACK PRESSURE AND STRAGGLERS MOST SCENARIOS BACK PRESSURE RECOVERS � Without any manual intervention SUSTAINED BACK PRESSURE � Irrecoverable GC cycles Bad or faulty host SOMETIMES USER PREFER DROPPING OF DATA d Care about only latest data

  26. LOAD SHEDDING SAMPLING BASED APPROACHES � Down sample the incoming stream and scale up the results Easy to reason if the sampling is uniform Hard to achieve uniformity across distributed spouts DROP BASED APPROACHES � Simply drop older data Spouts takes a lag threshold and a lag adjustment value Works well in practice

  27. CURIOUS TO LEARN MORE… Streaming@Twitter Maosong Fu, Sailesh Mittal, Vikas Kedigehalli, Karthik Ramasamy, Michael Barry, Andrew Jorgensen, Christopher Kellogg, Neng Lu, Bill Graham, Jingwei Wu Twitter, Inc. Twitter Heron: Stream Processing at Scale Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel *,1 , Karthik Ramasamy, Siddarth Taneja @sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg, @saileshmittal, @pateljm, @karthikz, @staneja Twitter, Inc., *University of Wisconsin – Madison Storm @Twitter Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy @ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk, @jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog Twitter, Inc., *University of Wisconsin – Madison

  28. #ThankYou FOR LISTENING �

  29. R QUESTIONS and � Go ahead. Ask away. ANSWERS

  30. SHEDDING x HERON LOAD 9

Recommend


More recommend