realtime data processing at facebook
play

Realtime Data Processing at Facebook Abhay Venkatesh Actionable - PowerPoint PPT Presentation

Realtime Data Processing at Facebook Abhay Venkatesh Actionable reports Why e.g. Chorus: what is trending right now? Realtime monitoring Streaming at e.g. dashboard queries Facebook? Hybrid realtime-batch pipelines e.g.


  1. Realtime Data Processing at Facebook Abhay Venkatesh

  2. • Actionable reports Why • e.g. Chorus: what is trending right now? • Realtime monitoring Streaming at • e.g. dashboard queries Facebook? • Hybrid realtime-batch pipelines • e.g. pre-emptive queries over data warehouse

  3. • s not ms, which means • can use persistent message bus called Scribe Workload • which makes it easier to enable Assumptions • Fault tolerance • Scalability • Multiple options for correctness

  4. System Architecture

  5. • Puma The Streaming • Swift Triad • Stylus

  6. • For apps written in a SQL-like language • Quick to write (< 1 hour) • But run over long periods (months to years) Puma • Two purposes • Pre-computed query results for simple aggregation queries • Filtering and processing of Scribe streams

  7. A Puma App

  8. Very Basic API • Can read() from a Scribe Stream Swift • Checkpoints every • N Strings, or • B Bytes

  9. • Low-Level Stream Processing in C++ Stylus Scribe Scribe Stylus Stream or Stream Processor(s) Data Store

  10. Sample Application

  11. • Language Paradigm Design • Data Transfer • Processing Semantics Decisions • State-saving mechanism • Reprocessing

  12. • Language Paradigm • Data Transfer Design Decisions • Processing Semantics • State-saving mechanism • Reprocessing

  13. Processing • At least once, at most once or exactly once • State semantics (inputs) Semantics • Output semantics

  14. State-Saving Mechanisms

  15. Reprocessing • Data warehousing with Hive • Stream processing in batch environment Data • Puma -> Hive • Stylus -> stateless, stateful, and monoid

  16. • “Move Fast” Closing • Ease of debugging Thoughts • Ease of deployment • Ease of monitoring and operation

  17. Naiad Facebook Realtime Systems • Milliseconds, not seconds • Seconds, not milliseconds • Robust solutions to • Does not handle micro- Comparison micro-stragglers stragglers • Expense availability in event • Persistent message bus with Naiad of failure ensures no loss • Naiad consumes inputs from • Flexible, and easy to use, message queue, and writes deploy, debug to key-value store

Recommend


More recommend