Realtime Data Processing at Facebook Abhay Venkatesh
• Actionable reports Why • e.g. Chorus: what is trending right now? • Realtime monitoring Streaming at • e.g. dashboard queries Facebook? • Hybrid realtime-batch pipelines • e.g. pre-emptive queries over data warehouse
• s not ms, which means • can use persistent message bus called Scribe Workload • which makes it easier to enable Assumptions • Fault tolerance • Scalability • Multiple options for correctness
System Architecture
• Puma The Streaming • Swift Triad • Stylus
• For apps written in a SQL-like language • Quick to write (< 1 hour) • But run over long periods (months to years) Puma • Two purposes • Pre-computed query results for simple aggregation queries • Filtering and processing of Scribe streams
A Puma App
Very Basic API • Can read() from a Scribe Stream Swift • Checkpoints every • N Strings, or • B Bytes
• Low-Level Stream Processing in C++ Stylus Scribe Scribe Stylus Stream or Stream Processor(s) Data Store
Sample Application
• Language Paradigm Design • Data Transfer • Processing Semantics Decisions • State-saving mechanism • Reprocessing
• Language Paradigm • Data Transfer Design Decisions • Processing Semantics • State-saving mechanism • Reprocessing
Processing • At least once, at most once or exactly once • State semantics (inputs) Semantics • Output semantics
State-Saving Mechanisms
Reprocessing • Data warehousing with Hive • Stream processing in batch environment Data • Puma -> Hive • Stylus -> stateless, stateful, and monoid
• “Move Fast” Closing • Ease of debugging Thoughts • Ease of deployment • Ease of monitoring and operation
Naiad Facebook Realtime Systems • Milliseconds, not seconds • Seconds, not milliseconds • Robust solutions to • Does not handle micro- Comparison micro-stragglers stragglers • Expense availability in event • Persistent message bus with Naiad of failure ensures no loss • Naiad consumes inputs from • Flexible, and easy to use, message queue, and writes deploy, debug to key-value store
Recommend
More recommend