dsp frameworks
play

DSP Frameworks Corso di Sistemi e Architetture per Big Data A.A. - PowerPoint PPT Presentation

Macroarea di Ingegneria Dipartimento di Ingegneria Civile e Ingegneria Informatica DSP Frameworks Corso di Sistemi e Architetture per Big Data A.A. 2018/19 Valeria Cardellini Laurea Magistrale in Ingegneria Informatica DSP frameworks we


  1. Heron API: shift to functional style • Processing graphs consist of streamlets – One or more supplier streamlets inject data into the graph to be processed by downstream operators • Operations (similar to Spark) Valeria Cardellini - SABD 2018/19 23

  2. Heron API: shift to functional style • Operations (continued) Valeria Cardellini - SABD 2018/19 24

  3. Heron: topology lifecycle • Topology lifecycle managed through Heron’s CLI tool • Stages – Submit the topology to the cluster – Activate the topology – Restart an active topology if, e.g., after updating the topology configuration – Deactivate the topology – Kill a topology to completely remove it from the cluster Valeria Cardellini - SABD 2018/19 25

  4. Heron topology: logical and physical plans • Topology’s logical plan : analogous to a database query plan in that it maps out the basic operations associated with a topology • Topology’s physical plan : determines the “physical” execution logic of a topology, i.e. how topology processes are divided between Heron containers • Logical and physical plans are automatically created by Heron Valeria Cardellini - SABD 2018/19 26

  5. Heron architecture per topology • Master-work architecture • One Topology Master (TM) – Manages a topology throughout its entire lifecycle • Multiple Containers – Each Container multiple Heron Instances, a Stream Manager, and a Metrics Manager – A Heron Instance is a process that handles a single task of a spout or bolt – Containers communicate with TM to ensure that the topology forms a fully connected graph Valeria Cardellini - SABD 2018/19 27

  6. Heron architecture per topology Valeria Cardellini - SABD 2018/19 28

  7. Heron architecture per topology • Stream Manager (SM): routing engine for data streams – Each Heron container connects to its local SM, while all of the SMs in a given topology connect to one another to form a network – Responsible for propagating backpressure Valeria Cardellini - SABD 2018/19 29

  8. Heron: topology submit sequence Valeria Cardellini - SABD 2018/19 30

  9. Heron: self-adaptation • Dhalion: framework on top of Heron to autonomously reconfigure topologies to meet throughput SLOs, scaling resource consumption up and down as needed • Phases in Dhalion: - Symptom detection (backpressure, skew, … ) - Diagnosis generation - Resolution • Adaptation actions: parallelism changes Valeria Cardellini - SABD 2018/19 31

  10. Heron environment • Heron supports deployment on Apache Mesos • Can also run on Mesos using Apache Aurora as a scheduler or using a local scheduler Valeria Cardellini - SABD 2018/19 32

  11. Batch processing vs. stream processing • Batch processing is just a special case of stream processing Valeria Cardellini - SABD 2018/19 33

  12. Batch processing vs. stream processing • Batched/stateless: scheduled in batches – Short-lived tasks (Hadoop, Spark) – Distributed streaming over batches (Spark Streaming) • Dataflow/stateful: continuous/scheduled once (Storm, Flink, Heron) – Long-lived task execution – State is kept inside tasks Valeria Cardellini - SABD 2018/19 34

  13. Native vs. non-native streaming Valeria Cardellini - SABD 2018/19 35

  14. Apache Flink • Distributed data flow processing system • One common runtime for DSP applications and batch processing applications – Batch processing applications run efficiently as special cases of DSP applications • Integrated with many other projects in the open-source data processing ecosystem • Derives from Stratosphere project by TU Berlin, Humboldt University and Hasso Plattner Institute • Support a Storm-compatible API Valeria Cardellini - SABD 2018/19 36

  15. Flink: software stack • Flink is a layered system • On top: libraries with high-level APIs for different use cases https://ci.apache.org/projects/flink/flink-docs-release-1.8/ Valeria Cardellini - SABD 2018/19 37

  16. Flink: programming model • Data streams – Unbounded, partitioned immutable sequence of events • Stream operators – Stream transformations that take one or more streams as input, and produce one or more output streams as a result Valeria Cardellini - SABD 2018/19 38

  17. DSP and time • Different notions of time in a DSP application: – Processing time: time at which events are observed in the system (local time of the machine executing the operator) – Event time: time at which events actually occured • Usually described by a timestamp in the events – Ingestion time: when an event enters the dataflow at the source operator(s) See https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 Valeria Cardellini - SABD 2018/19 39

  18. Flink: time • Flink supports all the 3 notions of time – Internally, ingestion time is treated similarly to event time • Event time makes it easy to compute over streams where events arrive out-of-order , and where events may arrive delayed • How to measure the progress of event time? – Flink uses watermarks Valeria Cardellini - SABD 2018/19 40

  19. Flink: backpressure • Continuous streaming model with backpressure – Flink’s streaming runtime provides flow control: slow data sinks backpressure faster sources – Flink’s UI allows to monitor backpressure behavior of running jobs • Back pressure warning (e.g. High ) for an upstream operator Valeria Cardellini - SABD 2018/19 41

  20. Flink: other features • Highly flexible streaming windows – Also user-defined windows • Exactly-once semantics for stateful computations – Based on two-phase commit Valeria Cardellini - SABD 2018/19 42

  21. Flink: levels of abstraction • Different levels of abstraction to develop streaming/batch applications • APIs in Java and Scala Valeria Cardellini - SABD 2018/19 43

  22. Flink: APIs and libraries • Streaming data applications: DataStream API – Supports functional transformations on data streams, with user-defined state and flexible windows – Example: how to compute a sliding histogram of word occurrences of a data stream of texts WindowWordCount in Flink's DataStream API Sliding time window of 5 sec length and 1 sec trigger interval Valeria Cardellini - SABD 2018/19 44

  23. Flink: APIs and libraries • Batch processing applications: DataSet API – Supports a wide range of data types beyond key/value pairs and a wealth of operators Core loop of the PageRank algorithm for graphs Valeria Cardellini - SABD 2018/19 45

  24. Anatomy of a Flink program • Let’s analyze the DataStream API https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html • Each Flink program consists of the same basic parts: 1. Obtain an execution environment 2. Load/create the initial data Valeria Cardellini - SABD 2018/19 46

  25. Anatomy of a Flink program 3. Specify transformations on data 4. Specify where to put the results of your computations 5. Trigger the program execution Valeria Cardellini - SABD 2018/19 47

  26. Flink: lazy evaluation • All Flink programs are executed lazily – When the program’s main method is executed, the data loading and transformations do not happen directly – Rather, each operation is created and added to the program’s plan – Operations are actually executed when the execution is explicitly triggered by execute() call on the execution environment Valeria Cardellini - SABD 2018/19 48

  27. Flink: data sources • Several predefined stream sources accessible from the StreamExecutionEnvironment 1. File-based: – E.g., readTextFile(path) to read text files – Flink splits file reading process into two sub-tasks: directory monitoring and data reading • Monitoring is implemented by a single, non-parallel task, while reading is performed by multiple tasks running in parallel, whose parallelism is equal to the job parallelism 2. Socket-based 3. Collection-based 4. Custom – E.g., to read from Kafka addSource(new FlinkKafkaConsumer08<>(...)) – See Apache Bahir for streaming connectors and SQL data sources https://bahir.apache.org/ Valeria Cardellini - SABD 2018/19 49

  28. Flink: DataStream transformations • Map DataStream → DataStream – Example: double the values of the input stream • FlatMap DataStream → DataStream – Example: split sentences to words Valeria Cardellini - SABD 2018/19 50

  29. Flink: DataStream transformations • Filter DataStream → DataStream – Example: filter out zero values • KeyBy DataStream → KeyedStream – To specify a key, that logically partitions a stream into disjoint partitions – Internally, implemented with hash partitioning – Different ways to specify keys, the simplest case is grouping tuples on one or more fields of the tuple: – Examples: Valeria Cardellini - SABD 2018/19 51

  30. Flink: DataStream transformations • Reduce KeyedStream → DataStream – “Rolling” reduce on a keyed data stream – Combines the current element with the last reduced value and emits the new value – Example: create a stream of partial sums Valeria Cardellini - SABD 2018/19 52

  31. Flink: DataStream transformations • Fold KeyedStream → DataStream – “Rolling” fold on a keyed data stream with an initial value – Combines the current element with the last folded value and emits the new value – Example: to emit the sequence "start-1", "start-1-2", "start-1-2-3", ... when applied on the sequence (1,2,3,4,5) Valeria Cardellini - SABD 2018/19 53

  32. Flink: DataStream transformations • Aggregations KeyedStream → DataStream – To aggregate on a keyed data stream – min returns the minimum value, whereas minBy returns the element that has the minimum value in this field • Window KeyedStream → WindowedStream Valeria Cardellini - SABD 2018/19 54

  33. Flink: DataStream transformations • Other transformations available in Flink – Join: joins two data streams on a given key – Union: union of two or more data streams creating a new stream containing all the elements from all the streams – Split : splits the stream into two or more streams according to some criterion – Iterate : creates a “feedback” loop in the flow, by redirecting the output of one operator to some previous operator • Useful for algorithms that continuously update a model See https://ci.apache.org/projects/flink/flink-docs-release- 1.8/dev/stream/operators/ Valeria Cardellini - SABD 2018/19 55

  34. Example: streaming window WordCount • Count the words from a web socket in 5 sec windows // Key by the first element of a Tuple Valeria Cardellini - SABD 2018/19 56

  35. Example: streaming window WordCount Valeria Cardellini - SABD 2018/19 57

  36. Flink: windows support • Windows can be applied either to keyed streams or to non-keyed ones • General structure of a windowed Flink program Valeria Cardellini - SABD 2018/19 58

  37. Flink: window lifecycle • First, specify if stream is keyed or not and define the window assigner – Keyed stream allows to perform the windowed computation in parallel by multiple tasks – The window will be completely removed when the time (event or processing time) passes its end timestamp plus the user-specified allowed lateness • Then associate to the window the trigger and function – Trigger determines when a window is ready to be processed by the window function – Function specifies the computation to be applied to the window contents Valeria Cardellini - SABD 2018/19 59

  38. Flink: window assigners • How elements are assigned to windows • Support for different window assigners – Each WindowAssigner comes with a default Trigger • Built-in assigners for most common use cases: – Tumbling windows – Sliding windows – Session windows – Global windows • Except global windows, they assign elements to windows based on time, which can either be processing time or event time • Also possible to implement a custom window assigner Valeria Cardellini - SABD 2018/19 60

  39. Flink: window assigners • Session windows – To group elements by sessions of activity – Differently from tumbling and sliding windows, do not overlap and do not have a fixed start and end time – A session window closes when a gap of inactivity occurs • Global windows – To assign all elements with the same key to the same single global window – Only useful if you also specify a custom trigger Valeria Cardellini - SABD 2018/19 61

  40. Flink: window functions • Different window functions to specify the computation on each window • ReduceFunction – To incrementally aggregate the elements of a window – Example: sum up the second fields of the tuples for all elements in a window Valeria Cardellini - SABD 2018/19 62

  41. Flink: window functions • AggregateFunction : generalized version of a ReduceFunction – Example: compute the average of the second field of the elements in the window Valeria Cardellini - SABD 2018/19 63

  42. Flink: window functions • FoldFunction : to specify how an input element of the window is combined with an element of the output type • ProcessWindowFunction : gets an Iterable containing all the elements of the window, and a Context object with access to time and state information – More flexibility than other window functions, at the cost of performance and resource consumption: elements are buffered until the window is ready for processing • ReduceFunction and AggregateFunction can be executed more efficiently – Flink can incrementally aggregate the elements for each window as they arrive Valeria Cardellini - SABD 2018/19 64

  43. Flink: control events • Control events: special events injected in the data stream by operators • Two types of control events in Flink ⎼ Watermarks ⎼ Checkpoint barriers Valeria Cardellini - SABD 2018/19 65

  44. Flink: watermarks • Watermarks signal the progress of event time within a data stream – Watermark(t) declares that event time has reached time t in that stream, meaning that there should be no more elements with timestamp t’ <= t – Crucial for out-of-order streams, where events are not ordered by their timestamps • Flink does not provide ordering guarantees after any form of stream partitioning or broadcasting – In such case, dealing with out-of-order tuples is left to the operator implementation Valeria Cardellini - SABD 2018/19 66

  45. Flink: checkpoint barriers • To provide fault tolerance (see next slides), special barrier markers (called checkpoint barriers) are periodically injected at streams sources and then pushed downstream up to sinks Valeria Cardellini - SABD 2018/19 67

  46. Fault tolerance • To provide consistent results, DSP systems need to be resilient to failures • How? By periodically capturing a snapshot of the execution graph which can be used later to restart in case of failures (checkpointing) Snapshot : global state of the execution graph, capturing all necessary information to restart computation from that specific execution state • Common approach is to rely on periodic global state snapshots, but has drawbacks: – Stall overall computation – Eagerly persist all tuples in transit along with states which results in larger snapshots than required Valeria Cardellini - SABD 2018/19 68

  47. Flink: fault tolerance • Flink offers a lightweight snapshotting mechanism – Allows to maintain high throughput and provide strong consistency guarantees at the same time • Such mechanism: – Draws consistent snapshots of stream flows and operators’ state, – Even in presence of failures, the application state will reflect every record from the data stream exactly once – State stored at configurable place – Disabled by default • Inspired by Chandy-Lamport algorithm for distributed snapshot and tailored to Flink’s execution model Valeria Cardellini - SABD 2018/19 69

  48. Chandy-Lamport algorithm • The observer process (process initiating the snapshot): – Saves its own local state – Sends a snapshot requestmessage bearing a snapshot token to all other processes • If a process receives the token for the first time : – Sends the observer process its own saved state – Attaches the snapshot token to all subsequent messages (to help propagate the snapshot token) • When a process that has already received the token receives a message not bearing the token, it will forward that message to the observer process – This message was sent before the snapshot “cut off” (as it does not bear a snapshot token) and needs to be included in the snapshot • The observer builds up a complete snapshot: a saved state for each process and all messages “in the ether” are saved Valeria Cardellini - SABD 2018/19 70

  49. Flink: fault tolerance • Uses checkpoint barriers – When an operator has received a barrier for snapshot n from all of its input streams, it emits a barrier for snapshot n into all of its outgoing streams. Once a sink operator has received barrier n from all of its input streams, it acknowledgesthat snapshot n to the checkpoint coordinator. After all sinks have acknowledged a snapshot, it is considered completed https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html Valeria Cardellini - SABD 2018/19 71

  50. Flink: performance and memory management • High throughput and low latency • Memory management – Flink implements its own memory management inside the JVM Valeria Cardellini - SABD 2018/19 72

  51. Flink: architecture • The usual master-worker architecture Valeria Cardellini - SABD 2018/19 73

  52. Flink: architecture • Master (JobManager): schedules tasks, coordinates checkpoints, coordinates recovery on failures, etc. • Workers (TaskManagers): JVM processes that execute tasks of a dataflow, and buffer and exchange the data streams – Workers use task slots to control the number of tasks they accept (at least one) – Each task slot represents a fixed subset of resources of the worker Valeria Cardellini - SABD 2018/19 74

  53. Flink: application execution • The JobManager receives the JobGraph – Representation of the data flow consisting of operators (JobVertex) and intermediate results (IntermediateDataSet) – Each operator has properties, like parallelism and code that it executes • The JobManager transforms the JobGraph into an ExecutionGraph – Parallel version of JobGraph Valeria Cardellini - SABD 2018/19 75

  54. Flink: application execution • Data parallelism – Different operators of the same program may have different levels of parallelism – The parallelism of an individual operator, data source, or data sink can be defined by calling its setParallelism() method Valeria Cardellini - SABD 2018/19 76

  55. Flink: application execution • Execution plan can be visualized Valeria Cardellini - SABD 2018/19 77

  56. Flink: application monitoring • Flink has a built-in monitoring and metrics system • Built-in metrics include – Throughput: in terms of number of records per sec. (per operator/task) – Latency • Support for latency tracking: special markers are periodically inserted at all sources in order to obtain a distribution of latency between sources and each downstream operator – But do not account for time spent in operator processing – Assume that all machines clocks are sync – Used JVM heap/non-heap/direct memory • Application-specific metrics can be added – E.g., counters for the number of invalid records • All metrics can be either queried via Flink’s REST API or send to external systems (e.g., Graphite and InfluxDB) See https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html Valeria Cardellini - SABD 2018/19 78

  57. Flink: deployment • Designed to run on large-scale clusters with many thousands of nodes • Can be run in a fully distributed fashion on a static (but possibly heterogeneous) standalone cluster • For a dynamically shared cluster, can be deployed on YARN or Mesos • Docker images for Apache Flink available on Docker Hub Valeria Cardellini - SABD 2018/19 79

  58. A recent need • A common need for many companies – Run both batch and stream processing • Alternative solutions 1. Lambda architecture 2. Unified frameworks 3. Unified programming model Valeria Cardellini - SABD 2018/19 80

  59. Lambda architecture • Data-processing design pattern to integrate batch and real-time processing • Streaming framework used to process real-time events, and, in parallel, batch framework to process the entire dataset • Results from the two parallel pipelines are then merged Source: https://voltdb.com/products/alternatives/lambda-architecture 81 Valeria Cardellini - SABD 2018/19

  60. Lambda architecture: example • Lambda architecture used at LinkedIn before Samza development Valeria Cardellini - SABD 2018/19 82

  61. Lambda architecture: pros and cons • Pros: – Flexibility in the frameworks’ choice • Cons: – Implementing and maintaining two separate frameworks for batch and stream processing can be hard and error-prone – Overhead of developing and managing multiple source codes • The logic in each fork evolves over time, and keeping them in sync involves duplicated and complex manual effort, often with different languages Valeria Cardellini - SABD 2018/19 83

  62. Unified frameworks • Use a unified (Lambda-less) design for processing both real-time as well as batch data using the same data structure • Spark, Flink, Samza and Apex follow this trend Valeria Cardellini - SABD 2018/19 84

  63. Unified programming model: Apache Beam • A new layer of abstraction • Provides advanced unified programming model – Allows to define batch and streaming data processing pipelines that run on any execution engine (for now: Apex, Flink, Spark, Google Cloud Dataflow) – Java, Python and Go as programming languages • Translates the data processing pipeline defined by the user with the Beam program into the API compatible with the chosen distributed processing engine • Developed by Google and released as open- source top-level project Valeria Cardellini - SABD 2018/19 85

  64. Apache Samza • A distributed framework for stateful and fault- tolerant stream processing – Unified framework for batch and stream processing • Similarly to Flink, streams as first-class citizen, batch as special case of streaming – Used in production at LinkedIn Valeria Cardellini - SABD 2018/19 86

  65. Apache Samza • Why stateful and fault-tolerant processing? User profiles, email digests, aggregate counts, … • Example: Email Digestion System at LinkedIn – Production application running to digest updates into one email Valeria Cardellini - SABD 2018/19 87

  66. Samza: features • Unified processing API for stream and batch – Supports both stateless and stateful stream processing – Supports both processing time and event time • Configurable and heterogeneous data sources and sinks (e.g., Kafka, HDFS, AWS Kinesis) • At-least once processing • Efficient state management – Local state (in-memory or on disk) partitioned among tasks (rather than remote data store) – Incremental checkpointing: only the delta rather than the entire state • Flexible deployment – As light-weight embedded library that can be integrated with a larger application – Alternately, as managed framework using YARN Valeria Cardellini - SABD 2018/19 88

  67. Samza: architecture • Task: logical unit of parallelism • Container: physical unit of parallelism • Usual architecture – The coordinator manages the assignment of tasks across containers, monitors the liveness of containers and redistributes the tasks during a failure – One coordinator per application – Host-affinity : during a new deployment Samza tries to preserve the assignment of tasks to hosts to re-use the snapshot of its local state Valeria Cardellini - SABD 2018/19 89

  68. DSP state management • How to manage state information, i.e., “intermediate information” that needs to be maintained between tuples for processing streams of data correctly? • Common approach (e.g., in Storm) to deal with large amounts of state: use remote data store (e.g., Redis) Valeria Cardellini - SABD 2018/19 90

  69. Samza: state management • Samza approach: keep state local to each node and make it robust to failures by replicating state changes across multiple machines Local state External store Valeria Cardellini - SABD 2018/19 91

  70. Samza: High Level Streams API • Samza offers multiple APIs – High Level Streams API, Low Level Task API, Samza SQL – High Level Streams API : includes common stream processing operations such as filter, partition, join, and windowing – Example: Wikipedia stream application using Samza that consumes events from Wikipedia and produce stats to a Kafka topic https://samza.apache.org/learn/tutorials/latest/hello-samza-high-level-code.html Valeria Cardellini - SABD 2018/19 92

  71. Towards strict delivery guarantees • Most frameworks provide at-least-once delivery guarantees (e.g., Storm, Samza) – For stateful non-idempotent operators such as counting, at- least-once delivery guarantees can give incorrect results • Flink, Storm with Trident, and Google’s MillWheel offer stronger delivery guarantees (i.e., exactly-once) – Exactly-once low latency stream processing in MillWheel works as follows: • The record is checked against de-duplication data from previous deliveries; duplicates are discarded • User code is run for the input record, possibly resulting in pending changes to timers, state, and productions • Pending changes are committed to the backing store • Senders are acked • Pending downstream productions are sent Valeria Cardellini - SABD 2018/19 93

  72. Comparing DSP frameworks • Let’s compare open source DSP frameworks according to some features API Windows Delivery Fault tol. State Flow Operator semantics mgmt. ctl. elasticity Storm Low-level Yes At least once Acking Limited Back No High-level Exactly once Checkpoint. Yes with pressure SQL with Trident (similar to Trident No batch Fink) Low-level Yes At least once Limited Back Yes with Heron High-level Effectively pressure Dhalion No SQL once No batch Flink High-level Yes, also At least once Checkpoint. Yes Back No SQL used-def. Exactly once pressure Also batch Samza Low-level Yes At least once Incremental Yes No No High-level checkpoint. SQL Unified Valeria Cardellini - SABD 2018/19 94

  73. DSP in the Cloud • Data streaming systems also as Cloud services – Amazon Kinesis Data Streams – Google Cloud Dataflow – IBM Streaming Analytics – Microsoft Azure Stream Analytics • Abstract the underlying infrastructure and support dynamic scaling of computing resources • Appear to execute in a single data center (i.e., no geo-distribution) Valeria Cardellini - SABD 2018/19 95

  74. Google Cloud Dataflow • Fully-managed data processing service, supporting both stream and batch data processing – Automated resource management – Dynamic work rebalancing – Horizontal auto-scaling • Provides a unified programming model based on Apache Beam – Apache Beam SDKs in Java and Python – Enable developers to implement custom extensions and choose other execution engines • Provides exactly-once processing – MillWheel is Google’s internal version of Cloud Dataflow Valeria Cardellini - SABD 2018/19 96

  75. Google Cloud Dataflow • Can be seamlessly integrated with GCP services for streaming events ingestion (Cloud Pub/Sub), data warehousing (BigQuery), machine learning (Cloud Machine Learning) Valeria Cardellini - SABD 2018/19 97

  76. Amazon Kinesis Data Streams • Allows to collect and ingest streaming data at scale for real-time analytics Valeria Cardellini - SABD 2018/19 98

  77. Kinesis Data Analytics • Allows to process data streams in real time with SQL or Java – Java open source libraries based on Apache Flink • Usual operators to filter aggregate and transform streaming data – Per-hour pricing based on the number of Kinesis Processing Units used to run the application • Horizontal scaling of KPUs Valeria Cardellini - SABD 2018/19 99

Recommend


More recommend