hdes a dynamic stream processing engine
play

HDES: A Dynamic Stream Processing Engine Nico Duldhardt, Torben - PowerPoint PPT Presentation

HDES: A Dynamic Stream Processing Engine Nico Duldhardt, Torben Meyer, Marvin Thiele, Anton von Weltzien 14.04.2020 Masterprojekt WS 19/20 Data Engineering Systems 1 Agenda 1. Goals 2. Features 3. Architecture Overview 4. Query


  1. HDES: A Dynamic Stream Processing Engine Nico Duldhardt, Torben Meyer, Marvin Thiele, Anton von Weltzien 14.04.2020 Masterprojekt WS 19/20 Data Engineering Systems 1

  2. Agenda 1. Goals 2. Features 3. Architecture Overview 4. Query Transformation 5. Query Execution 6. Ad-hoc join processing 7. Benchmark Results Chart 2

  3. Goals 1. Build a standalone prototype of a stream processing engine that has first class support for dynamic query deployment and removal 2. Support processing simple queries and streams 3. Support online optimizations for efficient multi-query processing Chart 3

  4. Features - Stream Processing Framework written in Java 11 - Ad-hoc addition and removal of arbitrary queries - Single node, multi-threaded execution - Optimization for Joins and Aggregations in multi-query execution - Queries are defined in a Flink-like dataflow language - Support for Sliding- and Tumbling-Windows both with Event- and Processing-Time Chart 4

  5. Dataflow API Overview Chart 5

  6. Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 6

  7. Code Example JobManager jobManager = new JobManager(); Create JobManager and start jobManager.runEngine(); engine NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 7

  8. Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); Define Sources NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 8

  9. Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); Create new query with TopologyBuilder TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 9

  10. Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); Define query s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 10

  11. Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); Build and submit query jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 11

  12. 3. Architecture Overview 12

  13. Chart 13

  14. Chart 14

  15. 4. Query Transformation 15

  16. Transformation Pipeline Chart 16

  17. Operators Source: - Read from a source - Attaches metadata OneInputOperator: - Transform a single event into n new events TwoInputOperator: - Transform events from two different origins Sink: - Write to a sink Chart 17

  18. Source Operator Chart 18

  19. Logical Plan Source OneInput Sink Node Node Node BinaryInput Node Source OneInput Sink Node Node Node Chart 19

  20. Execution Plan Source OneInput Slot PushSlot TwoInput PullSlot Source OneInput Slot PushSlot Chart 20

  21. Transformation Properties Layered architecture decouples query definition and execution. - Interchangeable query definition - Interchangeable Execution Plan Chart 21

  22. 5. Query Execution 22

  23. Routing Chart 23

  24. Slot Operator Operator Collector Operator Event Events Operator Chart 24

  25. Slot Types Pull Slot Push Slot Thread reads Slot Buffer Slot Event Event Events Events Chart 25

  26. Execution Plan Source OneInput Slot PushSlot TwoInput PullSlot Source OneInput Slot PushSlot Chart 26

  27. 6. Ad-hoc join processing 27

  28. Efficient Distributed Join Architecture Source Operator Join operator 1 Join operator Sink Operator N ... Chart 28

  29. Efficient Distributed Join Architecture Source Operator Join operator 1 Join operator Sink Operator N ... Indexing ● Windowing ● Chart 29

  30. Efficient Distributed Join Architecture Source Operator Join operator 1 Join operator Sink Operator N ... Indexing ● Set intersection ● Windowing ● of join index Chart 30

  31. Efficient Distributed Join Architecture Source Operator Join operator 1 Join operator Sink Operator N ... Indexing ● Set intersection ● Joins matching tuples ● Windowing ● of join index Pushes to output ● channels Chart 31

  32. AJoin in HDES Upstream Operator AJoin Downstream Operator Source Join Sink Source Upstream Operator Chart 32

  33. HDES AJoin Example Orders <OrderID, ItemID, …> AJoin Shipped Orders <OrderID, ShipmentID, ItemID …> Source Join Sink Source <ShipmentID, OrderID, …> Shipments Chart 33

  34. HDES AJoin Example AJoin Orders Source Shipped Orders <OrderID, ItemID, … > Join Sink <OrderID, ShipmentID, ItemID … > Shipments Source <ShipmentID, OrderID, … > <1, 5,...> <1, 8,...> <4, 7,...> <5, 2,...> <5, 7,...> <5, 7,...> <6, 8,...> <9, 1,...> <6, 4,...> <3, 5,...> Chart 34

  35. HDES AJoin Example AJoin Orders Source Shipped Orders <OrderID, ItemID, … > Join Sink <OrderID, ShipmentID, ItemID … > Shipments Source <ShipmentID, OrderID, … > <5, 7,...> <6, 8,...> <9, 1,...> <6, 4,...> <3, 5,...> <1, 5,...> <1, 8,...> <4, 7,...> <5, 2,...> <5, 7,...> 1 ← [<1, 5,...>, <1,8,...>] 7 ← [<5, 7,...>] 8 ← [<6, 8,...>] 4 ← [<4,7,...>] 1 ← [<9, 1,...>] 4 ← [<6, 4,...>] 5 ← [<5,2,...>, <5,7,...>] 5 ← [<3, 5,...>] Chart 35 Orders Bucket Shipment Bucket

  36. HDES AJoin Example AJoin Orders Source Shipped Orders <OrderID, ItemID, … > Join Sink <OrderID, ShipmentID, ItemID … > Shipments Source <ShipmentID, OrderID, … > 1 ← [<1, 5,...>, <1,8,...>]|4 ← [<4,7,...>]|5 ← [<5,2,...>, <5,7,...>] 7 ← [<5, 7,...>]|8 ← [<6, 8,...>]|1 ← [<9, 1,...>]|4 ← [<6, 4,...>]|5 ← [<3, 5,...>] [<1, 5,...>, <1,8,...>] [<5,2,...>, <5,7,...>] 1 ← 5 ← [<9, 1,...>] [<3, 5,...>] [<4,7,...>] 4 ← Chart 36 [<6, 4,...>]

Recommend


More recommend