tutorial stream processing languages
play

Tutorial: Stream Processing Languages Martin Hirzel, IBM Research - PowerPoint PPT Presentation

Tutorial: Stream Processing Languages Martin Hirzel, IBM Research AI 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems DSLs DSELs (aka EDSLs) Modular Domain Specific Martin Hirzel, IBM Research AI 2 Languages and Tools


  1. Tutorial: Stream Processing Languages Martin Hirzel, IBM Research AI 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems

  2. DSLs ⊇ DSELs (aka EDSLs) Modular Domain Specific Martin Hirzel, IBM Research AI 2 Languages and Tools [ICSR’98]

  3. Definitions • A stream is a conceptually infinite ordered sequence of data items. • A streaming application is a computer program that continuously ingests input streams and produces output streams. • A stream processing language is a DSL (domain-specific language) for writing streaming applications. Martin Hirzel, IBM Research AI 3

  4. Requirements Performance Fast streaming on a cluster Hide complexity of Support diverse distributed system application domains Generality Productivity � Different prioritization drives language diversity Martin Hirzel, IBM Research AI 4

  5. Outline • Streaming SQL: CQL • Synchronous Dataflow: StreamIt • Explicit Stream Graph: SPL • Complex Events: MatchRegex • Reactive: ActiveSheets • Controlled Natural Language: META • Foundational Calculus: Brooklet Martin Hirzel, IBM Research AI 5

  6. Streaming SQL: CQL � SQL, plus IStream and Window The CQL continuous query language: semantic Martin Hirzel, IBM Research AI 6 foundations and query execution [VLDBJ’06]

  7. Streaming SQL: CQL S2R Operators R2R Operators R2S Operators Windows: now, Classic relational IStream: inserts, unbounded, algebra: select, DStream: deletes, sliding, tumbling, project, union, RStream: relations time, partitioned, group-by, F aggregate, join, F � Relational algebra, plus convert streams from/to relations River: An Intermediate Language for Martin Hirzel, IBM Research AI 7 Stream Processing [SP&E'16]

  8. Streaming SQL: CQL Insert newest size • In general: policies specified T as time or count or delta Back Front • Special case T = L also L known as tumbling granularity Evict oldest Trigger aggregation The CQL continuous query language: semantic Martin Hirzel, IBM Research AI 8 foundations and query execution [VLDBJ’06]

  9. Synchronous Dataflow: StreamIt F float->float pipeline ABC { add float->float filter A() { A work pop … push 2 { … } 2 } 3 B pops 3 per firing add float->float filter B () { B work pop 3 push 1 B pushes 1 per firing { … } 1 } 2 add float->float filter C() { C work pop 2 push … { … } F } } � Statically known push/pop rates StreamIt: A Compiler for Streaming Martin Hirzel, IBM Research AI 9 Applications [MIT TR'05]

  10. Synchronous Dataflow: StreamIt F A A A A A A 2 3 B B B B B B 1 2 C C C C C C F � Statically known firing schedule and FIFO queue sizes Dynamic Expressivity with Static Optimization Martin Hirzel, IBM Research AI 10 for Streaming Languages [DEBS'13]

  11. Explicit Stream Graph: SPL ibmstreams.github.io Martin Hirzel, IBM Research AI 11

  12. Explicit Stream Graph: SPL Kind of type Type example Literal example Number int32 42 String ustring "Saarbrücken" Boolean boolean true Enumeration enum<error,info,trace> LogLevel.info XML xml<"schemaURI"> '<x a="b">55</x>'x Tuple tuple<float64 x, float64 y> {x=0.5, y=0.8} Map map<ustring, int32> {"Mon": -1, "Fri": 1} List list<int32> [1, 2, 3] � Strongly and statically typed � Composite types (tuple, map, list) can nest � Streams can carry any tuple type SPL: An Extensible Language for Distributed Martin Hirzel, IBM Research AI 12 Stream Processing [TOPLAS'17]

  13. Complex Events: MatchRegex Series of rising peaks and troughs Deep drop below start of match M-shape (double-top) stock pattern Martin Hirzel, IBM Research AI 13 http://www.cs.cornell.edu/bigreddata/cayuga/

  14. Complex Events: MatchRegex Composite events Simple events Regular expression Key Aggregation � Operator only, no extensions to SPL syntax Partition and Compose: Parallel Martin Hirzel, IBM Research AI 14 Complex Event Processing [DEBS'12]

  15. Reactive: ActiveSheets =B3*C3 Scrolling Scrolling =B10*C10 =SUM(C3:C10) =SUM(G3:G10) =A15 =C15 =C12/G12 =B15<G15 Stream Processing with a Spreadsheet Martin Hirzel, IBM Research AI 15 [ECOOP'14] (Distinguished Paper Award)

  16. Reactive: ActiveSheets Columns Time Rows Sheets � Need more than two dimensions in practice Spreadsheets for Stream Processing with Martin Hirzel, IBM Research AI 16 Unbounded Windows and Partitions (DEBS'16)

  17. Controlled Natural Language: META 1 --- data model --- 2 a Client is a business entity identified by a name . 3 a Client is related to a marketer (a Marketer ). 4 5 a Read Event is a business event time-stamped by a date . 6 a Read Event is related to a client (a Client ). 7 a Read Event has a topic . 8 a Read Event has a length (a number). 9 10 --- agent descriptor --- 11 ' ClientRules ' is an agent related to a Client , 12 processing events : 13 - Read Event , where this Client comes from the client of this Read Event 14 15 --- event-condition-action rule --- 16 when a Read Event occurs 17 if 18 the length of this Read Event is more than ' Average Read Event Length ' 19 then 20 emit a new Alert where 21 the client is 'the Client ' , 22 the topic is the topic of this Read Event , 23 the marketer is the marketer of 'the Client ' ; 24 25 --- global event query --- 26 define ' Average Read Event Length ' as 27 the average length of all Read Events 28 during the last period of 6 hours . Martin Hirzel, IBM Research AI 17 http://www.ibm.com/software/products/en/odm

  18. Controlled Natural Language: META Event Event router Exactly-once Shuffle, using X10 event processing Event Analytics Event Analytics Event Analytics Action agent agent agent agent agent agent Transaction WXS Shard WXS Shard WXS Shard Replication META: Middleware for Events, Martin Hirzel, IBM Research AI 18 Transactions, and Analytics [IBMRD'16]

  19. Foundational Calculus: Brooklet � Pure opaque functions, explicit state A Universal Calculus for Stream Martin Hirzel, IBM Research AI 19 Processing Languages [ESOP'10]

  20. Foundational Calculus: Brooklet � Atomic steps, non-determinism, fire on any port From a Calculus to an Execution Environment for Martin Hirzel, IBM Research AI 20 Stream Processing [DEBS'12] (Best Paper Award)

  21. Democratization of Streaming Telco Medical Science Finance F Streaming engine High-level programming experience Insights Actions Martin Hirzel, IBM Research AI 21

Recommend


More recommend