partition and compose parallel complex event processing
play

Partition and Compose: Parallel Complex Event Processing Martin - PowerPoint PPT Presentation

Partition and Compose: Parallel Complex Event Processing Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS 1 ? CEP = Stream Processing? Event (Stream) Processing Aggregate Complex Event Processing Enrich Filter Use pattern over


  1. Partition and Compose: Parallel Complex Event Processing Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS 1

  2. ? CEP = Stream Processing? Event (Stream) Processing Aggregate Complex Event Processing Enrich Filter Use pattern over “simple events” to detect and report Join “composite events” Parse …  CEP as an operator in a streaming language? 2

  3. Background: SPL • IBM Streams Processing Language • SPL is the language for InfoSphere Streams (IBM Product) • This paper is based on System S = research branch of InfoSphere Streams 3 ¡

  4. Scenario: Financial analysis Series of rising peaks and troughs Deep drop below start of match M-shape (double-top) stock pattern Source: http://www.cs.cornell.edu/bigreddata/cayuga/ 4

  5. M-Shape pattern in SPL Composite events Simple events Regular expression Key Aggregation  Operator only, no extensions to SPL syntax 5

  6. Regular expressions  Pattern language familiar from string matching 6

  7. Aggregations  Operator-specific intrinsic functions 7

  8. Matching semantics • Standard regular expression semantics • Non-greedy (right-minimal) • Partition-isolated • (Partition-)Contiguous • Non-overlapping (submit longest: left-maximal) 8

  9. Implementation overview MatchRegex MatchRegex operator operator param, invocation generator output Automaton At compile-time At runtime MatchRegex Downstream Upstream operator C e o operator operator m l p p o m instance s i t i e S instance instance s e v t e n n e t s v e  All C++ operators in SPL are code generators 9

  10. Automaton . rise+ drop+ rise+ drop* deep Update and filter rise rise partial match drop 2 4 drop rise rise 5 . drop deep 0 1 3 deep 6 Create new drop partial match Report completed match and flush  NFA (non-deterministic finite automaton) 10

  11. Partitioning :PartitionMap :SimpleEvent ts 0..* :PartialMatch symbol key state price rise aggr rise drop size 2 4 rise rise 5 drop . drop deep 0 1 3 seqNum deep 6 drop 11

  12. Generated C++ code  Incremental aggregation rise rise drop 2 4 rise rise 5 drop . drop deep 0 1 3 deep 6 drop 12

  13. Paralleli- Simple Composite events events Up-stream MatchRegex Down-stream zation operator operator operator PartitionMap :SimpleEvent …  Schneider et al. symbol key … [PACT’12] Parallelize MatchRegex Simple Composite (replica 0) events events PartitionMap Up-stream MatchRegex Down-stream operator (replica 1) operator PartitionMap key for hash-split MatchRegex (replica 2) PartitionMap :SimpleEvent … symbol key for partition map 13 ¡ …

  14. Safety and determinism • SPL compiler checks … – Syntax and names in expressions – Expression and function types • MatchRegex operator checks … – Syntax and names in regular expression pattern – Starting predicate aggregation-free • Auto-parallelizer checks … – Partitioning – Absence of stateful expressions – Sequence numbers and pulses  Enables simple output validation with “diff” 14

  15. Data sets … … and benchmarks 15

  16. Absolute throughput in events per second  Large speedup when low sequential throughput 16

  17. Speedups 1 Machine x 8 Cores 4 Machines x 8 Cores = 32  Motivates elasticity and auto-width controller 17 ¡

  18. Related work Engine / language Complex events Parallelism 2000 NiagaraCQ / XML-QL Algebraic No SQL-TS Back-tracking No Amit Back-tracking No NFA b / SASE Automaton No M ATCH _R ECOGNIZE ANSI proposal No EventScript Automaton No Cayuga / CEL Automaton Yes, by hand EventJava Index data structures Yes, per task [Woods,Teubner VLDB] Automaton Yes, on FPGA today This paper Automaton Yes, partitioned 18

  19. Conclusions • CEP as an SPL operator – Use CEP for pattern matching – Use other operators for filtering, enrichment, parsing, joining, etc. • Up to 830K events/second – Incremental aggregation – C++ code generation – Parallelism (up to 14x speedup) 19

  20. Backup 20

  21. Shuffle in twitter02 and twitter03 ParseTweet MatchRegex (replica 0) (replica 0) Down-stream ParseTweet MatchRegex Source operator (replica 1) (replica 1) ParseTweet MatchRegex (replica 2) (replica 2) Raw tweets as Tweets as Composite XML documents simple events events 21 ¡

Recommend


More recommend