Partition and Compose: Parallel Complex Event Processing Martin Hirzel, IBM Research Tuesday, 17 July 2012 DEBS 1
? CEP = Stream Processing? Event (Stream) Processing Aggregate Complex Event Processing Enrich Filter Use pattern over “simple events” to detect and report Join “composite events” Parse … CEP as an operator in a streaming language? 2
Background: SPL • IBM Streams Processing Language • SPL is the language for InfoSphere Streams (IBM Product) • This paper is based on System S = research branch of InfoSphere Streams 3 ¡
Scenario: Financial analysis Series of rising peaks and troughs Deep drop below start of match M-shape (double-top) stock pattern Source: http://www.cs.cornell.edu/bigreddata/cayuga/ 4
M-Shape pattern in SPL Composite events Simple events Regular expression Key Aggregation Operator only, no extensions to SPL syntax 5
Regular expressions Pattern language familiar from string matching 6
Aggregations Operator-specific intrinsic functions 7
Matching semantics • Standard regular expression semantics • Non-greedy (right-minimal) • Partition-isolated • (Partition-)Contiguous • Non-overlapping (submit longest: left-maximal) 8
Implementation overview MatchRegex MatchRegex operator operator param, invocation generator output Automaton At compile-time At runtime MatchRegex Downstream Upstream operator C e o operator operator m l p p o m instance s i t i e S instance instance s e v t e n n e t s v e All C++ operators in SPL are code generators 9
Automaton . rise+ drop+ rise+ drop* deep Update and filter rise rise partial match drop 2 4 drop rise rise 5 . drop deep 0 1 3 deep 6 Create new drop partial match Report completed match and flush NFA (non-deterministic finite automaton) 10
Partitioning :PartitionMap :SimpleEvent ts 0..* :PartialMatch symbol key state price rise aggr rise drop size 2 4 rise rise 5 drop . drop deep 0 1 3 seqNum deep 6 drop 11
Generated C++ code Incremental aggregation rise rise drop 2 4 rise rise 5 drop . drop deep 0 1 3 deep 6 drop 12
Paralleli- Simple Composite events events Up-stream MatchRegex Down-stream zation operator operator operator PartitionMap :SimpleEvent … Schneider et al. symbol key … [PACT’12] Parallelize MatchRegex Simple Composite (replica 0) events events PartitionMap Up-stream MatchRegex Down-stream operator (replica 1) operator PartitionMap key for hash-split MatchRegex (replica 2) PartitionMap :SimpleEvent … symbol key for partition map 13 ¡ …
Safety and determinism • SPL compiler checks … – Syntax and names in expressions – Expression and function types • MatchRegex operator checks … – Syntax and names in regular expression pattern – Starting predicate aggregation-free • Auto-parallelizer checks … – Partitioning – Absence of stateful expressions – Sequence numbers and pulses Enables simple output validation with “diff” 14
Data sets … … and benchmarks 15
Absolute throughput in events per second Large speedup when low sequential throughput 16
Speedups 1 Machine x 8 Cores 4 Machines x 8 Cores = 32 Motivates elasticity and auto-width controller 17 ¡
Related work Engine / language Complex events Parallelism 2000 NiagaraCQ / XML-QL Algebraic No SQL-TS Back-tracking No Amit Back-tracking No NFA b / SASE Automaton No M ATCH _R ECOGNIZE ANSI proposal No EventScript Automaton No Cayuga / CEL Automaton Yes, by hand EventJava Index data structures Yes, per task [Woods,Teubner VLDB] Automaton Yes, on FPGA today This paper Automaton Yes, partitioned 18
Conclusions • CEP as an SPL operator – Use CEP for pattern matching – Use other operators for filtering, enrichment, parsing, joining, etc. • Up to 830K events/second – Incremental aggregation – C++ code generation – Parallelism (up to 14x speedup) 19
Backup 20
Shuffle in twitter02 and twitter03 ParseTweet MatchRegex (replica 0) (replica 0) Down-stream ParseTweet MatchRegex Source operator (replica 1) (replica 1) ParseTweet MatchRegex (replica 2) (replica 2) Raw tweets as Tweets as Composite XML documents simple events events 21 ¡
Recommend
More recommend