over distributed settings
play

over Distributed Settings Nikos Giatrakos , Alexander Artikis * , - PowerPoint PPT Presentation

13 th ACM International Conference on Distributed & Event-Based Systems 28 June 2019 (DEBS 19) Darmstadt, Germany Uncertainty-Aware Event Analytics over Distributed Settings Nikos Giatrakos , Alexander Artikis * , Antonios


  1. 13 th ACM International Conference on Distributed & Event-Based Systems 28 June 2019 (DEBS ’19) Darmstadt, Germany Uncertainty-Aware Event Analytics over Distributed Settings Nikos Giatrakos §† , Alexander Artikis * ‡ , Antonios Deligiannakis §† , Minos Garofalakis §† § Athena Research & Innovation Center, * University of Piraeus, † Technical University of Crete, ‡ NCSR Demokritos Uncertainty-Aware Event Analytics over Distributed Settings 1

  2. (Geo)Distributed Architectures ~30 B connected devices by 2022 [Cisco VNI ‘18 ] Several data generation technologies Smart Cities, Smart Grids, Smart Houses Industry 4.0, Smart Factories Telecom Infrastructure Banking Infrastructure Social Networks Wearables … Uncertainty-Aware Event Analytics over Distributed Settings 3

  3. Big (Event) Data Challenges: 1-D B4 4-Vs Distribution: Massively distributed data streams → Need to reduce communication Volume NETWORK BOUND Velocity [e.g. Zleiter & Risch, PVLDB 2011, Karimov et al, ICDE 2018] Veracity (Uncertainty): Imprecise Attribute Values, Uncertain Event Occurrence Rules applied at a certain level of confidence Event Forecasting, Approximation Variety: various devices produce diverse data formats E. Zeitler and T. Risch. Massive scale-out of expensive continuous queries. PVLDB, 4(11):1181 – 1188, 2011. J. Karimov et al. Benchmarking Distributed Stream Data Processing Systems. ICDE, 1507-1518, 2018 Uncertainty-Aware Event Analytics over Distributed Settings 4

  4. Big (Event) Data Challenges: 1-D B4 4-Vs Distribution: Massively distributed data streams → Need to reduce communication Volume NETWORK BOUND Velocity [e.g. Zleiter & Risch, PVLDB 2011, Karimov et al, ICDE 2018] Veracity (Uncertainty): Imprecise Attribute Values, Uncertain Event Occurrence Rules applied at a certain level of confidence Event Forecasting, Approximation This Work: Handling Distribution + Uncertainty → Boost manageable Volume and Velocity → Extract Value (Event Analytics) out of Big Event Data Uncertainty-Aware Event Analytics over Distributed Settings 5

  5. Our Contributions Generic Tools for Integration in the FERARI Scalable Event-Analytics Platform Prototype Tool 1: In-situ Processing I. Flouris et al. FERARI: A Prototype for Complex Event Processing over Streaming In-situ filter installation Multi-cloud Platforms. SIGMOD, 2093-2096, “safely” avoids 2016. communication I. Flouris et al. Complex event processing over Tool 2: Monitoring Protocol streaming multi-cloud platforms: the FERARI approach. DEBS, 348-349, 2016 Incorporates in-situ filters Orchestrates event detection Cluster Cluster over the distributed setting In-situ processing Cluster Mobile In-situ processing Device In-situ processing Machine Sensor In-situ processing In-situ processing Uncertainty-Aware Event Analytics over Distributed Settings 6

  6. What Kind of Event-Analytics? Event Data Target Queries/CE Detection PATTERN NON_AGGR CEs: Complex Event Patterns (AGGR 1 > T 1 , AGGRegation (Thresholded) • SUM, COUNT, AVG etc …, • lying above/below Threshold T AGGR m > T m ) Q NON_AGGRegative Operator [WHERE conditions] • AND: Logical Conjunction • OR: Logical Disjunction [PARTITION BY key] • SEQ: Time-ordered Conjunction HAVING Q.Certainty> C (Un)Certainty/Confidence WITHIN window_const Threshold C SDEs: Simple Derived Events Updates on AGGR j Uncertainty-Aware Event Analytics over Distributed Settings 7

  7. Case Study: Mobile Fraud Detection Q 1 : FrequentToVoIPCalls 2-tiered Architecture PATTERN(COUNT (CDR) > T) Q 1 Coordinator – Query Source WHERE CDR.prefix = VoIP N sites - antennas PARTITION BY CDR.callerID Coordinator HAVING Q 1 .Certainty > C WITHIN Y minutes SDE Stream caller callee call start time duration p CDR = Call Detail Record caller callee call start time duration 62 23 11:10:23 05 - 10 22 0,41 62 23 11:10:23 May-10 22 38 45 11:10:24 05 - 10 21 0,43 38 45 11:10:24 May-10 21 34 22 11:10:23 05 - 10 13 0,41 34 22 11:10:23 May-10 13 83 19 11:10:25 05 - 10 6 0,42 83 19 11:10:25 May-10 5 10 22 11:10:24 May-10 6 10 22 11:10:24 05 – 10 6 0,4 18 26 11:10:24 May-10 7 34 41 11:10:24 05 - 10 9 0,41 26 30 11:10:24 May-10 8 Each VoIP call fraudulent 34 41 11:10:24 May-10 9 Antenna Sites with probability p Smartphone Users Commute Call Status updates Uncertainty-Aware Event Analytics over Distributed Settings 8

  8. Uncertainty-Aware In-situ Filters Basic Concept: Suppress communication if no CEs can be produced Random Variable (R.V.) X ≡ AGGR ϵ {COUNT, SUM, …} Global Filter @ Coordinator 1-CDF[X,T]=P[X > T ] ≤ C In-situ Filters @ each site A i ( N antennas), R.V. X i ≡ AGGR i N 1 − C If X = σ X i → CDF i [X i ,T/N ]≥ N T ]≥ N 1 − C If X = ς X i → CDF i [X i , Uncertainty-Aware Event Analytics over Distributed Settings 9

  9. Decomposable Probability Distributions Uncertainty-Aware Event Analytics over Distributed Settings 10

  10. Case Study: Mobile Fraud Detection Q 1 : FrequentToVoIPCalls PATTERN(COUNT (CDR) > T) Q 1 WHERE CDR.prefix = VoIP PARTITION BY CDR.callerID CDR = Call Detail Record HAVING Q 1 .Certainty > C WITHIN Y minutes Each VoIP call fraudulent with probability p ~ Bernoulli[p] n i calls @ A i , n = σ n i total calls for a subscriber, X = σ X i X i ≡COUNT i ~Binomial[n i ,p] → X≡COUNT ~Binomial[n,p] Global Filter @ Coordinator 1-CDF Binomial [X,T ] ≤ C In-situ Filters @ each site A i N 1 − C CDF Binomial [X i ,T/N ] ≥ Uncertainty-Aware Event Analytics over Distributed Settings 11

  11. 3-Phase Monitoring Protocol Initialization Phase Monitoring Phase Coordinator Coordinator 𝑂 1 − 𝐷 𝑂 1 − 𝐷 CDF 𝑗 ≥ CDF 𝑗 < Sites Transmit SDEs 𝑂 1 − 𝐷 𝑂 1 − 𝐷 CDF 𝑗 ≥ CDF 𝑗 ≥ 1. Estimate PDF if not known in-hand N 1 − C ⇒ A i caches relevant events 2. Set X~PDF . CDF 𝑗 ≥ 3. Transmit X i ~PDF(. ) to each site A i N 1 − C ⇒ A i Synchronization Phase CDF 𝑗 < 4, Go to Monitoring Phase Uncertainty-Aware Event Analytics over Distributed Settings 12

  12. 3-Phase Monitoring Protocol Synchronization Phase Coordinator Slack Allocation: Adaptively increase or decrease Sites Transmit N 1 − C threshold for each the SDEs site 1. Request cached events from sites A 1 , ⋯ , A N 2.1 SyncCase A when Pr X > T > C [Global Filter violated] : 2.1.1 Produce CEs, receive new events 2.1.2 Go to 2.1 2.1.3 If Pr X > T ≤ C [Global Filter holds] Go to Initialization phase 2.2 SyncCase B when Pr X > T ≤ C : 2.2.1 Slack Allocation 2.2.2 Go to Monitoring phase Uncertainty-Aware Event Analytics over Distributed Settings 13

  13. Implementation in FERARI Platform @ distributed Coordinator + CEP Optimizer architecture Site Configurations runtime statistics FERARI Inter-site Orchestration Output … real-time … input streams I. Flouris et al. FERARI: A Prototype for Complex Event Processing over Streaming Multi-cloud Platforms. SIGMOD, 2093-2096, 2016. I. Flouris et al. Complex event processing over streaming multi-cloud platforms: the FERARI approach. DEBS, 348-349, 2016 Uncertainty-Aware Event Analytics over Distributed Settings 14

  14. Implementation in FERARI Platform @ each site Each site runs an Apache Storm topology Support any CEP Engine Current Implementations ProtonOnStorm – IBM Haifa https://github.com/ishkin/Proton Esper http://www.espertech.com/esper/ Bridging the gap between two prototypes! Input Communi CEP Time cator Gate- Engine Machine Keeper Output Uncertainty-Aware Event Analytics over Distributed Settings 15

  15. Traditional Implementation in Proton Only @ coordinator [Correia et al, DEBS 2015] No parallelism Naive central data collection at the coordinator I. Correia et al. The uncertain case of credit card fraud detection. DEBS, 181-192, 2015 Uncertainty-Aware Event Analytics over Distributed Settings 16

  16. FERARI Implementation @ coordinator Parallel processing in Apache Storm Monitoring protocol for network orchestration No support for uncertainty Uncertainty-Aware Event Analytics over Distributed Settings 17

  17. FERARI Implementation @ each site Parallel processing in Apache Storm Monitoring protocol for network orchestration No support for uncertainty Uncertainty-Aware Event Analytics over Distributed Settings 18

  18. This Work: Uncertainty-aware FERARI @ coordinator Parallel processing in Apache Storm Monitoring protocol for network orchestration Support for uncertainty Uncertainty-Aware Event Analytics over Distributed Settings 19

  19. This Work: Uncertainty-aware FERARI @ each site Parallel processing in Apache Storm Monitoring protocol for network orchestration Support for uncertainty Uncertainty-Aware Event Analytics over Distributed Settings 20

  20. Evaluation Results Experimental Setup Highlights N=3, C=0,9 → An order of 160M calls from magnitude less transmitted [Flouris et al, SIGMOD 2016] messages N=3 to N=10 On average 4 times less C=0.9 to 0.5 transmitted messages across Competitors various N and C This Work N → 10 or C → 0.5 no earnings FERARI + Uncertainty-Aware Recall: Coordinator N 1 − C CDF i [X i ,T/N ] ≥ Naïve central data collection As N increases (omitted) N 1 − C → 1 • • T/N → 0 Uncertainty-Aware Event Analytics over Distributed Settings 21

Recommend


More recommend