accommodating bursts in distributed stream processing
play

Accommodating Bursts in Distributed Stream Processing Systems - PowerPoint PPT Presentation

Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, Vana Kalogeraki Distributed Real-time Systems Lab University of California, Riverside {drougas,vana}@cs.ucr.edu http://www.cs.ucr.edu/~{drougas,vana} Stream


  1. Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, Vana Kalogeraki Distributed Real-time Systems Lab University of California, Riverside {drougas,vana}@cs.ucr.edu http://www.cs.ucr.edu/~{drougas,vana}

  2. Stream Processing Applications • Large class of emerging applications in which data streams must be processed online • Example applications include: – Stock Exchange data filtering – Traffic Monitoring – Surveillance – Sensor network data processing – Network monitoring Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 2

  3. Distributed Stream Processing Systems High-volume, continuous Processed input streams result streams On-line processing functions / continuous query operators implemented on each node: Clustering Correlation Filtering Aggregation ... Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 3

  4. Stream Processing Applications Characteristics Clustering Join Filtering • Data is produced continuously, in large volumes and at high rates • Data has to be processed in a timely manner, e.g. within a deadline • Application input rates fluctuate notably and abruptly Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 4

  5. Previous Work • The majority of previous work [FIT, MPRA, Brown@VLDB 2006, Amini@ICDCS 2006, Xia@ICDCS 2007] has focused on optimizing a given utility function – Some solutions [FIT] employ data admission – Others [MPRA, Brown@VLDB 2006] consider the optimal placement of tasks on nodes • The case where load patterns can be predicted has also been studied [Borealis @ ICDE 2005] • QoS management [RTStream] is another solution Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 5

  6. Our Problem • We focus on the problem of addressing bursts of input data rate – Devise a plan to thwart the burst – Provision for future bursts • Benefits: – Lost data units due to bursts are minimized – No QoS degradation or data admission – No under-utilization, dynamic reservation used • Challenges: – Highly dynamic / unpredictable environment – Multiple limiting resource types – Plan must be applied on time for the burst Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 6

  7. Roadmap • Motivation and Background • System Architecture • Burst Handling Mechanism – Feasible Region & Index Points – Application-based Reservation – Online system adjustment • Experimental Evaluation • Conclusion Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 7

  8. System Architecture Application layer: Execution of stream processing applications. Overlay network consisted of processing nodes. Built over a DHT (currently, Pastry). The physical (IP) network. Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 8

  9. Application Execution • A stream processing application is executed collaboratively by peers of the system that invoke the appropriate services. • A service can be instantiated on more than one nodes. • A service instantiation on a node is a component. c1 src dest src s dest c2 Application submitted by the user Application executed on the system Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 9

  10. System Architecture Streams Services Application Execution and Burst Handling Components Scheduling Monitoring Discovery Instantiation Operating System Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 10

  11. System Model u c i • Each component is   1 characterized by its resource u c i u c i =   2 requirements   · · ·   • Selectivity is another u c i J component characteristic. sel c i = average output rate average input rate • Each node is characterized  A n  by the availability of its 1 A n resources. A n =   2   · · ·   A n J Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 11

  12. Roadmap • Motivation and Background • System Architecture • Burst Handling Mechanism – Feasible Region & Index Points – Application-based Reservation – Online system adjustment • Experimental Evaluation • Conclusion Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 12

  13. Optimization Problem • Capacity Constraints: � r c i · u c i j ≤ A n ∀ n ∈ N , j , 1 ≤ j ≤ J c i ∈ n • Flow Conservation Constraints: � r c j = sel c i · r c i ∀ c i , c j ∈ D ( c i ) • We need to come up with a plan that satisfies the above constraints and minimizes the likelihood of missing data due to bursts. Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 13

  14. Feasible Region • Assume we have Q applications • The state of the system at any given time can be described by a point in the Q- dimensional space 0.2 Rate of application 2 (ADUs / msec) 0.15 p 0.1 0.05 0 0 0.05 0.1 0.15 0.2 0.25 Rate of application 1 (ADUs / msec) Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 14

  15. Feasible Region • The feasible region is the set of all points (application input rates combinations) that nodes in the given distributed stream processing system can accommodate without any data unit being dropped. • The form of linear constraints suggest that in the general case of Q applications, the feasible region is a convex polytope. Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 15

  16. Feasible Region - Example src1 s1 dest1 src2 s2 dest2 Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 16

  17. Feasible Region - Example Application 1 Node A Application 2 c1 u = 4 ms dest1 src1 s1 dest1 src1 c2 u = 4 ms u = 6 ms src2 s2 dest2 Node B c3 dest2 src2 u = 6.67 ms u = 5 ms c4 u = 10 ms Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 16

  18. Feasible Region - Example Application 1 Node A Application 2 c1 u = 4 ms dest1 src1 s1 dest1 src1 c2 u = 4 ms u = 6 ms src2 s2 dest2 Node B c3 dest2 src2 u = 6.67 ms u = 5 ms c4 u = 10 ms r c 1 · 4 + r c 2 · 6 ≤ 1 r c 3 · 6 . 67 + r c 4 · 10 ≤ 1 r dest 1 · 4 ≤ 1 r dest 2 · 5 ≤ 1 r dest 1 = r c 1 + r c 3 r dest 2 = r c 2 + r c 4 Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 16

  19. Feasible Region - Example Application 1 Node A Application 2 c1 u = 4 ms dest1 src1 s1 dest1 src1 c2 u = 4 ms u = 6 ms src2 s2 dest2 Node B c3 dest2 src2 u = 6.67 ms u = 5 ms c4 u = 10 ms Capacity constraints of nodes A and B 0.2 dest2 capacity constraint r c 1 · 4 + r c 2 · 6 ≤ 1 Rate of application 2 (ADUs / msec) 0.15 r c 3 · 6 . 67 + r c 4 · 10 ≤ 1 Feasible Region r dest 1 · 4 ≤ 1 0.1 dest1 capacity constraint r dest 2 · 5 ≤ 1 0.05 r dest 1 = r c 1 + r c 3 r dest 2 = r c 2 + r c 4 0 0 0.05 0.1 0.15 0.2 0.25 Rate of application 1 (ADUs / msec) Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 16

  20. Dominance & Pareto Points • A point p1 dominates a point p2 when for each application q, p 1 ( q ) ≥ p 2 ( q ) • If the current system state is p2, one can apply the rate allocations calculated for p1 • A Pareto point is not dominated by any other point in the feasible region • Pareto points represent p 1 0.2 p 2 optimal solutions: Rate of application 2 (ADUs / msec) p 3 0.15 There is no point that is p 4 p 5 0.1 p “better” than a pareto point 0.05 0 0 0.05 0.1 0.15 0.2 0.25 Rate of application 1 (ADUs / msec) Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 17

  21. Burst Handling • If the input rate of application q increases by , the input rate of a component of q δ q c i will increase by δ q · r c i r q • In order for a stream processing system to be able to sustain such an increase, the following must hold for each node: δ q · r c i � � r c i · u c i · u c i j ≤ A n j + j r q c i ∈ n ∩ C q c i ∈ n } } Initial resource Additional resource requirements requirements due to single burst Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 18

  22. Optimization Objective • To minimize the amount of dropped data, we wish to maximize � δ q • If the current input rates are represented by p, we need to configure the system for:   r 1 + δ 1 p ′ = p + δ = · · ·   r Q + δ q • We assume each application has equal probability for a burst to appear. So, ’s δ q must be as equal as possible:       δ 1 r 1 + c c p ′ �  �  ⇒ δ = · · · · · · · · ·     δ Q r Q + c c Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 19

Recommend


More recommend