Accommodating Bursts in Distributed Stream Processing Systems Yannis Drougas, Vana Kalogeraki Distributed Real-time Systems Lab University of California, Riverside {drougas,vana}@cs.ucr.edu http://www.cs.ucr.edu/~{drougas,vana}
Stream Processing Applications • Large class of emerging applications in which data streams must be processed online • Example applications include: – Stock Exchange data filtering – Traffic Monitoring – Surveillance – Sensor network data processing – Network monitoring Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 2
Distributed Stream Processing Systems High-volume, continuous Processed input streams result streams On-line processing functions / continuous query operators implemented on each node: Clustering Correlation Filtering Aggregation ... Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 3
Stream Processing Applications Characteristics Clustering Join Filtering • Data is produced continuously, in large volumes and at high rates • Data has to be processed in a timely manner, e.g. within a deadline • Application input rates fluctuate notably and abruptly Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 4
Previous Work • The majority of previous work [FIT, MPRA, Brown@VLDB 2006, Amini@ICDCS 2006, Xia@ICDCS 2007] has focused on optimizing a given utility function – Some solutions [FIT] employ data admission – Others [MPRA, Brown@VLDB 2006] consider the optimal placement of tasks on nodes • The case where load patterns can be predicted has also been studied [Borealis @ ICDE 2005] • QoS management [RTStream] is another solution Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 5
Our Problem • We focus on the problem of addressing bursts of input data rate – Devise a plan to thwart the burst – Provision for future bursts • Benefits: – Lost data units due to bursts are minimized – No QoS degradation or data admission – No under-utilization, dynamic reservation used • Challenges: – Highly dynamic / unpredictable environment – Multiple limiting resource types – Plan must be applied on time for the burst Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 6
Roadmap • Motivation and Background • System Architecture • Burst Handling Mechanism – Feasible Region & Index Points – Application-based Reservation – Online system adjustment • Experimental Evaluation • Conclusion Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 7
System Architecture Application layer: Execution of stream processing applications. Overlay network consisted of processing nodes. Built over a DHT (currently, Pastry). The physical (IP) network. Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 8
Application Execution • A stream processing application is executed collaboratively by peers of the system that invoke the appropriate services. • A service can be instantiated on more than one nodes. • A service instantiation on a node is a component. c1 src dest src s dest c2 Application submitted by the user Application executed on the system Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 9
System Architecture Streams Services Application Execution and Burst Handling Components Scheduling Monitoring Discovery Instantiation Operating System Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 10
System Model u c i • Each component is 1 characterized by its resource u c i u c i = 2 requirements · · · • Selectivity is another u c i J component characteristic. sel c i = average output rate average input rate • Each node is characterized A n by the availability of its 1 A n resources. A n = 2 · · · A n J Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 11
Roadmap • Motivation and Background • System Architecture • Burst Handling Mechanism – Feasible Region & Index Points – Application-based Reservation – Online system adjustment • Experimental Evaluation • Conclusion Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 12
Optimization Problem • Capacity Constraints: � r c i · u c i j ≤ A n ∀ n ∈ N , j , 1 ≤ j ≤ J c i ∈ n • Flow Conservation Constraints: � r c j = sel c i · r c i ∀ c i , c j ∈ D ( c i ) • We need to come up with a plan that satisfies the above constraints and minimizes the likelihood of missing data due to bursts. Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 13
Feasible Region • Assume we have Q applications • The state of the system at any given time can be described by a point in the Q- dimensional space 0.2 Rate of application 2 (ADUs / msec) 0.15 p 0.1 0.05 0 0 0.05 0.1 0.15 0.2 0.25 Rate of application 1 (ADUs / msec) Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 14
Feasible Region • The feasible region is the set of all points (application input rates combinations) that nodes in the given distributed stream processing system can accommodate without any data unit being dropped. • The form of linear constraints suggest that in the general case of Q applications, the feasible region is a convex polytope. Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 15
Feasible Region - Example src1 s1 dest1 src2 s2 dest2 Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 16
Feasible Region - Example Application 1 Node A Application 2 c1 u = 4 ms dest1 src1 s1 dest1 src1 c2 u = 4 ms u = 6 ms src2 s2 dest2 Node B c3 dest2 src2 u = 6.67 ms u = 5 ms c4 u = 10 ms Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 16
Feasible Region - Example Application 1 Node A Application 2 c1 u = 4 ms dest1 src1 s1 dest1 src1 c2 u = 4 ms u = 6 ms src2 s2 dest2 Node B c3 dest2 src2 u = 6.67 ms u = 5 ms c4 u = 10 ms r c 1 · 4 + r c 2 · 6 ≤ 1 r c 3 · 6 . 67 + r c 4 · 10 ≤ 1 r dest 1 · 4 ≤ 1 r dest 2 · 5 ≤ 1 r dest 1 = r c 1 + r c 3 r dest 2 = r c 2 + r c 4 Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 16
Feasible Region - Example Application 1 Node A Application 2 c1 u = 4 ms dest1 src1 s1 dest1 src1 c2 u = 4 ms u = 6 ms src2 s2 dest2 Node B c3 dest2 src2 u = 6.67 ms u = 5 ms c4 u = 10 ms Capacity constraints of nodes A and B 0.2 dest2 capacity constraint r c 1 · 4 + r c 2 · 6 ≤ 1 Rate of application 2 (ADUs / msec) 0.15 r c 3 · 6 . 67 + r c 4 · 10 ≤ 1 Feasible Region r dest 1 · 4 ≤ 1 0.1 dest1 capacity constraint r dest 2 · 5 ≤ 1 0.05 r dest 1 = r c 1 + r c 3 r dest 2 = r c 2 + r c 4 0 0 0.05 0.1 0.15 0.2 0.25 Rate of application 1 (ADUs / msec) Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 16
Dominance & Pareto Points • A point p1 dominates a point p2 when for each application q, p 1 ( q ) ≥ p 2 ( q ) • If the current system state is p2, one can apply the rate allocations calculated for p1 • A Pareto point is not dominated by any other point in the feasible region • Pareto points represent p 1 0.2 p 2 optimal solutions: Rate of application 2 (ADUs / msec) p 3 0.15 There is no point that is p 4 p 5 0.1 p “better” than a pareto point 0.05 0 0 0.05 0.1 0.15 0.2 0.25 Rate of application 1 (ADUs / msec) Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 17
Burst Handling • If the input rate of application q increases by , the input rate of a component of q δ q c i will increase by δ q · r c i r q • In order for a stream processing system to be able to sustain such an increase, the following must hold for each node: δ q · r c i � � r c i · u c i · u c i j ≤ A n j + j r q c i ∈ n ∩ C q c i ∈ n } } Initial resource Additional resource requirements requirements due to single burst Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 18
Optimization Objective • To minimize the amount of dropped data, we wish to maximize � δ q • If the current input rates are represented by p, we need to configure the system for: r 1 + δ 1 p ′ = p + δ = · · · r Q + δ q • We assume each application has equal probability for a burst to appear. So, ’s δ q must be as equal as possible: δ 1 r 1 + c c p ′ � � ⇒ δ = · · · · · · · · · δ Q r Q + c c Yannis Drougas, Vana Kalogeraki Accommodating Bursts in Distributed Stream Processing Systems 19
Recommend
More recommend