Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Mimi alone at home � � � � �� �� � � � � �� �� � � � � � � ��� ��� � � � � � � ��� ��� �� �� � � � � � � �� �� � � � � � � ��� ��� � � ��� ��� � � � � ��� ��� � � ��� ��� � � � � � � � � � � ��� ��� � � � � � � � � �� �� � � ��� ��� � � � � �� �� � � � � � � � � � � �� �� ��� ��� ��� ��� � � � � �� �� � � � � � � � � � � �� �� ��� ��� �� �� � � � � � � � � � � �� �� ��� ��� � � � � ��� ��� � � � � � � �� �� ��� ��� � � � � ��� ��� � � � � � � ��� ��� � � � � � � � � ��� ��� � � � � � � � � �� �� ? � � � � �� �� � � � � �� �� ? ? ? Problems and questions: Where to download from? How to deal with multiple Where to place the replicas? users? Heterogeneity Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 5/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Introduction and motivation Replica placement in tree networks Set of clients (tree leaves): requests with QoS or bandwidth constraints, known in advance Internal nodes may be provided with a replica; in this case they become servers and process requests (up to their capacity limit) Research questions: Total replica cost? How many replicas required? Quality of Service? Which locations? Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 6/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Introduction and motivation Replica placement in tree networks Set of clients (tree leaves): requests with QoS or bandwidth constraints, known in advance Internal nodes may be provided with a replica; in this case they become servers and process requests (up to their capacity limit) Research questions: Total replica cost? How many replicas required? Quality of Service? Which locations? Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 6/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Introduction and motivation Replica placement in tree networks Set of clients (tree leaves): requests with QoS or bandwidth constraints, known in advance Internal nodes may be provided with a replica; in this case they become servers and process requests (up to their capacity limit) Research questions: Total replica cost? How many replicas required? Quality of Service? Which locations? Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 6/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Introduction and motivation Replica placement in tree networks Set of clients (tree leaves): requests with QoS or bandwidth constraints, known in advance Internal nodes may be provided with a replica; in this case they become servers and process requests (up to their capacity limit) Research questions: Total replica cost? How many replicas required? Quality of Service? Which locations? Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 6/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Introduction and motivation Replica placement in tree networks Set of clients (tree leaves): requests with QoS or bandwidth constraints, known in advance Internal nodes may be provided with a replica; in this case they become servers and process requests (up to their capacity limit) Research questions: Total replica cost? How many replicas required? Quality of Service? Which locations? Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 6/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game Handle all client requests, and minimize cost of replicas → Replica Placement problem Several policies to assign replicas W = 10 1 5 4 3 2 2 3 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 7/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game Handle all client requests, and minimize cost of replicas → Replica Placement problem Several policies to assign replicas W = 10 1 5 4 3 2 2 3 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 7/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game Handle all client requests, and minimize cost of replicas → Replica Placement problem Several policies to assign replicas W = 10 1 5 4 3 2 2 3 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 7/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game Handle all client requests, and minimize cost of replicas → Replica Placement problem Several policies to assign replicas W = 10 1 5 4 3 2 2 3 Closest Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 7/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game Handle all client requests, and minimize cost of replicas → Replica Placement problem Several policies to assign replicas W = 10 1 5 4 3 3 2 2 Upwards Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 7/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game Handle all client requests, and minimize cost of replicas → Replica Placement problem Several policies to assign replicas W = 10 3 2 1 5 4 3 3 2 2 Multiple Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 7/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Major contributions Theory New access policies Problem complexity LP-based optimal solution to cost of Replica Placement Practice Heuristics for each policy Experiments to assess impact of new policies Experiments to assess impact of QoS on different policies Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 8/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Major contributions Theory New access policies Problem complexity LP-based optimal solution to cost of Replica Placement Practice Heuristics for each policy Experiments to assess impact of new policies Experiments to assess impact of QoS on different policies Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 8/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Definitions and notations Distribution tree T , clients C (leaf nodes), internal nodes N Client i ∈ C : Sends r i requests per time unit (number of accesses to a single object database) Quality of service q i (response time) Node j ∈ N : Can contain the object database replica (server) or not Processing capacity W j Storage cost sc j Tree edge: l ∈ L (communication link between nodes) Communication time comm l Bandwidth limit BW l Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 9/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Definitions and notations Distribution tree T , clients C (leaf nodes), internal nodes N Client i ∈ C : Sends r i requests per time unit (number of accesses to a single object database) Quality of service q i (response time) Node j ∈ N : Can contain the object database replica (server) or not Processing capacity W j Storage cost sc j Tree edge: l ∈ L (communication link between nodes) Communication time comm l Bandwidth limit BW l Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 9/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Definitions and notations Distribution tree T , clients C (leaf nodes), internal nodes N Client i ∈ C : Sends r i requests per time unit (number of accesses to a single object database) Quality of service q i (response time) Node j ∈ N : Can contain the object database replica (server) or not Processing capacity W j Storage cost sc j Tree edge: l ∈ L (communication link between nodes) Communication time comm l Bandwidth limit BW l Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 9/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Definitions and notations Distribution tree T , clients C (leaf nodes), internal nodes N Client i ∈ C : Sends r i requests per time unit (number of accesses to a single object database) Quality of service q i (response time) Node j ∈ N : Can contain the object database replica (server) or not Processing capacity W j Storage cost sc j Tree edge: l ∈ L (communication link between nodes) Communication time comm l Bandwidth limit BW l Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 9/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Problem instances Minimize � s ∈ R sc s under the constraints: Server capacity – ∀ s ∈ R , � i ∈C| s ∈ Servers( i ) r i , s ≤ W s QoS – ∀ i ∈ C , ∀ s ∈ Servers( i ) , � l ∈ path[ i → s ] comm l ≤ q i . Link capacity – ∀ l ∈ L � i ∈C , s ∈ Servers( i ) | l ∈ path[ i → s ] r i , s ≤ BW l Restrict to case where sc s = W s : Replica Counting problem on homogeneous platforms, Replica Cost problem with heterogeneous servers. Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 10/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Problem instances Minimize � s ∈ R sc s under the constraints: Server capacity – ∀ s ∈ R , � i ∈C| s ∈ Servers( i ) r i , s ≤ W s QoS – ∀ i ∈ C , ∀ s ∈ Servers( i ) , � l ∈ path[ i → s ] comm l ≤ q i . Link capacity – ∀ l ∈ L � i ∈C , s ∈ Servers( i ) | l ∈ path[ i → s ] r i , s ≤ BW l Restrict to case where sc s = W s : Replica Counting problem on homogeneous platforms, Replica Cost problem with heterogeneous servers. Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 10/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Problem instances Minimize � s ∈ R sc s under the constraints: Server capacity – ∀ s ∈ R , � i ∈C| s ∈ Servers( i ) r i , s ≤ W s QoS – ∀ i ∈ C , ∀ s ∈ Servers( i ) , � l ∈ path[ i → s ] comm l ≤ q i . Link capacity – ∀ l ∈ L � i ∈C , s ∈ Servers( i ) | l ∈ path[ i → s ] r i , s ≤ BW l Restrict to case where sc s = W s : Replica Counting problem on homogeneous platforms, Replica Cost problem with heterogeneous servers. Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 10/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Example: existence of a solution s 2 s 2 s 2 W = 1 (a) s 1 (b) s 1 (c) s 1 1 1 1 2 (a): solution for all policies ( Closest , Upwards , Multiple ) (b): no solution with Closest (c): no solution with Closest nor Upwards Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 11/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Example: existence of a solution s 2 s 2 s 2 W = 1 (a) s 1 (b) s 1 (c) s 1 1 1 1 2 (a): solution for all policies ( Closest , Upwards , Multiple ) (b): no solution with Closest (c): no solution with Closest nor Upwards Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 11/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Example: existence of a solution s 2 s 2 s 2 W = 1 (a) s 1 (b) s 1 (c) s 1 1 1 1 2 (a): solution for all policies ( Closest , Upwards , Multiple ) (b): no solution with Closest (c): no solution with Closest nor Upwards Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 11/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Example: existence of a solution s 2 s 2 s 2 W = 1 (a) s 1 (b) s 1 (c) s 1 1 1 1 2 (a): solution for all policies ( Closest , Upwards , Multiple ) (b): no solution with Closest (c): no solution with Closest nor Upwards Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 11/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Complexity results Homogeneous platform: Replica Counting problem, no bandwidth constraints No QoS With QoS Closest polynomial [Cidon02,Liu06] polynomial [Liu06] Upwards NP-hard NP-hard polynomial NP-hard Multiple Homogeneous platforms with bandwidth and QoS constraints: Closest remains polynomial Heterogeneous platforms: all problems are NP-hard Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 12/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Complexity results Homogeneous platform: Replica Counting problem, no bandwidth constraints No QoS With QoS Closest polynomial [Cidon02,Liu06] polynomial [Liu06] Upwards NP-hard NP-hard polynomial NP-hard Multiple Homogeneous platforms with bandwidth and QoS constraints: Closest remains polynomial Heterogeneous platforms: all problems are NP-hard Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 12/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Complexity results Homogeneous platform: Replica Counting problem, no bandwidth constraints No QoS With QoS Closest polynomial [Cidon02,Liu06] polynomial [Liu06] Upwards NP-hard NP-hard polynomial NP-hard Multiple Homogeneous platforms with bandwidth and QoS constraints: Closest remains polynomial Heterogeneous platforms: all problems are NP-hard Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 12/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Complexity results Homogeneous platform: Replica Counting problem, no bandwidth constraints No QoS With QoS Closest polynomial [Cidon02,Liu06] polynomial [Liu06] Upwards NP-hard NP-hard polynomial NP-hard Multiple Homogeneous platforms with bandwidth and QoS constraints: Closest remains polynomial Heterogeneous platforms: all problems are NP-hard Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 12/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Algo: Homogeneous Platform with QoS and Bandwidth Basic idea: computation of the minimal necessary number of replicas in a subtree Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 13/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Algo: Homogeneous Platform with QoS and Bandwidth Basic idea: computation of the minimal necessary number of replicas in a subtree Case 1: too many requests W = 10 r : 3 5 4 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 13/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Algo: Homogeneous Platform with QoS and Bandwidth Basic idea: computation of the minimal necessary number of replicas in a subtree Case 1: too many requests W = 10 � i r ( i ) = 12 r : 3 5 4 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 13/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Algo: Homogeneous Platform with QoS and Bandwidth Basic idea: computation of the minimal necessary number of replicas in a subtree Case 1: too many requests W = 10 � i r ( i ) = 12 1 replica r : 3 5 4 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 13/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Algo: Homogeneous Platform with QoS and Bandwidth Basic idea: computation of the minimal necessary number of replicas in a subtree Case 2: QoS constraints W = 10 r : 3 5 4 q : 1 3 2 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 13/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Algo: Homogeneous Platform with QoS and Bandwidth Basic idea: computation of the minimal necessary number of replicas in a subtree Case 2: QoS constraints W = 10 q ( i ) < hops 1 replica r : 3 5 4 q : 1 3 2 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 13/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Algo: Homogeneous Platform with QoS and Bandwidth Basic idea: computation of the minimal necessary number of replicas in a subtree Case 3: bandwidth constraints W = 10 4 b : 5 2 4 r : 3 5 4 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 13/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Algo: Homogeneous Platform with QoS and Bandwidth Basic idea: computation of the minimal necessary number of replicas in a subtree Case 3: bandwidth constraints W = 10 4 b ( l ) < r ( i ) b : 5 2 4 1 replica r : 3 5 4 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 13/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Algo: Homogeneous Platform with QoS and Bandwidth Basic idea: computation of the minimal necessary number of replicas in a subtree W = 10 4 1 replica b : 5 2 4 1 replica r : 3 5 4 q : 1 3 2 2 replicas Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 13/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion ORP - Optimal Replica Placement Algorithm Preparation Tree transformation Step 1 Bottom up computation of the contribution of client requests 6 C ( v , i ) : the contribution of node v on its i -th ancestor 6 e ( v , i ) : children of v that have to be equipped with a replica to 4 2 b : 5 2 minimize the contribution on the 2 4 i -th ancestor of v (respecting some additional constraints). r : 3 5 4 q : 1 3 2 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 14/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion ORP - Optimal Replica Placement Algorithm Preparation Tree transformation Step 1 Bottom up computation of the contribution of client requests Step 2 Top down replica placement procedure Place-replica (v, i) if v ∈ C then return; end place a replica at each node of e ( v , i ); forall c ∈ children( v ) do if c ∈ e ( v , i ) then Place-replica(c,0); else Place-replica(c,i+1); end end Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 15/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Linear programming General instance of the problem: Heterogeneous platform, QoS+bandwidth, Closest , Upwards and Multiple policies Solving over the rationals: solution for all practical values of the problem size Not very precise bound Upwards / Closest equivalent to Multiple Integer solving: limitation to s ≤ 50 nodes and clients Mixed bound obtained by solving the Upwards formulation over the rational and imposing only the x j being integers Resolution for problem sizes s ≤ 400 Improved bound: if a server is used only at 50% of its capacity, the cost of placing a replica at this node is not halved as it would be with x j = 0 . 5 → optimal solution for Multiple Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 16/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Linear programming General instance of the problem: Heterogeneous platform, QoS+bandwidth, Closest , Upwards and Multiple policies Solving over the rationals: solution for all practical values of the problem size Not very precise bound Upwards / Closest equivalent to Multiple Integer solving: limitation to s ≤ 50 nodes and clients Mixed bound obtained by solving the Upwards formulation over the rational and imposing only the x j being integers Resolution for problem sizes s ≤ 400 Improved bound: if a server is used only at 50% of its capacity, the cost of placing a replica at this node is not halved as it would be with x j = 0 . 5 → optimal solution for Multiple Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 16/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Linear programming General instance of the problem: Heterogeneous platform, QoS+bandwidth, Closest , Upwards and Multiple policies Solving over the rationals: solution for all practical values of the problem size Not very precise bound Upwards / Closest equivalent to Multiple Integer solving: limitation to s ≤ 50 nodes and clients Mixed bound obtained by solving the Upwards formulation over the rational and imposing only the x j being integers Resolution for problem sizes s ≤ 400 Improved bound: if a server is used only at 50% of its capacity, the cost of placing a replica at this node is not halved as it would be with x j = 0 . 5 → optimal solution for Multiple Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 16/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Linear programming General instance of the problem: Heterogeneous platform, QoS+bandwidth, Closest , Upwards and Multiple policies Solving over the rationals: solution for all practical values of the problem size Not very precise bound Upwards / Closest equivalent to Multiple Integer solving: limitation to s ≤ 50 nodes and clients Mixed bound obtained by solving the Upwards formulation over the rational and imposing only the x j being integers Resolution for problem sizes s ≤ 400 Improved bound: if a server is used only at 50% of its capacity, the cost of placing a replica at this node is not halved as it would be with x j = 0 . 5 → optimal solution for Multiple Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 16/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics Polynomial heuristics for Replica Cost problem Heterogeneous platforms Heuristics with and without QoS QoS constraints: QoS of client i represents the maximum distance (number of hops) between i and server( i ) Experimental assessment of relative performance of the three policies Impact of QoS No QoS: Traversals of the tree, bottom-up or top-down QoS: Sorted lists Worst case complexity O ( s 2 ), where s = |C| + |N| is problem size Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 17/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics Polynomial heuristics for Replica Cost problem Heterogeneous platforms Heuristics with and without QoS QoS constraints: QoS of client i represents the maximum distance (number of hops) between i and server( i ) Experimental assessment of relative performance of the three policies Impact of QoS No QoS: Traversals of the tree, bottom-up or top-down QoS: Sorted lists Worst case complexity O ( s 2 ), where s = |C| + |N| is problem size Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 17/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics Polynomial heuristics for Replica Cost problem Heterogeneous platforms Heuristics with and without QoS QoS constraints: QoS of client i represents the maximum distance (number of hops) between i and server( i ) Experimental assessment of relative performance of the three policies Impact of QoS No QoS: Traversals of the tree, bottom-up or top-down QoS: Sorted lists Worst case complexity O ( s 2 ), where s = |C| + |N| is problem size Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 17/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics for Closest - No QoS Closest Top Down Largest First CTDLF Traversal of the tree, treating n 1 18 subtrees that contains most requests first n 2 9 n 4 2 When a node can process the requests of all the clients in its n 3 1 subtree, node chosen as a server and traversal stopped 2 5 2 3 1 Procedure called until no more servers are added Cost Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 18/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics for Closest - No QoS Closest Top Down Largest First CTDLF Traversal of the tree, treating n 1 18 subtrees that contains most requests first n 2 9 n 4 2 When a node can process the requests of all the clients in its n 3 1 subtree, node chosen as a server and traversal stopped 2 5 2 3 1 Procedure called until no more servers are added Cost: 18 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 18/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics for Closest - No QoS Closest Top Down Largest First CTDLF Traversal of the tree, treating n 1 8 subtrees that contains most requests first n 2 9 n 4 2 When a node can process the requests of all the clients in its n 3 1 subtree, node chosen as a server and traversal stopped 2 5 2 3 1 Procedure called until no more servers are added Cost Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 18/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics for Closest - No QoS Closest Top Down Largest First CTDLF Traversal of the tree, treating n 1 8 subtrees that contains most requests first n 2 9 n 4 2 When a node can process the requests of all the clients in its n 3 1 subtree, node chosen as a server and traversal stopped 2 5 2 3 1 Procedure called until no more servers are added Solution cost: 17 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 18/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics for Closest - QoS Closest Big Subtree First CBSF n 1 18 Traversal of the tree, treating subtrees that contain most n 2 9 n 4 2 requests first When a node can process the 1 n 3 requests of all the clients in its subtree, node chosen as a server and traversal stopped 2 5 2 3 1 Procedure called until no more q = 3 q = 1 q = 1 q = 2 q = 3 servers are added Cost Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 19/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics for Closest - QoS Closest Big Subtree First CBSF n 1 18 Traversal of the tree, treating subtrees that contain most n 2 9 n 4 2 requests first When a node can process the 1 n 3 requests of all the clients in its subtree, node chosen as a server and traversal stopped 2 5 2 3 1 Procedure called until no more q = 3 q = 1 q = 1 q = 2 q = 3 servers are added Cost Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 19/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Heuristics for Closest - QoS Closest Big Subtree First CBSF n 1 18 Traversal of the tree, treating subtrees that contain most n 2 9 n 4 2 requests first When a node can process the 1 n 3 requests of all the clients in its subtree, node chosen as a server and traversal stopped 2 5 2 3 1 Procedure called until no more q = 3 q = 1 q = 1 q = 2 q = 3 servers are added Cost: 27 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 19/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Results - Percentage of success - QoS Number of solutions for each lambda and each heuristic average ( qos ) = height / 2 100 80 percentage of success � 60 i ∈C r i λ = � j ∈N W i 40 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 lambda Closest_BigSubtreeFirst Multiple_SQoS_Close Closest_SmallQoSFirst Multiple_SQoS_MinReq Upwards_SQoS_Started Multiple_MinQoS_Indisp Upwards_SQoS_MinReq MixedBest Upwards_DistServer_Indisp LP Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 20/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Results - Relative Performance Distance of the result (in terms of replica cost) of the heuristic to the optimal solution T λ : subset of trees with a solution Relative performance: 1 cost LP ( t ) � rperf = | T λ | cost h ( t ) t ∈ T λ cost LP ( t ) : optimal cost on tree t cost h ( t ): heuristic cost on tree t ; cost h ( t ) = + ∞ if h did not find any solution Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 21/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Results - Relative Performance Distance of the result (in terms of replica cost) of the heuristic to the optimal solution T λ : subset of trees with a solution Relative performance: 1 cost LP ( t ) � rperf = | T λ | cost h ( t ) t ∈ T λ cost LP ( t ) : optimal cost on tree t cost h ( t ): heuristic cost on tree t ; cost h ( t ) = + ∞ if h did not find any solution Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 21/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Results - Relative Performance - No QoS Heterogeneous results - similar to the homogeneous case 100 80 relative performance 60 40 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 lambda ClosestTopDownAll MultipleGreedy ClosestTopDownLargestFirst MultipleTopDown ClosestBottomUp MultipleBottomUp UpwardsTopDown MixedBest UpwardsBigClientFirst Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 22/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Results - Relative Performance - QoS average ( qos ) = height / 2 100 80 relative performance 60 40 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 lambda Closest_BigSubtreeFirst Multiple_SQoS_Close Closest_SmallQoSFirst Multiple_SQoS_MinReq Upwards_SQoS_Started Multiple_MinQoS_Indisp Upwards_SQoS_MinReq MixedBest Upwards_DistServer_Indisp Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 23/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Summary No QoS: Striking effect of new policies: many more solutions to the Replica Placement problem Multiple ≥ Upwards ≥ Closest : hierarchy observed within our heuristics Best Multiple heuristic (MB) always at 85% of the optimal: satisfactory result QoS: Hierarchy also under QoS constraints Performance compared to the optimal solution: qos ∈ { 1 , 2 } : 95% average ( qos ) = height / 2: 85% no qos: 85% Smaller trees: results slightly less good Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 24/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Summary No QoS: Striking effect of new policies: many more solutions to the Replica Placement problem Multiple ≥ Upwards ≥ Closest : hierarchy observed within our heuristics Best Multiple heuristic (MB) always at 85% of the optimal: satisfactory result QoS: Hierarchy also under QoS constraints Performance compared to the optimal solution: qos ∈ { 1 , 2 } : 95% average ( qos ) = height / 2: 85% no qos: 85% Smaller trees: results slightly less good Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 24/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Related work Several papers on replica placement, but... ...all consider only the Closest policy Replica Placement in a general graph is NP-complete Wolfson and Milo: impact of the write cost, use of a minimum spanning tree for updates. Tree networks: polynomial solution Cidon et al (multiple objects) and Liu et al (QoS constraints): polynomial algorithms for homogeneous networks. Kalpakis et al: NP-completeness of a variant with bidirectional links (requests served by any node in the tree) Karlsson et al: comparison of different objective functions and several heuristics. No QoS, but several other constraints. Tang et al: real QoS constraints Rodolakis et al: Multiple policy but in a very different context Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 25/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Related work Several papers on replica placement, but... ...all consider only the Closest policy Replica Placement in a general graph is NP-complete Wolfson and Milo: impact of the write cost, use of a minimum spanning tree for updates. Tree networks: polynomial solution Cidon et al (multiple objects) and Liu et al (QoS constraints): polynomial algorithms for homogeneous networks. Kalpakis et al: NP-completeness of a variant with bidirectional links (requests served by any node in the tree) Karlsson et al: comparison of different objective functions and several heuristics. No QoS, but several other constraints. Tang et al: real QoS constraints Rodolakis et al: Multiple policy but in a very different context Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 25/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Related work Several papers on replica placement, but... ...all consider only the Closest policy Replica Placement in a general graph is NP-complete Wolfson and Milo: impact of the write cost, use of a minimum spanning tree for updates. Tree networks: polynomial solution Cidon et al (multiple objects) and Liu et al (QoS constraints): polynomial algorithms for homogeneous networks. Kalpakis et al: NP-completeness of a variant with bidirectional links (requests served by any node in the tree) Karlsson et al: comparison of different objective functions and several heuristics. No QoS, but several other constraints. Tang et al: real QoS constraints Rodolakis et al: Multiple policy but in a very different context Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 25/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Outline of the Talk Replica Placement in Tree-Networks 1 Framework Complexity Heuristics for Replica Cost Problem Experiments Pipeline Workflow Applications 2 Bi-criteria Complexity Results In-network Stream Processing 3 Heuristics and Experiments Conclusion 4 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 26/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Focus on pipeline applications Mapping the JPEG encoder pipeline onto a cluster of workstations. Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 27/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Focus on pipeline applications Mapping the JPEG encoder pipeline onto a cluster of workstations. Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 27/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Introduction and motivation Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping Focus on pipeline applications Mapping the JPEG encoder pipeline onto a cluster of workstations. Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 27/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Multi-criteria scheduling of workflows Source Compressed YUV Block Entropy Scaling FDCT Quantizier Image Data Image Data Conversion Storage Encoder Quantization Huffman Subsampling Table Table Workflow Several consecutive data-sets enter the application graph. Period P : time interval between the beginning of execution of two consecutive data-sets Latency L : maximal time elapsed between beginning and end of execution of a data-set Failure probability FP : the probability that a processor fails during execution Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 28/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game The application ... ... δ 0 δ 1 δ k − 1 δ k δ n S 1 S 2 S k S n w 1 w 2 w k w n Cut pipeline into intervals Map each interval on a single processor... ... or replicate it to improve reliability The platform P processors Fully connected graph (i.e., a clique) Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 29/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game The application ... ... δ 0 δ 1 δ k − 1 δ k δ n S 1 S 2 S k S n w 1 w 2 w k w n Cut pipeline into intervals Map each interval on a single processor... ... or replicate it to improve reliability The platform P processors Fully connected graph (i.e., a clique) Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 29/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game The application ... ... S 1 S 2 S k S n P 1 P 2 Cut pipeline into intervals Map each interval on a single processor... ... or replicate it to improve reliability The platform P processors Fully connected graph (i.e., a clique) Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 29/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game The application ... ... S 1 S 2 S k S n P 1 P 2 P 3 P 4 P 6 Cut pipeline into intervals Map each interval on a single processor... ... or replicate it to improve reliability The platform P processors Fully connected graph (i.e., a clique) Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 29/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the game The application ... ... S 1 S 2 S k S n P 1 P 2 P 3 P 4 P 6 Cut pipeline into intervals Map each interval on a single processor... ... or replicate it to improve reliability The platform P processors Fully connected graph (i.e., a clique) Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 29/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Objective function? Minimize P Mono-criterion Minimize L Minimize FP Bi-criteria How to define it? Minimize α. P + β. L ? Minimize α. L + β. FP ? Values which are not comparable Minimize P for a fixed latency Minimize L for a fixed period Minimize FP for a fixed latency Minimize L for a fixed failure probability Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 30/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Objective function? Minimize P Mono-criterion Minimize L Minimize FP Bi-criteria How to define it? Minimize α. P + β. L ? Minimize α. L + β. FP ? Values which are not comparable Minimize P for a fixed latency Minimize L for a fixed period Minimize FP for a fixed latency Minimize L for a fixed failure probability Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 30/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Objective function? Minimize P Mono-criterion Minimize L Minimize FP Bi-criteria How to define it? Minimize α. P + β. L ? Minimize α. L + β. FP ? Values which are not comparable Minimize P for a fixed latency Minimize L for a fixed period Minimize FP for a fixed latency Minimize L for a fixed failure probability Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 30/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Objective function? Minimize P Mono-criterion Minimize L Minimize FP Bi-criteria How to define it? Minimize α. P + β. L ? Minimize α. L + β. FP ? Values which are not comparable Minimize P for a fixed latency Minimize L for a fixed period Minimize FP for a fixed latency Minimize L for a fixed failure probability Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 30/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion An Optimal Algorithm Minimize L with fixed P - Homogeneous platform L ( i , q ) : min. latency with exactly q procs mapping stages 1 to i L ( n , q ) for 1 ≤ q ≤ p Init: ( δ 0 b + P i w k s + δ i if ≤ P k =1 b L ( i , 1) = , L (1 , q ) = ∞ if q > 1 else ∞ Recursion: i i w k s + δ i w k δ j δ i � � � L ( i , q ) = min L ( j , q − 1) + b + s + b ≤ P � b � j < i k = j +1 k = j +1 ... S j +1 ... ... ... δ 0 δ 1 δ j − 1 δ j δ i δ n S 1 S 2 S j S i S n w 1 w 2 w j w i w n Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 31/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Exemple: Minimizing FP with fixed latency Minimize FP with fixed latency Different speed processors - Failure heterogeneous s = 1 fp = 0 . 1 Fixed latency: 22 10 1 0 S 1 S 2 w 1 = 1 w 2 = 100 s = 100 fp = 0 . 8 Open complexity! Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 32/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Exemple: Minimizing FP with fixed latency Minimize FP with fixed latency Different speed processors - Failure heterogeneous s = 1 fp = 0 . 1 Fixed latency: 22 10 1 0 S 1 S 2 w 1 = 1 w 2 = 100 s = 100 fp = 0 . 8 10 + 101 ≫ 22 Open complexity! Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 32/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Exemple: Minimizing FP with fixed latency Minimize FP with fixed latency Different speed processors - Failure heterogeneous s = 1 fp = 0 . 1 Fixed latency: 22 10 1 0 S 1 S 2 w 1 = 1 w 2 = 100 s = 100 fp = 0 . 8 20 + 101 / 100 < 22 FP = (1 − (1 − 0 . 8 2 )) = 0 . 64 Open complexity! Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 32/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Exemple: Minimizing FP with fixed latency Minimize FP with fixed latency Different speed processors - Failure heterogeneous s = 1 fp = 0 . 1 Fixed latency: 22 10 1 0 S 1 S 2 w 1 = 1 w 2 = 100 s = 100 fp = 0 . 8 30 + 101 / 100 > 22 Open complexity! Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 32/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Exemple: Minimizing FP with fixed latency Minimize FP with fixed latency Different speed processors - Failure heterogeneous s = 1 fp = 0 . 1 Fixed latency: 22 10 1 0 S 1 S 2 w 1 = 1 w 2 = 100 s = 100 fp = 0 . 8 10 + 1 / 1 + 10 × 1 + 100 / 100 = 22 FP : 1 − (1 − 0 . 1) × (1 − 0 . 8 10 ) < 0 . 2 Open complexity! Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 32/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Exemple: Minimizing FP with fixed latency Minimize FP with fixed latency Different speed processors - Failure heterogeneous s = 1 fp = 0 . 1 Fixed latency: 22 10 1 0 S 1 S 2 w 1 = 1 w 2 = 100 s = 100 fp = 0 . 8 Open complexity! Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 32/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Complexity Results Bi-criteria interval mapping Objective Failure Hom. Com. Hom. Het. P & L / polynomial NP-hard NP-hard FP & L hom. polynomial polynomial NP-hard FP & L het. polynomial open NP-hard Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 33/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Integer linear programming Integer LP to solve Interval Mapping on Communication Homogeneous platforms Many integer variables: no efficient algorithm to solve Approach limited to small problem instances Absolute performance of the heuristics for such instances Bucket behavior of LP solutions 350 330 P_fixed L_fixed 325 340 Optimal Latency Optimal Period 320 330 315 320 310 310 305 300 300 300 305 310 315 320 325 330 320 330 340 350 360 370 380 390 400 Fixed Period Fixed Latency (a) Fixed P. (b) Fixed L. Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 34/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Outline of the Talk Replica Placement in Tree-Networks 1 Framework Complexity Heuristics for Replica Cost Problem Experiments Pipeline Workflow Applications 2 Bi-criteria Complexity Results In-network Stream Processing 3 Heuristics and Experiments Conclusion 4 Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 35/ 48
Replica Placement Pipeline Workflows In-network Stream Processing Conclusion Rule of the Game Processors Applications op 5 op 3 op 2 op 4 op 2 ob 3 op 1 op 1 op 1 ob 1 ob 1 ob 1 ob 2 ob 1 ob 2 ob 1 ob 2 computation speed network card capacity application 1 application 2 Goal Minimize total processing power of the target platform while matching all application requirements. Assess impact of reusing intermediate results. Veronika.Sonigo@ens-lyon.fr July 7, 2009 Mapping and Scheduling of Workflow Applications 36/ 48
Recommend
More recommend