Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Static Worksharing Strategies for Heterogeneous Computers with Unrecoverable Failures Anne Benoit, Yves Robert, Arnold Rosenberg and Fr´ ed´ eric Vivien ´ Ecole Normale Sup´ erieure de Lyon, France Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit HeteroPar’2009, August 25 Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 1/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Problem Large divisible computational workload Single-round distribution, one-port model Assemblage of p different-speed computers Unrecoverable interruptions A-priori knowledge of risk (failure probability) Goal: maximize expected amount of work done Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 2/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Related work Landmark paper by Bhatt, Chung, Leighton & Rosenberg on cycle stealing Hardware failures � Fault tolerant computing (hence scheduling) becomes unavoidable � Well, same story told since very long! Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 3/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Related work Landmark paper by Bhatt, Chung, Leighton & Rosenberg on cycle stealing Hardware failures � Fault tolerant computing (hence scheduling) becomes unavoidable � Well, same story told since very long! Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 3/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Related work Landmark paper by Bhatt, Chung, Leighton & Rosenberg on cycle stealing Hardware failures � Fault tolerant computing (hence scheduling) becomes unavoidable � Well, same story told since very long! Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 3/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Cycle-stealing scenario Big job of size W to execute during week-end Enroll p computers P 1 to P p Assign load fraction to each P i How to compute these load fractions? How to order communications? Risk increases with time Machines reclaimed at 8am on Monday with probability 1 Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 4/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Cycle-stealing scenario Big job of size W to execute during week-end Enroll p computers P 1 to P p Assign load fraction to each P i How to compute these load fractions? How to order communications? Risk increases linearly with time Machines reclaimed at 8am on Monday with probability 1 Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 4/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Cycle-stealing scenario Big job of size W to execute during week-end Enroll p computers P 1 to P p Assign load fraction to each P i How to compute these load fractions? How to order communications? Risk increases linearly with time Machines reclaimed at 8am on Monday with probability 1 Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 4/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Outline Technical framework 1 Homogeneous computers, with communication costs 2 Heterogeneous computers, no communication costs 3 Heterogeneous computers, with communication costs 4 Conclusion 5 Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 5/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Outline Technical framework 1 Homogeneous computers, with communication costs 2 Heterogeneous computers, no communication costs 3 Heterogeneous computers, with communication costs 4 Conclusion 5 Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 6/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Interruption model � κ dt for t ∈ [0 , 1 /κ ] dPr = 0 otherwise � w � � Pr ( w ) = min 1 , κ dt = min { 1 , κ w } 0 Goal: maximize expected work production Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 7/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Interruption model � κ dt for t ∈ [0 , 1 /κ ] dPr = 0 otherwise � w � � Pr ( w ) = min 1 , κ dt = min { 1 , κ w } 0 Goal: maximize expected work production Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 7/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Rules of the game Single-round, no overlap, one-port communications Homogeneous network Different-speed computers Failure-rate per unit-load communication z = κ bw Failure-rate per unit-load computation by computer P i κ x i = speed i Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 8/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Rules of the game Single-round, no overlap, one-port communications Homogeneous network Different-speed computers Failure-rate per unit-load communication z = κ bw Failure-rate per unit-load computation by computer P i κ x i = speed i Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 8/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion With two computers (1/2) P 1 z Y x 1 Y First send P 1 a chunk of size Y : E 1 = Y (1 − (z + x 1 ) Y ) Then send P 2 the remaining load (of size W − Y ): E 2 = ( W − Y ) (1 − (z W + x 2 ( W − Y )) Total expectation: E ( Y ) = E 1 + E 2 Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 9/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion With two computers (1/2) P 1 z Y x 1 Y P 2 z ( W − Y ) x 2 ( W − Y ) First send P 1 a chunk of size Y : E 1 = Y (1 − (z + x 1 ) Y ) Then send P 2 the remaining load (of size W − Y ): E 2 = ( W − Y ) (1 − (z W + x 2 ( W − Y )) Total expectation: E ( Y ) = E 1 + E 2 Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 9/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion With two computers (1/2) P 1 z Y x 1 Y P 2 z ( W − Y ) x 2 ( W − Y ) First send P 1 a chunk of size Y : E 1 = Y (1 − (z + x 1 ) Y ) Then send P 2 the remaining load (of size W − Y ): E 2 = ( W − Y ) (1 − (z W + x 2 ( W − Y )) Total expectation: E ( Y ) = E 1 + E 2 Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 9/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion With two computers (2/2) E ( Y ) = Y (1 − (z + x 1 ) Y ) + ( W − Y ) (1 − (z W + x 2 ( W − Y )) E ( Y ) = W − (z + x 2 ) W 2 − (z + x 1 + x 2 ) Y 2 + (z + 2x 2 ) WY z + 2x 2 Y (opt) = 2(z + x 1 + x 2 ) W � 4x 1 x 2 + 4(x 1 + x 2 )z + 3z 2 � E opt ( W , 2) = E ( Y (opt) ) = W − W 2 4(x 1 + x 2 + z) Symmetric in x 1 and x 2 ⇒ ordering of the communications has no impact Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 10/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion With two computers (2/2) E ( Y ) = Y (1 − (z + x 1 ) Y ) + ( W − Y ) (1 − (z W + x 2 ( W − Y )) E ( Y ) = W − (z + x 2 ) W 2 − (z + x 1 + x 2 ) Y 2 + (z + 2x 2 ) WY z + 2x 2 Y (opt) = 2(z + x 1 + x 2 ) W � 4x 1 x 2 + 4(x 1 + x 2 )z + 3z 2 � E opt ( W , 2) = E ( Y (opt) ) = W − W 2 4(x 1 + x 2 + z) Symmetric in x 1 and x 2 ⇒ ordering of the communications has no impact Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 10/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion With two computers (2/2) E ( Y ) = Y (1 − (z + x 1 ) Y ) + ( W − Y ) (1 − (z W + x 2 ( W − Y )) E ( Y ) = W − (z + x 2 ) W 2 − (z + x 1 + x 2 ) Y 2 + (z + 2x 2 ) WY z + 2x 2 Y (opt) = 2(z + x 1 + x 2 ) W � 4x 1 x 2 + 4(x 1 + x 2 )z + 3z 2 � E opt ( W , 2) = E ( Y (opt) ) = W − W 2 4(x 1 + x 2 + z) Symmetric in x 1 and x 2 ⇒ ordering of the communications has no impact Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 10/ 25
Framework Homogeneous+coms Heterogeneous Heterogeneous+coms Conclusion Extra rule: distribute entire load Total load W small enough so that we distribute it entirely Quite reasonable but dramatic impact on solution Definition Distrib ( p ): compute E opt ( W , p ), the optimal value of expected total amount of work done when distributing entire workload 1 W ≤ z+max(x i ) to the p remote computers Anne.Benoit@ens-lyon.fr August 25, 2009 Worksharing with Unrecoverable Interruptions 11/ 25
Recommend
More recommend