diversity vs parallelism in distributed computing with
play

Diversity vs. Parallelism in Distributed Computing with Redundancy - PowerPoint PPT Presentation

Diversity vs. Parallelism in Distributed Computing with Redundancy Pei Peng*, Emina Soljanin*, Philip Whiting * Rutgers University Macquarie University 2020 IEEE International Symposium on Information Theory Background and Problem


  1. Diversity vs. Parallelism in Distributed Computing with Redundancy Pei Peng*, Emina Soljanin*, Philip Whiting † * Rutgers University † Macquarie University 2020 IEEE International Symposium on Information Theory

  2. Background and Problem Description

  3. Background  Distributed computing:  Numerous machine learning and other algorithms increase in complexity and data requirements;  Distributed computing can provide parallelism that provides simultaneous execution of smaller tasks that make up a large computing job.  Redundancy :  The large-scale sharing of computing resources causes random fluctuations in task service times;  Redundancy can provides diversity so that a job is completed as soon as any fixed size subset of tasks has been executed. 1/19

  4. Distributed Computing Distributed computing provides simultaneous execution of smaller tasks that make up a large computing job. Bob Split homework Bob Alice 2/19

  5. Straggling Jobs There are many reasons, e.g. large-scale sharing, maintenance activities, queueing , causes random fluctuations in task service times. Straggler 3/19

  6. Redundancy Redundancy , in the form of task replication and erasure coding , that allows a job to be completed as only a subset of redundant tasks gets executed thus avoiding stragglers. Bob Split homework replica Alice Eve 4/19

  7. Stragglers Mitigation -example Straggler 4/19

  8. Worst Case for One Straggler Straggler 4/19

  9. Coding Erasure code is a potentially powerful way to shorten the job execution time, especially in previous example. However, it can only be used in some specific scenarios. Split and encode 5/19

  10. Stragglers Mitigation -example Straggler 5/19

  11. Diversity and Parallelism The diversity and parallelism are defined according to the redundancy for each job: diversity increases with redundancy and parallelism decreases with redundancy. Parallelism Diversity Coding Splitting Replication minimum decrease increase Redundancy maximum Splitting Replication 6/19

  12. Diversity vs. Parallelism Tradeoff Both parallelism and diversity are essential in reducing job service time, but in the opposite directions. Replication Splitting Task execution time Tr 1 , Tr 2 ,…, Tr 𝑜 T s 1 , T s 2 ,…, T s 𝑜 𝑄 Tr 𝑗 > 𝑢 > 𝑄 T s 𝑗 > 𝑢 where 𝑗 = 1,2, … , 𝑜 Comparison between each task 𝑁𝑗𝑜{ Tr 1 , Tr 2 ,…, Tr 𝑜 } 𝑁𝑏𝑦{ T s 1 , T s 2 ,…, T s 𝑜 } Job completion time To formalize this tradeoff problem, we should answer the following questions: 1. What distribution has been used for job service time? ( Tr , T s ) 2. How the distribution changes with task size? 6/19

  13. System Model and Prior work

  14. System Model Jobs can be split into tasks which can be executed independently in parallel on different workers: 1. J 1 is executed with splitting; 2. J 2 is executed with replication, the shadings are replicas; 3. J 3 is executed with [4,2] erasure code, the shadings are coded tasks. We consider the expected execution time for each job. 7/19

  15. Computing Unit (CU) Fact: Job can not be split into any size of tasks. Smallest task: a question. Smallest task: A Job consists of tasks, and each task consists of CUs. 8/19

  16. Parameters and Notations n − number of workers (number of CUs in a job) k − number of workers that have to execute their tasks for job completion s − number of CUs per task, 𝑡 = 𝑜/𝑙 V − service time for each CU X − exponential random variable Y − task completion time for each worker 𝑌 𝑙:𝑜 − k-th order statistics 𝑜 𝑍 𝑜,𝑙 − job completion time when each worker’s task size is 𝑙 = 𝑡

  17. References [1] G. Liang and U. C. Kozat , “Fast cloud: Pushing the envelope on delay performance of cloud storage with coding,” IEEE/ACM Transactions on Networking, vol. 22, no. 6, pp. 2012 – 2025, 2014. [2] A. Vulimiri, P. B. Godfrey, R. Mittal, J. Sherry, S. Ratnasamy, and S. Shenker , “Low latency via redundancy,” in Proceedings of the ninth ACM conference on Emerging networking experiments and technologies. ACM, 2013, pp. 283 – 294. [3] A. Gorbunova, I. Zaryadov, S. Matyushenko, and E. Sopin , “The estimation of probability characteristics of cloud computing systems with splitting of requests,” in International Conference on Distributed Computer and Communication Networks. Springer, 2016, pp. 418– 429. [4] A. Behrouzi-Far and E. Soljanin , “Redundancy scheduling in systems with bi - modal job service time distributions,” in 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2019, pp. 9 – 16. [5] A. Reisizadeh, S. Prakash, R. Pedarsani, and A. S. Avestimehr , “Coded computation over heterogeneous clusters,” IEEE Transactions on Information Theory, vol. 65, no. 7, pp. 4227 – 4242, 2019. [6] S. Dutta, V. Cadambe , and P. Grover, “Short -dot: Computing large linear transforms distributedly using coded short dot products,” Advances in Neural Information Processing Systems, vol. 29, pp. 2100 – 2108, 2016. [7] G. Joshi, Y. Liu, and E. Soljanin , “Coding for fast content download,” in Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on, 2012, pp. 326 – 333. [8]K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran , “Speeding up distributed machine learning using codes,” IEEE Transactions on Information Theory, vol. 64, no. 3, pp. 1514 – 1529, 2017. [9] K. Gardner, M. Harchol-Balter, A. Scheller-Wolf, and B. Van Houdt , “A better model for job redundancy: Decoupling server slowdown and job size,” IEEE/ACM transactions on networking, vol. 25, no. 6, pp. 3353 – 3367, 2017. [10] G. Joshi, E. Soljanin, and G. Wornell , “Efficient redundancy techniques for latency reduction in cloud systems,” ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), vol. 2, no. 2, pp. 1 – 30, 2017 [11] M. F. Akta¸s and E. Soljanin , “Straggler mitigation at scale,” IEEE/ACM Transactions on Networking, vol. 27, no. 6, pp. 2266– 2279, 2019. [12] M. S. Klamkin and D. J. Newman, “Extensions of the birthday surprise,” Journal of Combinatorial Theory, vol. 3, no. 3, pp. 279– 282, 1967. [13] P. Peng, E. Soljanin , and P. Whiting, “Diversity vs. parallelism in distributed computing with redundancy.” [Online]. Available: https://emina.flywheelsites.com

  18. Various Proposed Distributions 1. What distribution has been used for job service time?  For theoretical analysis, Shifted exponential was used in e.g.[1], Pareto was used in e.g.[2], Erlang was used in e.g. [3], Bi-modal was used in e.g. [4].  In this paper, we assume the service time per CU is Shifted exponential: 𝑊~∆ + 𝑌 o Where ∆ is a constant modelling the minimum service time (deterministic component); o 𝑌~𝐹𝑦𝑞(𝑋) is an exponentially distributed random variable modelling the straggling (random component). The other distributions are analyzed in [13]. 9/19

  19. Various Proposed Scaling Models 2. How the probability distribution scales (changes) with task size?  There is no consensus on this question. Scaling the random component e.g. [5,6], scaling the deterministic component e.g. [7,8]. These papers provide the scaling for specific distributions, none of them provides a model for general distributions. The service time for each CU is 𝑊~∆ + 𝑌 . Then the task execution time 𝑍 is Scaling the random component: 𝑍 = ∆ + 𝑡 ∙ 𝑌 Scaling the deterministic component: 𝑍 = 𝑡 ∙ ∆ + 𝑌 Where 𝑡 is the number of CUs per task (task size). 9/19

  20. Prior Related Work We can classify the relative references into two categories.  Category 1: there are many papers working on designing codes for given systems, e.g. [5, 6, 8], however they do not focusing on optimizing code rate.  Category 2: there are few papers working on how much redundancy should be introduced, e.g. [9, 10, 11], however [9, 10] only focus on applying replication in queueing systems, and [11] only focuses on Pareto distribution. 10/19

  21. Task Service Time Scaling Models The service time increases with the size of task (or the number of CUs 𝑡 ). Model 1---Server-Dependent: the straggling effect depends on the server and is identical for each CU executed on that server. Here ∆ is some initial handshake time. 𝑍 = ∆ + 𝑡 ∙ 𝑌 𝑗 where 𝑌 𝑗 ~𝐹𝑦𝑞 𝑋 is the service time for a CU on server 𝑗 . Model 2---Data-Dependent: each CU in a task of 𝑡 CUs takes ∆ time to complete and there are some inherent additive system randomness at each server. 𝑍 = 𝑡 ∙ ∆ + 𝑌 𝑗 where 𝑌 𝑗 ~𝐹𝑦𝑞 𝑋 is the system randomness on server 𝑗 . Model 3---Additive: the execution times of CUs are independent and identically distributed. 𝑍 = 𝑊 1 + ⋯ + 𝑊 𝑡 where 𝑊 𝑗 ~𝑇 − 𝐹𝑦𝑞 ∆, 𝑋 is the service time for a CU on server 𝑗 . 11/19

  22. Main Results

  23. Server-Dependent Model For each worker: 𝒁 = ∆ + 𝒕 ∙ 𝒀 𝒋 where 𝒀 𝒋 ~𝑭𝒚𝒒 𝑿 , then the job completion time 𝑍 𝑜,𝑙 is the 𝑙 -th order statistics of 𝑜 workers’ task excution times Theorem 1. The expected job completion time for the server-dependent execution model is given by 𝐹[𝑍 𝑜,𝑙 ] = ∆ + 𝑡𝑋 𝐼 𝑜 − 𝐼 𝑜−𝑙 𝑜 = ∆ + 𝑋 𝑙 𝐼 𝑜 − 𝐼 𝑜−𝑙 ≥ ∆ + 𝑋 𝐹[𝑍 𝑜,𝑙 ] is minimized by replication/maximal diversity (k=1). 12/19

Recommend


More recommend