genuinely distributed byzantine machine learning
play

Genuinely Distributed Byzantine Machine Learning El-Mahdi El-Mhamdi - PowerPoint PPT Presentation

first.last@epfl.ch Genuinely Distributed Byzantine Machine Learning El-Mahdi El-Mhamdi Rachid Guerraoui Arsany Guirguis L Nguyn Hoang Sbastien Rouault Swiss Federal Institute of Technology (EPFL) August 6, 2020 The Big Picture


  1. first.last@epfl.ch Genuinely Distributed Byzantine Machine Learning El-Mahdi El-Mhamdi Rachid Guerraoui Arsany Guirguis Lê Nguyên Hoang Sébastien Rouault Swiss Federal Institute of Technology (EPFL) August 6, 2020

  2. The Big Picture Machine learning (ML) tackles critical tasks ... 1

  3. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust 1

  4. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust using Literature: robust when the model training 1

  5. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust Literature: robust when training the model 1

  6. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust Literature: robust when training the model 4y ago 1

  7. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust Literature: robust when training the model 4y ago 1

  8. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust Literature: robust when training the model 4y ago Genuinely distributed, Byzantine ML 1

  9. Machine learning (ML) Boat Goat ... 2

  10. Machine learning (ML) Boat Goat ~1 to 100 millions ... 2

  11. Machine learning (ML) Krust ZrOm ~1 to 100 millions ... 2

  12. Machine learning (ML) Brust GOrm ~1 to 100 millions ... 2

  13. Machine learning (ML) Bost GOat ~1 to 100 millions ... 2

  14. Machine learning (ML) Boat Goat ~1 to 100 millions ... 2

  15. Stochastic Gradient Descent (SGD) 4.2 0.5 1.0 0.8 Training loop: 1. Estimate gradient 5.7 0.3 2. Turn potentiometers ~1 to 100 following the gradient millions 3. Loop back to step 1. -.- 3

  16. Stochastic Gradient Descent (SGD) 4.2 Training loop: -0.5 1. Estimate gradient -1.0 0.8 2. Turn potentiometers -5.7 following the gradient 0.3 3. Loop back to step 1. 3

  17. Stochastic Gradient Descent (SGD) 4.2 Training loop: -0.5 1. Estimate gradient -1.0 0.8 2. Turn potentiometers -5.7 following the gradient 0.3 3. Loop back to step 1. 3

  18. Distributed SGD parameter server ~1 to 100 millions worker network 4

  19. Distributed SGD 4.2 4.1 -0.5 -0.5 -1.0 -1.0 0.8 0.7 -5.7 -5.7 0.4 0.3 parameter server 4.3 4.3 -0.5 -0.5 -0.9 -1.0 0.7 0.9 -5.7 -5.7 0.3 0.4 ~1 to 100 4.2 4.1 millions -0.5 -0.5 -1.0 -1.0 0.9 0.8 -5.7 -5.7 0.2 0.3 worker network 4

  20. Distributed SGD parameter server 4.2 -0.5 4.1 -1.0 -0.5 4.3 0.8 -1.0 -0.5 -5.7 0.7 -0.9 0.4 -5.7 0.7 0.3 -5.7 0.3 ~1 to 100 millions worker network 4

  21. Distributed SGD parameter server ~1 to 100 millions worker network 4

  22. Distributed, Byzantine SGD parameter server ~1 to 100 millions worker network 5

  23. Distributed, Byzantine SGD 4.2 -537 -0.5 -752 -1.0 349 0.8 412 -5.7 824 0.4 -153 parameter server 4.3 -537 -0.5 -752 -0.9 349 0.7 412 -5.7 824 0.3 -153 ~1 to 100 4.2 4.1 millions -0.5 -0.5 -1.0 -1.0 0.9 0.8 -5.7 -5.7 0.2 0.3 worker network 5

  24. Distributed, Byzantine SGD parameter server 4.2 -0.5 4.1 -1.0 -0.5 -537 0.8 -1.0 -752 -5.7 0.7 349 0.4 -5.7412 0.3 824 -153 ~1 to 100 millions worker network 5

  25. Distributed, Byzantine SGD parameter server ~1 to 100 millions worker network 5

  26. Byzantine-resilient SGD 4.2 -0.5 4.1 -537 -1.0 -752 -0.5 -537 0.8 Average 349 -1.0 ≈ -752 -5.7 0.7 412 349 0.4 -5.7412 824 -153 0.3 824 -153 6

  27. Byzantine-resilient SGD 4.2 -0.5 4.1 -537 -1.0 -752 -0.5 -537 0.8 Average 349 -1.0 ≈ -752 -5.7 0.7 412 349 0.4 -5.7412 824 -153 0.3 824 -153 MDA Median 4.2 -0.5 4.1 4.1 -1.0 -0.5 -0.5 -537 0.8 Krum -1.0 -1.0 ≈ -752 -5.7 0.7 0.7 349 0.4 -5.7412 -5.7 Bulyan 0.3 0.3 824 -153 GeoMed 6

  28. Byzantine-resilient SGD 4.2 -0.5 4.1 -537 -1.0 -752 -0.5 -537 0.8 Average 349 -1.0 ≈ -752 -5.7 0.7 412 349 0.4 -5.7412 824 -153 0.3 824 -153 4.2 -0.5 4.1 4.1 -1.0 -0.5 -0.5 -537 0.8 MDA -1.0 -1.0 ≈ -752 -5.7 0.7 0.7 349 0.4 -5.7412 -5.7 0.3 0.3 824 -153 6

  29. Problem single point of failure 7

  30. Problem… solution 7

  31. Problem… solution a n z t i y n B e s C u o s n n s e 7

  32. Problem… solution… nope a n z t i y n B e s C u o s n n s e asynchronous network 8

  33. Key problem: divergence A 1 B 2 3 C D 9

  34. Key problem: divergence A 1 B 2 3 C D 9

  35. Key problem: divergence A 1 B 2 3 C D 9

  36. Key problem: divergence A 1 B 2 3 C D 9

  37. Key problem: divergence A 1 B 2 3 C D 9

  38. Key problem: divergence A 1 B 2 3 C D 9

  39. Key problem: divergence A 1 B 2 3 C D 9

  40. Key problem: divergence A 1 B 2 3 C D 9

  41. Key problem: divergence A 1 B 2 3 C D 9

  42. Key problem: divergence A 1 B 2 3 C D 9

  43. Key problem: divergence A 1 B 2 3 C D 9

  44. Key problem: divergence A 1 B 2 3 C D 9

  45. The goal Can we keep the ~1 to 100 millions ~1 to 100 millions ~1 to 100 millions "close" to each other... ...despite network asynchrony ... ...and Byzantine behaviors? 10

  46. Key approach Can we bring the ~1 to 100 millions ~1 to 100 millions ~1 to 100 millions back closer to each other... ...despite network asynchrony ... ...and Byzantine behaviors? 11

  47. Key approach: +1 round A 1 B 2 3 C D 11

  48. Key approach: toy example 1 2 3 4 = 1-parameter model: & one 12

  49. Key approach: toy example 1 2 3 4 diameter & one 12

  50. Key approach: toy example 1 2 3 4 reduced diameter & one 12

  51. Key approach: toy example 1 1 2 2 3 3 4 4 & one 12

  52. Key approach: toy example 1 1 2 2 3 3 4 4 & one 12

  53. Key approach: toy example 1 1 2 2 3 3 4 4 & one 12

  54. Key approach: toy example 1 1 2 2 3 3 4 4 & one 12

  55. Key approach: last remark 1 1 2 2 3 3 4 4 & one 13

  56. Key approach: last remark ×2 1 1 ×2 2 2 2 ×2 3 3 3 ×2 4 4 4 & one 13

  57. Key approach: last remark ×2 1 1 ×2 2 2 2 ×2 3 3 3 ×2 4 4 4 & one 13

Recommend


More recommend