Numerically Stable Binary Gradient Coding Neophytos Charalambides - PowerPoint PPT Presentation

Numerically Stable Binary Gradient Coding Neophytos Charalambides Hessam Mahdavifar Alfred Hero Department of Electrical Engineering and Computer Science, University of Michigan June, 2020 1 / 21

Outline for section 1 Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers 2 / 21

Issues and Motivation Introduction and Motivation Machine Learning Today: Curse of Dimensionality ◮ Large Datasets — many samples ◮ Complex Datasets — large dimension ◮ Problems become intractable Use distributed methods ◮ Distribute smaller computation assignments ◮ Multiple servers complete various tasks Drawbacks of Distributed Synchronous Computations ◮ Requires all servers to respond — communication overhead ◮ What if stragglers are present? ◮ Stragglers — servers with delays or non-responsive 3 / 21

Gradient Coding 1 Introduction and Motivation 1. Speed up distributive computation — gradient methods 2. Mitigate stragglers 1 R Tandon et al. “Gradient Coding: Avoiding Stragglers in Synchronous Gradient Descent”. In: stat 1050 (2017), p. 8. 4 / 21

Benefits of our Binary Scheme Introduction and Motivation Few schemes deal with exact recovery Common issues with current exact recovery schemes 1. construct and search through a decoding matrix 1 A T ∈ R ( n s ) × n 2. storage issue, and further delay 3. work over R and C — further numerical instability 4. have a strict assumption that ( s + 1) | n Our scheme 1. faster online decoding 2. only deal with { 0 , 1 } encodings — view as “task assignments” 3. ... this makes encoding and decoding numerically stable 4. works for any pair s , n 5. ... extend our construction to work for heterogeneous workers also 5 / 21

Distributed Gradient Descent Gradient Coding i =1 � R p × R , or X ∈ R N × p ; y ∈ R N ◮ Dataset D = { ( x i , y i ) } N k � ◮ Partition D = D j , s.t. D i ∩ D j = ∅ and |D j | = N k j =1 ◮ Partial gradients g j — gradient on D j k � ◮ Minimize the loss L ( D ; θ ) = ℓ ( D j ; θ ) j =1 ◮ Gradient descent updates: θ ( t +1) = θ ( t ) − α t g ( t ) g ( t ) j � �� k � D j ; θ ( t ) � k ◮ g ( t ) = ∇ θ L � D ; θ ( t ) � � � g ( t ) = ∇ θ ℓ = j j =1 j =1 ◮ additive structure allows g ( t ) to be computed in parallel ! 7 / 21

Synchronous Distributed Computation Gradient Coding ◮ Execute gradient descent distributively ◮ Need all workers to respond Figure: Need all responses — g = g 1 + g 2 + g 3 8 / 21

Table of Contents Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers 9 / 21

General Setup Problem Setup 10 / 21

Encoding matrix Problem Setup ◮ Rows: workers { W i } n i =1 ◮ b i = encoding vector for W i ◮ Columns: partitions {D j } k i =1 1. nonzero entries: assigned partitions 2. redundancy in assigned D j ’s ◮ Stragglers ≡ erasing rows of B 11 / 21

Table of Contents Introduction and Motivation Gradient Coding Problem Setup Binary Scheme Allocation to Heterogeneous Workers 12 / 21

Example of our Binary Scheme Binary Scheme n = k = 11 , s = 3 = ⇒ r ≡ 3 mod ( s + 1) r workers for B 1 , and ( s + 1 − r ) for B 2   1 1 1 1 1 1 1 1     1 1 1 1     1 1 1 1     ∈ { 0 , 1 } 9 × 11 B 1 = 1 1 1 1     1 1 1 1     1 1 1     1 1 1   1 1 1 � � 1 1 1 1 1 1 ∈ { 0 , 1 } 2 × 11 B 2 = 1 1 1 1 1 13 / 21

Example — Encoding and Decoding Binary Scheme Decoding : only take received workers of same color a T Example : { 2 , 6 , 10 } B = 1 11 × 1             0 0 0 1 1 1 1 1       0 0 0 1 1 1 1 1                             0 0 0 1 1 1 1 1                             1 1 1 1 1 1 0 0 0 1                             0 0 0 1 1 1 1 1                         B = a I ∈ 0 , , 0 , 0 1 1 1 1 1                       0 0 0 1 1 1 1 1                             1 1 1 1 1 0 0 0 1                             1 1 1 1 0 0 0                             0 0 0 1 1 1 1                   0 0 0 1 1 1 1 14 / 21

Main Idea of Our Binary Scheme Binary Scheme ◮ Have B as sparse as possible = ⇒ nnzr( B ) = k · ( s + 1) ◮ Work with congruence classes (mod s + 1) ◮ superposition of rows of each class results in 1 1 × k ◮ Allocate tasks s.t. � b i � 0 ≃ � b j � 0 for all i , j ∈ { 1 , · · · , n } , while satisfying the above two constraints ◮ Formally, construct B that is a solution to n � � � � � � � b i � 0 − ( s +1) · k / n s.t. nnzr( B ) = k · ( s +1) min � B ∈ N n × k i =1 0 ◮ Intuition : B is close to being block diagonal 15 / 21

Construction and Decoding Binary Scheme ◮ Congruence classes C 1 = { [ i ] } r − 1 i =0 and C 2 = { [ i ] } s i = r : 1. r ≡ n mod ( s + 1) 2. respectively identically 3. within each C 1 , C 2 , cardinalities do not differ by more than one 4. construct B 1 and B 2 ◮ B = aggregation of B 1 and B 2 ◮ Decoding : By the pigeonhole principle , for any f workers, at least one complete residue system is present 16 / 21

⇒ r = 5 Larger Example: n = k = 165 and s = 15 = Binary Scheme Do not want a lot of redundancy — close to block diagonal 17 / 21

Setup a Linear System Allocation to Heterogeneous workers ◮ Assume two groups of different machines T 1 , T 2 , s.t. : t i = E [time for T i to compute g j ] and t 1 � t 2 ◮ Goal : Want same expectation time for each worker ◮ Let |J T i | = # of partitions allocated to T i ’s workers ◮ Let |T i | = τ i and τ 1 = α β · τ 2 Solve the linear system: 1. t 1 · |J T 1 | = t 2 · |J T 2 | 2. |J T 1 | · τ 1 + |J T 2 | · τ 2 = ( s + 1) · k 3. τ 2 = β α · τ 1 19 / 21

Main Takeaways of Our Scheme ◮ Gave a simple gradient coding scheme ◮ Faster online decoding ◮ Numerically stable in encoding and decoding ◮ Works for any pair s , n ◮ Extended it to accommodate heterogeneous workers also 20 / 21

Thank you for your attention! �

Outline for section 4 Additional Slides Details of the constructions Explicit Algorithms 22 / 21

Idea Behind Binary Scheme Details of the constructions ◮ When ( s + 1) | n and k = n — B is block diagonal � � ◮ assign to each worker ℓ = n partitions in a repeated sense s +1 ◮ For ( s + 1) ∤ n , each worker in blocks of ( s + 1) rows corresponds to a distinct congruence class (c.c.) mod( s + 1) ◮ When any f workers send their computations, at least one congruence class is met in every block — pigeonhole � � ◮ ∃ i ∈ Z / ( s + 1) s.t. i + j ( s + 1) ∈ I , for all j = 0 , 1 , · · · , ℓ − 1 ◮ there received workers “always form a coset” ◮ Decoding: select any such i , and sum the vectors received by the ℓ − 1 a T = � workers of the c.c. i — e i + j ( s +1) j =0 ◮ Want “even” number of assignments — homogeneous servers 23 / 21

Binary Scheme when ( s + 1) ∤ n Details of the constructions ◮ Determine the integer parameters ◮ n = ℓ · ( s + 1) + r 0 ≤ r < s + 1 ◮ r = t · ℓ + q 0 ≤ q < ℓ ◮ n = λ · ( ℓ + 1) + ˜ 0 ≤ ˜ r < ℓ + 1 r ◮ Define: C 1 := { [ i ] s +1 } r − 1 C 2 := { [ i ] s +1 } s and i =0 i = r ◮ workers C 1 lie in all ( ℓ + 1) blocks, and C 1 lie in first ℓ ◮ C 1 load: { s + 1 , s } if ℓ + r > s , o.w. { λ + 1 , λ } ◮ C 2 load: { s + t + 2 , s + t + 1 } if q > 0, o.w. all have s + t + 1 24 / 21

Numerically Stable Binary Gradient Coding Neophytos Charalambides - PowerPoint PPT Presentation

Numerically Stable Binary Gradient Coding Neophytos Charalambides Hessam Mahdavifar Alfred Hero Department of Electrical Engineering and Computer Science, University of Michigan June, 2020 1 / 21 Outline for section 1 Introduction and

Solving Numerically a Problem Modelling Cancer Therapy Maria Emilia Castillo 26, May 2010

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

Numerically Solving the Coupled Motion of Fluid and Contained Elastic Objects Elijah Newren

Stable Marriage Problem Stable Marriage Problem Small town with n boys and n girls. Stable

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Numerically Stable Parallel Computation of (Co-)Variance Erich Schubert, Michael Gertz 30th Int.

MPI Internals Advanced Parallel Programming Stephen Booth David Henty EPCC Dan Holmes

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Munguia, Geoffrey M. Oxberry, Deepak

Parallel & Distributed Real-Time Systems Lecture #12 Professor Jan Jonsson Department of

Partitioned Successive-Cancellation List Decoding of Polar Codes Seyyed Ali Hashemi , Alexios

AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1 October 4 th , 2012 1 Department of Computer

Interaction Testing Chapter 15 Interaction faults and failures Subtle Difficult to detect

Remote ImageJ - Running macros on a distant machine Volker Bcker Montpellier RIO Imaging

Regular Fabrics for Retiming & Regular Fabrics for Retiming & Pipelining over Global

Sambuz

Useful Links

Newsletter

Mail Us

Numerically Stable Binary Gradient Coding Neophytos Charalambides - PowerPoint PPT Presentation

Numerically Stable Binary Gradient Coding Neophytos Charalambides Hessam Mahdavifar Alfred Hero Department of Electrical Engineering and Computer Science, University of Michigan June, 2020 1 / 21 Outline for section 1 Introduction and

Solving Numerically a Problem Modelling Cancer Therapy Maria Emilia Castillo 26, May 2010

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

Numerically Solving the Coupled Motion of Fluid and Contained Elastic Objects Elijah Newren

Stable Marriage Problem Stable Marriage Problem Small town with n boys and n girls. Stable

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Numerically Stable Parallel Computation of (Co-)Variance Erich Schubert, Michael Gertz 30th Int.

MPI Internals Advanced Parallel Programming Stephen Booth David Henty EPCC Dan Holmes

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Munguia, Geoffrey M. Oxberry, Deepak

Parallel &amp; Distributed Real-Time Systems Lecture #12 Professor Jan Jonsson Department of

Partitioned Successive-Cancellation List Decoding of Polar Codes Seyyed Ali Hashemi , Alexios

AN O/S PERSPECTIVE ON NETWORKS Adem Efe Gencer 1 October 4 th , 2012 1 Department of Computer

Interaction Testing Chapter 15 Interaction faults and failures Subtle Difficult to detect

Remote ImageJ - Running macros on a distant machine Volker Bcker Montpellier RIO Imaging

Regular Fabrics for Retiming &amp; Regular Fabrics for Retiming &amp; Pipelining over Global

Sambuz

Useful Links

Newsletter

Mail Us

Parallel & Distributed Real-Time Systems Lecture #12 Professor Jan Jonsson Department of

Regular Fabrics for Retiming & Regular Fabrics for Retiming & Pipelining over Global