Coding for Distributed Computing Albin Severinson , Alexandre Graell - PowerPoint PPT Presentation

Coding for Distributed Computing Albin Severinson †‡ , Alexandre Graell i Amat † , and Eirik Rosnes ‡ † Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden ‡ University of Bergen/Simula Research Lab, Bergen, Norway Finse, May 09, 2018

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Motivation Master . . . Communication Bus . . . Server S 1 Server S 2 Server S K Challenges • Straggler problem: May induce a large computational delay. • Bandwidth scarcity: Need to reduce the communication load. Problem addressed: Matrix multiplication • Given an m × n matrix A and N vectors x 1 , . . . , x N , we want to compute y 1 = Ax 1 , . . . , y N = Ax N using K servers. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 1 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Bandwidth Scarcity (Coded MapReduce, Li et al. , 2015) y 1 = Ax 1 , y 2 = Ax 2 , y 3 = Ax 3 Server S 1 Has: A 1 A 1 x 1 A 1 x 2 A 1 x 3 A 3 x 1 A 3 x 2 A 3 x 3 A = A 2 Needs: A 2 x 1 A 3 A 2 x 1 A 3 x 2 ⊕ A 1 x 3 Has: Has: A 2 x 1 A 2 x 2 A 2 x 3 A 3 x 1 A 3 x 2 A 3 x 3 A 1 x 1 A 1 x 2 A 1 x 3 A 2 x 1 A 2 x 2 A 2 x 3 Needs: Needs: A 3 x 2 A 1 x 3 Server S 2 Server S 3 Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 2 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... The straggler problem (Speeding up Distributed Machine Learning Using Codes, Lee et al. , 2016) Server S 1 Server S 2 Server S 3 Time Task 1 completed Task 2 completed Task 3 completed Server S 1 Server S 2 Server S 3 Time Task 1 completed Task 2 completed Task 3 completed Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 3 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... The Straggler Problem y = Ax A 1 × x A 1 Server S 1 A = A 2 × x Ax A 2 Decoding Server S 2 A 1 + A 2 × x A 1 + A 2 Server S 3 In general • Introduce redundancy by encoding the input matrix A . • Each server is given more work. However, this may still lower the computational delay! Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 4 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Coding for distributed computing • [Lee et al. ’17]: Introduce redundant computations using MDS codes to alleviate the straggler problem. • [Li, Maddah-Ali, Avestimehr ’17]: A fundamental tradeoff between computational delay and communication load. A unified coding framework trading higher computational delay for lower communication load. 1 0 . 9 commun. load 0 . 8 0 . 7 0 . 6 0 . 5 0 . 4 0 . 3 0 . 2 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 computational delay Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 5 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Unified coding framework [Li, Maddah-Ali, Avestimehr ’17] • Encode the columns of A ∈ F m × n using an ( r, m ) MDS code by multiplying A by an r × n encoding matrix Ψ MDS , i.e., C = Ψ MDS A . • Code length r proportional to number of rows of A → high overall delay! 0 . 7 0 . 6 load w/o encoding&decoding 0 . 5 w/ encoding&decoding 6 delay 4 2 10 1 10 2 number of servers ( K ) • A with n = 10000 columns and m = 2000 K/ 3 rows, N = 2000 K/ 3 vectors, and code rate 2 / 3 ( 2000 rows assigned to each server). Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 6 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... In this talk Two coding schemes to reduce the overall computational delay • Block-diagonal coding scheme, based on a block-diagonal encoding matrix and shorter MDS codes. • LT code-based scheme under inactivation decoding. Outcome • Block-diagonal coding scheme: Significantly lower overall computational delay than the scheme by [Li, Maddah-Ali, Avestimehr ’17] with no or little impact on communication load. • LT code-based scheme: Very good performance when requiring to meet a deadline with high probability, at the expense of an increased communication load. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 7 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Block-diagonal coding scheme Idea • Partition A into T disjoint submatrices and apply smaller MDS codes to each submatrix,   ψ 1 � r T , m � ... C = Ψ BDC A , Ψ BDC =  , ψ i : MDS code .   T  ψ T ψ 1 ψ 1 A 1 A 1 Ψ BDC A = ψ 2 A 2 ψ 2 A 2 = A 3 ψ 3 ψ 3 A 3 r × m m × n r × n • Need any m/T out of r/T rows from each partition to decode. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 8 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Assignment of coded rows to servers Optimization Solver Server S 1 ψ 1 A 1 Server S 2 ψ 2 A 2 Assignment Strategy . . . ψ 3 A 3 Server S K • Need to assign coded rows to servers very carefully in some instances (such as when the number of servers is small). • This assignment can be formulated as an optimization problem. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 9 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Lossless partitioning Theorem For T ≤ r/ � K � , there exists an assignment matrix such that the µq communication load and the computational delay (not taking encoding/decoding delay into account) are equal to those of the unpartitioned scheme by [Li, Maddah-Ali, Avestimehr ’17]. However... The overall computational delay of the block-diagonal coding scheme is much lower than that of the scheme by Li et al. due to its lower encoding and decoding complexity. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 10 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Luby-transform code-based scheme LT code-based scheme • Encode A as C = Ψ LT A ; Ψ LT corresponds to an LT code of fixed rate. • Decode the LT code using inactivation decoding. Code design • Design the LT code for a minimum overhead ǫ min and a target failure probability P f , target , such that P f ( ǫ min ) ≤ P f , target . • Increasing ǫ min leads to lower encoding/decoding complexity but increased communication load and may require waiting for more servers − → optimal ǫ min depends on the scenario. • For a given ǫ min and P f , target , optimize the LT code so that the decoding complexity is minimized: for a fixed computational delay of Cx 1 , . . . , Cx N , minimize the computational delay of the decoding phase. Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 11 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Computational delay and communication load 1 0 . 9 0 . 8 load 0 . 7 0 . 6 0 . 5 0 . 4 5 4 delay Unified [Li et al. ] BDC 3 LT [Lee et al. ] 2 1 10 1 10 2 number of servers ( K ) • A with n = 10000 columns and m = 2000 K/ 3 rows. N = 2000 K/ 3 vectors. Rate 2 / 3 , i.e., 2000 rows assigned to each server and m/T = 10 1 rows per partition. 0 . 9 Coding for Distributed Computing 0 . 8 | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 12 / 16 load 0 7

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Performance as a function of the number of partitions 1 . 0 0 . 9 0 . 8 load Unified [Li et al. ] BDC 0 . 7 LT [Lee et al. ] 0 . 6 0 . 5 5 4 delay 3 2 1 10 1 10 2 10 3 number of partitions ( T ) • A with m = 6000 rows and n = 6000 columns, N = 6 vectors, K = 9 servers, and code rate 2 / 3 . Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 13 / 16

Introduction Block-diagonal coding LT code-based scheme Numerical results Conclusion One More Thing... Distributed computing under a deadline 10 0 10 − 2 10 − 4 Pr(delay > t) 10 − 6 10 − 8 Uncoded 10 − 10 Unified [Li et al. ] BDC 10 − 12 LT 10 − 14 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 1 . 6 t · 10 4 • A with m = 134000 rows and n = 10000 columns, N = 134000 vectors, K = 201 servers, T = 13400 partitions, and code rate 2 / 3 . Coding for Distributed Computing | Albin Severinson, Alexandre Graell i Amat, and Eirik Rosnes 14 / 16

Coding for Distributed Computing Albin Severinson , Alexandre Graell - PowerPoint PPT Presentation

Coding for Distributed Computing Albin Severinson , Alexandre Graell i Amat , and Eirik Rosnes Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden University of Bergen/Simula Research

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Coding and Applications in Sensor Networks Why coding? Information compression

CODING: ICD-10 CODING & UB-04 CODING FOR PDPM NELIA ADACI RN, BSN CDONA, DNS-CT, RAC-CTA

Lecture 5 Lossless Coding (II) May 20, 2009 Shujun LI ( ): INF-10845-20091 Multimedia

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Image and Video Coding: Hybrid Video Coding s n 1 [ x , y ] s n [ x , y ] m k = ( m x , m

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

Transform Coding - Overview Principle of block-wise transform coding Properties of orthonormal

Tech Day: Universal Acceptance Mark van rek Universal Acceptance Todays Objectives

In the name of Allah the compassionate, the merciful Digital Video Systems S. Kasaei S. Kasaei

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models Jamie Ryan Kiros,

Generative Auto Encoder Yongdai Kim, Dongha Kim and Jaesung Hwang Speaker : Dongha Kim

+ Dean Copsey University of California at Davis With Mark Oskin (UW), Fred Chong (Davis), Isaac

Conceptual Schema Transformation in Ontology-based Data Access D. Calvanese 1 , T. E. Kalayci 2,1

Automatic Structures and Rank Cornell REU Group 2 Summer 2009 1 Important Questions 1. Is

Why H.264? End-to-end protocol H.264 Luma Predictor Better compression Maxine Lee, Alex

Coding for Distributed Computing Albin Severinson , Alexandre Graell - PowerPoint PPT Presentation

Coding for Distributed Computing Albin Severinson , Alexandre Graell i Amat , and Eirik Rosnes Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden University of Bergen/Simula Research

Formal Modeling in Cognitive Science 1 Coding Theorems Lecture 28: Kraft Inequality; Source Coding

Image and Video Coding: Video Coding Extensions Screen Content Coding Screen Content Coding

ADVANCED MULTIMEDIA ADVANCED MULTIMEDIA CODING CODING Fernando Pereira Instituto Superior

Dynamical systems Expanding maps on the circle. Coding Jana Rodriguez Hertz ICTP 2018 coding

Risk-Based Coding and Reimbursement What is Risk-Based Coding? Risk-Based Coding Overview A

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

Coding and Applications in Sensor Networks Coding and Applications in Sensor Networks Why coding?

Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding

Coding and Applications in Sensor Networks Why coding? Information compression

CODING: ICD-10 CODING &amp; UB-04 CODING FOR PDPM NELIA ADACI RN, BSN CDONA, DNS-CT, RAC-CTA

Lecture 5 Lossless Coding (II) May 20, 2009 Shujun LI ( ): INF-10845-20091 Multimedia

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

Speech &amp; Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen

Image and Video Coding: Hybrid Video Coding s n 1 [ x , y ] s n [ x , y ] m k = ( m x , m

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

Transform Coding - Overview Principle of block-wise transform coding Properties of orthonormal

Tech Day: Universal Acceptance Mark van rek Universal Acceptance Todays Objectives

In the name of Allah the compassionate, the merciful Digital Video Systems S. Kasaei S. Kasaei

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models Jamie Ryan Kiros,

Generative Auto Encoder Yongdai Kim, Dongha Kim and Jaesung Hwang Speaker : Dongha Kim

+ Dean Copsey University of California at Davis With Mark Oskin (UW), Fred Chong (Davis), Isaac

Conceptual Schema Transformation in Ontology-based Data Access D. Calvanese 1 , T. E. Kalayci 2,1

Automatic Structures and Rank Cornell REU Group 2 Summer 2009 1 Important Questions 1. Is

Why H.264? End-to-end protocol H.264 Luma Predictor Better compression Maxine Lee, Alex

CODING: ICD-10 CODING & UB-04 CODING FOR PDPM NELIA ADACI RN, BSN CDONA, DNS-CT, RAC-CTA

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jrgen