Multicore Implementation of LDPC Decoders based on ADMM Algorithm - PowerPoint PPT Presentation

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1 , Fethi TLILI 1 ,   Bertrand LE GAL 2 and Christophe JEGO 2 1 - SUP’COM, GRESCOM Lab,   University of Carthage, Tunisia 2 - Bordeaux-INP, IMS-lab., CNRS UMR 5218   University of Bordeaux, France B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

The LP decoding for LDPC codes 2 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

Introduction to LDPC codes ๏ LDPC codes are well-known Error V 0 V 1 V 2 V 3 V 4 V 5 V 6 V 7 Correction Codes working on blocs,   C 0 1 1 1 0 0 0 0 0 - K information bits; C 1 0 0 0 1 1 1 0 0   H =   - N transmitted values, C 2 1 0 0 1 0 0 1 0   C 3 0 1 0 0 1 0 0 1 - (N-K) redundant values, ๏ The LDPC code structure is defined by a H matrix, - Provides VN/CN involved in parity equations, V 0 V 1 V 2 V 3 V 4 V 5 V 6 V 7 - Visually represented as a Tanner graph. ๏ State-of-the-art works for LDPC decoding are based on MP algorithm; - Propagate message between CNs and VNs, C 0 C 1 C 2 C 3 - MP algorithm is iterative. 1 Tanner graph representation. 3 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

Related works on LDPC decoding ๏ During the last decade, lots of works focused on LDPC codes. For instance : - Find an « efficient » SPA approximation , ‣ SPA algorithm is efficient but complex to implement, ‣ MS, OMS, NMS, 2NMS, lambda-min, ANMS, etc. - Reduce computation complexity through different computation schedules, ‣ Flooding, TDMP , conditional activation, etc. - Efficient implementation of LDPC decoders, ‣ Hardware (ASIC, FPGA) for efficiency, ‣ Software (CPU & GPU) for flexibility. ๏ Linear Programming (LP) approach for LDPC decoding is a « recent » way. 4 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

LP decoding of LDPC codes ๏ Linear programming formulation of LDPC decoding problem, - First, proposed by in [ 1 ], - Huge memory & computation complexities, - Limited to very short frames (N << 200), ๏ Interesting FER performance - Especially in Error floors (Even against SPA), - ML certificate when frame is successfully decoded (not decoded otherwise). ๏ Lower complexity formulation, - Initial LP ADMM algorithm [ 2 ], Increase mainly according - Good FER performance ADMM-l2 against SPA [ 3 ], to N, N-K and deg(Ci) parameters - Reduced complexity s-ADMM-l2 [ 4 ] ๏ LP LDPC decoding is affordable for implementation purpose. [1] J. Feldman, Decoding Error-Correcting Codes via Linear Programming. PhD thesis, Massachussets Institute of Technology , 2003. 5 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

LP decoding of LDPC codes F ER for WiMAX 1152 × 288 rate 0 . 75 B LDPC code SPA SPA 10 0 ๏ Linear programming formulation of F ER for WiMAX 576 × 288 LDPC code 10 − 1 ADMM- l 2 ADMM- l 2 10 − 1 LDPC decoding, 10 − 2 - First, proposed by in [ 1 ], 10 − 2 - Huge memory & computation complexities, 10 − 3 10 − 3 - Limited to very short frames (< 200 bits), 10 − 4 10 − 4 ๏ Interesting FER performance - Even against SPA algorithm, 10 − 5 10 − 5 - ML certificate when frame is successfully decoded 10 − 6 10 − 6 (not decoded otherwise). 1 2 3 4 5 1 . 4 2 . 4 3 . 4 ๏ Lower complexity formulation, Eb/N 0 Eb/N 0 - Initial LP ADMM algorithm [ 2 ], Fig. 1 . FER comparison of ADMM- l 2 penalized decoders with SPA - Improved ADMM-l2 against SPA [ 3 ], decoders on AWGN channel. - Computation complexity reduction [ 4 ], [2] Xiaojie Zhang and Paul H.Siegel, “Efficient iterative LP decoding of LDPC codes with alternating direction method of multipliers,” IEEE International Symposium on ๏ LP LDPC decoding becomes now Information Theory (ISIT) , 2013. [3] X. Jiao, H. Wei, J. Mu, and C. Chen, “Improved ADMM penalized decoder for realistic for implementation purpose. irregular low-density parity-check codes,” IEEE Communications Letters , June 2015. [4] H. Wei, X. Jiao, and J. Mu, “Reduced-complexity linear programming decoding based on ADMM for LDPC codes,” IEEE Communications Letters , June 2015. 6 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

The ADMM decoding algorithm 7 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

Formulation of the ADMM decoding algorithm Algorithm 1 Flooding based ADMM - l 2 Algorithm. 1: Kernel 1 : Initialization ๏ The ADMM algorithm is a MP-based ∀ j ∈ J , i ∈ N c ( j ) : z (0) j → i = 0 . 5, λ (0) 2: j → i = 0 formulation of the LP problem, 3: ∀ i ∈ I : n i = γ i µ 4: for all k = 1 → q when stop criterion = false do 2 2 , - Proposed in [ 2 ] and correction improved in [ 3 ], 5: Kernel 2: For all variable nodes in the code 6: for all i ∈ I , j ∈ N v ( i ) do - Traditional flooding schedule, t ( k ) ( z ( k − 1) − λ ( k − 1) 7: = P ) - The key element is the Euclidian projection; i j → i j → i j ∈ N v ( i ) - Formulation maintains LP properties, L ( k ) µ ( t ( k ) 1 8: i → j = Π [0 , 1] ( − n i − α µ )) i d vi − 2 α 9: end for ๏ Based on 4 distinct kernels 10: Kernel 3: For all check nodes in the code 11: for all j ∈ J , i ∈ N c ( j ) do - Kernel 1, initializes the decoder; z ( k ) j → i = Π P dcj [ ρ L ( k ) i → j + (1 − ρ ) z ( k − 1) + λ ( k − 1) 12: ] j → i j → i - Kernel 2, processes all VNs; λ ( k ) j → i = λ ( k − 1) + ρ L ( k ) i → j + (1 − ρ ) z ( k − 1) − z ( k ) 13: j → i j → i j → i - Kernel 3, processes all CNs; 14: end for 15: end for - Kernel 4, takes hard decision; 16: Kernel 4 : Hard decisions from soft-values ! ๏ Kernels 2 and 3 are iterated k times 17: ∀ i ∈ I : ˆ c i = P > 0 . 5 L i → j j ∈ N v ( i ) (# iterations) [2] Xiaojie Zhang and Paul H.Siegel, “Efficient iterative LP decoding of LDPC codes with - Computation complexity is located there; alternating direction method of multipliers,” IEEE International Symposium on Information Theory (ISIT) , 2013. [3] X. Jiao, H. Wei, J. Mu, and C. Chen, “Improved ADMM penalized decoder for irregular low-density parity-check codes,” IEEE Communications Letters , June 2015. 8 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

Formulation of the ADMM decoding algorithm Algorithm 1 Flooding based ADMM - l 2 Algorithm. ๏ The ADMM algorithm has a flooding- 1: Kernel 1 : Initialization ∀ j ∈ J , i ∈ N c ( j ) : z (0) j → i = 0 . 5, λ (0) 2: j → i = 0 based formulation of the LP problem, 3: ∀ i ∈ I : n i = γ i µ - Proposed in [ 2 ] and correction improved in [ 3 ], 4: for all k = 1 → q when stop criterion = false do 2 2 , 5: Kernel 2: For all variable nodes in the code - Traditional flooding schedule, 6: for all i ∈ I , j ∈ N v ( i ) do - Based on Euclidian projection; t ( k ) ( z ( k − 1) − λ ( k − 1) 7: = P ) i j → i j → i j ∈ N v ( i ) - Formulation maintains LP properties, L ( k ) µ ( t ( k ) 1 8: i → j = Π [0 , 1] ( − n i − α µ )) i d vi − 2 α ๏ Based on 4 distinct kernels 9: end for 10: Kernel 3: For all check nodes in the code 11: for all j ∈ J , i ∈ N c ( j ) do - Kernel 1, initializes the decoder; z ( k ) j → i = Π P dcj [ ρ L ( k ) i → j + (1 − ρ ) z ( k − 1) + λ ( k − 1) 12: ] - Kernel 2, processes all j → i j → i VNs; λ ( k ) j → i = λ ( k − 1) + ρ L ( k ) i → j + (1 − ρ ) z ( k − 1) − z ( k ) 13: - Kernel 3, processes all CNs; j → i j → i j → i 14: end for - Kernel 4, takes hard decision; 15: end for 16: Kernel 4 : Hard decisions from soft-values ! ๏ Kernels 2 and 3 are iterated k times 17: ∀ i ∈ I : ˆ c i = P > 0 . 5 L i → j j ∈ N v ( i ) (# iterations) - Decoding computation complexity is located [2] Xiaojie Zhang and Paul H.Siegel, “Efficient iterative LP decoding of LDPC codes with alternating direction method of multipliers,” IEEE International Symposium on there; Information Theory (ISIT) , 2013. [3] X. Jiao, H. Wei, J. Mu, and C. Chen, “Improved ADMM penalized decoder for irregular low-density parity-check codes,” IEEE Communications Letters , June 2015. 9 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

The VN and CN computation kernels n 1 2 3 ( (λ,z) λ L , z L ) (λ,z) L 1 2 3 n Two « messages » per VN i → j + (1 − ρ ) z ( k − 1) + λ ( k − 1) ω i = ρ × L k j j One broadcasted message z = Π P dcj ( ω ) ⇣P ( λ j + z j ) − LLR i ⌘ λ k − α j → i = ω i − z i µ µ γ i = L ( k ) j → i = ( z ( k ) ) i − ( λ ( k ) deg V N − 2 α j ) i j µ 10 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

The VN and CN processing kernels n 1 2 3 ( (λ,z) λ L , z L ) (λ,z) L 1 2 3 n Two « messages » per VN i → j + (1 − ρ ) z ( k − 1) + λ ( k − 1) ω i = ρ × L k j j One broadcasted message z = Π P dcj ( ω ) ⇣P ( λ j + z j ) − LLR i ⌘ λ k − α j → i = ω i − z i µ µ γ i = L ( k ) j → i = ( z ( k ) ) i − ( λ ( k ) deg V N − 2 α j ) i j µ 11 B. Le Gal ICASSP 2016 - Implementation of Signal Processing Systems March 23, 2016

Multicore Implementation of LDPC Decoders based on ADMM Algorithm - PowerPoint PPT Presentation

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1 , Fethi TLILI 1 , Bertrand LE GAL 2 and Christophe JEGO 2 1 - SUPCOM, GRESCOM Lab, University of Carthage, Tunisia 2 - Bordeaux-INP,

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1

Design of Energy-Efficient LDPC Codes and Decoders Elsa Dupraz 16/04/2019 Section 1:

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits Salman Habib, Allison Beemer,

Low-latency software LDPC decoders for x86 multi-core devices Bertrand LE GAL and Christophe JEGO

Combinational Logic Building Blocks Chapter 6 Combinational Logic Introduction Decoders Basic

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long

- tunnel-effect ( "micro-convergence" ) for SC-LDPC [ 1 ] [ 1 ] Schmalen, ten Brink,

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

On the Factor Refinement Principle and its Implementation on Multicore Architectures Masters

COLOR CODE DECODERS FROM TORIC CODE DECODERS Aleksander Kubica work w/ N. Delfosse

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Lecture 07 Multicore Computation Lecture based on notes from John Mellor-Crummey Department of

EWG on Maritime Security Brunei Darussalam and New Zealand Co-Chairs 2014 2017 ADMM-PLUS EWG

LEDAkem: a post-quantum key encapsulation mechanism based on QC-LDPC codes Marco Baldi 1 ,

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Design of LDPC Lattice Network Codes Based on Construction D Paulo Branco Danilo Silva

Unit 7 Fundamental Digital Building Blocks: Decoders & Multiplexers 7.2 CHECKERS / DECODERS

Improved read/write cost tradeoff in DNA-based data storage using LDPC codes Shubham Chandak

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

Performance Impact of Resource Contention in Multicore Systems R. Hood, H. Jin, P. Mehrotra, J.

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

A Reaction Attack on the QC-LDPC McEliece Cryptosystem Tomas Fabsic 1 , Viliam Hromada 1 , Paul

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

Multicore Implementation of LDPC Decoders based on ADMM Algorithm - PowerPoint PPT Presentation

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1 , Fethi TLILI 1 , Bertrand LE GAL 2 and Christophe JEGO 2 1 - SUPCOM, GRESCOM Lab, University of Carthage, Tunisia 2 - Bordeaux-INP,

Multicore Implementation of LDPC Decoders based on ADMM Algorithm Imen DEBBABI 1 , Nadia KHOUJA 1

Design of Energy-Efficient LDPC Codes and Decoders Elsa Dupraz 16/04/2019 Section 1:

Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits Salman Habib, Allison Beemer,

Low-latency software LDPC decoders for x86 multi-core devices Bertrand LE GAL and Christophe JEGO

Combinational Logic Building Blocks Chapter 6 Combinational Logic Introduction Decoders Basic

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long

- tunnel-effect ( &quot;micro-convergence&quot; ) for SC-LDPC [ 1 ] [ 1 ] Schmalen, ten Brink,

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

On the Factor Refinement Principle and its Implementation on Multicore Architectures Masters

COLOR CODE DECODERS FROM TORIC CODE DECODERS Aleksander Kubica work w/ N. Delfosse

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Lecture 07 Multicore Computation Lecture based on notes from John Mellor-Crummey Department of

EWG on Maritime Security Brunei Darussalam and New Zealand Co-Chairs 2014 2017 ADMM-PLUS EWG

LEDAkem: a post-quantum key encapsulation mechanism based on QC-LDPC codes Marco Baldi 1 ,

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Design of LDPC Lattice Network Codes Based on Construction D Paulo Branco Danilo Silva

Unit 7 Fundamental Digital Building Blocks: Decoders &amp; Multiplexers 7.2 CHECKERS / DECODERS

Improved read/write cost tradeoff in DNA-based data storage using LDPC codes Shubham Chandak

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

Performance Impact of Resource Contention in Multicore Systems R. Hood, H. Jin, P. Mehrotra, J.

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

A Reaction Attack on the QC-LDPC McEliece Cryptosystem Tomas Fabsic 1 , Viliam Hromada 1 , Paul

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

- tunnel-effect ( "micro-convergence" ) for SC-LDPC [ 1 ] [ 1 ] Schmalen, ten Brink,

Unit 7 Fundamental Digital Building Blocks: Decoders & Multiplexers 7.2 CHECKERS / DECODERS