Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits Salman Habib, Allison Beemer, and J¨ org Kliewer The Center for Wireless Information Processing, New Jersey Institute of Technology June 2020 IEEE International Symposium on Information Theory
Background: Reinforcement Learning (RL) • RL is a framework for learning sequential decision making tasks [Sutton, 84], [Sutton, Barto, 18] 1 / 16
Background: Reinforcement Learning (RL) • RL is a framework for learning sequential decision making tasks [Sutton, 84], [Sutton, Barto, 18] • Typical applications include robotics, resource management in computer clusters, video games, etc. 1 / 16
Background: The MAB Problem [Jeremy Zhang: Reinforcement Learning — Multi-Arm Bandit Implementation] • The MAB problem refers to a special RL task 2 / 16
Background: The MAB Problem [Jeremy Zhang: Reinforcement Learning — Multi-Arm Bandit Implementation] • The MAB problem refers to a special RL task • A gambler (learner) has to decide which arm of a multi-armed slot machine to pull next with the goal of achieving the highest total reward in a sequence of pulls [Gittins, 79] 2 / 16
Background: LDPC Decoding CNs VNs • The traditional flooding scheme first updates all the check nodes (CNs) and then all the variable nodes (VNs) in the same iteration 3 / 16
Background: LDPC Decoding CNs VNs • The traditional flooding scheme first updates all the check nodes (CNs) and then all the variable nodes (VNs) in the same iteration • In comparison, sequential decoding schemes update a single node per iteration and converges faster than flooding [Kfir, Kanter, 03] 3 / 16
Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration 4 / 16
Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration • Node-wise scheduling (NS) uses CN residual r m a → v � | m ′ a → v − m a → v | as scheduling criterion [Casado et al., 10] 4 / 16
Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration • Node-wise scheduling (NS) uses CN residual r m a → v � | m ′ a → v − m a → v | as scheduling criterion [Casado et al., 10] • The higher the residual, the less reliable the message and hence propagating it first leads to faster decoder convergence 4 / 16
Background: Sequential Scheduling CNs VNs iteration 1 iteration 2 • Sequential LDPC decoding: only one CN (and its neighboring VNs) is scheduled per iteration • Node-wise scheduling (NS) uses CN residual r m a → v � | m ′ a → v − m a → v | as scheduling criterion [Casado et al., 10] • The higher the residual, the less reliable the message and hence propagating it first leads to faster decoder convergence • Disadvantage: residual calculation makes NS more complex than flooding scheme for the same total messages propagated 4 / 16
Motivation • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17] Deep learning for improved decoding of linear codes 5 / 16
Motivation • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17] Deep learning for improved decoding of linear codes • [Carpi, Hager, Martalo, Raheli, and Pfister, 19] Deep-RL for channel coding based on hard-decision decoding 5 / 16
Motivation • [Nachmani, Marciano, Lugosch, Gross, Burshtein, Be’ery 17] Deep learning for improved decoding of linear codes • [Carpi, Hager, Martalo, Raheli, and Pfister, 19] Deep-RL for channel coding based on hard-decision decoding In this work: MAB-based sequential CN scheduling (MAB-NS) scheme for soft-decoding of LDPC codes • Obviates real-time calculation of CN residuals • Utilizes a novel clustering scheme to significantly reduce the learning complexity induced by soft-decoding 5 / 16
The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) 6 / 16
The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) • Action A ℓ denotes the index of a scheduled CN out of m CNs (arms) in iteration ℓ 6 / 16
The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) • Action A ℓ denotes the index of a scheduled CN out of m CNs (arms) in iteration ℓ • A quantized syndrome vector S ℓ = [ S ( 0 ) , . . . , S ( m − 1 ) ] represents ℓ ℓ the state of the MDP in iteration ℓ 6 / 16
The Proposed MAB Framework CNs VNs • NS scheme is modeled as a finite Markov decision process (MDP) • Action A ℓ denotes the index of a scheduled CN out of m CNs (arms) in iteration ℓ • A quantized syndrome vector S ℓ = [ S ( 0 ) , . . . , S ( m − 1 ) ] represents ℓ ℓ the state of the MDP in iteration ℓ • Decision-making process leads to a future state s ′ ∈ S ( M ) and reward R a = max v ∈N ( a ) r m a → v that relies on s and a 6 / 16
Solving the MAB Problem • Compute an action-value called Gittins indices (GIs), where all CNs are assumed to be independent [Gittins, 79] 7 / 16
Solving the MAB Problem • Compute an action-value called Gittins indices (GIs), where all CNs are assumed to be independent [Gittins, 79] • Utilize Q-learning , a Monte Carlo approach for estimating the action-value of a CN [Watkins, 89] 7 / 16
Solving the MAB Problem • Compute an action-value called Gittins indices (GIs), where all CNs are assumed to be independent [Gittins, 79] • Utilize Q-learning , a Monte Carlo approach for estimating the action-value of a CN [Watkins, 89] • Learning complexity in this method grows exponentially with the number of CNs • Solution: group CNs as clusters with separate state and action spaces 7 / 16
The MAB-NS Algorithm Input : L , H Output: reconstructed codeword 1 Initialization: • L is a vector of ℓ ← 0 2 m c → v ← 0 // for all CN to VN messages 3 log-likelihood ratios m v i → c ← L i // for all VN to CN messages 4 ˆ L ℓ ← L • H is the parity-check 5 ˆ S ℓ ← H ˆ L ℓ 6 matrix of an LDPC code 7 foreach a ∈ [[ m ]] do s ( a ) s ( a ) ← g M (ˆ ℓ ) // M -level quantization 8 ℓ 9 end // decoding starts 10 if stopping condition not satisfied or ℓ < ℓ max then s ← index of S ℓ 11 update CN a according to an optimum scheduling policy 12 foreach v k ∈ N ( a ) do 13 compute and propagate m a → v k 14 foreach c j ∈ N ( v k ) \ a do 15 compute and propagate m v k → c j 16 end 17 L ( k ) ˆ ← � c ∈N ( v k ) m c → v k + L k // update LLR of v k 18 ℓ end 19 foreach CN j that is a neighbor of v k ∈ N ( a ) do 20 s ( j ) L ( i ) v i ∈N ( j ) ˆ ˆ ← � 21 ℓ ℓ s ( j ) s ( j ) ← g M (ˆ ℓ ) // update syndrome S ℓ 22 ℓ end 23 ℓ ← ℓ + 1 // update iteration 24 25 end 8 / 16
The MAB-NS Algorithm Input : L , H Output: reconstructed codeword 1 Initialization: • L is a vector of ℓ ← 0 2 m c → v ← 0 // for all CN to VN messages 3 log-likelihood ratios m v i → c ← L i // for all VN to CN messages 4 ˆ L ℓ ← L • H is the parity-check 5 ˆ S ℓ ← H ˆ L ℓ 6 matrix of an LDPC code 7 foreach a ∈ [[ m ]] do s ( a ) s ( a ) ← g M (ˆ ℓ ) // M -level quantization 8 ℓ • Steps 10-25 represent NS 9 end // decoding starts (no residuals computed) 10 if stopping condition not satisfied or ℓ < ℓ max then s ← index of S ℓ 11 update CN a according to an optimum scheduling policy 12 foreach v k ∈ N ( a ) do 13 compute and propagate m a → v k 14 foreach c j ∈ N ( v k ) \ a do 15 compute and propagate m v k → c j 16 end 17 L ( k ) ˆ ← � c ∈N ( v k ) m c → v k + L k // update LLR of v k 18 ℓ end 19 foreach CN j that is a neighbor of v k ∈ N ( a ) do 20 s ( j ) L ( i ) v i ∈N ( j ) ˆ ˆ ← � 21 ℓ ℓ s ( j ) s ( j ) ← g M (ˆ ℓ ) // update syndrome S ℓ 22 ℓ end 23 ℓ ← ℓ + 1 // update iteration 24 25 end 8 / 16
The MAB-NS Algorithm Input : L , H Output: reconstructed codeword 1 Initialization: • L is a vector of ℓ ← 0 2 m c → v ← 0 // for all CN to VN messages 3 log-likelihood ratios m v i → c ← L i // for all VN to CN messages 4 ˆ L ℓ ← L • H is the parity-check 5 ˆ S ℓ ← H ˆ L ℓ 6 matrix of an LDPC code 7 foreach a ∈ [[ m ]] do s ( a ) s ( a ) ← g M (ˆ ℓ ) // M -level quantization 8 ℓ • Steps 10-25 represent NS 9 end // decoding starts (no residuals computed) 10 if stopping condition not satisfied or ℓ < ℓ max then s ← index of S ℓ 11 • An optimized CN update CN a according to an optimum scheduling policy 12 foreach v k ∈ N ( a ) do 13 scheduling policy invoked compute and propagate m a → v k 14 foreach c j ∈ N ( v k ) \ a do 15 in Step 12 compute and propagate m v k → c j 16 end 17 • Learned by solving the L ( k ) ˆ ← � c ∈N ( v k ) m c → v k + L k // update LLR of v k 18 ℓ end MAB problem 19 foreach CN j that is a neighbor of v k ∈ N ( a ) do 20 s ( j ) L ( i ) v i ∈N ( j ) ˆ ˆ ← � 21 ℓ ℓ s ( j ) s ( j ) ← g M (ˆ ℓ ) // update syndrome S ℓ 22 ℓ end 23 ℓ ← ℓ + 1 // update iteration 24 25 end 8 / 16
Recommend
More recommend