An Efficient, Portable and Generic Library for Successive Cancellation Decoding of Polar Codes Adrien Cassagne 1 , 2 Bertrand Le Gal 1 Camille Leroux 1 Olivier Aumage 2 Denis Barthou 2 1 IMS, Univ. Bordeaux, INP, France 2 Inria / Labri, Univ. Bordeaux, INP, France LCPC, September 2015 A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 1 / 20
Context: Error Correction Codes (ECC) Algorithm that enable reliable delivery of digital data Redundancy for error correction Usually implemented in hardware Growing interest for software implementation End-user utilization (low power consumption processors) Less expensive production than dedicated hardware chips Algorithms validation (typically Monte-Carlo HPC simulations) Require performance Focus on the decoder (most time consuming part) Decoder Source Encoder Channel Sink Transmitter Comm. Chan. Receiver The communication chain A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 2 / 20
Polar Codes as a New Class of ECC Explored for upcoming 5G Mobile Phones (Huawei 1 ) Redundancy: adding bits at fixed positions (value is always 0) Frozen bits Information bits Information bits 1 0 1 0 0 0 0 1 0 0 1 0 Enc. process K = 4 N = 8 Example of Polar Code (number of info. bits K = 4, frame size N = 8) Rate R = N / K : frame size / information bits ratio 1 http://www.huawei.com/minisite/has2015/img/5g_radio_whitepaper.pdf A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 3 / 20
The Successive Cancellation (SC) Algorithm Depth-first binary tree traversal/search algorithm 3 key functions: � λ c = f ( λ a , λ b ) = sign ( λ a .λ b ) . min( | λ a | , | λ b | ) = g ( λ a , λ b , s ) = (1 − 2 s ) λ a + λ b λ c ( s c , s d ) = h ( s a , s b ) = ( s a ⊕ s b , s b ) . Depth 0 λ d h 1 f 1 3 2 g 2 λ d +1 λ d +1 3 4 Per-node downward and upward Data layout representation computations A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 4 / 20
Polar Decoding Tree h() ? f() g() h() h() ? ? f() g() f() g() h() h() h() h() ? ? ? ? f() g() f() g() f() g() f() g() Position of the frozen and information bits in the tree Same specialized tree for each frame Frames are independent A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 5 / 20
A Wide Optimization Space Simplification of the computations (tree pruning, rewriting rules) Vectorization of the node functions ( f , g , h ) Optimization on the decoder binary size Implementation of low level kernels: various instruction sets (SSE, AVX, NEON, etc.) A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 6 / 20
Example: Application of the Rewriting Rules xor() xor() ? ? cut branch f() gr() f() gr() Rep SPC Rep SPC �� �� ���� ���� ��� ��� ����� ����� �� �� ���� ���� ��� ��� ����� ����� �� �� �� �� Rate 0 Rate 1 �� �� �� �� �� �� �� �� Rewriting rules applied to a N = 8 and K = 4 frame Rewriting rules are applied recursively Repeated application of this rules lead to a simplified tree A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 7 / 20
Rewriting Rules and Tree Pruning h() rep() spc() Any node ? �� �� ���� ���� level >= 2 ? ? ? ? �� �� �� �� ���� ���� �� �� �� �� Rate 0 Rate 0, leaf childrens Rate 1, leaf childrens Rep., leaf children Level 2+ SPC Rate 1 h() rep() spc() �� �� ���� ���� ? ? ? ? ��� ��� ���� ���� �� �� ���� ���� ��� ��� ���� ���� �� �� ��� ��� ���� ���� Rep �� �� Repetition Rate 0 Rate 1 Repetition SPC ���� ���� SPC xor0() xor() xor() xor() ���� ���� Single Parity Check ? ? ? ? ��� ��� ? ��� ��� ? ? ? g0() f() g1() f() gr() f() g() ��� ��� ��� ��� ? ? ? ? ? ? ? ? ? ? ��� ��� ��� ��� Leaf Rate 0, left only Rate 1, left only Repetition, left only Standard Case Sub-tree rewriting rules and tree pruning for processing specialization A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 8 / 20
Vectorization Intra frame SIMD strategy: Inter frame SIMD strategy: n data n data frame 0 f frame 1 n frames f f f f f frame 2 f frame 3 f frame 0 frame 1 frame 2 frame 3 A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 9 / 20
P-EDGE: a Dedicated Framework for Polar Codes Features 1 Code generation Flattening recursive calls Rewriting rules Generation of templated C++ 2 C++ specialization Loop unrolling Data types SIMD A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 10 / 20
P-EDGE: Code Generation Example xor() xor() ? ? cut branch f() gr() f() gr() Rep SPC Rep SPC �� �� ���� ���� ��� ��� ����� ����� �� �� ���� ���� ��� ��� ����� ����� �� �� �� �� Rate 0 Rate 1 �� �� �� �� �� �� �� �� Generated code for N = 8 and K = 4 1 void Generated_SC_decoder_N8_K4 :: decode () 2 { 3 // ------------template args --------------- -std args - 4 // -types - --funcs -- ----offsets ---- -size - --buffs --- 5 f < R , F , FI , 0 , 4 , 8 , 4 > :: apply ( l ); 6 rep < B , R , H , HI , 8 , 0 , 4 > :: apply ( l , s ); 7 gr < B , R , G , GI , 0 , 4 , 0 , 8 , 4 > :: apply ( l , s ); 8 spc < B , R , H , HI , 8 , 4 , 4 > :: apply ( l , s ); 9 xo < B , X , XI , 0 , 4 , 0 , 4 > :: apply ( s ); 10 } A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 11 / 20
Reducing the L1I Cache Occupancy Flattening may generate large binaries Binary size grows with the frame size Performance slowdown when the binary exceeds the L1I cache Moving offsets from template to function arguments Help the compiler to factorize many function calls A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 12 / 20
Sub-tree Folding Technique Legend Default N0 xo() f() g() Rate 0 left N1 xo() N22 xo() f() g() f() g() Rate 0 N2 xo0() N11 xo() N23 xo() N34 xo() g0() f() g() f() g() f() g() Rate 1 N3 N4 xo0() N12 xo0() N17 xo() N24 xo() N29 xo() N35 xo() N42 h() g0() g0() f() g() f() gr() f() g() f() g() Rep left N5 N6 xo0() N13 N14 xo() N18 xo() N21 spc() N25 rep() N26 xo() N30 xo() N33 h() N36 xo() N41 h() g0() f() gr() f() gr() f() gr() f() gr() f() g() Rep N7 N8 xo0() N15 rep() N16 spc() N19 rep() N20 spc() N27 rep() N28 spc() N31 rep() N32 spc() N37 xo0() N40 h() g0() g0() SPC 2 N9 N10 h() N38 N39 h() Full decoding tree representation ( N = 128 , K = 64). A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 13 / 20
Sub-tree Folding Technique Enabling compression Legend Default N0 Rate 0 left N1 N22 Rate 0 N2 N11 N23 N34 Rate 1 N3 N4 N12 N17 N24 N29 N35 N42 Rep left N6 N5 N21 N14 N25 N33 N36 Rep N7 N8 N16 N15 N40 SPC 2 N9 N10 A single occurrence of a given sub-tree traversal is generated, and reused wherever needed Compression ratio on the example shown: 1.48 A. Cassagne, B. Le Gal, C. Leroux, O. Aumage, D. Barthou IMS, Inria / Labri, Univ. Bordeaux, INP P-EDGE: Polar ECC Decoder Generation Environment 14 / 20
Recommend
More recommend