Jet Propulsion Laboratory California Institute of Technology Support Material for Presentation of Orange Book on LDPC Code Selection for CCSDS Standard CCSDS, Toulouse, Nov. 15, 2004 JPL Proprietary Material 1 CCSDS, Toulouse, Nov., 2004
LDPC Code Family Construction Method Jet Propulsion Laboratory California Institute of Technology • Code is constructed by extending a seed protograph into a large code, where interconnections are organized as “circulants”, and avoiding short loops • lower complexity decoders, while maintaining the relevant code characteristics for good performance • fast encoders • Protographs are “skeletons” of the code family • Selected from dozens (hundreds?) of candidates for: • Low threshold, determined by Density Evolution • Small size, for simple implementation • Small edge degrees, to reduce node complexity • Circulants used to expand protographs, so: • Code description tables are small • Hardware has fast, simple memory addressing • Progressive Edge Growth (PEG) used to select circulants • PEG is a greedy algorithm that chooses “good” circulants • ACE criterion defines how good a set of circulants are • ACE (Approximate Cycle EMD) is a low-complexity Extrinsic Message Degree measure • Better than minimum loop length for preventing low weight codewords • Simulation (in hardware and software) used to determine error floor and number of iterations required. JPL Proprietary Material 2 CCSDS, Toulouse, Nov., 2004
Jet Propulsion Laboratory LDPC Code Selection for Standard (Cont’d) California Institute of Technology Output codeword s 0 ,s 1 s 1 Code Protograph Difference Capacity Puncture Rate Threshold Input Accumulate message 1/2 0.516 0.187 0.329 D Permute 2/3 1.288 0.229 1.059 p 1 ,p 2 + Π 3 4/5 2.277 2.040 0.237 Sparse matrix multiplies 7/8 3.129 2.845 0.284 p 0 Threshold table (near-capacity) Π6+Π7 Π 1 α Π 2 Π4+Π5 Information Code block length n block length k rate 1/2 rate 2/3 rate 4/5 Fast encoder structure Sparse circulant G matrices 1024 2048 1536 1280 4096 8192 6144 5120 16384 32768 24576 20480 Family of Protograph of ARA Family protographs 2n Code Family (Code rates and block lengths) code rate =(n+1)/(n+2) n=0, 1, ...... 2 2 4 0 0 1 1 3 This simple seed protograph, replicated enough times to obtain the large code, yields a much more structured code, suitable for high speed decoding JPL Proprietary Material 3 CCSDS, Toulouse, Nov., 2004
Parity check matrix Jet Propulsion Laboratory California Institute of Technology ARAx2_4c_64c parity check matrix, with structure indicated Protograph is 3 rows by 5 columns Expanded 2 times (by hand) to eliminate parallel edges Expanded 4 times with circulants to introduce necessary irregularity Expanded 64 times with circulants to construct full code JPL Proprietary Material 4 CCSDS, Toulouse, Nov., 2004
Jet Propulsion Laboratory transmitted transmitted transmitted punctured transmitted California Institute of Technology m+k p0 p1 p2 s0 s1 s2 0 1 2 3 4 5 6 7 c 2 (j) 0 1 Π 2 Π 1 2 3 j 4 5 6 7 M Π 3 Π 6 +Π 7 m M = 512 α Π 4 +Π 5 n = 2048 punctured rate = k/n = 1/2 k = 1024 m = n + (punctured) - k unpunctured rate = k/(m+k) = 2/5 m = 1536 JPL Proprietary Material 5 CCSDS, Toulouse, Nov., 2004
Jet Propulsion Laboratory Peformance curves California Institute of Technology Rates 1/2, 2/3, and 4/5 at k=16384 Rate 0.87451 at k=7136 JPL Proprietary Material 6 CCSDS, Toulouse, Nov., 2004
Jet Propulsion Laboratory Peformance curves California Institute of Technology Ten proposed codes: blue=1/2, green=2/3, red=4/5, black=0.7451 JPL Proprietary Material 7 CCSDS, Toulouse, Nov., 2004
Jet Propulsion Laboratory FY04 Accomplishments (Cont’d) California Institute of Technology LDPC Decoder Architecture • Followed a two-pronged development approach: Conceived and developed two promising types of decoders. Selection of best approach is in progress BenONE™: Single-slot DIME-II™ BenDATA-WS™: 24MByte ZBT SRAM Motherboard PCI card $14K purchase from Nallatech Java GUI user Xilinx Virtex II interface for 8M gates remote access to HW platform Daughter Card K. Andrews, C. Jones JPL Proprietary Material 8 CCSDS, Toulouse, Nov., 2004
Jet Propulsion Laboratory FY04 Accomplishments (Cont’d) California Institute of Technology Parallel Decoder Type-1 – for protograph codes Parallelization method • Decodes one protograph per clock cycle per half-iteration Protograph • Protograph with k input bits has: Decoder speed = (k/2) x clock speed/iterations • Example ARA protograph with 16 input bits (expanded by 8 from seed protograph) yields 20 Mbps FPGA decoder with 50 MHz clock and 20 average iterations Parallelization method slice # 1 Expanded protograph has 40 variable nodes, 24 check nodes, Expanded protograph has 40 variable nodes, 24 check nodes, and 112 edges. and 112 edges. FPGA can support up to 512 slices of protograph. FPGA can support up to 512 slices of protograph. This corresponds to an input block size of up to 8192 bits This corresponds to an input block size of up to 8192 bits FPGA utilization factor is 39% logic, 67% RAM. FPGA utilization factor is 39% logic, 67% RAM. slice # 2 check node Pros: Pros: • Highly parallel architecture for fast • Highly parallel architecture for fast variable node decoders decoders connected to • Regular structure • Regular structure channel slice # N Cons: Cons: variable node • Little code flexibility: tailored to • Little code flexibility: tailored to not connected protograph codes protograph codes to channel D. Divsalar, J. Lee, J. Thorpe, K. Andrews, A. Abbasfar JPL Proprietary Material 9 CCSDS, Toulouse, Nov., 2004
Jet Propulsion Laboratory FY04 Accomplishments (Cont’d) California Institute of Technology Parallel Decoder Type-1 – for protograph codes (Cont’d) Hardware implementation • Developed high-speed decoder architecture that needs only simple addition operations at both variable and check nodes – Variable nodes add “reliabilities” = Log-likelihoods – Check nodes add “unreliabilities” – Exchanged messages transformed between reliability and unreliability • Non-uniform quantizer designed to maximize performance while simplifying this transformation Quantized reliability/unreliability transformation Variable nodes Decoder implementation for sample protograph Edge memories Check nodes Variable node processors Constraint node processors Rel/Unrel transformation • Suitable for in-situ communications. Estimates predict 32 Msps using XQR2V6000 radiation tolerant FPGA (largest rad-tol FPGA available today) D. Divsalar, J. Lee, J. Thorpe, K. Andrews, S. Dolinar JPL Proprietary Material 10 CCSDS, Toulouse, Nov., 2004
Jet Propulsion Laboratory Oct’04 Accomplishments (Cont’d) California Institute of Technology Parallel Decoder – Type-1 for Protograph-based LDPC codes (cont.) • Developed design for efficient use of Virtex FPGA block RAM memories to maximize both the decoder’s speed and decodable code size – Comparison: Type-1 protograph decoder processing e edges in parallel every half-iteration is roughly e /(2 L ) times faster than Type-2 universal decoder processing 2 L edges in parallel every half-iteration – e.g., e = 140 vs L = 16 yields speedup factor > 4 – Nearly a factor of 2 additional parallelizability/speedup may be possible if the FPGA logic can make use of the Virtex RAM’s read-before-write mode – This would increase the parallelizability limit on e ; revised constraint would be e/2 + n /2 < B – e.g., e = 18*14 = 252 for the rate-1/2 ARA protograph would yield speedup factor ≈ ≈ 8 vs universal decoder with L ≈ ≈ = 16 – Maximum decodable code size is ( nT, kT ), where ( n,k ) is the size of the protograph and T is the size of the circulant expansion – e.g., ( nT,kT ) = (40960, 20480) for the rate-1/2 ARA protograph expanded to e = 140 – e.g., ( nT,kT ) = (73728, 36864) for the rate-1/2 ARA protograph expanded to e = 252 Notation and other relevant details: – Virtex block RAMs are 2048 x 9 bits – Design achieves protograph expansion factor T = 1024 by using two half-RAMs for 1024 inputs and 1024 outputs for each protograph edge – Half-RAM addresses are accessed in sequence, exploiting simplicity of circulant permutations on protograph edges – Design achieves decoder parallelizability corresponding to a maximum protograph size with e + n /2 < B where – B = # block RAMs (B = 168 for current FPGA, Virtex II 8000) – e = # edges in protograph – n = # channel symbols input to protograph – Example: small rate-1/2 ARA protograph can be preliminarily expanded T ′ ′ = 10 times to yield e = 140, n = 40, e + n /2 = 160 < B = 168 ′ ′ K. Andrews JPL Proprietary Material 11 CCSDS, Toulouse, Nov., 2004
Recommend
More recommend