An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long - PowerPoint PPT Presentation

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long Codewords Codewords Stefan Grönroos Kristian Nybom Jerker Björkqvist 18.10.11 Åbo Akademi University - Turku, Finland 1

Background Background  Working on software real-time DVB-T2 implementation for general purpose computers  DVB-T2, DVB-C2, DVB-S2 standards use LDPC codes as part of FEC scheme • Very long codewords: 16200 or 64800 bits • One of the most complex operations in the signal processing chain • DVB-T2 requires up to ~61 Mbps decoder throughput  Our CPU implementation not even close to realtime capable  Thus we turned to GPUs • More specifically NVIDIAs CUDA framework 18.10.11 Åbo Akademi University - Turku, Finland 2

LDPC Decoding LDPC Decoding H = [ 0 ] 1 1 1 1 0 0 0 0 1 1 0 1  LDPC Code can 1 0 0 1 1 be described by: – H matrix (n – k) Check nodes – Corresponding n Variable nodes bipartite graph  n-bit codeword – k data bits – (n-k) parity bits 18.10.11 Åbo Akademi University - Turku, Finland 3

Iterative message passing Iterative message passing  Each edge in graph holds message between check- and variable nodes  Check node  Variable node update: update: (n – k) Check nodes (n – k) Check nodes n Variable nodes n Variable nodes 18.10.11 Åbo Akademi University - Turku, Finland 4

Hardware Setup Hardware Setup  NVIDIA GeForce GTX 570  Based on NVIDIA Fermi architecture  15 Streaming Multiprocessors • 32 cores per SM  Thread warp : • Group of 32 consecutive threads • The same instruction is run for a half-warp (16 threads) at a time on 16 cores of an SM Source: NVIDIA 18.10.11 Åbo Akademi University - Turku, Finland 5

GPU Memory Accesses GPU Memory Accesses  Access to the large global memory is very slow on the GPU  Global memory accesses are processed per warp (32 threads)  If the threads of a warp access 32 aligned consecutive 32-byte words, we get full memory coalescence • Only one memory request for 128 bytes is made, and memory bus is fully utilized • Very low bus utilization if memory accesses are scattered within a warp! 18.10.11 Åbo Akademi University - Turku, Finland 6

Decoder memory accesses Decoder memory accesses  If we decode one codeword at a time: • Either check node update or variable node update memory accesses scattered  Solution: Decode several codewords in parallel • Efficient memory accesses • Increases parallelism (n – k) Check nodes (n – k) Check nodes n Variable nodes n Variable nodes 18.10.11 Åbo Akademi University - Turku, Finland 7

Our LDPC Decoder approach Our LDPC Decoder approach  Two main kernels (functions). Iterated alternately. • Check node update • Variable node update  8-bit fixed-point representation for messages • Messages for same edge for all codewords stored consecutively in memory  We decode 128 codewords in parallel  Each thread updates the outgoing messages from one check/variable node for 4 different codewords • A warp processes the same updates for all 128 codewords (32 threads x 4 codewords). • Result: 128-byte message reads/writes to global memory 18.10.11 Åbo Akademi University - Turku, Finland 8

Performance Performance  Good memory access patterns • Solution is now instruction bound  No shared (”scratchpad”) memory used, just 48KB L1 cache. • Allows larger number of active threads  Throughput: • Codeword length: 64800 bits • Code rate ½ (32400 information bits, 32400 parity bits) 20 iterations 30 iterations 50 iterations 163 Mbps 112 Mbps 69 Mbps 18.10.11 Åbo Akademi University - Turku, Finland 9

Conclusions Conclusions  Real-time LDPC decoding for DVB-T2, DVB-S2, DVB-C2 possible on a modern GPU  Some capacity left on GPU for other complex tasks, such as QAM constellation demapper • Future work 18.10.11 Åbo Akademi University - Turku, Finland 10

Thank you for listening! Questions? 18.10.11 Åbo Akademi University - Turku, Finland 11

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long - PowerPoint PPT Presentation

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long Codewords Codewords Stefan Grnroos Kristian Nybom Jerker Bjrkqvist 18.10.11 bo Akademi University - Turku, Finland 1 Background Background

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

A Reaction Attack on the QC-LDPC McEliece Cryptosystem Tomas Fabsic 1 , Viliam Hromada 1 , Paul

Design and Analysis of LDPC for MIMO-OFDM Guosen Yue NEC Labs Research Princeton, NJ Joint work

- tunnel-effect ( "micro-convergence" ) for SC-LDPC [ 1 ] [ 1 ] Schmalen, ten Brink,

Design of Energy-Efficient LDPC Codes and Decoders Elsa Dupraz 16/04/2019 Section 1:

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and

White Paper for LDPC Codes CCSDS P1B Houston Meeting Wai Fong NASA/GSFC October 2, 2002 White

Construction of LDPC codes Telecommunications Laboratory Alex Balatsoukas-Stimming Technical

Finite-Length Analysis of Irregular Expurgated LDPC Codes under Finite Number of Iterations

Coset graphs and LDPC codes Josef Lauri 1 and Cen J Tjhai 2 1 University of Malta || 2 University

LP Decoding of Regular LDPC Codes in Memoryless Channels Nissim Halabi Guy Even ISIT 2010 1

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Multilevel LDPC Lattices with Efficient Encoding and Decoding and a Generalization of Construction

Welcome to Enobyte Munich! Data Protection Enobyte Introduction Company overview 2 Enobyte

empower. NDIS made simple and easy for everyone. simple. effective. NDIS Management About Us.

Florida SBDC at UCF's Cybersecurity for Small Businesses: Protecting Your Digital Assets in 2018

Incentives in Cardano A Symphony Of Blockchains - London Kick off Dr. Lars Brnjes, Director of

(c) 2016 Fabbian USA Corp. (c) 2015 Fabbian USA Corp. (c) 2015 Fabbian USA Corp. (c) 2015

Slides Available Now! Slides from Todays Workshop are available at:

CWG on the use of country & territory names as TLDs (CWG UCTN) Presentation in Hyderabad,

Investor Presentation FEBRUARY 2020 Forward looking Certain information regarding mCloud

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long - PowerPoint PPT Presentation

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long Codewords Codewords Stefan Grnroos Kristian Nybom Jerker Bjrkqvist 18.10.11 bo Akademi University - Turku, Finland 1 Background Background

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

A Reaction Attack on the QC-LDPC McEliece Cryptosystem Tomas Fabsic 1 , Viliam Hromada 1 , Paul

Design and Analysis of LDPC for MIMO-OFDM Guosen Yue NEC Labs Research Princeton, NJ Joint work

- tunnel-effect ( &quot;micro-convergence&quot; ) for SC-LDPC [ 1 ] [ 1 ] Schmalen, ten Brink,

Design of Energy-Efficient LDPC Codes and Decoders Elsa Dupraz 16/04/2019 Section 1:

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and

White Paper for LDPC Codes CCSDS P1B Houston Meeting Wai Fong NASA/GSFC October 2, 2002 White

Construction of LDPC codes Telecommunications Laboratory Alex Balatsoukas-Stimming Technical

Finite-Length Analysis of Irregular Expurgated LDPC Codes under Finite Number of Iterations

Coset graphs and LDPC codes Josef Lauri 1 and Cen J Tjhai 2 1 University of Malta || 2 University

LP Decoding of Regular LDPC Codes in Memoryless Channels Nissim Halabi Guy Even ISIT 2010 1

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Multilevel LDPC Lattices with Efficient Encoding and Decoding and a Generalization of Construction

Welcome to Enobyte Munich! Data Protection Enobyte Introduction Company overview 2 Enobyte

empower. NDIS made simple and easy for everyone. simple. effective. NDIS Management About Us.

Florida SBDC at UCF's Cybersecurity for Small Businesses: Protecting Your Digital Assets in 2018

Incentives in Cardano A Symphony Of Blockchains - London Kick off Dr. Lars Brnjes, Director of

(c) 2016 Fabbian USA Corp. (c) 2015 Fabbian USA Corp. (c) 2015 Fabbian USA Corp. (c) 2015

Slides Available Now! Slides from Todays Workshop are available at:

CWG on the use of country &amp; territory names as TLDs (CWG UCTN) Presentation in Hyderabad,

Investor Presentation FEBRUARY 2020 Forward looking Certain information regarding mCloud

- tunnel-effect ( "micro-convergence" ) for SC-LDPC [ 1 ] [ 1 ] Schmalen, ten Brink,

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

CWG on the use of country & territory names as TLDs (CWG UCTN) Presentation in Hyderabad,