lightweight compression methods achieving 120gbps and more
play

Lightweight Compression Methods Achieving 120GBps and More Piotr - PowerPoint PPT Presentation

Lightweight Compression Methods Achieving 120GBps and More Piotr Przymus Laboratoire dInformatique Fondamentale de Marseille Aix-Marseille University, France GPU Technology Conference Silicon Valley May 2017 P. Przymus Lightweight


  1. Lightweight Compression Methods Achieving 120GBps and More Piotr Przymus Laboratoire d’Informatique Fondamentale de Marseille Aix-Marseille University, France GPU Technology Conference Silicon Valley May 2017 P. Przymus Lightweight Compression Methods Achieving 120GBps and More 1/25

  2. K. Kaczmarski and P. Przymus , Fixed Length Lightweight Compression for GPU Revised , Journal of Parallel and Distributed Computing, 2017. A lightweight compression library for GPU. github.com/mis-wut/feathergpu MIT -licenesed. This project was partly funded by National Science Centre, decision DEC-2012/07/D/ST6/02483. Team Krzysztof Kaczmarski Warsaw University of Technology, Poland Piotr Przymus Aix-Marseille University, France Nicolaus Copernicus University in Toruń, Poland. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 2/25

  3. Lightweight compression on GPU – motivation Lightweight compression algorithms favours compression and decompression speed over compression ratio. Improved data transfer: Disk ↔ RAM ↔ GPU. GPU ↔ GPU: exchange of already compressed data, compress → transfer → decompress. Lower memory footprint: Less disk space used. Less RAM used. Less GPU memory used. Improved internal memory access: In some cases improved internal GPU memory access. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/25

  4. Lightweight compression on GPU – motivation Lightweight compression algorithms favours compression and decompression speed over compression ratio. Improved data transfer: Disk ↔ RAM ↔ GPU. GPU ↔ GPU: exchange of already compressed data, compress → transfer → decompress. Lower memory footprint: Less disk space used. Less RAM used. Less GPU memory used. Improved internal memory access: In some cases improved internal GPU memory access. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/25

  5. Lightweight compression on GPU – motivation Lightweight compression algorithms favours compression and decompression speed over compression ratio. Improved data transfer: Disk ↔ RAM ↔ GPU. GPU ↔ GPU: exchange of already compressed data, compress → transfer → decompress. Lower memory footprint: Less disk space used. Less RAM used. Less GPU memory used. Improved internal memory access: In some cases improved internal GPU memory access. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/25

  6. Lightweight compression on GPU – motivation Lightweight compression algorithms favours compression and decompression speed over compression ratio. Improved data transfer: Disk ↔ RAM ↔ GPU. GPU ↔ GPU: exchange of already compressed data, compress → transfer → decompress. Lower memory footprint: Less disk space used. Less RAM used. Less GPU memory used. Improved internal memory access: In some cases improved internal GPU memory access. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 3/25

  7. 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 Fixed length compression Fixed length ( FL ) – is a simple well known compression scheme where fixed number of bits is suppressed. Suppressed bits should be equal to 0. 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 Figure: Original data, only 4 bits are used in each byte. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

  8. 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 Fixed length compression Fixed length ( FL ) – is a simple well known compression scheme where fixed number of bits is suppressed. Suppressed bits should be equal to 0. 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 Figure: Original data, only 4 bits are used in each byte. P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

  9. Fixed length compression Fixed length ( FL ) – is a simple well known compression scheme where fixed number of bits is suppressed. Suppressed bits should be equal to 0. 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 Figure: Original data, only 4 bits are used in each byte. 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 Figure: Compressed data (each byte encodes two words of length 4 bits.) P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

  10. Fixed length compression Fixed length ( FL ) – is a simple well known compression scheme where fixed number of bits is suppressed. Suppressed bits should be equal to 0. 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 Figure: Original data, only 4 bits are used in each byte. 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 Figure: Compressed data (each byte encodes two words of length 4 bits.) compression ratio (CR) = Uncompressed size = 2, Compressed size P. Przymus Lightweight Compression Methods Achieving 120GBps and More 4/25

  11. Fixed length compression Fixed length (FL) compression: easy to implement, easy to achieve high data throughput. Many applications: Database compression: Columns, Indexes, Timeseries compression, Graph compression, etc. Many variants: Patched FL, Adaptive FL, DELTA-* P. Przymus Lightweight Compression Methods Achieving 120GBps and More 5/25

  12. Fixed length compression on GPU Performance over flexibility ( Fang et al. 2010 ) High performance but highly simplified version of algorithm. Words are mapped to full bytes e.g. 4 bits word will be mapped to 1 byte. Uses map primitive. Coalesced reads and writes: YES . Direct memory access: YES . Flexibility over performance ( Nvbio and Kaczmarski, Przymus 2012-2017 ) No simplifications at the cost of lower performance. Supports all possible bit encodings. Uses allgather or gather primitive. Coalesced reads and writes: NO . Direct memory access: YES . P. Przymus Lightweight Compression Methods Achieving 120GBps and More 6/25

  13. Fixed length compression on GPU 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 0 3233343536373839404142434445464748495051525354555657585960616263 1 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... 1024 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1055 31 Figure: Read pattern: GPU version of FL algorithm 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 3233343536373839404142434445464748495051525354555657585960616263 6465666768697071727374757677787980818283848586878889909192939495 Figure: Write pattern: GPU version of FL algorithm P. Przymus Lightweight Compression Methods Achieving 120GBps and More 7/25

  14. Fixed length on GPU (C+D, GTX Titan Black) 120 100 Bandwidth GB/s 80 60 40 20 0 0 200 400 600 800 1000 Data Size MB int max int min long max long min int long P. Przymus Lightweight Compression Methods Achieving 120GBps and More 8/25

  15. Fixed length on GPU (1 GB of data, GTX Titan Black) 300 250 Compr. GB/s 200 150 100 50 Bit Encoding 8 16 24 32 40 48 56 63 50 Decompr. GB/s 100 150 200 250 300 int long P. Przymus Lightweight Compression Methods Achieving 120GBps and More 9/25

  16. Can we do better? Aligned Fixed Length ( AFL ) algorithm. The FL algorithm is optimized for CPU memory access scheme. We can do better with GPU friendly memory organisation scheme. Features No simplifications, high performance on GPU . Still works quite well on CPU , but loses some cache hits benefits. Supports all possible bit encodings. Uses allgather or gather primitive. Coalesced reads and writes: YES . Direct memory access: YES . P. Przymus Lightweight Compression Methods Achieving 120GBps and More 10/25

  17. Aligned FL on GPU 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 3233343536373839404142434445464748495051525354555657585960616263 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1024 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 1055 Figure: Read pattern: GPU version of Aligned FL algorithm 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031 3233343536373839404142434445464748495051525354555657585960616263 6465666768697071727374757677787980818283848586878889909192939495 Figure: Write pattern: GPU version of Aligned FL algorithm P. Przymus Lightweight Compression Methods Achieving 120GBps and More 11/25

  18. Aligned FL on GPU (C+D, GTX Titan Black) 120 100 Bandwidth GB/s 80 60 40 20 0 0 200 400 600 800 1000 Data Size MB int max int min long max long min int long P. Przymus Lightweight Compression Methods Achieving 120GBps and More 12/25

Recommend


More recommend