gzip compression using altera opencl
play

Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University - PowerPoint PPT Presentation

Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh Gzip Widely-used lossless compression program Gzip = LZ77 + Huffman Big data needs fast compression Gigabyte-per-second


  1. Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh

  2. Gzip  Widely-used lossless compression program  Gzip = LZ77 + Huffman  Big data needs fast compression Gigabyte-per-second  Lower disk space in data centers  Less power on communication networks 2

  3. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 3

  4. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 4

  5. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 5

  6. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 6

  7. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 7

  8. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 8

  9. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length 2. Match offset 3. Replace with a reference to previous occurrence 9

  10. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length = 2 2. Match offset 3. Replace with a reference to previous occurrence 10

  11. LZ77 Compression Example  This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length = 3 2. Match offset 3. Replace with a reference to previous occurrence 11

  12. LZ77 Compression Example  This sentence is an easy sentence to compress. Match offset = 20 bytes 1. Scan file byte by byte 2. Look for matches 1. Match length = 8 2. Match offset 3. Replace with a reference to previous occurrence 12

  13. LZ77 Compression Example  This sentence is an easy sentence to compress. Match offset = 20 bytes 1. Scan file byte by byte 2. Look for matches 1. Match length = 8 2. Match offset = 20 3. Replace with a reference to previous occurrence 13

  14. LZ77 Compression Example  This sentence is an easy @(8,20) to compress. 1. Scan file byte by byte 2. Look for matches • Match length = 8 • Match offset = 20 3. Replace with a reference to previous occurrence • Marker, length, offset 14

  15. LZ77 Compression Example  This sentence is an easy sentence to compress.  This sentence is an easy @(8,20) to compress. Saved 5 bytes! 1. Scan file byte by byte 2. Look for matches • Match length = 8 • Match offset = 20 3. Replace with a reference to previous occurrence • Marker, length, offset 15

  16. Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) Compiler { for(i=1..size) { int x = input[i]; int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 16

  17. Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator 1 void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) Compiler { for(i=1..size) { int x = input[i]; int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 17

  18. Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator 2 void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) 1 Compiler { for(i=1..size) { int x = input[i]; int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 18

  19. Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator 3 void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) 2 Compiler { for(i=1..size) { int x = input[i]; 1 int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 19

  20. FPGAs can be VERY Custom Host ARM Host on FPGA chip CPU IO Channels IO Channels PCIe FPGA Accelerator Load x Load y Store z RDL? Different memory types QDR? DDRx Memory

  21. Implementation Overview 1. Shift In 2. Dictionary 3. Match Search 4. Write to New Data Lookup/Update & Filtering output 21

  22. 1. Shift In New Data Current Window Input from DDR memory 22

  23. 1. Shift In New Data Current Window e.g. o l d _ t e x t sample_text Cycle boundary 23

  24. 1. Shift In New Data Current Window e.g. o l d _ t e x t sample_text Cycle boundary Use text in our example, but can be anything VEC = 4 24

  25. 1. Shift In New Data Current Window e.g. t e x t sample_text Cycle boundary 25

  26. 1. Shift In New Data Current Window e.g. t e x t s a m p le_text Cycle boundary 26

  27. Implementation Overview 1. Shift In 2. Dictionary 3. Match Search 4. Write to New Data Lookup/Update & Filtering output 27

  28. 2. Dictionary Lookup/Update Dictionary Current Window: t t e x t e x t e x t x t t s a m p s a m s s a 0 1. Compute hash Dictionary 2. Look for match 1 in 4 dictionaries 3. Update dictionaries Dictionary 2 Dictionaries buffer the text that we have already processed, e.g.: Dictionary 3 28

  29. 2. Dictionary Lookup/Update t a n _ Dictionary Current Window: t e x t s a m p 0 t e x t Hash e x t s t e x t Dictionary x t s a 1 t s a m t e x l Dictionary 2 t e e n Dictionary 3 29

  30. 2. Dictionary Lookup/Update t a n _ Dictionary Current Window: t e x t s a m p e a t e 0 t e x t Hash e x t s t e x t Dictionary e a r s x t s a 1 t s a m t e x l Dictionary e e p s 2 t e e n Dictionary e n t e 3 30

  31. 2. Dictionary Lookup/Update t a n _ Dictionary Current Window: t e x t s a m p e a t e 0 x a n t t e x t Hash e x t s t e x t Dictionary e a r s x t s a 1 x y l o t s a m t e x l Dictionary e e p s 2 x e l y t e e n Dictionary e n t e 3 x i r t 31

  32. 2. Dictionary Lookup/Update Possile matches from history (dictionaries) t a n _ Dictionary Current Window: t e x t s a m p e a t e 0 x a n t t e x t t a n _ e x t s Hash t e x t Dictionary e a r s x t s a 1 x y l o t s a m t a m e t e x l Dictionary e e p s 2 x e l y t e a l t e e n Dictionary e n t e 3 x i r t t e e n 32

  33. 2. Dictionary Lookup/Update Dictionary Current Window: t e x t s a m p 0 t e x t e x t s Hash Dictionary x t s a 1 t s a m Dictionary 2 Dictionary 3 33

  34. 2. Dictionary Lookup/Update t e e n RD03 RD01 t a n _ Dictionary Current Window: t e x t s a m p 0 RD02 RD00 t e x t t e x l W0 t e x t RD13 RD11 Dictionary 1 RD12 RD10 W1 Generate exactly the number of read/write ports that we need and the width RD23 RD21 Dictionary 2 RD22 RD20 256 read ports, 16 write ports – 128 bits W2 RD33 RD31 Dictionary 3 RD32 RD30 W3 34

  35. Implementation Overview 1. Shift In 2. Dictionary 3. Match Search 4. Write to New Data Lookup/Update & Filtering output 35

  36. 3. Match Search & Filtering Current Windows: Comparison Windows: t e x t t e e n t e x l t e x t t a n _ e x t s e n t e e e p s e a r s e a t e x t s a x i r t x e l y x y l o x a n t t s a m t e e n t e a l t a m e t a n _ A set of candidate matches The substrings for each incoming substring Compare current window against each of its 4 compare windows 36

  37. 3. Match Search & Filtering Comparison Windows: t e e n t e x l t e x t t a n _ Current Window: Comparators t e x t We have another 3 of those Match Length: 2 3 4 1 Compare each byte 37

  38. 3. Match Search & Filtering Comparison Windows: t e e n t e x l t e x t t a n _ Current Window: Comparators t e x t Match Length: 2 3 4 1 Match Reduction Best Length: 4 38

  39. 3. Match Search & Filtering 39

  40. 3. Match Search & Filtering 40

  41. 3. Match Search & Filtering 41

  42. 3. Match Search & Filtering Typical C-code Fixed loop bounds – compiler can unroll loop 42

Recommend


More recommend