Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh
Gzip Widely-used lossless compression program Gzip = LZ77 + Huffman Big data needs fast compression Gigabyte-per-second Lower disk space in data centers Less power on communication networks 2
LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 3
LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 4
LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 5
LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 6
LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 7
LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 8
LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length 2. Match offset 3. Replace with a reference to previous occurrence 9
LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length = 2 2. Match offset 3. Replace with a reference to previous occurrence 10
LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length = 3 2. Match offset 3. Replace with a reference to previous occurrence 11
LZ77 Compression Example This sentence is an easy sentence to compress. Match offset = 20 bytes 1. Scan file byte by byte 2. Look for matches 1. Match length = 8 2. Match offset 3. Replace with a reference to previous occurrence 12
LZ77 Compression Example This sentence is an easy sentence to compress. Match offset = 20 bytes 1. Scan file byte by byte 2. Look for matches 1. Match length = 8 2. Match offset = 20 3. Replace with a reference to previous occurrence 13
LZ77 Compression Example This sentence is an easy @(8,20) to compress. 1. Scan file byte by byte 2. Look for matches • Match length = 8 • Match offset = 20 3. Replace with a reference to previous occurrence • Marker, length, offset 14
LZ77 Compression Example This sentence is an easy sentence to compress. This sentence is an easy @(8,20) to compress. Saved 5 bytes! 1. Scan file byte by byte 2. Look for matches • Match length = 8 • Match offset = 20 3. Replace with a reference to previous occurrence • Marker, length, offset 15
Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) Compiler { for(i=1..size) { int x = input[i]; int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 16
Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator 1 void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) Compiler { for(i=1..size) { int x = input[i]; int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 17
Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator 2 void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) 1 Compiler { for(i=1..size) { int x = input[i]; int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 18
Altera OpenCL Compiler for FPGAs Host Code Host Altera’s OpenCL //host code CPU Compiler //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers … PCIe OpenCL Single-threaded Code FPGA Accelerator 3 void kernel Load x Load y simple(global int *input, Altera’s OpenCL int size, global int *output) 2 Compiler { for(i=1..size) { int x = input[i]; 1 int y = input[i+1]; Store z int z = x + y; output[i] = z; } } DDRx Memory 19
FPGAs can be VERY Custom Host ARM Host on FPGA chip CPU IO Channels IO Channels PCIe FPGA Accelerator Load x Load y Store z RDL? Different memory types QDR? DDRx Memory
Implementation Overview 1. Shift In 2. Dictionary 3. Match Search 4. Write to New Data Lookup/Update & Filtering output 21
1. Shift In New Data Current Window Input from DDR memory 22
1. Shift In New Data Current Window e.g. o l d _ t e x t sample_text Cycle boundary 23
1. Shift In New Data Current Window e.g. o l d _ t e x t sample_text Cycle boundary Use text in our example, but can be anything VEC = 4 24
1. Shift In New Data Current Window e.g. t e x t sample_text Cycle boundary 25
1. Shift In New Data Current Window e.g. t e x t s a m p le_text Cycle boundary 26
Implementation Overview 1. Shift In 2. Dictionary 3. Match Search 4. Write to New Data Lookup/Update & Filtering output 27
2. Dictionary Lookup/Update Dictionary Current Window: t t e x t e x t e x t x t t s a m p s a m s s a 0 1. Compute hash Dictionary 2. Look for match 1 in 4 dictionaries 3. Update dictionaries Dictionary 2 Dictionaries buffer the text that we have already processed, e.g.: Dictionary 3 28
2. Dictionary Lookup/Update t a n _ Dictionary Current Window: t e x t s a m p 0 t e x t Hash e x t s t e x t Dictionary x t s a 1 t s a m t e x l Dictionary 2 t e e n Dictionary 3 29
2. Dictionary Lookup/Update t a n _ Dictionary Current Window: t e x t s a m p e a t e 0 t e x t Hash e x t s t e x t Dictionary e a r s x t s a 1 t s a m t e x l Dictionary e e p s 2 t e e n Dictionary e n t e 3 30
2. Dictionary Lookup/Update t a n _ Dictionary Current Window: t e x t s a m p e a t e 0 x a n t t e x t Hash e x t s t e x t Dictionary e a r s x t s a 1 x y l o t s a m t e x l Dictionary e e p s 2 x e l y t e e n Dictionary e n t e 3 x i r t 31
2. Dictionary Lookup/Update Possile matches from history (dictionaries) t a n _ Dictionary Current Window: t e x t s a m p e a t e 0 x a n t t e x t t a n _ e x t s Hash t e x t Dictionary e a r s x t s a 1 x y l o t s a m t a m e t e x l Dictionary e e p s 2 x e l y t e a l t e e n Dictionary e n t e 3 x i r t t e e n 32
2. Dictionary Lookup/Update Dictionary Current Window: t e x t s a m p 0 t e x t e x t s Hash Dictionary x t s a 1 t s a m Dictionary 2 Dictionary 3 33
2. Dictionary Lookup/Update t e e n RD03 RD01 t a n _ Dictionary Current Window: t e x t s a m p 0 RD02 RD00 t e x t t e x l W0 t e x t RD13 RD11 Dictionary 1 RD12 RD10 W1 Generate exactly the number of read/write ports that we need and the width RD23 RD21 Dictionary 2 RD22 RD20 256 read ports, 16 write ports – 128 bits W2 RD33 RD31 Dictionary 3 RD32 RD30 W3 34
Implementation Overview 1. Shift In 2. Dictionary 3. Match Search 4. Write to New Data Lookup/Update & Filtering output 35
3. Match Search & Filtering Current Windows: Comparison Windows: t e x t t e e n t e x l t e x t t a n _ e x t s e n t e e e p s e a r s e a t e x t s a x i r t x e l y x y l o x a n t t s a m t e e n t e a l t a m e t a n _ A set of candidate matches The substrings for each incoming substring Compare current window against each of its 4 compare windows 36
3. Match Search & Filtering Comparison Windows: t e e n t e x l t e x t t a n _ Current Window: Comparators t e x t We have another 3 of those Match Length: 2 3 4 1 Compare each byte 37
3. Match Search & Filtering Comparison Windows: t e e n t e x l t e x t t a n _ Current Window: Comparators t e x t Match Length: 2 3 4 1 Match Reduction Best Length: 4 38
3. Match Search & Filtering 39
3. Match Search & Filtering 40
3. Match Search & Filtering 41
3. Match Search & Filtering Typical C-code Fixed loop bounds – compiler can unroll loop 42
Recommend
More recommend