Pipelined Compressor Tree Optimization using Integer Linear - PowerPoint PPT Presentation

Pipelined Compressor Tree Optimization using Integer Linear Programming International Conference on Field Programmable Logic 03.09.2014 Martin Kumm, Peter Zipf University of Kassel, Germany

C ONTENTS 1. Introduction to Compressor Trees 2. Compressor Trees on FPGAs 3. Optimal Compressor Tree Synthesis 2

C OMPRESSOR T REES A compressor tree realizes the addition of many (>2) bit-shifted numbers The applications are versatile: Multiplier (real, complex, squarer) Evaluation of polynomials   (e.g., for function approximation) Linear transforms (e.g., FFT, DCT) Digital filters … 3

  E XAMPLE 1: M ULTI -I NPUT A DDITION Dot representation   Formula:   5 bit, 5-input addition:  X  S = X i         i         input  vectors                   2 4 2 3 2 2 2 1 2 0 4

  E XAMPLE 1: M ULTI -I NPUT A DDITION Dot representation   Formula:   5 bit, 5-input addition:  X 1 0 1 0 1 21  S = X i         i  1 1 0 1 1 +27        input  +13 0 1 1 0 1 vectors       +7 0 0 1 1 1          1 0 1 1 0 +22    = 90 3 · 2 4 +2 · 2 3 +4 · 2 2 +3 · 2 1 +4 · 2 0 = 90 5

  E XAMPLE 2: M ULTIPLIER Dot Representation   Formula:   5x5 Multiplication: 6

E XAMPLE 3: A DVANCED A RITHMETIC sine/cosine computation: Dot representation for Z-Z 3 /6 : [Dinechin HEART’13] 7

B ASIC C OMPRESSION Full adder/   Ripple carry adder: (3;2) counter: FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA 8

F LOW OF C OMPRESSION ⇓ 9

T ABULAR R EPRESENTATION 5 5 5 5 5 bits in stage 0 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 = 1 4 4 4 4 3 bits in stage 1 10

T ABULAR R EPRESENTATION 1 4 4 4 4 3 bits in stage 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 = 1 3 3 3 3 1 bits in stage 2 11

T ABULAR R EPRESENTATION 1 3 3 3 3 1 bits in stage 2 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 3 o − (3;2) counter + 1 1 = 2 2 2 2 1 1 bits in stage 3 12

T ABULAR R EPRESENTATION 2 2 2 2 1 1 bits in stage 3 2 2 2 2 o − ripple carry adder + 1 1 1 1 1 = 1 1 1 1 1 1 1 bits in final stage 13

A PPLICATION TO FPGA S The compression using full adders is unsuitable for FPGAs: Mapping of a full adder on FPGA LUTs is inefficient and slow ( ➯ large routing delays) Fast carry chain is not exploited Conventional Solution: Ripple-carry adder tree Delay reduction possible by using Generalized Parallel Counters (GPCs) [Parandeh–Afshar TRETS’11] 14

(1,5;3) GPC ON FPGA Dot transform: Realization: FA FA FA ⇓ 15

(1,5;3) GPC ON FPGA (1,5;3) GPC Mapping [Parandeh-Afshar TRETS’11]: Efficiency = bits reduced/#LUTs = (1+5-3)/3 = 1.0   [Dinechin FPL’13] FA FA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 16

E FFICIENT GPC S ON FPGA S (1,4,1,5;5) GPC [Kumm MBMV’14]: Efficiency = 1.5 FA FA FA FA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 17

E FFICIENT GPC S ON FPGA S (1,4,0,6;5) GPC [Kumm MBMV’14]: Efficiency = 1.5 FA FA FA FA HA HA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 18

E FFICIENT GPC S ON FPGA S (1,3,2,5;5) GPC (proposed): Efficiency = 1.5 FA FA FA FA FA FA FA FA FA HA HA FA FA FA FA FA FA FA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 19

E FFICIENT GPC S ON FPGA S (6,0,6;5) GPC (proposed): Efficiency = 1.75 FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA FA Slice LUT 0 0 0 0 1 1 1 1 Carry Logic 20

C OMPRESSOR T REE O PTIMIZATION Problem 1: The presented GPCs have irregular input pattern How to select them to get the least LUT resources? Problem 2: Pipelining is important on FPGAs to obtain a high throughput. How to select them to get the least LUT/FF resources?   (least pipeline balancing FFs) 21

E XAMPLE FOR P ROBLEM 1 5 5 5 5 5 bits in stage 0 1 4 1 5 o − (1,4,1,5;5) GPC + 1 1 1 1 1 1 4 1 4 o − (1,4,1,5;5) GPC + 1 1 1 1 1 = 1 6 2 2 2 1 bits in stage 1 1 6 2 2 2 1 bits in stage 1 6 o − (6;3) GPC + 1 1 1 = 1 2 1 2 2 2 1 bits in stage 2 22

E XAMPLE FOR P ROBLEM 2 5 5 5 5 5 bits in stage 0 2 0 4 5 o − (2,0,4,5;5) GPC + 1 1 1 1 1 5 0 5 o − (6,0,6;5) GPC + 1 1 1 1 1 3 1 o − 4 FF for pipeline balancing + 3 1 = 1 1 2 5 2 2 1 bits in stage 1 1 1 2 5 2 2 1 bits in stage 1 1 1 2 5 o − (1,3,2,5;5) GPC + 1 1 1 1 1 2 2 1 o − 5 FF for pipeline balancing + 2 2 1 = 1 1 1 1 1 2 2 1 bits in stage 2 23

P ROPOSED O PTIMIZATION A generic ILP optimizer was used Main idea of the ILP formulation is to count GPCs for each column [Matsunaga’13] and to `cover´ all bits in each stage by GPCs For that, a `pseudo compressor´ with one input and one output is introduced (no compression) To optimize a combinatorial compressor tree   (problem 1) the cost are set to zero (a wire) To optimize a pipelined compressor tree   (problem 2) the cost are set to the flip flop cost 24

ILP F ORMULATION ILP variables: No. of bits in stage s and column c : N s,c No. of GPCs in stage s , of type e and column c : k s,e,c No. of inputs and outputs of GPC (Typ e ) in column c :   and , respectively M e,c K e,c LUT cost of GPC e : c e Binary variable to select the active stage: ( if stage s is used   1 D s = otherwise 0 25

ILP F ORMULATION S − 1 C − 1 E − 1 X X X minimize c e k s,e,c s =0 c =0 e =0 subject to s = 1 . . . S − 1 , E − 1 C e − 1 ) X X M e,c + c 0 k s − 1 ,e,c + c 0 C1: N s − 1 ,c ≤ c = 0 . . . C − 1 , if D s = 0 e =0 c 0 =0 E − 1 C e − 1 ) s = 1 . . . S − 1 , X X K e,c + c 0 k s − 1 ,e,c + c 0 C2: N s,c = c = 0 . . . C − 1 e =0 c 0 =0 ⇢ 2 for two-input VMA C3: N s,c ≤ if D s = 1 3 for ternary VMA S − 1 X C4: D s = 1 s =1 26

              ILP F ORMULATION C1 and C3 have to be linearized:   E − 1 C e − 1 X X M e,c + c 0 k s − 1 ,e,c + c 0 + ID s C1’: N s − 1 ,c ≤ e =0 c 0 =0 ⇢ 2 + (1 − D s ) I for two-input VMA C3’: N s,c ≤ 3 + (1 − D s ) I for ternary VMA I must be a sufficiently large integer.   27

R ESULTS (a) 250 700 600 200 500 150 400 #LUT #LUT 300 100 200 Heuristic [8] Heuristic [8] 50 100 prop. ILP prop. ILP 0 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 Compressed bits Compressed bits (a) Virtex 4 FPGA Virtex 6 FPGA The required LUTs could be reduced by   23% (Virtex 4) and 30% (Virtex 6) compared to   Dinechin (FPL’13) [8] The slice reduction was 12.5% (Virtex 4) and 19.5% (Virtex 6) after synthesis. 28

E XAMPLE C OMPRESSION T REE WITH 16 I NPUTS , 16 B IT E ACH FloPoCo   Proposed ILP [Dinechin FPL’13] 29

C ONCLUSION & O UTLOOK A novel ILP formulation for the optimization of pipelined compressor trees was presented There is a notable gap between the former   state-of-the-art heuristic and our optimal solution Extensions are proposed for minimal stage count or variable column counters like 4:2 compressors Good heuristics are still required for problem sizes >100 bit due to the runtime of the ILP solver So far there is no heuristic considering pipelining 30

T HANK Y OU !

L ITERATURE [Parandeh-Afshar TRETS’11]: H. Parandeh-Afshar, A. Neogy, P. Brisk, and P. Inne, “Compressor Tree Synthesis on Commercial High-Performance FPGAs,” ACM TRETS , 2011 [Dinechin HEART’13]: F. de Dinechin, M. Istoan, and G. Sergent, “Fixed-Point Trigonometric Functions on FPGAs,” HEART 2013 , Jun. 2013. [Dinechin FPL’13]: N. Brunie, F. de Dinechin, M. Istoan, G. Sergent, K. Illyes, and B. Popa, “Arithmetic Core Generation Using Bit Heaps,” FPL 2013 [Matsunaga’13]: T. Matsunaga, S. Kimura, and Y. Matsunaga, “An Exact Approach for GPC-Based Compressor Tree Synthesis,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences , Dec. 2013.

ATTACHMENTS 34

Pipelined Compressor Tree Optimization using Integer Linear - PowerPoint PPT Presentation

Pipelined Compressor Tree Optimization using Integer Linear Programming International Conference on Field Programmable Logic 03.09.2014 Martin Kumm, Peter Zipf University of Kassel, Germany C ONTENTS 1. Introduction to Compressor Trees 2.

Emerson Compressor Control Process Control Made Easy with SmartProcess Compressor Agenda

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

Compressor stations & Compressor stations & health risks: health risks: Moving New

CPM Series Permanent Magnet Motor Variable Speed Screw Air Compressor PAR ART T 01 Why we

Compressor stations and Compressor stations and health risks health risks Curtis Nordgaard, MD

Review of Natural Gas Transmission Compressor Station Methane Emissions and Mitigation Options

Statements and open sentences Statements: 2 is an even integer. 3 is an even integer.

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Faster multiprecision integer division William Hart June 22, 2015 William Hart Faster

Integer Linked Lists An integer list is either: (1) empty, represented by (null) Lists, Too

Integer Programming Part 1 Prof. Dr. Arslan M. RNEK Integer Programming An integer

Quantum Hall effect effect Quantum Hall integer integer Hall bar geometry classical quantum

1 LOGO LAB 4&5 PCM Modulator & Demodulator Block diagram of PCM modulation LPF is

Summary of High Brightness Beams Workshop Erice 2005 G. A. Krafft Jefferson Lab Applications of

Auditd for the Masses Philipp Krenn @xeraa Learn about a breach From the press or

Introduction to Fitwel Alex Spilger Director of Sustainability, Cushman Wakefield Founder,

Facial Action Unit Detection Using Kernel Partial Least Squares Tobias Gehrig and Hazm K.

Searching for a Partner in .SE Owner: Erik Herzog, Ph.D., CSEP, SAAB Technical Fellow Systems

Harm Reduction 101 Safe talk about using drugs Scott Steiger, MD, FACP , FASAM Associate

Service 1 Service 2 Service 3 Postgresql/ MySQL Postgresql/ MySQL Data size: ~100GB to a few

Pipelined Compressor Tree Optimization using Integer Linear - PowerPoint PPT Presentation

Pipelined Compressor Tree Optimization using Integer Linear Programming International Conference on Field Programmable Logic 03.09.2014 Martin Kumm, Peter Zipf University of Kassel, Germany C ONTENTS 1. Introduction to Compressor Trees 2.

Emerson Compressor Control Process Control Made Easy with SmartProcess Compressor Agenda

DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle

Review: FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Chapter 6: Designing a Pipelined CPU What are our resources? 1 washer, 1 dryer, 1 folder

Compressor stations &amp; Compressor stations &amp; health risks: health risks: Moving New

CPM Series Permanent Magnet Motor Variable Speed Screw Air Compressor PAR ART T 01 Why we

Compressor stations and Compressor stations and health risks health risks Curtis Nordgaard, MD

Review of Natural Gas Transmission Compressor Station Methane Emissions and Mitigation Options

Statements and open sentences Statements: 2 is an even integer. 3 is an even integer.

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Faster multiprecision integer division William Hart June 22, 2015 William Hart Faster

Integer Linked Lists An integer list is either: (1) empty, represented by (null) Lists, Too

Integer Programming Part 1 Prof. Dr. Arslan M. RNEK Integer Programming An integer

Quantum Hall effect effect Quantum Hall integer integer Hall bar geometry classical quantum

1 LOGO LAB 4&amp;5 PCM Modulator &amp; Demodulator Block diagram of PCM modulation LPF is

Summary of High Brightness Beams Workshop Erice 2005 G. A. Krafft Jefferson Lab Applications of

Auditd for the Masses Philipp Krenn @xeraa Learn about a breach From the press or

Introduction to Fitwel Alex Spilger Director of Sustainability, Cushman Wakefield Founder,

Facial Action Unit Detection Using Kernel Partial Least Squares Tobias Gehrig and Hazm K.

Searching for a Partner in .SE Owner: Erik Herzog, Ph.D., CSEP, SAAB Technical Fellow Systems

Harm Reduction 101 Safe talk about using drugs Scott Steiger, MD, FACP , FASAM Associate

Service 1 Service 2 Service 3 Postgresql/ MySQL Postgresql/ MySQL Data size: ~100GB to a few

Compressor stations & Compressor stations & health risks: health risks: Moving New

1 LOGO LAB 4&5 PCM Modulator & Demodulator Block diagram of PCM modulation LPF is