An Efficient Softcore Multiplier Architecture for Xilinx FPGAs 22 nd - PowerPoint PPT Presentation

An Efficient Softcore Multiplier Architecture for Xilinx FPGAs 22 nd IEEE Symposium on Computer Arithmetic Martin Kumm, Shahid Abbas and Peter Zipf University of Kassel, Germany

CONTENTS 1. State-of-the-art 2. Proposed multiplier 3. Results 2

WHY FPGA   SOFTCORE MULTIPLIERS? The need for efficient multipliers forced FPGA vendors to embed hard multiplier blocks FPGA softcore multipliers are still required: Small word sizes (worse mapping for embedded mults) Large word sizes ("fill gaps") Replace embedded mults on small/low-cost FPGAs 3

WHY THEY ARE DIFFERENT? Research for efficient multipliers is an ongoing process nearly since >50 years Efficient multipliers in terms of gates may not be efficient on FPGAs FPGA optimized structures are relatively rare 4

WHY THEY ARE DIFFERENT? Xilinx slice 6/7 series 5

PREVIOUS WORK A Baugh-Wooley like multiplier was proposed in   [Parandeh-Afshar 2011] Two partial products are generated and added using carry chain Compression tree of already reduced PP's necessary LUT LUT LUT LUT 0 0 0 0 1 1 1 1 Carry Logic 6

PREVIOUS WORK A Baugh-Wooley like multiplier was proposed in   [Parandeh-Afshar 2011] Two partial products are generated and added using carry chain Compression tree of already reduced PP's necessary full adder LUT LUT LUT LUT 0 0 0 0 1 1 1 1 Carry Logic 6

PREVIOUS WORK Another idea was discussed in [Brunie 2013]: Decompose multiplication into small multipliers that fit into single LUTs, e. g., 3x3, 2x3, 1x4 Use a compression tree to add partial results p = M 1 + 2 3 M 2 + 2 6 M 3 + . . . . . . + 2 3 M 4 + 2 6 M 5 + 2 9 M 6 + . . . . . . + 2 6 M 7 + 2 9 M 8 + 2 12 M 9 7

BOOTH RECODING M X a · BE m 2 m a · b = m =0 m even b m +1 b m b m − 1 BE m z m c m s m 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 1 2 0 0 1 1 0 0 -2 0 1 1 1 0 1 -1 0 1 0 1 1 0 -1 0 1 0 1 1 1 0 1 0 0 8

BOOTH MULTIPLIER b 0 LSB c 0 c 0 c 0 c 0 c 0 c 0 c 0 c 2 c 2 c 2 c 2 c 2 c 0 c 4 c 4 c 4 c 2 c 6 c 4 c 6 + = 0 0 MSB 9

BOOTH MULTIPLIER b 0 LSB c 0 1 1 c 0 c 2 1 c 2 c 4 1 c 4 c 6 c 6 + = 0 0 MSB 10

PROPOSED ARCHITECTURE 0 1 0 1 0 1 0 1 0 0 0 1 0 1 LUT LUT LUT LUT 0 0 0 0 1 1 1 1 Carry Logic 11

PROPOSED ARCHITECTURE 0 1 0 1 0 1 0 1 0 0 0 1 0 1 LUT LUT LUT LUT 0 0 0 0 1 1 1 1 Carry Logic full adder 11

PROPOSED ARCHITECTURE 12

  RESULTS The number of slices can be precisely predicted:   #slices( M, N ) = d N/ 4 + 1 e · b M/ 2 + 1 c   | {z } | {z } slices per row no of rows Design was implemented as generic VHDL A pipelined multiplier can be obtained by using the   (otherwise unused) slice FFs without much additional cost Reference circuits (Parandeh-Afshar & LUT-based) were designed with the FloPoCo library [de Dinechin 2012] Xilinx Coregen was used as a commercial reference 13

RESULTS VIRTEX 6 COMBINATORIAL, SLICES 2 , 000 1x4 LUT Multiplier 1 , 800 3x2 LUT Multiplier 3x3 LUT Multiplier 1 , 600 Parandeh-Afshar Multiplier Coregen (area) 1 , 400 Coregen (speed) 1 , 200 proposed #Slices 1 , 000 800 600 400 200 0 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 Input word size (N) 14

RESULTS VIRTEX 6 COMBINATORIAL, SLICE RED. 80 60 Slice reduction (%) 40 1x4 LUT Multiplier 20 3x2 LUT Multiplier 3x3 LUT Multiplier Parandeh-Afshar Multiplier Coregen (area) 0 Coregen (speed) 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 Input word size (N) 15

RESULTS VIRTEX 6 COMBINATORIAL, FREQ. 700 1x4 LUT Multiplier 3x2 LUT Multiplier 600 3x3 LUT Multiplier Parandeh-Afshar Multiplier Coregen (area) 500 Coregen (speed) Frequency [MHz] proposed 400 300 200 100 0 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 Input word size (N) 16

RESULTS VIRTEX 6 PIPELINED, SLICES 2 , 000 1x4 LUT Multiplier 1 , 800 3x2 LUT Multiplier 3x3 LUT Multiplier 1 , 600 Parandeh-Afshar Multiplier Coregen (area) 1 , 400 Coregen (speed) 1 , 200 proposed #Slices 1 , 000 800 600 400 200 0 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 Input word size (N) 17

RESULTS VIRTEX 6 PIPELINED, SLICE RED. 80 70 60 50 Slice reduction (%) 40 30 20 1x4 LUT Multiplier 3x2 LUT Multiplier 10 3x3 LUT Multiplier Parandeh-Afshar Multiplier Coregen (area) 0 Coregen (speed) − 10 8 12 16 20 24 28 32 36 40 44 48 52 56 60 Input word size (N) 18

RESULTS VIRTEX 6 PIPELINED, FREQ. 700 1x4 LUT Multiplier 3x2 LUT Multiplier 600 3x3 LUT Multiplier Parandeh-Afshar Multiplier Coregen (area) 500 Coregen (speed) Frequency [MHz] proposed 400 300 200 100 0 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 Input word size (N) 19

UNFORTUNATELY NOT POSSIBLE ON ALTERA FPGAS Altera ALM 20

MAYBE POSSIBLE NEXT? 21

CONCLUSION Compared to the best known design, up to 50% slices can be saved for the combinatorial multiplier 30% slices can be saved for the pipelined multiplier Portable to FPGAs providing a 5-input LUT at one full adder input "Free addition" supports multiply-accumulate (MAC) operation 22

THANK YOU! LITERATURE [Parandeh-Afshar 2011]: Parandeh-Afshar & Ienne Measuring and Reducing the Performance Gap between Embedded and Soft Multipliers on FPGAs , FPL 2011 [Brunie 2013]: Brunie, de Dinechin, Istoan, Sergent, Illyes & Popa Arithmetic Core Generation Using Bit Heaps , FPL 2013 [de Dinechin 2012]: de Dinechin & Pasca Designing Custom Arithmetic Data Paths with FloPoCo IEEE Design & Test of Computers 2012 23

BOOTH RECODING b = b M − 1 2 M − 1 + . . . + b 2 2 2 + b 1 2 1 + b 0 = b M − 1 2 M − 1 + . . . + b 2 2 2 + 2 b 1 2 1 + − b 1 2 1 + b 0 | {z } BE 0 = − 2 b 1 + b 0 = b M − 1 2 M − 1 + . . . . . . + 2 b 3 2 3 − b 3 2 3 + b 2 2 2 + 2 b 1 2 1 +BE 0 | {z } BE 2 =( − 2 b 3 + b 2 + b 1 )2 2 M X BE m 2 m with BE m = − 2 b m +1 + b m + b m − 1 = m =0 m even 25

WHY THEY ARE DIFFERENT? Altera ALM 26

WHY THEY ARE DIFFERENT? SRHI SRLO Q INIT1 CE INIT0 CK SR D6:1 FF/LAT INIT1 Q INIT0 D SRHI CE SRLO CK SR 27

An Efficient Softcore Multiplier Architecture for Xilinx FPGAs 22 nd - PowerPoint PPT Presentation

An Efficient Softcore Multiplier Architecture for Xilinx FPGAs 22 nd IEEE Symposium on Computer Arithmetic Martin Kumm, Shahid Abbas and Peter Zipf University of Kassel, Germany CONTENTS 1. State-of-the-art 2. Proposed multiplier 3.

Optimal DDR4 System with Data Bus Inversion Hing Yan (Thomas) To, (Xilinx Inc.) Image Changyi Su

TITLE Novel Methodology of IBIS-AMI Hardware Correlation using Trend and Distribution Analysis

OpenSPARC T1 on Xilinx FPGAs Updates Thomas Thatcher Paul Hartke Durgam Vahia

25G Long Reach Cable Link System Equalization Optimization Image Geoff Zhang (Xilinx Inc.) Yu

DIAMETER PHOTO-MULTIPLIER TUBES DEREK BOYLAN PHOTO-MULTIPLIER TUBES (PMTS) Photomultipler

Axiomatic Foundations of Multiplier Preferences Tomasz Strzalecki Multiplier preferences

VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

URP Slides for Multiplier Tables 12 April Lectures Economic Impact of Maytag Closing Economic

Verilog Modeling for Synthesis Multiplier Design (Nelson model) Add and shift binary

Memory Expansion and Storage Acceleration with CCIX Technology Millind Mittal, Fellow, Xilinx

DEC PERLE Board as Board as DEC PERLE an EXAMPLE of an EXAMPLE of RECONFIGURABLE

CENG 342 Digital Systems Review Larry Pyeatt SDSM&T Xilinx Vivado Installation Start

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

Introduction to Xilinx System Generator Part I Evan Everett and Michael Wu ELEC 433 - Spring

Critical points of the multiplier map Igors Gorbovickis Jacobs University Bremen March 25, 2019

Efficient Hardware Accelerator for IPSec based on Partial Reconfiguration on Xilinx FPGAs Ahmad

alloy oy Daniel Jackson MIT Lab for Computer Science 6898: Advanced Topics in Software Design

Algorithmic Bias I: Biases and their Consequences Joshua A. Kroll Postdoctoral Research Scholar

What is Data? Part 2: Patterns & Associations INFO-1301, Quantitative Reasoning 1 University

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

Lifted Inference in Statistical Relational Models Guy Van den Broeck BUDA Invited Tutorial June

Micropayments on the Paywalled Internet Samvit Jain, Class of 2017 Advisor: Brian Kernighan

An Overview of DNSSEC Cesar Diaz cesar@ lacnic.net 1 DNSSEC??? The DNS Security

PNAS Science Menopon gallinae Ectopsocus briggsi Transcriptome sequences of three lice

An Efficient Softcore Multiplier Architecture for Xilinx FPGAs 22 nd - PowerPoint PPT Presentation

An Efficient Softcore Multiplier Architecture for Xilinx FPGAs 22 nd IEEE Symposium on Computer Arithmetic Martin Kumm, Shahid Abbas and Peter Zipf University of Kassel, Germany CONTENTS 1. State-of-the-art 2. Proposed multiplier 3.

Optimal DDR4 System with Data Bus Inversion Hing Yan (Thomas) To, (Xilinx Inc.) Image Changyi Su

TITLE Novel Methodology of IBIS-AMI Hardware Correlation using Trend and Distribution Analysis

OpenSPARC T1 on Xilinx FPGAs Updates Thomas Thatcher Paul Hartke Durgam Vahia

25G Long Reach Cable Link System Equalization Optimization Image Geoff Zhang (Xilinx Inc.) Yu

DIAMETER PHOTO-MULTIPLIER TUBES DEREK BOYLAN PHOTO-MULTIPLIER TUBES (PMTS) Photomultipler

Axiomatic Foundations of Multiplier Preferences Tomasz Strzalecki Multiplier preferences

VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

URP Slides for Multiplier Tables 12 April Lectures Economic Impact of Maytag Closing Economic

Verilog Modeling for Synthesis Multiplier Design (Nelson model) Add and shift binary

Memory Expansion and Storage Acceleration with CCIX Technology Millind Mittal, Fellow, Xilinx

DEC PERLE Board as Board as DEC PERLE an EXAMPLE of an EXAMPLE of RECONFIGURABLE

CENG 342 Digital Systems Review Larry Pyeatt SDSM&amp;T Xilinx Vivado Installation Start

Fast dynamic and partial reconfiguration Data Path with low Hardware overhead on Xilinx FPGAs

Introduction to Xilinx System Generator Part I Evan Everett and Michael Wu ELEC 433 - Spring

Critical points of the multiplier map Igors Gorbovickis Jacobs University Bremen March 25, 2019

Efficient Hardware Accelerator for IPSec based on Partial Reconfiguration on Xilinx FPGAs Ahmad

alloy oy Daniel Jackson MIT Lab for Computer Science 6898: Advanced Topics in Software Design

Algorithmic Bias I: Biases and their Consequences Joshua A. Kroll Postdoctoral Research Scholar

What is Data? Part 2: Patterns &amp; Associations INFO-1301, Quantitative Reasoning 1 University

Outline Bugs! 1 Avoiding and Finding bugs 2 Bugs still happen 3 Why do bugs still happen ?!

Lifted Inference in Statistical Relational Models Guy Van den Broeck BUDA Invited Tutorial June

Micropayments on the Paywalled Internet Samvit Jain, Class of 2017 Advisor: Brian Kernighan

An Overview of DNSSEC Cesar Diaz cesar@ lacnic.net 1 DNSSEC??? The DNS Security

PNAS Science Menopon gallinae Ectopsocus briggsi Transcriptome sequences of three lice

CENG 342 Digital Systems Review Larry Pyeatt SDSM&T Xilinx Vivado Installation Start

What is Data? Part 2: Patterns & Associations INFO-1301, Quantitative Reasoning 1 University