Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg - PowerPoint PPT Presentation

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010

Parallelism in FPGAs  Larger SoCs on FPGAs → Parallel Systems  Parallel systems on FPGAs will need: − Queueing − Data sharing − Communication − Synchronization  Boils down to: − FIFOs − Register files We can do all these with multi-ported memories 2

Multi-Ported Memory X X X X Existing workarounds are ad-hoc, “roll-your-own”, and have limited parallelism. 3

Conventional Approaches 4

2W/2R Multi-Ported Memory Doesn't exist on FPGAs Altera used to have one (Mercury) 5

Stratix III Building Blocks Adaptive Logic Modules Flexible,  Registers but slow  LUTs  Adders Block RAMs Fast, but  M9K (eg: 32 x 256) inflexible  M144K (eg: 32 x 4098) 6

2W/2R Pure-ALM Scales very poorly with memory depth 7

1W/nR Replication Only one write port Multiple read ports 8

mW/nR Banking Multiple write ports Fragmented data 9

mW/nR “Multipumping”  Multiple read/write ports  Divides clock speed  No fragmentation  Read/write ordering 10

Block RAMs: Simple Dual Port Write Read 11

Block RAMs: True Dual Port R / W R / W 12

“Pure Multipumping” Read as banked memory (multiple reads) 13

“Pure Multipumping” Write as replicated memory (avoids fragmentation) 14

Methodology  Generate design variations over space − Vary # of ports, depth, type of memories  1W/2R to 8W/16R  2 to 256 elements deep  Pure-ALM, M9K, MLAB, Multipumped − Wrap in testbench for timing and correctness  Target Quartus 9.0 to Stratix III − No synthesis optimizations for speed or area − Standard P&R effort (speed, avg. over 10 runs)  Measure area as Total Equivalent Area − Expresses area in a single unit (ALMs) 15

Conventional Multi-Porting Performance 16

1W/2R Pure-ALM Area vs. Speed Too big and slow! Faster NiosII/f 290 MHz 500 ALMs 17 Smaller

1W/2R Replicated vs. Pure-ALM 18

1W/2R “Pure Multipumping” 19

LVT-Based Multi-Ported Memories 20

LVT-Based Memory 21

LVT-Based Memory Begin with one block RAM 22

LVT-Based Memory Replicate for two read ports 23

LVT-Based Memory Bank for two write ports 24

LVT-Based Memory Select bank to read from 25

LVT-Based Memory Add bank lookup table 26

LVT-Based Memory 27

Live Value Table Operation 28

LVT Operation 2W/2R, 4-deep 29

LVT Operation W 0 W 0 R 0 0 1 2 R 1 W 1 3 Write Addresses Read Addresses Live Value Table 30

LVT Operation: Write W 0 W 0 R 0 0 42 @ 1 1 0 2 R 1 W 1 23 @ 3 3 1 Records which write port last updated a location 31

LVT Operation: Read W 0 W 0 R 0 0 @ 3 1 1 0 2 R 1 W 1 @ 1 3 1 0 Steers read port to correct memory bank 32

LVT Implementation LVT remains practical because it is very narrow 33

LVT Operation Small Pure-ALM memory controlling larger block RAMs 34

Advantages of LVTs  LVTs add a layer of indirection − Everything operates in parallel − Makes banked memory behave as consistent unit  LVTs are narrow − Word width = log 2 (# of write ports) < 4 bits typically − Pure-ALM, but practical size and speed 35

LVT Performance 36

2W/4R Pure-ALM 37

2W/4R LVT-based vs. Pure-ALM 412 MHz to 375 MHz 84% smaller 43% faster 38

2W/4R Multipumping Must be careful about read/write ordering! 39

Multipumping Performance 40

2W/4R Multipumping 41

2W/4R Multipumping Pure Multipumping (279 MHz) 42

4W/8R Multipumping Worsens as # of ports increases 43

2W/4R Multipumping 54% slower 28% smaller on average on average 193 MHz to 174 MHz 44

Conclusions  LVT-based memories are faster and smaller than Pure-ALM memories.  LVT-based memories are faster than pure multipumping, but at a cost in area.  Pure multipumped memories are better for memories with few ports or low speed. 45

Future Work  Pure multipumping for LVT-based memories − Build banks with 2W/4R pure multipumping blocks − Possible further area improvement  Relaxing the read/write order for multipumping − Allows multiplexing the write ports − Leaves designer to watch for WAR violations 46

Thank You 47

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg - PowerPoint PPT Presentation

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010 Parallelism in FPGAs Larger SoCs on FPGAs Parallel Systems Parallel systems on FPGAs

Multj-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapatj, and Greg Stefgan

The BIST History of FPGAs FPGAs The BIST History of The BISTory BISTory of of FPGAs FPGAs

FPGAs 1 CMPE691/491: Advanced FPGA Design FPGAs Large array of configurable logic blocks

Physical Design For FPGAs Rajeev Jayaraman Physical Implementation Tools Xilinx Inc. ISPD-2001

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Memories Introduction Why do we need memory in an FPGA Device? Topics Types of FPGA

with FP FPGAs: Cas ase Stu tudy on on a a Key-Value Store FPGAs in the Cloud Wider

FPGAs as Tools and Architectures at ETH Systems FPGAs as Tools and Architectures at ETH Systems

Virtex-7 FPGAs Target Software Virtex-7 FPGAs Target Software Defined Radio Applications Defined

Linux and FPGAs Chad D. Kersey chad@cdkersey.com cdkersey@gatech.edu Linux and FPGAs - p. 1/9

Physical optimization for Physical optimization for FPGAs using post- FPGAs using post-

Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019

High-Speed Computing & Co-Processing with FPGAs FPGAs (Field Programmable Gate Arrays) are

A Network of Time Division Multiplexing for FPGAs Rosemary Francis Motivation FPGAs are

Gigabit Ethernet Gigabit Ethernet implementation for implementation for FPGAs FPGAs Grzegorz

FPGAs 1 To read more This days papers: Brown and Rose, Architecture of FPGAs and

ECEU530 Project Presentations ECE U530 Wednesday November 15: Digital Hardware Synthesis

Documentation on Record Review 1 Ineligibles Report: Client/ Miscellaneous/ Communication 2 1

Polynomial optimization using MOSEK and Julia ISMP, Pittsburgh, July 12-17, 2015 Joachim Dahl,

Hardware Design with VHDL Design Example: BRAM ECE 443 BRAM There are two sources of memory

Background w Allen Tanner built an SRAM/ROM generator program back in 2004 n the ROM seems

Read Only Memory ROM A read only memory have address inputs and data outputs With m

edet: Depfet Movie Chip (DMC) Status 20th International Workshop on DEPFET Detectors and

Market-leading Hardw are/Softw are Co-Verification Seamless for Field Programmable SoC n FPSoC

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg - PowerPoint PPT Presentation

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010 Parallelism in FPGAs Larger SoCs on FPGAs Parallel Systems Parallel systems on FPGAs

Multj-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapatj, and Greg Stefgan

The BIST History of FPGAs FPGAs The BIST History of The BISTory BISTory of of FPGAs FPGAs

FPGAs 1 CMPE691/491: Advanced FPGA Design FPGAs Large array of configurable logic blocks

Physical Design For FPGAs Rajeev Jayaraman Physical Implementation Tools Xilinx Inc. ISPD-2001

Real Time Embedded Systems &quot; Memories Memories &quot; rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Memories Introduction Why do we need memory in an FPGA Device? Topics Types of FPGA

with FP FPGAs: Cas ase Stu tudy on on a a Key-Value Store FPGAs in the Cloud Wider

FPGAs as Tools and Architectures at ETH Systems FPGAs as Tools and Architectures at ETH Systems

Virtex-7 FPGAs Target Software Virtex-7 FPGAs Target Software Defined Radio Applications Defined

Linux and FPGAs Chad D. Kersey chad@cdkersey.com cdkersey@gatech.edu Linux and FPGAs - p. 1/9

Physical optimization for Physical optimization for FPGAs using post- FPGAs using post-

Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019

High-Speed Computing &amp; Co-Processing with FPGAs FPGAs (Field Programmable Gate Arrays) are

A Network of Time Division Multiplexing for FPGAs Rosemary Francis Motivation FPGAs are

Gigabit Ethernet Gigabit Ethernet implementation for implementation for FPGAs FPGAs Grzegorz

FPGAs 1 To read more This days papers: Brown and Rose, Architecture of FPGAs and

ECEU530 Project Presentations ECE U530 Wednesday November 15: Digital Hardware Synthesis

Documentation on Record Review 1 Ineligibles Report: Client/ Miscellaneous/ Communication 2 1

Polynomial optimization using MOSEK and Julia ISMP, Pittsburgh, July 12-17, 2015 Joachim Dahl,

Hardware Design with VHDL Design Example: BRAM ECE 443 BRAM There are two sources of memory

Background w Allen Tanner built an SRAM/ROM generator program back in 2004 n the ROM seems

Read Only Memory ROM A read only memory have address inputs and data outputs With m

edet: Depfet Movie Chip (DMC) Status 20th International Workshop on DEPFET Detectors and

Market-leading Hardw are/Softw are Co-Verification Seamless for Field Programmable SoC n FPSoC

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

High-Speed Computing & Co-Processing with FPGAs FPGAs (Field Programmable Gate Arrays) are