efficient multi ported memories for fpgas
play

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg - PowerPoint PPT Presentation

Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010 Parallelism in FPGAs Larger SoCs on FPGAs Parallel Systems Parallel systems on FPGAs


  1. Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010

  2. Parallelism in FPGAs  Larger SoCs on FPGAs → Parallel Systems  Parallel systems on FPGAs will need: − Queueing − Data sharing − Communication − Synchronization  Boils down to: − FIFOs − Register files We can do all these with multi-ported memories 2

  3. Multi-Ported Memory X X X X Existing workarounds are ad-hoc, “roll-your-own”, and have limited parallelism. 3

  4. Conventional Approaches 4

  5. 2W/2R Multi-Ported Memory Doesn't exist on FPGAs Altera used to have one (Mercury) 5

  6. Stratix III Building Blocks Adaptive Logic Modules Flexible,  Registers but slow  LUTs  Adders Block RAMs Fast, but  M9K (eg: 32 x 256) inflexible  M144K (eg: 32 x 4098) 6

  7. 2W/2R Pure-ALM Scales very poorly with memory depth 7

  8. 1W/nR Replication Only one write port Multiple read ports 8

  9. mW/nR Banking Multiple write ports Fragmented data 9

  10. mW/nR “Multipumping”  Multiple read/write ports  Divides clock speed  No fragmentation  Read/write ordering 10

  11. Block RAMs: Simple Dual Port Write Read 11

  12. Block RAMs: True Dual Port R / W R / W 12

  13. “Pure Multipumping” Read as banked memory (multiple reads) 13

  14. “Pure Multipumping” Write as replicated memory (avoids fragmentation) 14

  15. Methodology  Generate design variations over space − Vary # of ports, depth, type of memories  1W/2R to 8W/16R  2 to 256 elements deep  Pure-ALM, M9K, MLAB, Multipumped − Wrap in testbench for timing and correctness  Target Quartus 9.0 to Stratix III − No synthesis optimizations for speed or area − Standard P&R effort (speed, avg. over 10 runs)  Measure area as Total Equivalent Area − Expresses area in a single unit (ALMs) 15

  16. Conventional Multi-Porting Performance 16

  17. 1W/2R Pure-ALM Area vs. Speed Too big and slow! Faster NiosII/f 290 MHz 500 ALMs 17 Smaller

  18. 1W/2R Replicated vs. Pure-ALM 18

  19. 1W/2R “Pure Multipumping” 19

  20. LVT-Based Multi-Ported Memories 20

  21. LVT-Based Memory 21

  22. LVT-Based Memory Begin with one block RAM 22

  23. LVT-Based Memory Replicate for two read ports 23

  24. LVT-Based Memory Bank for two write ports 24

  25. LVT-Based Memory Select bank to read from 25

  26. LVT-Based Memory Add bank lookup table 26

  27. LVT-Based Memory 27

  28. Live Value Table Operation 28

  29. LVT Operation 2W/2R, 4-deep 29

  30. LVT Operation W 0 W 0 R 0 0 1 2 R 1 W 1 3 Write Addresses Read Addresses Live Value Table 30

  31. LVT Operation: Write W 0 W 0 R 0 0 42 @ 1 1 0 2 R 1 W 1 23 @ 3 3 1 Records which write port last updated a location 31

  32. LVT Operation: Read W 0 W 0 R 0 0 @ 3 1 1 0 2 R 1 W 1 @ 1 3 1 0 Steers read port to correct memory bank 32

  33. LVT Implementation LVT remains practical because it is very narrow 33

  34. LVT Operation Small Pure-ALM memory controlling larger block RAMs 34

  35. Advantages of LVTs  LVTs add a layer of indirection − Everything operates in parallel − Makes banked memory behave as consistent unit  LVTs are narrow − Word width = log 2 (# of write ports) < 4 bits typically − Pure-ALM, but practical size and speed 35

  36. LVT Performance 36

  37. 2W/4R Pure-ALM 37

  38. 2W/4R LVT-based vs. Pure-ALM 412 MHz to 375 MHz 84% smaller 43% faster 38

  39. 2W/4R Multipumping Must be careful about read/write ordering! 39

  40. Multipumping Performance 40

  41. 2W/4R Multipumping 41

  42. 2W/4R Multipumping Pure Multipumping (279 MHz) 42

  43. 4W/8R Multipumping Worsens as # of ports increases 43

  44. 2W/4R Multipumping 54% slower 28% smaller on average on average 193 MHz to 174 MHz 44

  45. Conclusions  LVT-based memories are faster and smaller than Pure-ALM memories.  LVT-based memories are faster than pure multipumping, but at a cost in area.  Pure multipumped memories are better for memories with few ports or low speed. 45

  46. Future Work  Pure multipumping for LVT-based memories − Build banks with 2W/4R pure multipumping blocks − Possible further area improvement  Relaxing the read/write order for multipumping − Allows multiplexing the write ports − Leaves designer to watch for WAR violations 46

  47. Thank You 47

Recommend


More recommend