liberty queues for epic architectures
play

Liberty Queues for EPIC Architectures Thomas Jablin, Yun Zhang, - PowerPoint PPT Presentation

Liberty Queues for EPIC Architectures Thomas Jablin, Yun Zhang, James A. Jablin, Jialu Huang, Hanjun Kim, & David I. August The Liberty Research Group Princeton University Comparison: IMT, PMT, CMT IMT PMT CMT Core 1 Core 2 Core 1


  1. Liberty Queues for EPIC Architectures Thomas Jablin, Yun Zhang, James A. Jablin, Jialu Huang, Hanjun Kim, & David I. August The Liberty Research Group Princeton University

  2. Comparison: IMT, PMT, CMT IMT PMT CMT Core 1 Core 2 Core 1 Core 2 Core 1 Core 2 0 0 0 C:1 C:2 LD:1 LD:1 1 1 1 X:1 X:2 LD:2 X:1 X:1 LD:2 2 2 2 C:3 C:4 LD:3 X:2 LD:3 X:2 3 3 3 X:3 X:4 LD:4 X:3 X:3 LD:4 4 4 4 X:4 X:4 C:5 C:6 LD:5 LD:5 5 5 5 X:5 X:6 LD:6 X:5 X:5 LD:6 lat(comm) = 1: 1 iter/cycle 1 iter/cycle 1 iter/cycle

  3. Comparison: IMT, PMT, CMT IMT PMT CMT Core 1 Core 2 Core 1 Core 2 Core 1 Core 2 0 0 0 C:1 C:2 LD:1 LD:1 1 1 1 X:1 X:2 LD:2 X:1 692 2 2 2 LD:2 C:3 C:4 LD:3 X:1 3 3 3 X:2 X:3 X:4 LD:4 X:2 4 4 4 LD:3 C:5 C:6 LD:5 X:3 5 5 5 X:3 X:5 X:6 LD:6 X:4 lat(comm) = 1: 1 iter/cycle 1 iter/cycle 0.5 iter/cycle

  4. Bandwidth 40MB/s 430MB/s 2Gb/s DBLS '07 Technology Lamport '83 FastForward '08 Liberty '10 MCRB '09 DSWP Transactional Memory Applications StreamIt Line-Rate Network Traffic Monitoring SRMT Multithreaded Assertions

  5. Bandwidth: 40 MB/s Lamport Queues Tail Head CPU 0: Shared CPU 0: Exclusive CPU 1: Shared CPU 1: Invalid 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0

  6. Bandwidth: 40 MB/s Lamport Queues Tail Head CPU 0: Invalid CPU 0: Shared CPU 1: Exclusive CPU 1: Shared 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0 Every produce-consumer pair produces a cache ping-pong!

  7. Bandwidth: 40 MB/s Lamport Queues Tail Head CPU 0: Invalid CPU 0: Shared CPU 1: Exclusive CPU 1: Shared 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0 Every produce-consumer pair produces a cache ping-pong!

  8. Bandwidth: 430 MB/s MCRB and DBLS Tail Head CPU 0: Invalid CPU 0: Exclusive CPU 1: Exclusive CPU 1: Invalid 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive

  9. Bandwidth: 430 MB/s MCRB and DBLS Tail Head CPU 0: Invalid CPU 0: Exclusive CPU 1: Exclusive CPU 1: Invalid 0 0 0 0 0 1 2 3 4 5 0 0 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive

  10. Bandwidth: 430 MB/s MCRB and DBLS Tail Head CPU 0: Invalid CPU 0: Exclusive CPU 1: Exclusive CPU 1: Invalid 0 0 0 0 0 1 2 3 4 5 6 7 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive

  11. Bandwidth: 430 MB/s MCRB and DBLS Tail Head CPU 0: Invalid CPU 0: Shared CPU 1: Exclusive CPU 1: Shared 0 0 0 0 0 1 2 3 4 5 6 7 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive

  12. Bandwidth: 430 MB/s Caching eliminates ping-ponging! Tail Head CPU 0: Invalid CPU 0: Shared CPU 1: Exclusive CPU 1: Shared 0 0 0 0 0 1 2 3 4 5 6 7 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive

  13. Bandwidth: 500 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0 1 2 3 4 5 6 7

  14. Bandwidth: 500 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 5 0 0 0 0 0 0 1 2 3 4 5 6 7

  15. Bandwidth: 500 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 5 0 0 0 0 0 0 5 2 3 4 5 6 7

  16. Bandwidth: 490 MB/s Liberty Queues Prefeching doesn’t help Tail Head 0 0 0 0 0 1 2 3 4 5 0 0 0 0 0 0 5 0 0 0 5 6 7

  17. Bandwidth: 690 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 5 6 7 8 0 0 0 5 2 3 4

  18. Bandwidth: 2170 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 5 6 7 8 0 0 0 5 6 7 8

  19. Liberty Queue Bandwidth 4,5 4 3,5 3 2,5 GB/s 64bit 2 128bit 1,5 1 0,5 0

Recommend


More recommend