Liberty Queues for EPIC Architectures Thomas Jablin, Yun Zhang, James A. Jablin, Jialu Huang, Hanjun Kim, & David I. August The Liberty Research Group Princeton University
Comparison: IMT, PMT, CMT IMT PMT CMT Core 1 Core 2 Core 1 Core 2 Core 1 Core 2 0 0 0 C:1 C:2 LD:1 LD:1 1 1 1 X:1 X:2 LD:2 X:1 X:1 LD:2 2 2 2 C:3 C:4 LD:3 X:2 LD:3 X:2 3 3 3 X:3 X:4 LD:4 X:3 X:3 LD:4 4 4 4 X:4 X:4 C:5 C:6 LD:5 LD:5 5 5 5 X:5 X:6 LD:6 X:5 X:5 LD:6 lat(comm) = 1: 1 iter/cycle 1 iter/cycle 1 iter/cycle
Comparison: IMT, PMT, CMT IMT PMT CMT Core 1 Core 2 Core 1 Core 2 Core 1 Core 2 0 0 0 C:1 C:2 LD:1 LD:1 1 1 1 X:1 X:2 LD:2 X:1 692 2 2 2 LD:2 C:3 C:4 LD:3 X:1 3 3 3 X:2 X:3 X:4 LD:4 X:2 4 4 4 LD:3 C:5 C:6 LD:5 X:3 5 5 5 X:3 X:5 X:6 LD:6 X:4 lat(comm) = 1: 1 iter/cycle 1 iter/cycle 0.5 iter/cycle
Bandwidth 40MB/s 430MB/s 2Gb/s DBLS '07 Technology Lamport '83 FastForward '08 Liberty '10 MCRB '09 DSWP Transactional Memory Applications StreamIt Line-Rate Network Traffic Monitoring SRMT Multithreaded Assertions
Bandwidth: 40 MB/s Lamport Queues Tail Head CPU 0: Shared CPU 0: Exclusive CPU 1: Shared CPU 1: Invalid 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0
Bandwidth: 40 MB/s Lamport Queues Tail Head CPU 0: Invalid CPU 0: Shared CPU 1: Exclusive CPU 1: Shared 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0 Every produce-consumer pair produces a cache ping-pong!
Bandwidth: 40 MB/s Lamport Queues Tail Head CPU 0: Invalid CPU 0: Shared CPU 1: Exclusive CPU 1: Shared 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0 Every produce-consumer pair produces a cache ping-pong!
Bandwidth: 430 MB/s MCRB and DBLS Tail Head CPU 0: Invalid CPU 0: Exclusive CPU 1: Exclusive CPU 1: Invalid 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive
Bandwidth: 430 MB/s MCRB and DBLS Tail Head CPU 0: Invalid CPU 0: Exclusive CPU 1: Exclusive CPU 1: Invalid 0 0 0 0 0 1 2 3 4 5 0 0 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive
Bandwidth: 430 MB/s MCRB and DBLS Tail Head CPU 0: Invalid CPU 0: Exclusive CPU 1: Exclusive CPU 1: Invalid 0 0 0 0 0 1 2 3 4 5 6 7 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive
Bandwidth: 430 MB/s MCRB and DBLS Tail Head CPU 0: Invalid CPU 0: Shared CPU 1: Exclusive CPU 1: Shared 0 0 0 0 0 1 2 3 4 5 6 7 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive
Bandwidth: 430 MB/s Caching eliminates ping-ponging! Tail Head CPU 0: Invalid CPU 0: Shared CPU 1: Exclusive CPU 1: Shared 0 0 0 0 0 1 2 3 4 5 6 7 0 0 0 0 Cached Head Cached Tail CPU 1: Exclusive CPU 0: Exclusive
Bandwidth: 500 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 0 0 0 0 0 0 0 1 2 3 4 5 6 7
Bandwidth: 500 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 5 0 0 0 0 0 0 1 2 3 4 5 6 7
Bandwidth: 500 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 5 0 0 0 0 0 0 5 2 3 4 5 6 7
Bandwidth: 490 MB/s Liberty Queues Prefeching doesn’t help Tail Head 0 0 0 0 0 1 2 3 4 5 0 0 0 0 0 0 5 0 0 0 5 6 7
Bandwidth: 690 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 5 6 7 8 0 0 0 5 2 3 4
Bandwidth: 2170 MB/s Liberty Queues Tail Head 0 0 0 0 0 1 2 3 4 5 6 7 8 0 0 0 5 6 7 8
Liberty Queue Bandwidth 4,5 4 3,5 3 2,5 GB/s 64bit 2 128bit 1,5 1 0,5 0
Recommend
More recommend