Ti Ti Tiny Directory Tiny Directory Di Di t t Making Coherence Tracking Making Coherence Tracking Making Coherence Tracking Making Coherence Tracking Feather light Feather Feather-light Feather light light Mainak Chaudhuri Indian Institute of Technology Kanpur Indian Institute of Technology Kanpur (J i (Joint work with Sudhanshu Shukla, IITK) k i h S dh h Sh kl IITK)
Forty Forty-year anniversary year anniversary • Forty years of directory-based coherence • L M Censier and P Feautrier A New L. M. Censier and P. Feautrier. A New Solution to Coherence Problems in Multicache Systems In IEEE Transactions on Multicache Systems. In IEEE Transactions on Computers, c-27 (12):1112-1118, December 1978. 1978 – “A new solution is presented and discussed here: the presence flag solution.” the presence flag solution. Lucien M. Censier, Paul Feautrier, CII-Honeywell Bull y University Pierre y et Marie Curie
Sketch Sketch • Talk in one slide • Result highlights Result highlights • Introduction • Tiny Directory Ti Di t – In-LLC coherence tracking – Tiny Directory design – Spilling into LLC space • Simulation infra-structure • Simulation results Simulation results • Summary and extensions
Sketch Sketch Talk in one slide Result highlights Result highlights • Introduction • Tiny Directory Ti Di t – In-LLC coherence tracking – Tiny Directory design – Spilling into LLC space • Simulation infra-structure • Simulation results Simulation results • Summary and extensions
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Interconnection Network Interconnection Network B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Sparse directory height is an important Interconnection Network Interconnection Network determinant of performance determinant of performance B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 We show how to design very small sparse Interconnection Network Interconnection Network directories directories while delivering high performance hile deli ering high performance B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Track privately owned blocks by Interconnection Network Interconnection Network borro ing bits from LLC data borrowing bits from LLC data way a B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Track privately owned blocks by Interconnection Network Interconnection Network borro ing bits from LLC data borrowing bits from LLC data way a B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Track privately owned blocks by Interconnection Network Interconnection Network borro ing bits from LLC data borrowing bits from LLC data way a B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Track privately owned blocks by Interconnection Network Interconnection Network borro ing bits from LLC data borrowing bits from LLC data way a B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Track privately owned blocks by Interconnection Network Interconnection Network borro ing bits from LLC data borrowing bits from LLC data way a B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Critical shared blocks with large-scale read Interconnection Network Interconnection Network sharing are tracked in a tin sharing are tracked in a tiny directory director B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Entries evicted from tiny directory can be Interconnection Network Interconnection Network spilled into LLC space at a controlled rate spilled into LLC space at a controlled rate B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Entries evicted from tiny directory can be Interconnection Network Interconnection Network spilled into LLC space at a controlled rate spilled into LLC space at a controlled rate B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Talk in One Slide Talk in One Slide C0 C1 C2 C3 B2 Private B1 B1 Cache(s) B3 Entries evicted from tiny directory can be Interconnection Network Interconnection Network spilled into LLC space at a controlled rate spilled into LLC space at a controlled rate B2 B2 Sparse B2 B1 Shared Shared Directory LLC LLC B3 Height Bank Bank Sparse Sparse B1 B3 Directory Slice Directory Slice
Result highlights Result highlights • 128-core chip-multiprocessor running scientific computing, general-purpose, and commercial multi-threaded workloads l l h d d kl d – Our Tiny Directory proposal using sparse directories with (1/32)x to (1/256)x entries performs within 1% of a 2x sparse directory • Tiny Directory capacity ranges from 187KB to 23.75KB Ti Di t it f 187KB t 23 75KB – Our Tiny Directory proposal exercising (1/256)x entries saves 16% energy in the LLC and the entries saves 16% energy in the LLC and the sparse directory compared to the 2x baseline – Our proposal outperforms the state-of-the-art – Our proposal outperforms the state of the art multi-grain directory by large margins
Result highlights Result highlights • 128-core chip-multiprocessor running scientific computing, general-purpose, and commercial multi-threaded workloads l l h d d kl d – Our Tiny Directory proposal using sparse directories with (1/32)x to (1/256)x entries A significant leap forward in saving on-die performs within 1% of a 2x sparse directory SRAM investment for coherence tracking SRAM in estment for coherence tracking • Tiny Directory capacity ranges from 187KB to 23.75KB Ti Di t it f 187KB t 23 75KB – Our Tiny Directory proposal exercising (1/256)x entries saves 16% energy in the LLC and the entries saves 16% energy in the LLC and the sparse directory compared to the 2x baseline – Our proposal outperforms the state-of-the-art – Our proposal outperforms the state of the art multi-grain directory by large margins
Sketch Sketch • Talk in one slide • Result highlights Result highlights Introduction • Tiny Directory Ti Di t – In-LLC coherence tracking – Tiny Directory design – Spilling into LLC space • Simulation infra-structure • Simulation results Simulation results • Summary and extensions
Introduction Introduction • Sparse directory is a set-associative tagged Sparse directory is a set associative tagged structure attached to each last-level cache (LLC) bank ( ) – Each sparse directory entry tracks the location(s) of an LLC block in the private cache hierarchy attached to each core tt hed to e h o e – Sparse directory implementation needs to be space-efficient as the number of cores in the space efficient as the number of cores in the chip-multiprocessor increases – The number of sparse directory entries imposes p y p an upper bound on the number of distinct blocks tracked at any point in time • This parameter plays an important role in determining This parameter plays an important role in determining the overall performance and the total space investment for coherence tracking
Sparse directory height Sparse directory height • Sparse directory height is an important • Sparse directory height is an important determinant of performance – Number of sparse directory entries is mentioned Number of sparse directory entries is mentioned as a fraction of the number of blocks in the last- level private cache (L2 cache in our case) level private cache (L2 cache in our case)
Recommend
More recommend