proximity aware directory based coherence for multi core
play

Proximity-Aware Directory-based Coherence for Multi-core Processor - PowerPoint PPT Presentation

Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures Jeff Brown Rakesh Kumar Dean Tullsen UC San Diego University of Illinois at Urbana-Champaign SPAA19 June 9, 2007 Introduction The chip multiprocessor


  1. Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures Jeff Brown Rakesh Kumar Dean Tullsen UC San Diego ● University of Illinois at Urbana-Champaign SPAA19 ● June 9, 2007

  2. Introduction ● The chip multiprocessor (CMP) era is upon us! ● Caching complicate writes ● Cache Coherence ensures caching is done safely ● Multi-core designs offer new tradeoffs

  3. Introduction ● The chip multiprocessor (CMP) era is upon us! ● Caching complicate writes ● Cache Coherence ensures caching is done safely ● Multi-core designs offer new tradeoffs P P M M

  4. Introduction ● The chip multiprocessor (CMP) era is upon us! ● Caching complicate writes ● Cache Coherence ensures caching is done safely ● Multi-core designs offer new tradeoffs P P P M M M P M

  5. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server

  6. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions P

  7. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions – Directory controllers manage memory access P Dir

  8. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions – Directory controllers manage memory access P Dir M

  9. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions – Directory controllers manage memory access P Dir M

  10. Background: Directory-based Cache Coherence ● Directory-based ; explicit per-block accounting – Doesn't rely on broadcasts ● Directory operation: client/server – Processors request data, permissions – Directory controllers manage memory access ● Updates, conflicts P Dir P M

  11. Background: Historical MP Cache Coherence ● Distributed directory, memory P P P P M M M M

  12. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss P P P P M M M M

  13. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss "Home Node" P P P P M M M M

  14. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss "Home Node" P P P P M M M M

  15. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss "Home Node" P P P P M M M M Data Request

  16. Background: Historical MP Cache Coherence ● Distributed directory, memory Cache Miss "Home Node" P P P P M M M M Data Request Reply

  17. Motivation: Multi-core Cache Coherence P P P P M M M M

  18. Motivation: Multi-core Cache Coherence Cache Miss P P P P M M M M

  19. Motivation: Multi-core Cache Coherence Cache Miss P P P P M M M M

  20. Motivation: Multi-core Cache Coherence Cache Miss P P P P "Home M M Node" M M

  21. Motivation: Multi-core Cache Coherence Cache Miss P P Data Request P P "Home M M Node" M M

  22. Motivation: Multi-core Cache Coherence Cache Miss P P Data Request P P "Home M M Node" M M

  23. Motivation: Multi-core Cache Coherence Cache Miss P P P P Reply "Home M M Node" M M

  24. Motivation: Multi-core Cache Coherence Additional Sharer P P P P M M M M

  25. Motivation: Multi-core Cache Coherence Additional Sharer P P P P M M M M ● Multi-core designs present radically different relative latency & bandwidth

  26. Outline ● Introduction & Background ● System Architecture ● Proximity-Aware Coherence ● Results ● Conclusion

  27. Directory-based Cache Coherence ● Directory structures

  28. Directory-based Cache Coherence ● Directory structures Main Memory

  29. Directory-based Cache Coherence ● Directory structures Main Memory

  30. Directory-based Cache Coherence ● Directory structures – Directory Memory Directory Main Memory Memory

  31. Directory-based Cache Coherence ● Directory structures – Directory Memory – Directory Entries Directory Main Memory Memory

  32. Directory-based Cache Coherence ● Directory structures – Directory Memory – Directory Entries – Directory Controller Controller Directory Main Memory Memory

  33. A Traditional Multiprocessor Core Core … L2 $ L2 $ Dir Dir Mem Mem Interconnect

  34. A Traditional Multiprocessor (Chassis, board, etc.) Core Core … L2 $ L2 $ Dir Dir Mem Mem Interconnect

  35. A Traditional Multiprocessor (Chassis, board, etc.) Core Core … L2 $ L2 $ Dir Dir Mem Mem Interconnect

  36. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  37. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  38. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  39. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  40. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  41. Our 16-Core Chip Multiprocessor Tile Tile ... 0 1 Core L2 $ Dir Bus control Dir $ Tile Mem. Net. channel switch 15

  42. Outline ● Introduction & Background ● System Architecture ● Proximity-Aware Coherence ● Results ● Conclusion

  43. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy

  44. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible

  45. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies

  46. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies P P P P M M M M

  47. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies Cache Miss P P Data Request P P "Home M M Node" M M

  48. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies Additional Sharer P P P P "Home M M Node" M M

  49. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies Additional Sharer Forward Request P P P P "Home M M Node" M M

  50. Proximity-Aware Coherence ● Idea: home node asks sharer nearest requester to forward its cached copy – Stay on-chip when possible – Minimize transit of large data-carrying replies Reply P P P P M M M M

  51. Proximity-Aware Coherence ● To service read misses for shared data, traditional protocols use main memory ● Other nodes may hold copies ● On the CMP landscape, inter-node latency is much less than memory latency

  52. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask

  53. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask Miss Home

  54. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask – rand Miss Home

  55. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask – rand Miss – near1 Home

  56. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask – rand Miss – near1 – via1 Home

  57. Sharer Selection ● When the home node lacks a cached copy, it selects a sharer to ask – rand Miss – near1 – via1 Home ● Retries didn't prove beneficial

  58. Outline ● Introduction & Background ● System Architecture ● Proximity-Aware Coherence ● Results ● Conclusion

  59. Methodology ● Detailed, execution-driven processor and network simulation ● "RSIM" simulator, adapted to our CMP model ● Parallel workloads from several suites ● Hardware, benchmark details in paper

  60. Proximity-Aware: Potential Coverage Fraction of read misses to shared lines 1 0.9 0.8 0.7 0.6 6 5 0.5 4 0.4 3 2 0.3 1 0.2 0.1 0 appbt fft lu mp3d ocean quicksort unstruct

Recommend


More recommend