what we ll talk about
play

What well talk about 2 ZSim has a full-featured memory system - PowerPoint PPT Presentation

MICRO 2015 W AIKIKI H AWAII 5 D EC 2015 ZS IM T UTORIAL M EMORY S YSTEM N ATHAN B ECKMANN What well talk about 2 ZSim has a full-featured memory system (originally designed for caches) Core Memory What well talk about 2


  1. Example: “A day in the life of a memory request” 4  Bound-phase function simulation  Some components add weave-phase modeling access() lookup() Replac load()/store() access() Array ement Coherence L1I Prefetcher Filter $ Core L2 access() L1D Latency Coherence Directory access() NoC Cache Memory invalidate() MemReq L1I Coherence Prefetcher Filter $ Core Contention Model L2 L1D invalidate()

  2. Example: “A day in the life of a memory request” 4  Bound-phase function simulation  Some components add weave-phase modeling access() lookup() rankCands() Replac load()/store() access() Array cands ement Coherence L1I Prefetcher Filter $ Core L2 access() L1D Latency Coherence Directory access() NoC Cache Memory invalidate() MemReq L1I Coherence Prefetcher Filter $ Core Contention Model L2 L1D invalidate()

  3. Example: “A day in the life of a memory request” 4  Bound-phase function simulation  Some components add weave-phase modeling access() lookup() rankCands() Replac load()/store() access() Array cands ement Coherence L1I Prefetcher Filter $ Core L2 access() L1D Latency Coherence Directory access() NoC Cache Memory invalidate() MemReq L1I Coherence Prefetcher Filter $ Core Contention Model L2 L1D invalidate()

  4. Example: “A day in the life of a memory request” 4  Bound-phase function simulation  Some components add weave-phase modeling access() lookup() rankCands() Replac load()/store() access() Array cands ement Coherence L1I Prefetcher Filter $ Core L2 access() L1D Latency Coherence Directory access() NoC Cache Memory invalidate() MemReq L1I Coherence Prefetcher Filter $ Core Contention Model L2 L1D invalidate()

  5. Important ZSim memory classes 5 MemReq

  6. MemReq 6  Represents an in-flight memory request  Important fields:  uint64_t lineAddr – shifted address  AccessType type – GETS, GETX, PUTS, PUTX  uint64_t cycle – requesting cycle  MESIState* state – coherence state (M, E, S, or I)  Important methods:  N/A

  7. Important ZSim memory classes 7 MemReq

  8. Important ZSim memory classes 7 MemReq MemObject

  9. MemObject 8  Generic interface for things that handle memory requests  Important fields:  N/A  Important methods:  uint64_t access(MemReq& req) – performs an access and returns completion time

  10. Implementing a simple model for main memory 9 class SimpleMemory : public MemObject { uint64_t latency; g_string name; public: SimpleMemory(uint64_t _latency, g_string _name) : latency(_latency), name(_name) {}; const char* getName() { return name.c_str(); } uint64_t access(MemReq& req) { switch (req.type) { case PUTS: case PUTX: // write *req.state = I; case GETS: *req.state = req.is(MemReq::NOEXCL)? S : E; case GETX: *req.state = M; } return req.cycle + latency; } };

  11. Implementing a simple model for main memory 9 class SimpleMemory : public MemObject { uint64_t latency; g_string name; public: SimpleMemory(uint64_t _latency, g_string _name) : latency(_latency), name(_name) {}; const char* getName() { return name.c_str(); } Set coherence in requestor uint64_t access(MemReq& req) { switch (req.type) { case PUTS: case PUTX: // write *req.state = I; case GETS: *req.state = req.is(MemReq::NOEXCL)? S : E; case GETX: *req.state = M; } return req.cycle + latency; } };

  12. Implementing a simple model for main memory 9 class SimpleMemory : public MemObject { uint64_t latency; g_string name; public: SimpleMemory(uint64_t _latency, g_string _name) : latency(_latency), name(_name) {}; const char* getName() { return name.c_str(); } Set coherence in requestor uint64_t access(MemReq& req) { switch (req.type) { case PUTS: case PUTX: // write *req.state = I; case GETS: *req.state = req.is(MemReq::NOEXCL)? S : E; case GETX: *req.state = M; } return req.cycle + latency; Completion cycle } };

  13. Important ZSim memory classes 10 “is a” MemReq MemObject

  14. Important ZSim memory classes 10 “is a” MemReq MemObject SimpleMemory MD1Memory DDRMemory

  15. Memory controllers 11  Different models for main memory  SimpleMemory: fixed-latency, no contention  Important fields: latency  MD1Memory: contention modeled using M/D/1 queue  Important fields: megabytesPerSecond (bandwidth), zeroLoadLatency, etc.  DDRMemory & DRAMSimMemory: detailed modeling of DDR timings  Important fields: lots of configuration parameters (CAS, RAS, bus MHz)  Timings modeled in weave-phase  Requires TimingCore or OOO core models  Similar accuracy, but DDRMemory is much faster

  16. Important ZSim memory classes 12 “is a” MemReq MemObject SimpleMemory MD1Memory DDRMemory

  17. Important ZSim memory classes 12 “is a” MemReq MemObject SimpleMemory MD1Memory DDRMemory BaseCache InvReq

  18. InvReq 13  Represents an invalidation request from coherence controller/directory  Important fields:  uint64_t lineAddr – shifted address  InvType type – INV, INVX, FWD  uint64_t cycle – requesting cycle  Important methods:  N/A

  19. BaseCache 14  Generic interface for cache-like objects  Important fields:  N/A  Important methods:  void setParents (…) – register the caches above it in the hierarchy  void setChildren (…) – register the caches below it in the hierarchy  uint64_t invalidate(const InvReq& req) – invalidate line locally & in children  uint64_t access(MemReq& req)

  20. Important ZSim memory classes 15 “is a” MemReq MemObject SimpleMemory MD1Memory DDRMemory BaseCache InvReq

  21. Important ZSim memory classes 15 “is a” MemReq MemObject SimpleMemory MD1Memory DDRMemory BaseCache InvReq Cache

  22. Cache 16  Inclusive cache  Contains tag array, coherence controller, replacement policy (discussed later)  Adds logic to control these components  Important fields (that aren’t discussed later):  uint32_t accLat – access latency  uint32_t invLat – invalidation latency  Important methods:  void setParents (…) – register the caches above it in the hierarchy  void setChildren (…) – register the caches below it in the hierarchy  uint64_t invalidate(const InvReq& req) – invalidate line locally & in children  uint64_t access(MemReq& req)

  23. How ZSim allows concurrency 17 L3 L2 L1 L1 Core Core

  24. How ZSim allows concurrency 17 L3 L2 L1 L1 Core Core

  25. How ZSim allows concurrency 17 L3 L2 L1 L1 Core Core

  26. How ZSim allows concurrency 17 L3 L2 L1 L1 Core Core

  27. How ZSim allows concurrency 17 L3 L2 L1 L1 Core Core

  28. How ZSim allows concurrency 17 L3 L2 L1 L1 Core Core

  29. How ZSim allows concurrency 18  Naïve “big lock” implementation won’t work L3 L2 L1 L1 Core Core

  30. How ZSim allows concurrency 18  Naïve “big lock” implementation won’t work L3 L2 L1 L1 Core Core

  31. How ZSim allows concurrency 19  There is concurrency available! L3 L2 L1 L1 MemReq Core MemReq Core

  32. How ZSim allows concurrency 19  There is concurrency available! L3 MemReq L2 L1 L1 Core MemReq Core

  33. How ZSim allows concurrency 19  There is concurrency available! L3 MemReq L2 L1 MemReq L1 Core Core

  34. How ZSim allows concurrency 19  There is concurrency available! MemReq L3 L2 L1 MemReq L1 Core Core

  35. How ZSim allows concurrency 19  There is concurrency available! MemReq L3 L2 MemReq L1 L1 Core Core

  36. How ZSim allows concurrency 19  There is concurrency available! MemReq L3 L2 L1 L1 MemReq Core Core

  37. How ZSim allows concurrency 19  There is concurrency available! L3 L2 L1 L1 MemReq MemReq Core Core

  38. How ZSim allows concurrency 19  There is concurrency available! L3 Requires handling many complex transients! L2 L1 L1 MemReq MemReq Core Core

  39. How ZSim allows concurrency 19  There is concurrency available! L3 Requires handling many complex transients! L2 L1 L1 MemReq MemReq Core Core

  40. How ZSim allows concurrency 20  Locking each cache leads to deadlock on invalidations L3 MemReq L2 L1 L1 Core Core

  41. How ZSim allows concurrency 20  Locking each cache leads to deadlock on invalidations L3 MemReq L2 MemReq L1 L1 Core Core

  42. How ZSim allows concurrency 20  Locking each cache leads to deadlock on invalidations L3 L1 is waiting on L2 on MemReq MemReq L2 L2 is waiting on L1 on InvReq InvReq  Deadlock! MemReq L1 L1 Core Core

  43. How ZSim allows concurrency 20  Locking each cache leads to deadlock on invalidations L3 L1 is waiting on L2 on MemReq MemReq L2 L2 is waiting on L1 on InvReq InvReq  Deadlock! MemReq L1 L1 Core Core

  44. How ZSim allows concurrency 21  Blocks more accesses going up, allows invalidations going down  Caches have two locks: access lock + invalidation lock  Invalidations are prioritized  Accesses acquire both locks  Invalidations need only invalidation lock

  45. How ZSim allows concurrency 21  Blocks more accesses going up, allows invalidations going down  Caches have two locks: access lock + invalidation lock  Invalidations are prioritized  Accesses acquire both locks  Invalidations need only invalidation lock uint64_t Cache::access(MemReq& req) { invLock.acquire(); accLock.acquire(); // look up address etc invLock.release() parent->access(req); // check if we got an invalidation! accLock.release(); return completionTime; }

  46. How ZSim allows concurrency 21  Blocks more accesses going up, allows invalidations going down  Caches have two locks: access lock + invalidation lock  Invalidations are prioritized  Accesses acquire both locks  Invalidations need only invalidation lock uint64_t Cache::access(MemReq& req) { uint64_t Cache::invalidate(InvReq& req) { invLock.acquire(); accLock.acquire(); invLock.acquire(); // look up address etc // do invalidation invLock.release() children.invalidate(req); parent->access(req); invLock.release() // check if we got an invalidation! return completionTime; accLock.release(); } return completionTime; }

  47. How ZSim allows concurrency 22 Invalidation lock L3 Access lock L2 L1 L1 MemReq Core MemReq Core

  48. How ZSim allows concurrency 22 Invalidation lock L3 Access lock L2 MemReq L1 L1 Core MemReq Core

  49. How ZSim allows concurrency 22 Invalidation lock L3 Access lock MemReq L2 L1 L1 Core MemReq Core

  50. How ZSim allows concurrency 22 Invalidation lock L3 Access lock MemReq L2 L1 MemReq L1 Core Core

  51. How ZSim allows concurrency 22 Invalidation lock L3 Access lock MemReq L2 MemReq L1 L1 Core Core

  52. How ZSim allows concurrency 22 Invalidation lock L3 Access lock MemReq L2 InvReq MemReq L1 L1 Core Core

  53. How ZSim allows concurrency 22 Invalidation lock L3 Access lock MemReq L2 MemReq L1 L1 InvReq Core Core

  54. How ZSim allows concurrency 22 Invalidation lock L3 Access lock MemReq L2 MemReq L1 L1 Core Core

  55. How ZSim allows concurrency 22 Invalidation lock MemReq L3 Access lock L2 MemReq L1 L1 Core Core

  56. How ZSim allows concurrency 22 Invalidation lock L3 Access lock L2 MemReq L1 MemReq L1 Core Core

  57. How ZSim allows concurrency 22 Invalidation lock L3 Access lock L2 MemReq L1 MemReq L1 Core Core

  58. How ZSim allows concurrency 22 Invalidation lock L3 Access lock L2 L1 MemReq L1 MemReq Core Core

  59. How ZSim allows concurrency 22 Invalidation lock L3 Access lock L2 L1 L1 MemReq MemReq Core Core

  60. Important ZSim memory classes 23 “is a” MemReq MemObject SimpleMemory MD1Memory DDRMemory BaseCache InvReq Cache

  61. Important ZSim memory classes 23 “is a” MemReq MemObject SimpleMemory MD1Memory DDRMemory BaseCache InvReq Cache NUCACache StreamPrefetcher

  62. NUCACache 24

  63. NUCACache 24  Non-uniform cache access: banks distributed around the chip  Important fields:  BankDir* bankDir – see below  g_vector<BaseCache*> banks – the distributed banks  Important methods: none over BaseCache

  64. NUCACache 24  Non-uniform cache access: banks distributed around the chip  Important fields:  BankDir* bankDir – see below  g_vector<BaseCache*> banks – the distributed banks  Important methods: none over BaseCache  Supports dynamic NUCA policies via BankDir class  uint32_t preAccess(MemReq& req) – Give destination bank  int32_t getPrevBank(MemReq& req, uint32_t curBank) – Get old bank (if moved)

  65. NUCACache 24  Non-uniform cache access: banks distributed around the chip  Important fields:  BankDir* bankDir – see below  g_vector<BaseCache*> banks – the distributed banks  Important methods: none over BaseCache  Supports dynamic NUCA policies via BankDir class  uint32_t preAccess(MemReq& req) – Give destination bank  int32_t getPrevBank(MemReq& req, uint32_t curBank) – Get old bank (if moved)  Wide-ranging support  First-touch, R- NUCA [Hardavellas ISCA’09], [Awasthi HPCA’09], idealized private D -NUCA [Herrero ISCA’10], Jigsaw [Beckmann PACT’13, Beckmann HPCA’15]  Some yet-to-be-released

  66. NUCACache::access pseudo-code 25 uint64_t NUCACache::access(MemReq& req) { uint32_t bank = bankDir->preAccess(req); int32_t prevBank = bankDir->getPrevBank(req, bank); if (prevBank != -1 && bank != prevBank) { // move the line from prevBank to bank } uint64_t completionCycle = banks[bank]->access(req); return completionCycle; }

  67. Implementing your own D-NUCA 26  Idealized “last - touch” bank dir that migrates lines to wherever they are referenced uint32_t LastTouchBankDir::preAccess(MemReq& req) { uint32_t closestBank = nuca->getSortedRTTs(req.childId)[0].second; return closestBank; } int32_t LastTouchBankDir::getPrevBank(MemReq& req, uint32_t currentBank) { ScopedMutex sm(mutex); // avoid races auto prevBankId = lineMap.find(req.lineAddr); if (prevBankId == lineMap.end() || currentBank == *prevBankId) { return -1; } else { uint32_t prevBank = *prevBankId; *prevBankId = currentBank; return *prevBank; } }

  68. StreamPrefetcher 27  Implements stream prefetcher  Important fields:  Entry array[16] – the streams it is following  Important methods: none over BaseCache  Prefetcher will issue its own MemReqs to parents  Validated against Westmere

  69. Important ZSim memory classes 28 “is a” MemReq MemObject SimpleMemory MD1Memory DDRMemory BaseCache InvReq Cache NUCACache StreamPrefetcher

Recommend


More recommend