memory access scheduler
play

Memory Access Scheduler Matthew Cohen, Alvin Lin 6.884 Complex - PowerPoint PPT Presentation

Memory Access Scheduler Matthew Cohen, Alvin Lin 6.884 Complex Digital Systems May 6 th , 2005 Why Use Scheduling? Sequential accesses to DRAM are wasteful Improve latency and bandwidth of memory requests Order requests to take


  1. Memory Access Scheduler Matthew Cohen, Alvin Lin 6.884 – Complex Digital Systems May 6 th , 2005

  2. Why Use Scheduling? � Sequential accesses to DRAM are wasteful � Improve latency and bandwidth of memory requests � Order requests to take advantage of DRAM characteristics

  3. DRAM Bank FSM Reads, Writes Activate Row Idle Row Active Bank Precharge

  4. Memory Access Scheduling Traditional Scheduling: Bank 0 Active R Precharge Idle Bank 1 Idle Active R Precharge Idle Bank 2 Idle Active R Precharge Idle Bank 3 Idle Active Memory Access Scheduling: Bank 0 Active R Idle Precharge Idle Bank 1 Active R Idle Precharge Idle Bank 2 Active R Idle Precharge Idle Bank 3 Active Idle R Idle Precharge Idle � Avoid data line conflicts (read/write) � Avoid control line conflicts

  5. High-Level Architecture Instructions Inst. Cache Controller Memory CPU DRAM Scheduler Data Cache Controller Data

  6. Instruction and Data Cache � Separate I- and D-caches � Fully parameterizable sizes � Direct mapped caches � Write-through, no-write-allocate � Four words per cache line V Tag Word 0 Word 1 Word 2 Word 3 V Tag Word 0 Word 1 Word 2 Word 3 V Tag Word 0 Word 1 Word 2 Word 3

  7. Incremental Design � Fully blocking, single word per line � Fully blocking, four words per line � Hit under miss � Miss under miss � Necessary for full benefits of scheduling

  8. Non-Blocking Cache Architecture BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 � On cache load miss, BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 add request to Pending BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 Request Buffer (PRB) � Place µP tag in Tag location, set Valid, issue read request to scheduler with tag = PRB index � If another read to same line, set tag and valid but no new read request � On return of data, match tag to PRB line, retrieve µ P tag of valid entries, return data to µ P

  9. Non-Blocking Cache Architecture BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 � On cache store BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 request, search PRB BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 BUFTAG V0 V1 V2 V3 Tag0Tag1Tag2Tag3 � If already issued read to this line, stall

  10. High-Level Architecture Instructions Inst. Cache Controller Memory CPU DRAM Scheduler Data Cache Controller Data

  11. Scheduler Overview � Cache misses are sent to the scheduler � Scheduler is responsible for interfacing with the DRAM � Requests may be honored out of order

  12. Scheduler Tasks � Keep waiting buffers of pending memory requests � Prioritize accesses in waiting buffer � Respect timing of the DRAM � Capture data coming back from DRAM � Keep the DRAM busy!

  13. Scheduler RTL Design Waiting Buffer Bank 0 Instructions Waiting Buffer Bank 1 From Cache DRAM Controllers Waiting Data Buffer Bank 2 Back to Cache Waiting Controllers Buffer Bank 3

  14. Incremental Design � Blocking In-Order Scheduler � FIFOs as Waiting Buffers and In- Order Scheduling � Real Waiting Buffers and Interleaved Scheduling

  15. Infinite Compile Time � Scheduler exploded in complexity � Huge amount of combinational logic � Memory access scheduling is a difficult problem � DRAM is not designed to work easily with scheduling

  16. Architectural Exploration � Change cache size to adjust cache miss percentage � Change PRB size to allow for scheduling optimization � Larger sizes should yield better results but higher cost

  17. Total Time to Make 6000 Random Accesses to 512 Addresses 60000 50000 40000 Time (ns) 128 Byte Cache 30000 256 Byte Cache 512 Byte Cache 20000 10000 0 1 10 100 PRB Lines

  18. Synthesis Results (Area = 196,117.6 µ m 2 )

  19. Conclusion � Memory becoming bottleneck for computer systems � In-order memory access is simple in logic but wasteful in performance � Memory access scheduling is much more efficient in theory, but complex in implementation

  20. Acknowledgements � 6884-bluespec � 6884-staff � group1, for teaching us how to use Vector, even if you didn’t realize it…

Recommend


More recommend