Impr mproving DR ving DRAM P M Per erfor ormanc mance e by P y Par arallelizing R allelizing Refr efreshes eshes with A with Accesses esses Kevin Chang Donghyuk Lee, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Yoongu Kim, Onur Mutlu
Ex Executiv ecutive Summar e Summary y • DRAM refr efresh in esh inter erfer eres with memor es with memory ac y accesses esses – Degrades system performance and energy efficiency – Becomes exacerbated as DRAM density increases • Goal: Serve memory accesses in parallel with refreshes to reduce refresh interference on demand requests • Our mechanisms: – 1. Enable more parallelization between refreshes and accesses across different banks with new per-bank refresh scheduling algorithms – 2. Enable serving accesses concurrently with refreshes in the same bank by exploiting DRAM subarrays • Improve system performance and energy efficiency for a wide variety of different workloads and DRAM densities – 20.2% and 9.0% for 8-core systems using 32Gb DRAM – Very close to the ideal scheme without refreshes 2
Outline Outline • Motiv otivation and Key Ideas tion and Key Ideas • DRAM and Refresh Background • Our Mechanisms • Results 3
Refr efresh P esh Penalt enalty y Refresh interferes with memory accesses oller troller Access ess emory y Memor tr transist ansistor or Refr efresh esh DR DRAM M Proc ocessor essor ontr Read ead Con Data Da ta Capacit apacitor or Refr efresh dela esh delays r s requests b equests by 100s of ns y 100s of ns 4
Existing R Existing Refr efresh M esh Modes odes All-bank r All-bank refr efresh esh in commodity DRAM (DDRx) Bank 7 Bank 7 Time … Refr efresh esh Bank 1 Bank 1 Bank 0 Bank 0 Per-bank refresh allows accesses to other Per er-bank r -bank refr efresh esh in mobile DRAM (LPDDRx) banks while a bank is refreshing Round-r ound-robin or obin order der Bank 7 Bank 7 Time … Bank 1 Bank 1 Bank 0 Bank 0 5
Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • Problem 1: Refreshes to different banks are scheduled in a strict round-robin order – The static ordering is hardwired into DRAM chips – Refreshes busy banks with many queued requests when other banks are idle • Key idea: Schedule per-bank refreshes to idle banks opportunistically in a dynamic order 6
Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • Problem 2: Banks that are being refreshed cannot concurrently serve memory requests Dela elayed b ed by r y refr efresh esh Bank 0 Bank 0 Per er-Bank R -Bank Refr efresh esh RD RD Time 7
Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • Problem 2: Refreshing banks cannot concurrently serve memory requests • Key idea: Exploit subar subarrays within a bank to parallelize refreshes and accesses across subar subarrays s Subarray 1 Subar y 1 RD RD Time Bank 0 Bank 0 Subar Subarray 0 y 0 Time Subar Subarray R y Refr efresh esh Par aralleliz allelize e 8
Outline Outline • Motivation and Key Ideas • DR DRAM and R M and Refr efresh Backg esh Background ound • Our Mechanisms • Results 9
DR DRAM S M Syst stem Or em Organiza ganization tion Rank 1 ank 1 Rank 0 ank 0 Bank 7 Bank 7 Rank 1 ank 1 … DRAM DR M Bank 1 Bank 1 Bank 0 Bank 0 • Banks can serve multiple requests in parallel 10
DR DRAM R M Refr efresh F esh Frequenc equency y • DRAM standard requires memory controllers to send per periodic r iodic refr efreshes eshes to DRAM tRefLatency (tRFC): Varies based on DRAM chip density (e.g., 350ns) Read/Write: roughly 50ns Timeline tRefPeriod (tREFI): Remains constan onstant 11
Incr ncreasing P easing Per erfor ormanc mance I e Impac mpact t • DRAM is unavailable to serve requests for tRefLatency of time tRefPeriod • 6.7% 6.7% for today’s 4Gb DRAM • Unavailability increases with higher density due to higher tRefLatency – 23% / 41% 23% / 41% for futur future 32Gb / 64Gb DR e 32Gb / 64Gb DRAM 12
All-Bank vs. P All-Bank v . Per er-Bank R -Bank Refr efresh esh All-Bank Refresh: Employed in commodity DRAM (DDRx, LPDDRx) Read ead Bank 1 Refr efresh esh Timeline Refr efresh esh Staggered across Read ead Bank 0 Refr efresh esh banks to limit power Per-Bank Refresh: In mobile DRAM (LPDDRx) Bank 1 Timeline Read ead Refr efresh esh Bank 0 Refr efresh esh Read ead • Shorter tR tRefLa efLatenc ency than that of all-bank refresh Can serve memory accesses in parallel with • More frequent refreshes (shorter tR tRefP efPer eriod iod) refreshes across banks 13
Shor Shortcomings of P omings of Per er-Bank R -Bank Refr efresh esh • 1) Per-bank refreshes are str stric ictly scheduled tly scheduled in round-robin order (as fixed by DRAM’s internal logic) • 2) A refr efreshing bank eshing bank cannot serve memory accesses Goal: Enable more parallelization between refreshes and accesses using practical mechanisms 14
Outline Outline • Motivation and Key Ideas • DRAM and Refresh Background • Our Mechanisms – 1. Dynamic Access-Refresh Parallelization (DARP) – 2. Subarray Access-Refresh Parallelization (SARP) • Results 15
Our F Our First Appr irst Approach: D oach: DARP ARP • Dynamic A ynamic Access-R ess-Refr efresh P esh Par aralleliza allelization (D tion (DARP) ARP) – An improved scheduling policy for per-bank refreshes – Exploits refresh scheduling flexibility in DDR DRAM • Component 1: Out Out-of- of-or order per der per-bank r -bank refr efresh esh – Avoids poor static scheduling decisions – Dynamically issues per-bank refreshes to idle banks • Component 2: Writ ite-R -Refr efresh P esh Par aralleliza allelization tion – Avoids refresh interference on latency-critical reads – Parallelizes refreshes with a ba a batch of wr ch of writ ites es 16
1) Out 1) Out-of- of-Or Order P der Per er-Bank R -Bank Refr efresh esh • Dynamic scheduling polic ynamic scheduling policy that prioritizes refreshes to idle banks • Memor emory c y con ontr trollers ollers decide which bank to refresh 17
1) Out-of- 1) Out of-Or Order P der Per er-Bank R -Bank Refr efresh esh Baseline: Round r Baseline: R ound robin obin Request queue (Bank 0) Request queue (Bank 1) Read Read Bank 1 Refr efresh esh Read ead Timeline Bank 0 Refr efresh esh Read ead Dela elayed b ed by r y refr efresh esh Reduces refresh penalty on demand requests Our mechanism: DARP Our mechanism: D ARP by refreshing idle banks first in a flexible order Saved c Sa ed cycles cles Bank 1 Refr efresh esh Read ead Bank 0 Read ead Refr efresh esh Sa Saved c ed cycles cles 18
Outline Outline • Motivation and Key Ideas • DRAM and Refresh Background • Our Mechanisms – 1. Dynamic Access-Refresh Parallelization (DARP) • 1) Out-of-Order Per-Bank Refresh • 2) 2) Writ ite-R -Refr efresh P esh Par aralleliza allelization tion – 2. Subarray Access-Refresh Parallelization (SARP) • Results 19
Refr efresh I esh Inter erfer erenc ence on Upc e on Upcoming R oming Requests equests • Problem: A refresh may collide with an upcoming request in the near future Bank 1 Read ead Time Bank 0 Refr efresh esh Dela elayed b ed by r y refr efresh esh Read ead 20
DR DRAM M Writ ite Dr e Draining aining • Observations: • 1) Bus-tur Bus-turnar naround la ound latenc ency y when transitioning from writes to reads or vice versa – To mitigate bus-tur bus-turnar naround la ound latenc ency, writes are typically drained to DRAM in a batch during a period of time • 2) Writes are not la latenc ency-cr critical itical Turnaround Bank 1 Timeline Read ead Writ ite e Writ ite e Writ ite e Bank 0 21
2) 2) Writ ite-R -Refr efresh P esh Par aralleliza allelization tion • Proactively schedules refreshes when banks are serving wr writ ite ba e batches ches Baseline Baseline Turnaround Bank 1 Timeline Read ead Writ ite e Writ ite e Writ ite e Bank 0 Refr efresh esh Read ead Dela elayed b ed by r y refr efresh esh Avoids stalling latency-critical read requests Writ ite-r -refr efresh par esh paralleliza allelization tion by refreshing with non-latency-critical writes Turnaround Bank 1 Timeline Read ead Writ ite e Writ ite e Writ ite e Bank 0 Refr efresh esh Read ead Refr efresh esh 1. Postpone r 1. P ostpone refr efresh esh 2. R 2. Refr efresh dur esh during wr ing writ ites es Sa Saved c ed cycles cles 22
Outline Outline • Motivation and Key Ideas • DRAM and Refresh Background • Our Mechanisms – 1. Dynamic Access-Refresh Parallelization (DARP) – 2. Subarray Access-Refresh Parallelization (SARP) • Results 23
Our S Our Sec econd Appr ond Approach: SARP oach: SARP Observations: 1. A bank is further divided into subar subarrays s – Each has its own row buffer to perform refresh operations Bank 7 Bank 7 … Subar Subarray y Bank 1 Bank 1 Row Bu w Bufffer er Bank I/O Bank I/O Bank 0 Bank 0 Idle Idle 2. Some subar subarrays and bank I/O bank I/O remain completely idle idle during refresh 24
Recommend
More recommend