Conditions for and effects of CARD cache implementations Gustaf R¨ antil¨ a and Mikael W˚ anggren {e99_gra,e99_mwa}@e.kth.se 1
Agenda • Problem formulation • Hypothesis • Our approach and methods • Results (and their reliability) • Questions 2
Problem formulation • Context switches degrade performance – interactive systems (with short timeslices) extra sensitive – Overhead: Saving and loading registers & processor state, scheduling – Flushing caches, TLB, prediction buffers etc → need to rebuild them every new timeslice 3
Hypothesis • We can decrease the negative effects of context switches by “caching the cache” • How? On context switch – activate a CARD cache – Save process-specific data (cache, buffers etc.) – Load ditto for the next process • CARD: Context switch Active – Run-time Drowsy – Sleeps when “programs run” – Awakens on context switch – Hardware implementation not discussed in this project 4
Issues not discussed in this project • Many processes – huge CARD cache – Scheduler can prioritize most suitable processes • Kernel–CPU interaction – New instructions required 5
Our approach and methods • We only save and restore the cache (not registers etc) • Simics 2.0 for full-system simulation – g-cache as cache model • x86 20 MHz hardware model • Red Hat Enterprice 7.3 with Linux 2.6 kernel 6
Our approach and methods contd. • Cache setup (we mimic an XScale) – 32 kB L1 i-cache, and 32 kB L1 d-cache ∗ 32-way, virtually indexed, physically tagged ∗ i-cache policy: lru, d-cache: random ∗ 1 cycle penalty for hit ∗ 50 cycle penalty for miss 7
Implementation • Requirements – Identifying context switches in Simics ∗ Break on execution of __switch_to ∗ Re-build kernel with magic instructions – Grab PID to use as key to the CARD ∗ Currently requires magic instructions 8
Magic instructions in Linux • Magic instructions do no harm • Our procedure in Linux – Before context switch ∗ Set eax to 0 and call magic instruction – After context switch ∗ Copy PID to eax and call magic instruction 9
Magic instructions in Simics (python) • Simics has native support for magic instructions • Our procedure in python – Break on MI and read eax – Load or save current cache to CARD – Start a temporal breakpoint chain ∗ For every temporal breakpoint, store statistics 10
Experimentation • We simulate applications of different behaviour • From MiBench – lame , calculation heavy – dijkstra , both calculation and data heavy – crc32 , a very common sequential application • Home-made – string search , data heavy 11
Experimentation contd. • Simulations runs on a pre-emptive kernel – But it’s not easy to force “pre-emption” • We want context switches! – We loop “ps >> file” in background to force CS – Thereby we also get the programs PID 12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Reliability in the results • Longer runs would eliminate start-up slowdown • Do we use a decent cache setup? • L2 Cache? • Is our clock frequency fair? 30
• Questions? 31
Recommend
More recommend