accelerating multiprocessor simulation with a memory
play

Accelerating Multiprocessor Simulation with a Memory Timestamp - PowerPoint PPT Presentation

Accelerating Multiprocessor Simulation with a Memory Timestamp Record Kenneth Barr Heidi Pan Michael Zhang Krste Asanovic Massachusetts Institute of Technology March 21, 2005 Intelligent sampling gives best speed-accuracy tradeoff for


  1. Accelerating Multiprocessor Simulation with a Memory Timestamp Record Kenneth Barr Heidi Pan Michael Zhang Krste Asanovic Massachusetts Institute of Technology March 21, 2005

  2. Intelligent sampling gives best speed-accuracy tradeoff for uniprocessors (Yi, HPCA `05) • Single sample detailed ignored • Fastforward + ignored ISA only detailed single sample measure • Fastforward + ignored ISA only d e t a i l e d Warmup + sample • Selective Sampling (SimPoints) • Statistical Sampling • Statistical sampling w/ ISA+ µ arch Fast Functional Warming (SMARTS, FFW) • Memory Timestamp Record ISA+MTR Update Reconstruct caches Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 2

  3. Snapshots amortize fast-forwarding, but require slow warming or bind to a particular µ arch Slow due to ISA only warmup, but snapshots: allows any µ arch ISA+ µ arch Fast (less warmup), snapshots: but tied to µ arch MTR Fast, NOT tied to snapshots: µ arch, supports multiprocessors… Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 3

  4. Multiprocessors simulation is especially slow • More cores → CPU1 CPU2 CPUn More state/complexity → $ $ $ Long, complex simulations Memory Directory • Full system, threaded apps → CPUs More variability → More simulation time Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 4

  5. For full-system simulations of commercial workloads, subtle variation matters! (Alameldeen and Wood, 2003) 4 3 CPU 2 1 Time = 2.5 Time = 1.8 Time = 2.1 • All produce same result, each has different runtime – DRAM refresh – Hard disk arrangement delays DMA – Incoming packet interrupts application – Locking order reversed – Processes migrate • Is our new gizmo a success? Maybe OS just ordered threads differently! Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 5

  6. What is the Memory Timestamp Record (MTR)? • MTR is abstract picture of an multiprocessor’s coherence state CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer 0 … … … … N-1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 6

  7. What is the Memory Timestamp Record (MTR)? • MTR is abstract picture of an multiprocessor’s coherence state – Allow quick update during fast forwarding – Fill in concrete caches and directory prior to sampling CPU0 … CPUn-1 CPU1 CPU1 CPU2 CPU2 CPUn CPUn Block Address Last Readtime Last Last $ $ $ $ $ $ Writetime Writer 0 … … … … Memory Memory Directory Directory - N 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 7

  8. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 old (self-compressing) 1 2 3 MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a … b … c d e … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 8

  9. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 2 3 MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b … c d e … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 9

  10. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 Read e 2 3 MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b … c d e 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 10

  11. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 Read e 2 Read b 3 MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … c d e 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 11

  12. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 Read e 2 Read b 3 Read c MTR: 4 CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … c 3 d e 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 12

  13. MTR example: update • MTR contains one entry per Memory Trace: memory block; locality keeps Time CPU0 CPU1 it sparse. • New access times overwrite 0 Read a old (self-compressing) 1 Read e 2 Read b 3 Read c MTR: 4 Write b CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 13

  14. 1. Coalesce: determining correct cache tags MTR: CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer … … … … … … … … Cache: Way 0 Way 1 Set 0 Set 1 Set 2 Set 3 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 14

  15. 1. Coalesce: determining correct cache tags MTR: CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer … … … … … … … … Cache: Way 0 Way 1 Set 0 Set 1 Set 2 Set 3 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 15

  16. 1. Coalesce: determining correct cache tags MTR: CPU0 … CPUn-1 Block Address Last Readtime Last Writetime Last Writer … … … … … … … … Cache: Way 0 Way 1 Set 0 Set 1 Set 2 Set 3 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 16

  17. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … Way 0 Way 1 Way 0 Way 1 Set 0 Set 0 Set 1 Set 1 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 17

  18. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 Set 1 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 18

  19. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 a 0 Set 1 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 19

  20. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 a 0 Set 1 b 2 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 20

  21. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 a 0 c 3 Set 1 b 2 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 21

  22. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … • What are the contents of CPU0 cache? Way 0 Way 1 Set 0 e 1 c 3 Set 1 b 2 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 22

  23. MTR example: coalesce • Choose organization – One set, two ways • Coalesce – Determine which blocks map to same set – Only ways most recent timestamps are present. Check validity later. CPU0 CPUn-1 … Block Address Last Readtime Last Writetime Last Writer a 0 … b 2 … 4 CPU1 c 3 d e 1 … CPU1? • What are the contents of CPU0 cache? Way 0 Way 1 Way 0 Way 1 Set 0 e 1 c 3 Set 0 Set 1 b 2 Set 1 b 4 Barr, Pan, Zhang, and Asanovi ć . ISPASS. March 21, 2005. 23

Recommend


More recommend