2 amateur photography
play

2 Amateur Photography - PowerPoint PPT Presentation

Syn thetic Full System Traffic Models Capturing Cache Coherence Behaviour Mario Badr | Natalie Enright Jerger 2 Amateur Photography http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/ 3 Photography 101


  1. Syn thetic Full System Traffic Models Capturing Cache Coherence Behaviour Mario Badr | Natalie Enright Jerger

  2. 2

  3. Amateur Photography http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/ 3

  4. Photography 101 http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/ 4

  5. Hundreds of Pictures 5

  6. SynFull http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/ 6

  7. Pictures for Facebook 7

  8. What Does SynFull Do? • Model real application traffic to the NoC • Generate realistic traffic synthetically for the NoC • Iterate over several NoC designs quickly Tool Available for Download 8

  9. SynFull’s Goals • Generic – Current and future applications – 16 different benchmarks • Accurate – Comparable performance metrics – 10.5% error • Fast – Faster than full system and traces – 52x speed up 9

  10. NoC Simulation Methodologies • Full System • Traces • Traffic Patterns 10

  11. Full System Simulation NoC Simulator Full System Simulator Packets Sent Processor Cache Application NoC Disk Other Components Feedback! Packets Arrived Accurate But Slow 11

  12. Trace Simulation NoC Simulator Trace Simulator Trace Packets Sent Processor NoC B Cache Application NoC A Disk Other Faster But Less Accurate 12

  13. Traffic Patterns Synthetic Traffic Driver NoC Simulator Traffic Pattern Uniform Random Bit Complement Application Bit Reverse NoC Bit Rotation Shuffle Transpose Tornado Neighbour Very Fast But Inaccurate 13

  14. The Opportunity Accuracy Speed 14

  15. The Opportunity SynFull Accuracy Speed 15

  16. Achieving the Goals • Synthetic Cache Coherence – Dependent Messages Accuracy – Enable Research • Time-Varying Behaviour – Short and Long Bursts of Traffic • Convergence Speed – Simulation length? 16

  17. Motivating Cache Coherence Shuffle Fast Fourier Transform Cache Coherence Affects Traffic Behaviour 17

  18. Capturing Coherence Traffic 1 2 3 • Example 4 5 6 – MOESI Protocol – Can be adapted 7 8 9 18

  19. Capturing Coherence Traffic • Initiate Transaction • Store Miss 1 2 3 – Source 4 5 6 7 8 9 Store Miss 7 19

  20. Capturing Coherence Traffic • Store Miss – Source 1 2 3 – Destination 4 5 6 7 8 9 Directory 3 20

  21. Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 – Destination 4 5 6 7 8 9 Owner 1 21

  22. Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 • Invalidations – Quantity – Destinations 4 5 6 7 8 9 Invalidate 2, 6 22

  23. Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 • Invalidations • Acknowledgements 4 5 6 7 8 9 ACKs 2, 6 23

  24. Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 • Invalidations • Acknowledgements 4 5 6 • Data Response 7 8 9 Data to 7 24

  25. Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 • Invalidations • Acknowledgements 4 5 6 • Data Response • Unblock 7 8 9 Transaction Complete 25

  26. Time Varying Behaviour Barrier Barrier Initiating Transactions and Sharing Patterns Can Change 26

  27. Time-Varying Behaviour FluidanimateBenchmark High H Packets Injected H H H H Low L L L L Time Bin (500,000 cycles per bin) Applications go through phases 27

  28. Modelling Time-Varying Behaviour • Create and group phases – Clustering • Transition from one phase to another • Markov Chains 28

  29. Dividing Into Intervals Intervals are a fixed size 29

  30. Dividing Into Intervals Visually we see: High, Low + High , and Low Intervals 30

  31. Phase Transitions: Markov Chains 17% 100% 83% 45% 55% P[Next State | Current State] 31

  32. Traffic Comparison Actual Synthetic Coarse Granularity → Average Behaviour 32

  33. Capturing Short Bursts • Macro Level – 100,000s of Cycles – Long phases – Outer-Loops • Micro Level – 100s of Cycles – Short Bursts – Inner-Loops Hierarchical Model • 33

  34. Modelling Parameters • Model accuracy affected by: – Interval Size – Interval Similarity – Number of Clusters See Paper for Parameter Sweep & Recommendations 34

  35. Creating The Models Processor Cache Ideal Application Ideal Trace Disk Network Other 35

  36. Creating The Models Processor Cache Ideal Application Ideal Trace Disk Network Other Parameters Model SynFull Modelling 36

  37. Creating The Models Processor Cache Ideal Application Ideal Trace Disk Network Other Parameters Model Traffic NoC NoC SynFull NoCs Generator Modelling 37

  38. Evaluation Methodology Network meshDOR meshADAP fbfly Flattened Topology Mesh Mesh Butterfly Channel Width 8 bytes 4 bytes 4 bytes Virtual Channels 2 per port 2 per port 4 per port Routing XY Adaptive YX-XY UGAL • 16 Out-of-Order Cores • MOESI Protocol • 16 Benchmarks (Splash-2, PARSEC) • Traces with Dependencies Comparison 38

  39. Packet Latency Error Trace Dependency SynFull 100% GeomeanError Percentage 75% Lower is Better 50% 25% 0% meshDOR meshADAP fbfly No Throttling For Initiating Transactions 39

  40. Distribution Error Trace Dependency SynFull 0.20 Geomeanof Helinger Distances 0.16 Lower is Better 0.12 0.08 0.04 0.00 meshDOR meshADAP fbfly Captures Congestion 40

  41. What About Speed? 1 2 Markov Probability Matrix, 41

  42. What About Speed? 56% 1 2 44% = Markov Probability Matrix, converges a fter a while… 42

  43. Speed Up 60 50 40 Speed Up 30 52 20 27 24 10 0 Trace Dependency SynFull SynFull (SS) 52x Speed Up With 11.7% Error 43

  44. Conclusion • Implemented Synthetic Traffic Models that are – Accurate: 10.5% error – Fast: Over 50x average speed up – Generic: SynFull works for many applications 44

  45. Try SynFull Out: http://www.eecg.toronto.edu/~enright/items/synfull_download.html Thank you for listening! QUESTIONS & ANSWERS 45

  46. Back Up: Design Space Exploration Full System SynFull 60 50 Average Packet Latency 40 30 20 10 0 2 4 8 16 Buffer Size (Number of Flits) Same Conclusion, Less Time 46

  47. Back Up: meshDOR Packet Latency Full System Trace Dependency SynFull 140 Avg. Packet Latency 120 100 80 60 40 20 0 47

  48. Back Up: fbfly Packet Latency Full System Trace Dependency SynFull 140 Avg. Packet Latency 120 100 80 60 40 20 0 48

  49. Back Up: meshADAP Packet Latency Full System Trace Dependency SynFull 140 Avg. Packet Latency 120 100 80 60 40 20 0 49

  50. Back Up: Average Throughput 20% 16% Geomean Error Percentage 12% 8% 16% 12% 12% 4% 0% meshDOR meshADAP fbfly 50

  51. Back Up: Speed Up Per Application Trace Dependency SynFull SynFull (SS) 160 Average Speed Up 140 120 100 80 60 40 20 0 Averaged Over 3 Runs (Different NoCs) 51

  52. Back Up: Steady State Before Simulation Steady State Acceptable +/- 42% 1.68% MSE 0.000191 21% 0.84% 37% 1.48% During Simulation Current +/- Steady State 42% ?% < MSE Exit 21% ?% 37% ?% Current +/- Depends on State Transitions (RNG) and MSE 52

  53. Back Up: Shuffle NoC 4 9 8 12 2 3 1 6 14 10 7 5 13 15 11 0 53

  54. Back Up: Shuffle NoC 4 9 8 12 2 3 1 6 14 10 7 5 13 15 11 0 Ring Topology; Max. 2 Hops Needed 54

  55. Triangle Score Cache Coherence Time Varying Fast 55

  56. NoC Simulation Methodologies Cache Coherence Trace Full System Time Varying Traffic Pattern Fast 56

  57. SynFull Cache Coherence Time Varying Fast 57

Recommend


More recommend