Syn thetic Full System Traffic Models Capturing Cache Coherence Behaviour Mario Badr | Natalie Enright Jerger
2
Amateur Photography http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/ 3
Photography 101 http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/ 4
Hundreds of Pictures 5
SynFull http://sevennine.net/archives/2009/10/01/torontos-skyline-at-night/ 6
Pictures for Facebook 7
What Does SynFull Do? • Model real application traffic to the NoC • Generate realistic traffic synthetically for the NoC • Iterate over several NoC designs quickly Tool Available for Download 8
SynFull’s Goals • Generic – Current and future applications – 16 different benchmarks • Accurate – Comparable performance metrics – 10.5% error • Fast – Faster than full system and traces – 52x speed up 9
NoC Simulation Methodologies • Full System • Traces • Traffic Patterns 10
Full System Simulation NoC Simulator Full System Simulator Packets Sent Processor Cache Application NoC Disk Other Components Feedback! Packets Arrived Accurate But Slow 11
Trace Simulation NoC Simulator Trace Simulator Trace Packets Sent Processor NoC B Cache Application NoC A Disk Other Faster But Less Accurate 12
Traffic Patterns Synthetic Traffic Driver NoC Simulator Traffic Pattern Uniform Random Bit Complement Application Bit Reverse NoC Bit Rotation Shuffle Transpose Tornado Neighbour Very Fast But Inaccurate 13
The Opportunity Accuracy Speed 14
The Opportunity SynFull Accuracy Speed 15
Achieving the Goals • Synthetic Cache Coherence – Dependent Messages Accuracy – Enable Research • Time-Varying Behaviour – Short and Long Bursts of Traffic • Convergence Speed – Simulation length? 16
Motivating Cache Coherence Shuffle Fast Fourier Transform Cache Coherence Affects Traffic Behaviour 17
Capturing Coherence Traffic 1 2 3 • Example 4 5 6 – MOESI Protocol – Can be adapted 7 8 9 18
Capturing Coherence Traffic • Initiate Transaction • Store Miss 1 2 3 – Source 4 5 6 7 8 9 Store Miss 7 19
Capturing Coherence Traffic • Store Miss – Source 1 2 3 – Destination 4 5 6 7 8 9 Directory 3 20
Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 – Destination 4 5 6 7 8 9 Owner 1 21
Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 • Invalidations – Quantity – Destinations 4 5 6 7 8 9 Invalidate 2, 6 22
Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 • Invalidations • Acknowledgements 4 5 6 7 8 9 ACKs 2, 6 23
Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 • Invalidations • Acknowledgements 4 5 6 • Data Response 7 8 9 Data to 7 24
Capturing Coherence Traffic • Store Miss • Forwarded Request 1 2 3 • Invalidations • Acknowledgements 4 5 6 • Data Response • Unblock 7 8 9 Transaction Complete 25
Time Varying Behaviour Barrier Barrier Initiating Transactions and Sharing Patterns Can Change 26
Time-Varying Behaviour FluidanimateBenchmark High H Packets Injected H H H H Low L L L L Time Bin (500,000 cycles per bin) Applications go through phases 27
Modelling Time-Varying Behaviour • Create and group phases – Clustering • Transition from one phase to another • Markov Chains 28
Dividing Into Intervals Intervals are a fixed size 29
Dividing Into Intervals Visually we see: High, Low + High , and Low Intervals 30
Phase Transitions: Markov Chains 17% 100% 83% 45% 55% P[Next State | Current State] 31
Traffic Comparison Actual Synthetic Coarse Granularity → Average Behaviour 32
Capturing Short Bursts • Macro Level – 100,000s of Cycles – Long phases – Outer-Loops • Micro Level – 100s of Cycles – Short Bursts – Inner-Loops Hierarchical Model • 33
Modelling Parameters • Model accuracy affected by: – Interval Size – Interval Similarity – Number of Clusters See Paper for Parameter Sweep & Recommendations 34
Creating The Models Processor Cache Ideal Application Ideal Trace Disk Network Other 35
Creating The Models Processor Cache Ideal Application Ideal Trace Disk Network Other Parameters Model SynFull Modelling 36
Creating The Models Processor Cache Ideal Application Ideal Trace Disk Network Other Parameters Model Traffic NoC NoC SynFull NoCs Generator Modelling 37
Evaluation Methodology Network meshDOR meshADAP fbfly Flattened Topology Mesh Mesh Butterfly Channel Width 8 bytes 4 bytes 4 bytes Virtual Channels 2 per port 2 per port 4 per port Routing XY Adaptive YX-XY UGAL • 16 Out-of-Order Cores • MOESI Protocol • 16 Benchmarks (Splash-2, PARSEC) • Traces with Dependencies Comparison 38
Packet Latency Error Trace Dependency SynFull 100% GeomeanError Percentage 75% Lower is Better 50% 25% 0% meshDOR meshADAP fbfly No Throttling For Initiating Transactions 39
Distribution Error Trace Dependency SynFull 0.20 Geomeanof Helinger Distances 0.16 Lower is Better 0.12 0.08 0.04 0.00 meshDOR meshADAP fbfly Captures Congestion 40
What About Speed? 1 2 Markov Probability Matrix, 41
What About Speed? 56% 1 2 44% = Markov Probability Matrix, converges a fter a while… 42
Speed Up 60 50 40 Speed Up 30 52 20 27 24 10 0 Trace Dependency SynFull SynFull (SS) 52x Speed Up With 11.7% Error 43
Conclusion • Implemented Synthetic Traffic Models that are – Accurate: 10.5% error – Fast: Over 50x average speed up – Generic: SynFull works for many applications 44
Try SynFull Out: http://www.eecg.toronto.edu/~enright/items/synfull_download.html Thank you for listening! QUESTIONS & ANSWERS 45
Back Up: Design Space Exploration Full System SynFull 60 50 Average Packet Latency 40 30 20 10 0 2 4 8 16 Buffer Size (Number of Flits) Same Conclusion, Less Time 46
Back Up: meshDOR Packet Latency Full System Trace Dependency SynFull 140 Avg. Packet Latency 120 100 80 60 40 20 0 47
Back Up: fbfly Packet Latency Full System Trace Dependency SynFull 140 Avg. Packet Latency 120 100 80 60 40 20 0 48
Back Up: meshADAP Packet Latency Full System Trace Dependency SynFull 140 Avg. Packet Latency 120 100 80 60 40 20 0 49
Back Up: Average Throughput 20% 16% Geomean Error Percentage 12% 8% 16% 12% 12% 4% 0% meshDOR meshADAP fbfly 50
Back Up: Speed Up Per Application Trace Dependency SynFull SynFull (SS) 160 Average Speed Up 140 120 100 80 60 40 20 0 Averaged Over 3 Runs (Different NoCs) 51
Back Up: Steady State Before Simulation Steady State Acceptable +/- 42% 1.68% MSE 0.000191 21% 0.84% 37% 1.48% During Simulation Current +/- Steady State 42% ?% < MSE Exit 21% ?% 37% ?% Current +/- Depends on State Transitions (RNG) and MSE 52
Back Up: Shuffle NoC 4 9 8 12 2 3 1 6 14 10 7 5 13 15 11 0 53
Back Up: Shuffle NoC 4 9 8 12 2 3 1 6 14 10 7 5 13 15 11 0 Ring Topology; Max. 2 Hops Needed 54
Triangle Score Cache Coherence Time Varying Fast 55
NoC Simulation Methodologies Cache Coherence Trace Full System Time Varying Traffic Pattern Fast 56
SynFull Cache Coherence Time Varying Fast 57
Recommend
More recommend