dist-gem5: Distributed Simulation of Compute Clusters Mohammad Alian, Umur Darbaz, Gabor Dozsa, Stephan Diestelhorst, Daehoon Kim, Nam Sung Kim University of Illinois Urbana-Champaign ARM Ltd., Cambridge, UK 1
2 Outline • motivation accelerating large-scale simulation • dist-gem5 architecture packet forwarding synchronization checkpointing network model • evaluation validation, speedup, synchronization overhead • conclusion what is gem5 dist-gem5 architecture evaluation conclusion
3 Outline • motivation accelerating large-scale simulation • dist-gem5 architecture packet forwarding synchronization checkpointing network model • evaluation validation, speedup, synchronization overhead • conclusion what is gem5 dist-gem5 architecture evaluation conclusion
4 What is gem5 – overview • full-system, cycle-level, event-driven simulator • used/maintained at universities and industry Stream Traffic Traffic NoMali Gen Monitor Line ARMv7a ARMv8 GPU models Sim KVMv7 FracFact Points Atomic Timing Power KVMv8 PCA Model Int. Out of Simulation support In Order Order Snoop Crossbar Bridges CPU Models filter Interconnect ARM ISA Support UHDLCD UART DMA GICv2 L1-L3 $ SCU RTC UFS NVMe 10Gb ArchTimer PMU Timers Flash DRAM HMC NIC Core Integrated IP Memory IO components what is gem5 dist-gem5 architecture evaluation conclusion
5 Why dist-gem5? • performance and power dissipation of a distributed system complex interplay among system components at scale • need a full-system, cycle-level simulator which is fast enough to simulate a large-scale computer system scale devices OS • distributed simulation: simulate a distributed system w/ many simulation hosts performance memory cores Power network caches ISAs what is gem5 dist-gem5 architecture evaluation conclusion
6 dist-gem5 architecture – high level view • gem5 processes modeling full systems run in host #1 physical machine parallel on a cluster of physical machines simulated system #1 gem5 process • simulated network switch forward packets among the simulated systems host #2 host #4 synchronize the distributed simulation simulated simulated simulate network topology network system #2 switch host #3 simulated system #3 what is gem5 dist-gem5 architecture evaluation conclusion
7 Outline • motivation accelerating large-scale simulation • dist-gem5 architecture packet forwarding synchronization checkpointing network model • evaluation validation, speedup, synchronization overhead • conclusion what is gem5 dist-gem5 architecture evaluation conclusion
8 dist-gem5 architecture – core components packet distributed simulated forwarding check-pointing network synchronization what is gem5 dist-gem5 architecture evaluation conclusion
9 dist-gem5 architecture – core components packet distributed simulated forwarding check-pointing network synchronization what is gem5 dist-gem5 architecture evaluation conclusion
10 dist-gem5 architecture – packet forw rwarding phys NIC#1 phys port1 phys phys port3 NIC#3 physical host #1 phys port2 physical switch phys NIC#2 physical host #3 physical host #2 what is gem5 dist-gem5 architecture evaluation conclusion
11 dist-gem5 architecture – packet forw rwarding sim phys NIC NIC#1 simulated phys system #1 sim port1 port0 gem5 #1 phys phys port3 NIC#3 physical host #1 phys sim port2 port1 sim physical switch phys NIC NIC#2 simulated switch simulated gem5 #3 system #2 physical host #3 gem5 #2 physical host #2 what is gem5 dist-gem5 architecture evaluation conclusion
12 dist-gem5 architecture – packet forw rwarding simulated packets sim are embedded into phys NIC TCP sim pkt host TCP/IP packets NIC#1 simulated sim pkt phys system #1 sim port1 port0 gem5 #1 phys phys TCP sim pkt sim pkt port3 NIC#3 physical host #1 phys sim port2 port1 sim physical switch phys NIC sim pkt NIC#2 simulated switch simulated gem5 #3 system #2 physical host #3 gem5 #2 physical host #2 what is gem5 dist-gem5 architecture evaluation conclusion
13 Asynchronous processing of f incoming messages • simulation thread (main thread) physical host process/insert events in the event queue gem5 process eventQ in case of send pkt event, encapsulate the simulation send pkt simulated Ethernet packet in a message and thread phys send it out NIC • receiver thread receiver recv pkt create for each gem5 process thread waits for incoming packets creates a recv pkt event and insert it to the event queue what is gem5 dist-gem5 architecture evaluation conclusion
14 dist-gem5 architecture – core components packet distributed simulated forwarding check-pointing network synchronization what is gem5 dist-gem5 architecture evaluation conclusion
15 Need for synchronization • receiver gem5 can run ahead of send time sender gem5 gem5#0 physical host mismatch different events to be simulated network delay processed late packet arrival • slowed down receiver gem5 to ensure simulation accuracy gem5#1 • quantum-based synchronization recv time expected delivery time wall clock time what is gem5 dist-gem5 architecture evaluation conclusion
16 Accurate packet forw rwarding global sync • quantum : interval for periodic quantum send time synchronization in simulated time gem5#0 gem5#0 • sync-event flushes inter gem5 simulated network delay communication channels • if quantum ≤ simulated link delay: expected delivery time packet arrival wall expected delivery tick falls clock time inside the next quantum gem5#1 gem5#1 • optimal quantum size for accurate forwarding == simulated link delay quantum wall clock time what is gem5 dist-gem5 architecture evaluation conclusion
17 dist-gem5 architecture – core components packet distributed simulated forwarding check-pointing network synchronization what is gem5 dist-gem5 architecture evaluation conclusion
18 dist-gem5 architecture – network modeling aggregate simulate in one gem5 process switch top of rack top of rack top of rack switch #0 switch #1 switch #7 Server #0 server #8 server #56 Server #1 server #9 server #57 server #2 server #10 server #58 . . . server #3 server #11 server #59 server #4 server #12 server #60 server #5 server #13 server #61 server #6 server #14 server #62 server #7 server #15 server #63 what is gem5 dist-gem5 architecture evaluation conclusion
19 Configurable network model MAC Table In-orderQ#0 IPORT#0 OPORT#0 • configurable baseline Ethernet switch model . . . . . . port number, delay, bandwidth, buffer size In-orderQ#n IPORT#n OPORT#n gem5 aggregate simulated port simulated etherLink switch simulated etherSwitch p0 p1 p7 p8 p8 p8 top of rack top of rack top of rack switch #0 switch #1 switch #7 distEtherLink p0 p7 p7 p0 p0 p7 . . . . . . . . . physical host what is gem5 dist-gem5 architecture evaluation conclusion
20 Outline • motivation accelerating large-scale simulation • dist-gem5 architecture packet forwarding synchronization checkpointing network model • evaluation validation, speedup, synchronization overhead • conclusion what is gem5 dist-gem5 architecture evaluation conclusion
21 Methodology – simulation techniques • For example, simulating a cluster w/ 7 nodes and 1 network switch: dist-gem5 single-threaded-gem5 parallel-gem5 system#6 switch system#6 switch system#6 switch gem5#6 gem5#7 gem5#6 gem5#7 system#4 system#5 system#4 system#5 system#4 system#5 gem5#4 gem5#5 system#2 system#3 gem5#4 gem5#5 quad core physical host system#0 system#1 system#2 system#3 system#2 system#3 gem5#2 gem5#3 gem5#0 gem5#6 gem5#7 quad core physical host system#0 system#1 system#0 system#1 gem5#0 gem5#1 gem5#4 gem5#5 quad core physical host quad core physical host what is gem5 dist-gem5 architecture evaluation conclusion
22 Methodology – experimental setup • focus on off-chip network performance using network intensive applications iperf, memcached, httperf, tcptest, netperf, NAS parallel benchmark • verification/validation against: single-threaded-gem5 physical cluster category gem5 configuration o 4 node cluster w/ AMD A10-5800K O3 core 4 cores; 4 way superscalar • speedup comparison against: memory 8GB DDR3 1600 MHz single-threaded-gem5 network Intel GbE NIC; 1 μ s Link latency parallel-gem5 OS Linux Ubuntu 14.04 (Kernel 4.3) what is gem5 dist-gem5 architecture evaluation conclusion
23 Verification • same node/network config dist-gem5 generates identical dist-gem5 single-threaded-gem5 simulation statistics compared to system#6 switch single-threaded-gem5 system#6 switch gem5#6 gem5#7 different cluster sizes system#4 system#5 system#4 system#5 = gem5#4 gem5#5 system#2 system#3 quad core physical host system#0 system#1 system#2 system#3 gem5#0 gem5#6 gem5#7 quad core physical host system#0 system#1 gem5#4 gem5#5 quad core physical host
Recommend
More recommend