in5060
play

IN5060 Performance in distributed systems Simulations Introduction - PowerPoint PPT Presentation

IN5060 Performance in distributed systems Simulations Introduction What is simulation? Wikipedia says: Simulation is the imitation of the operation of a real-world process or system over time. The act of simulating something first requires


  1. IN5060 Performance in distributed systems Simulations

  2. Introduction What is simulation? Wikipedia says: “Simulation is the imitation of the operation of a real-world process or system over time. The act of simulating something first requires that a model be developed; this model represents the key characteristics or behaviors/functions of the selected physical or abstract system or process. The model represents the system itself, whereas the simulation represents the operation of the system over time.” IN5060

  3. The Nature of Simulation § Most real-world systems are too complex to allow realistic models to be evaluated analytically. These models are usually studied by means of simulation. § First, see whether you can solve the problem analytically; if you cannot, then use simulation. § Simulation is the technique that imitates the operations of a complex real-world system where a computer is generally used to evaluate a model numerically, and data are gathered in order to estimate the desired true characteristics of the model. 3 IN5060

  4. Introduction § System to be characterized may not be available − During design or procurement stage § Still want to predict performance § Or, may have system but want to evaluate wide-range of workloads à Simulation § However, simulations may fail − Need good programming, statistical analysis and performance evaluation knowledge IN5060

  5. IN5060 Approaches to simulation

  6. Terminology Static and dynamic models § Time is not a variable: static 1x1 1x1 static simulation to assess the volume of a circle: real: 𝜌 ! this static simulation: !#$ " = 0.785 !%$ = 0.773 IN5060

  7. Terminology Static and dynamic models arrival probability process interarrival time § State changes with time: queue dynamic length queue processor processor occupancy completion probability process interarrival time IN5060

  8. Terminology State and state variables arrival probability 𝜇 & system snapshot process interarrival time State queue variable length queue processor processor occupancy completion probability process State interarrival time IN5060

  9. Terminology State and state variables Systems can be simulated at a arrival probability very high level of abstraction 𝜇 & process § artificial workloads interarrival time § simplified work queue characteristics length queue § removal of most system processor processor details occupancy § focus on a particular completion probability component process interarrival time IN5060

  10. Terminology State and state variables Systems can be simulated with a large amount of detail system snapshot § replaying traces of real (measured or logged) workloads § using components of real- world systems § with fine-grained monitoring § in controlled environments IN5060

  11. Terminology Event § A change in system state job arrival § Easily explained when state job arrival == job enqueue is discrete job enqueue job dequeue event == § Examples: processing start event − arrival of job processing end event processing end event == − beginning of new execution job departure job departure − departure of job IN5060

  12. Terminology Continuous-state or discrete-state models § Discrete state § Continuous state − State variables have a − State variables have an countable uncountable and number of states from a finite or infinite finite or infinite number of states range limited but no countable queue length water level time time Note: conceptually uncountable and infinite: computer nature implies all is countable and finite IN5060

  13. Terminology Continuous-time or discrete-time models § Discrete time § Continuous time − State is only defined at − State is defined at all times certain instances of time different workload sizes may lead to different processing durations d(sz) especially useful when the leading to a continuous-time departure processing state space is very large IN5060

  14. Terminology Deterministic or probabilistic models § If output predicted with certainty à deterministic § If output different for different repetitions à probabilistic output output input input output input IN5060

  15. Terminology Linear or non-linear models § Output is linear combination of input à linear § Otherwise à nonlinear Output Output Input Input (Linear) (Non-Linear) § Systems that are known to be linear can frequently be handled by analytical studies IN5060

  16. Terminology § Open and closed models − input is external and independent à open − model has no external input à closed − If same jobs leave and re-enter queue then closed, while if new jobs enter system then open system system processor processor queue queue IN5060

  17. Terminology § Stable and unstable − Model output settles down à stable − Model output always changes à unstable unstable queue length system processor time queue queue length stable time IN5060

  18. IN5060 Simulation platforms

  19. Simulation Platforms Historical concept Simulation languages dedicated to simulation are quite • language outdated but they have strongly inspired general • purpose languages General-purpose language GPSS (General Purpose Simulation System, • 1960) → CSMP III (Continuous System Modelling Program) Extended general- → APL (A Programming Language, 1966) purpose language → Matlab (1984) Simula (1962) • → object-oriented programming in general, Simulation and package → the Beta language in particular IN5060

  20. Simulation Platforms Simulation language Frequently used in its pure form only for very small • simulations, or General-purpose to achieve extreme performance • language Non-specific libraries fall into this category MPI-2 for communication in high- • Extended general- performance computing is mostly used for very large-scale simulations purpose language The borderline between GP and Extended GP is very fuzzy Simulation package IN5060

  21. Simulation Platforms Simulation language Comes in many forms language extensions dedicated to simulation • rare (e.g. extensions to SysML for • General-purpose simulation in 2017) libraries dedicated to simulation language • SIM.JS • SimPy • SystemC • Extended general- tightly integrated scripting and general • purpose language purpose language ns-2 (Tcl/Tk + C + library) • ns-3 (Python + C++ + library) • OMNeT++ (NED + C++ + library) Simulation • package IN5060

  22. Simulation Platforms Very usual outside of discrete-state Simulation modeling language Matlab and • Octave General-purpose language VisualSim • Extended general- purpose language Blender • Simulation package many more • IN5060

  23. Selecting a Simulation Language § Tradeoffs − Cost and flexibility • simulation languages require startup time to learn • general purpose language extensions require startup time to learn • general purpose languages may require a lot of code writing • packages may be feature-rich, allow visual presentation without overhead, allow to do simple simulations quickly • extending packages for special needs may be very hard 23 IN5060

  24. Types of Simulations For people in networking, operating systems, distributed systems, the main types of simulation are: § Monte Carlo simulation § trace-drive simulation § discrete-event simulation § emulation IN5060

  25. Monte Carlo Simulation § A static simulation that has no time parameter − Runs until some equilibrium state reached § Used to model physical phenomena, evaluate probabilistic system, numerically estimate complex mathematical expressions § Driven with random number generator − name “Monte Carlo” comes from the random draws in casinos 25 IN5060

  26. Monte Carlo Simulation 1x1 1x1 static simulation to assess the volume of a circle: real: 𝜌 ! this static simulation: !#$ " = 0.785 !%$ = 0.773 § Make random draws from a pool of random numbers − here: (x,y) , where x and y are randomly drawn from [0..1] − determine if (x,y) is inside the circle: 𝑦 ! + 𝑧 ! ≤ " ! − count the ratio of inliers vs outliers − since the square surface area is 1, counting achieves the fraction inside the circle − draw random numbers until the accuracy is satisfactory IN5060

  27. Monte Carlo Simulation § Markov-Chain Monte-Carlo simulation p=0.1 State 2 State 3 p=0.8 p=0.9 p=0.1 p=0.5 E:State 1 Finish p=0.6 p=0.4 p=0.3 p=0.3 p=0.1 p=0.2 State 4 State 5 p=0.7 § Markov Chain: for each state, probability ranges are assigned to alternative transitions to new states (usually also keep state ) § in each round, a random number is drawn § the appropriate transition is taken § the movement through the state space is called a random walk § if the system converges, probabilities of being in State N can be computed by repeated Monte-Carlo simulations of complete runs § Note: for converging simple models, a mathematical solutions exist IN5060

  28. Trace-Driven Simulation § Uses time-ordered record of events on real system as input trace 4 simulated behaviours based on the trace § Note: need trace to be independent of system under test − This is very frequently forgotten ! − For example, arrival rate of packets in TCP depends on packet loss and RTT and cannot be simulated based on a recorded IP packet trace! IN5060

More recommend