cs626 data analysis and simulation
play

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, - PowerPoint PPT Presentation

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Recap before midterm 1 Big Picture: Model-based Analysis of Systems portion/facet real world perception transfer


  1. CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Recap before midterm 1

  2. Big Picture: Model-based Analysis of Systems portion/facet real world perception transfer solution to real world problem real world problem description decision formal model transformation presentation probability model, solution, rewards, stochastic process qualitative and formal / computer aided quantitative properties analysis 2

  3. Reminder This is no pipe! ... and this is no serpentine accumulator in a production line! 3

  4. System - Model - Study Model vs System  largely simplified formal/mathematical/stochastic model implemented in software in a fully controlled environment  set of physical devices interacting in space-time in an largely uncontrolled, not fully understood environment Model  includes some of the rules how the system operates, excludes others  includes some aspects of the real world as random variables, ignores others or assumes them as constant  is parameterized with respect to certain design variables Study  has an objective, a clear question  delivers values that are probabilities like R(0,t)  Interpretation?  evaluates effects of different design choices 4

  5. CS 626 Topics From Data to Stochastic Input Models  Input Modeling  Probability, Distributions  Exploratory Data Analysis, Statistical tests  Stochastic processes, Markov Processes  DTMC, CTMC  Phase type distributions, MAPs, MAP Fitting  Tools  for data analysis: R  for MAP fitting: KPC toolbox Simulation Modeling  Simulation  Output Data Analysis  Verification, Validation,  Trace driven simulation  Debugging of simulation models  Tools for simulation: Mobius, (+Traviando) Applications  Reliability analysis, Dependability modeling of a LEO satellite  Modeling traffic in computer networks 5  Emulation: Testing, Debugging, Training in Automated Material Handling Systems

  6. From Data to Stochastic Input Models Probability  Axiomatic Definition  Frequentist Definition 6

  7. Frequency Definition of Probability If our experiment is repeated over and over again then the proportion of time that event E occurs will just be P(E). Frequency Definition of Probability: P(E) = lim m(E) / m m → ∞ where m(E) is the number of times event E occurs, m is the number of trials Note:  Random experiment can be repeated under identical conditions  if repeated indefinitely, relative frequency of occurrence of an event converges to a constant  Law of large numbers states that limit does exist.  For small m, m(E) can show strong fluctuations. 7

  8. Axiomatic Definition of Probability Definition For each event E of the sample S, we assume that a number P(E) is defined that satisfies Kolmogorov’s axioms: 8

  9. Outline on Problem Solving (Goodman & Hedetniemi 77) Identify sample space S  All elements must be mutually exclusive, collectively exhaustive.  All possible outcomes of experiment should be listed separately. (Root of “tricky” problems: often ambiguity, inexact formulation of the model of a physical situation) Assign probabilities  To all elements of S, consistent with Kolmogorov’s axioms. (In practice: estimates based on experience, analysis or common assumptions) Identify events of interest  Recast statements as subsets of S.  Use laws (algebra of events) for simplifications  Use visualizations for clarification Compute desired probabilities  Use axioms, laws, often helpful: express event of interest as union of mutually exclusive events and sum up probabilities 9

  10. More relations What is the probability of a UNION of events ? What is the probability of a union of a set of events? Is there a better way to calculate this? Sum of disjoint products (SDP) formula 10

  11. Conditional Probabilities E given F happens EF EF F F Definition The conditional probability of E given F is if P(F) > 0 and it is undefined otherwise. Interpretation: Given F has happened, only events in EF are still possible for E, so original probability P(EF) is scaled by 1/P(F). Multiplication rule: 11

  12. Independent events Definition  Two events E and F are independent if: This also means: In English, E and F are independent  if knowledge that F has occurred does not affect the probability that E occurs. Notes:  if E, F independent then also E,F c and E c ,F and E c ,F c  Generalizes from 2 to n events e.g. n=3 every subset independent  Mutually exclusive vs independent 12

  13. About independent events Venn diagrams For independent events: consider A, B being not empty S and not S, 1) if A ⊂ B, then A and B cannot be independent A B 2) if A ∩ B = ∅ , then A and B cannot be independent Tree diagrams of sequential sample spaces  Throw coin twice Joint sample space from cross product of individual sample spaces. H T First, second throw are independent. H T T H (H,H) (H,T) (T,H) (T,T) 13

  14. Joint and pairwise independence A ball is drawn from an urn containing four balls numbered 1, 2, 3, 4. Then we have: They are pairwise independent, but not jointly independent A sequence of experiments results in either a success or a failure where E i , i >= 1 denotes a success. If for all i 1 , i 2 , …, i n : we say the sequence of experiments consists of independent trials 14

  15. Independence is a very important property Independence  simplifies calculations significantly => very popular assumption for theoretical results  input modeling, workload modeling  statistical tests  output analysis of simulation models: confidence intervals for estimate of mean  ...  independence need not be present in real data  data traffic in networks: often correlated  output data of a (simulated) system, i.e. response of a system to some workload  ways to investigate independence  graphics: correlation plot  tests: chi-square test for vectors, rank von Neumann test, runs test  see Law/Kelton Chap 6.3 and Chap 7.4.1 15

  16. Bayes’ Formula Let F 1 , F 2 , …, F n be events of S, all mutually exclusive and collectively exhaustive. Theorem of total probability (also Rule of Elimination) Bayes’ Formula helps us to determine which F j happened given we observed E 16

  17. Random Variable RV Definition  A random variable X on a probability space (S,F,P) is a function X : S -> R that assigns a real number X(s) to each sample point s ∈ S, such that for every real number x, the set of sample points {s|X(s) ≤ x} is an event, that is a member of F. RVs can be discrete or continuous More concepts  cumulative distribution function  density  moments E[X i ], centralized moments, Variance, Skewness, Kurtosis Particular examples  Normal distribution  Poisson distribution  Exponential distribution  Pareto distribution 17

  18. Parameterization of distributions Parameters of 3 basic types Location  specifies an x-axis location point of a distribution’s range of values  usually the midpoint (e.g. mean for normal distribution) or lower end point for the distribution’s range  sometimes called shift parameter since changing its value shifts the distribution to the left or right, e.g., for Y = X + γ Scale  determines the scale (unit) of measurement of the values in the range of the distribution (e.g. std deviation σ for normal distribution)  changing its value compresses/expands distribution but does not alter its basic form, e.g., for Y = β X Shape  determines basic form/shape of a distribution  changing its values alters a distribution’s properties, e.g. skewness more fundamentally than a change in location or scale 18

  19. Properties of Mean, Variance and Covariance X x X stochastic variable ! F ( x ) P ( X x ) f ( y ) dy distributi on function = ! = X X f ( x ) density function X # " # " " = ! x E(cX) cE(X) = E(X) yf ( y ) dy expected value X E(X Y) E(X) E(Y) + = + # " E(X Y) E(X) E(Y) E(cX) cE(X) independen t : P(X x, Y independen t : P(X x, Y y) P(X x) P(Y y), E(XY) E(X)E(Y) = = = = = = 2 var( aX b ) a var( X ) 2 2 + = ! = var( X ) = E (( X " E ( X )) ) X var( X Y ) var( X ) var( Y ) 2 cov( X , Y ) + = + + var( X Y ) var( X ) var 2 var( aX b ) a var( X ) covariance : cov( X , Y ) E (( X E ( X ))( Y E ( covariance : cov( X , Y ) E (( X E ( X ))( Y E ( Y ))) = " " X Y cov( X , Y ) cov( X , Y ) independen t : cov( X , Y ) 0 = correlatio n : 2 2 ! X ! Y independen t : cov( X , Y ) For any random variables X, Y, Z and constant c,

  20. Proposition 2.4 X 1 , …, X n are independently and identically distributed with expected value µ and variance σ 2 . Then, Confidence intervals for estimate of mean Then, the (1 - ! ) confidence interval about x can be expressed as: ( ) ( ) t 1 s t 1 s ! ! " " N 1 N 1 2 2 ˆ " ˆ " µ " # µ # µ + N N Where – ! ( ) ( ) t N 1 is the 100 1 th percentile of the student' s t distributi on with " " ! ! 1 2 2 ! N 1 degrees of freedom (values of this distributi on can be found in tables) . ! 2 – ! s = s is the sample standard deviation. – ! N is the number of observations. 20

Recommend


More recommend