lecture 3 fitting distributions to data choice of a model
play

Lecture 3. Fitting Distributions to data - choice of a model. Igor - PowerPoint PPT Presentation

Lecture 3. Fitting Distributions to data - choice of a model. Igor Rychlik Chalmers Department of Mathematical Sciences Probability, Statistics and Risk, MVE300 Chalmers April 2013. Click on red text for extra material. Random


  1. Lecture 3. Fitting Distributions to data - choice of a model. Igor Rychlik Chalmers Department of Mathematical Sciences Probability, Statistics and Risk, MVE300 • Chalmers • April 2013. Click on red text for extra material.

  2. Random variables and cdf. Random variable is a numerical outcome X , say, of an experiment. To describe its properties one needs to find probability distribution F X ( x ). Three approaches will be discussed: I Use only the observed values of X (data) to model the variability of X , i.e. normalized histogram, empirical cdf, see Lecture 2. II Try to find the proper cdf by means of reasoning. For example a number of heads in 10 flips of a fair coin is Bin(10,1/2). III Assume that F X belongs to a class of distributions b + a Y , for example Y standard normal. Then choose values of parameters a , b that best ”fits” data.

  3. Case II - Example: Let roll a fair die. Sample space S = { 1 , . . . , 6 } and let random variable K be the number shown. All results are equally probable hence p k = P( K = k ) = 1 / 6. In 1882, R. Wolf rolled a die n = 20 000 times and recorded the number of eyes shown Number of eyes k 1 2 3 4 5 6 Frequency n k 3407 3631 3176 2916 3448 3422 Was his die fair? The χ 2 test , proposed by Karl Pearson’ (1857-1936), can be used to investigate this issue.

  4. Pearson’ χ 2 test: Hypothesis H 0 : We claim that P(“Experiment results in outcome k ”) = p k , k = 1 , . . . , r . In our example r = 6, p k = 1 / 6. Significance level α : Select the probability (risk) of rejecting a true hypothesis. Constant α is often chosen to be 0.05 or 0.01. Rejecting H 0 with a lower α indicates stronger evidence against H 0 . Data: In n experiments one observed n k times outcome k . Test: Estimate p k by p ∗ k = n k / n . Large distances p k − p ∗ k make hypothesis H 0 questionable. Pearson proposed to use the following statistics to measure the distance: r � r � ( n k − np k ) 2 ( p ∗ k − p k ) 2 � � Q = = n . (1) np k p k k =1 k =1

  5. Details of the χ 2 test How large Q should be to reject the hypothesis? Reject H 0 if Q > χ 2 α ( f ), where f = r − 1. Further, in order to use the test, as a rule of thumb one should check that np k > 5 for all k . Example 1 For Wolf’s data Q is Q = 1 . 6280 + 26 . 5816 + 7 . 4261 + 52 . 2501 + 3 . 9445 + 2 . 3585 = 94 . 2 Since f = r − 1 = 5 and the quantile χ 2 0 . 05 ( f ) = 11 . 1, we have Q > χ 2 0 . 05 (5) which leads to rejection of the hypothesis of a fair dice . 1 Example 2 Are children birth months uniformly distributed? Data, Matlab code:. 1 Not rejecting the hypothesis does not mean that there is strong evidence that H 0 is true. It is recommendable to use the terminology “reject hypothesis H 0 ” or “not reject hypothesis H 0 ” but not to say “accept H 0 ”.

  6. Case III - parametric approach to find F X . Parametric estimation procedure of F X contains three main steps: choice of a model ; finding the parameters ; analysis of error : ◮ Choose a model, i.e. select one of the standard distributions F ( x ) (normal, exponential, Binomial, Poisson ...). Next postulate that � x − b � F X ( x ) = F . a ◮ Find estimates ( a ∗ , b ∗ ) such that F n ( x ) ≈ F � ( x − b ∗ ) / a ∗ � ( F X ( x ) ≈ F n ( x )), here first method of moments to estimates parameters will be presented. Then more advanced and often more accurate maximum likelihood method will be presented on the next lecture.

  7. Moments of a rv. - Law of Large Numbers (LLN) ◮ Let X 1 , . . . , X k be a sequence of iid variables all having the distribution F X ( x ). Let E[ X ] be a constant, called the expected value of X , � + ∞ � E[ X ] = xf X ( x ) d x , or E[ K ] = k p k −∞ k ◮ If the expected value of X exists and is finite then, as k increases (we are averaging more and more variables), the average 1 k ( X 1 + X 2 + · · · + X k ) ≈ E[ X ] with equality when k approaches infinity. ◮ Linearity property E[ a + b X + c Y ] = a + b E[ X ] + c E[ Y ]. Example 3

  8. Other moments ◮ Let X i be iid all having the distribution F X ( x ). Let us also introduce constants called the moments of X , defined by � + ∞ x n f X ( x ) d x k n p k . � E[ X n ] = E[ K n ] = or −∞ k ◮ If E[ X n ] exists and is finite then, as k increases, the average 1 k ( X n 1 + X n 2 + · · · + X n k ) ≈ E[ X n ] . ◮ The same is valid for other functions of r.v.

  9. Variance, Coefficient of variation ◮ The variance V[ X ] and coefficient of variation R[ X ] � V[ X ] V[ X ] = E[ X 2 ] − E[ X ] 2 , R[ X ] = E[ X ] . ◮ IF X , Y are independent then Example 4 V[ a + b X + c Y ] = b 2 V[ X ] + c 2 V[ Y ]. ◮ Note that V[ X ] ≥ 0. If V[ X ] = 0 then X is a constant.

  10. Example: Expectations and variances. Example 5 Distribution Exp e tation V arian e Expected yearly wind energy production, on blackboard. Beta distribution, Beta ( a , b ) Binomial distribution, Bin ( n, p ) , k = 0 , 1 , . . . , n First su ess distribution Geometri distribution P oisson distribution, P o ( m ) Γ( a + b ) Γ( a )Γ( b ) x a − 1 (1 − x ) b − 1 , 0 < x < 1 a ab f ( x ) = a + b ( a + b ) 2 ( a + b +1) Exp onen tial distribution, Exp ( a ) � n p k (1 − p ) n − k p k = � np np (1 − p ) k Gamma distribution, Gamma ( a, b ) p k = p (1 − p ) k − 1 , 1 1 − p k = 1 , 2 , 3 , . . . p 2 p Gum b el distribution 1 − p 1 − p p k = p (1 − p ) k , k = 0 , 1 , 2 , . . . p 2 p Normal distribution, N ( m, σ 2 ) p k = e − m m k k ! , k = 0 , 1 , 2 , . . . m m F ( x ) = 1 − e − x/a , a 2 x ≥ 0 a Log-normal distribution, ln X ∈ N ( m, σ 2 ) b a Γ( a ) x a − 1 e − bx , a/b 2 f ( x ) = x ≥ 0 a/b Uniform distribution, U ( a, b ) F ( x ) = e − e − ( x − b ) /a , a 2 π 2 / 6 x ∈ R b + γa W eibull distribution 1 1 2 π e − ( x − m ) 2 / 2 σ 2 , f ( x ) = x ∈ R √ σ 2 m σ F ( x ) = Φ(( x − m ) /σ ) , x ∈ R e 2 m +2 σ 2 − e 2 m + σ 2 F ( x ) = Φ( ln x − m e m + σ 2 / 2 ) , x > 0 σ ( a − b ) 2 a + b f ( x ) = 1 / ( b − a ) , a ≤ x ≤ b 2 12 c F ( x ) = 1 − e − ( x − b a ) , x ≥ b b + a Γ(1 + 1 /c ) a 2 � Γ(1 + 2 c ) − Γ 2 (1 + 1 c ) �

  11. Method of moments to fit cdf to data: ◮ When a cdf F X ( x ) is specified then one can computed the expected value, variance, coefficient of variation and other moments E[ X k ]. � x − b ◮ If cdf F X ( x ) = F � , i.e. depends on two parameters a , b then a also moments are function of the parameters. E[ X k ] = m k ( a , b ) ◮ LLN tells us that having independent observations x 1 , . . . , x n of X the average values n m k = 1 � x k i → E[ X k ] , ¯ as n → ∞ . n i =1 ◮ Methods of moments recommends to estimate the parameters a , b by a ∗ , b ∗ that solve the equation system m k ( a ∗ , b ∗ ) = ¯ m k , k = 1 , 2 .

  12. Periods in days between serious earthquakes: Example 6 By experience we choose exponential family F X ( x ) = 1 − e − x / a . Since a = E[ X ] we choose a ∗ = ¯ x = 437 . 2 days. 25 1 20 0.9 0.8 15 0.7 0.6 0.5 10 0.4 0.3 5 0.2 0.1 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 Period (days) Period (days) Left figure - histogram of 62 observed times between earthquakes. Right figure - comparison of the fitted exponential cdf to the earthquake data compared with ecdf - we can see that the two distributions are very close. Is a = a ∗ , i.e. is error e = a − a ∗ = a − 437 . 2 = 0 ?

  13. Example 7 Poisson cdf The following data set gives the number of killed drivers of motorcycles in Sweden 1990-1999: 39 30 28 38 27 29 38 33 33 36 . Assume that the number of killed drivers per year is modeled as a random variable K ∈ Po( m ) and that numbers of killed drivers during consecutive years, are independent and identically distributed. From the table we read that E[ K ] = m hence methods of moments recommends to estimate parameter m by the average number m ∗ = ¯ k , viz. m ∗ = (39 + . . . + 36) / 10 = 33 . 1. Is m = m ∗ , i.e. is error e = m − m ∗ = m − 33 . 1 = 0 ?

  14. Gaussian model Example 8 Since V[ X ] = E[ X 2 ] − E[ X ] 2 LLN gives the following estimate of the variance � 2 n � n n n = 1 1 = 1 x ) 2 → V[ X ] , � � � s 2 x 2 i − x i ( x i − ¯ as n tends to infinity . n n n i =1 i =1 i =1 We proposed to model weight of newborn baby X by normal (Gaussian) cdf N ( m , σ 2 ). Since E[ X ] = m and V[ X ] = σ 2 hence the method of moments recommends to estimate m , σ 2 by m ∗ = ¯ x , ( σ 2 ) ∗ = s 2 n . For the data m ∗ = 3400 g, ( σ 2 ) ∗ = 570 2 , g 2 . Are m = m ∗ and σ 2 = s 2 n , i.e. are errors e 1 = m − m ∗ = m − 33 . 1 = 0 , e 2 = σ 2 − ( σ 2 ) ∗ = σ 2 − 570 2 = 0 ?

  15. Weibull model For environmental variables often Weibull cdf fits well data. Suppose that � x � c � � F X ( x ) = 1 − exp − , a a is scale parameter, c shape parameter. Using the table we have that � Γ(1 + 2 / c ) − Γ(1 + 1 / c ) 2 E[ X ] = a Γ(1 + 1 / c ) , R[ X ] = . Γ(1 + 1 / c ) � s 2 Method of moments: estimate the coefficient of variation by n / ¯ x , solve numerically the second equation for c ∗ , see Table 4 on page 256, then a ∗ = ¯ x / Γ(1 + 1 / c ∗ ). Example 9 Fitting Weibull cdf to bearing lifetimes Example 10 Fitting Weibull cdf to wind speeds measurements

Recommend


More recommend