sampling distribution of the variance
play

Sampling Distribution Of The Variance pierre.douillet@ensait.fr - PowerPoint PPT Presentation

WSC 2009 Sampling Distribution Of The Variance pierre.douillet@ensait.fr cole Nationale Suprieure des Arts et Industries Textiles Roubaix, France founded 1881 www.douillet.info WSC 2009


  1. ✬ ✩ WSC 2009 Sampling Distribution Of The Variance pierre.douillet@ensait.fr École Nationale Supérieure des Arts et Industries Textiles Roubaix, France ✫ ✪

  2. ✬ ✩ ✫ ✪ founded 1881

  3. ✬ ✩ www.douillet.info WSC 2009 ⇒ • Well-Known Results . . . . . . . . . . . . . . 4 notations shape chi-square statistic batch mean method • Closed Form Results for m 2 . . . . . . . . . . 9 • Experimental Results . . . . . . . . . . . . . 14 • Variations of the Sample Variance . . . . . . 19 • Useful and Useless Statistics . . . . . . . . . 25 ✫ ✪ • Conclusions . . . . . . . . . . . . . . . . . . . 29 Ensait - Roubaix - France 3

  4. ✬ ✩ www.douillet.info WSC 2009 Well-Known Results notations • random variable ξ ∈ Ω with pd f : ϕ ( ξ ) • sample of size n : ω ∈ Φ . = Ω n where x i ∈ ω are i.i.d. µ 2 = σ 2 = var ( ξ ) , � ( ξ − µ ) 4 � µ = E ( ξ ) , µ 4 = E n � ( x − m ) 4 � m 2 = s 2 , m = E ω ( x ) , m 4 = n − 1 E ω E Φ ( f ) , var Φ ( f ) • Φ -distribution of some f ( ω ) , using the product measure. ✫ ✪ Ensait - Roubaix - France 4

  5. ✬ ✩ www.douillet.info WSC 2009 shape • mean, variance, shape (everything else) • centered moments of increasing index are more and more involving rare events • Fisher’s skewness is γ 1 . � ( ξ − µ ) 3 � /σ 3 = E χ 2 � • usual : γ 1 ( gauss ) = 0 , γ 1 � � 8 /ν , γ 1 ( exp. ) = 2 = ν • not bounded (e.g lognormal) ✫ ✪ Ensait - Roubaix - France 5

  6. ✬ ✩ www.douillet.info WSC 2009 chi-square statistic • A 0 , A 1 , · · · , A ν , partition of Ω , ∀ j : p j . = Pr ( ξ ∈ A j ) > 0 . • For a sample ω , n j is the number of x i that belong to A j ( n p j − n j ) 2 P earson ( ω ) = � ν χ 2 j =0 n p j • without any other assumption, E Φ χ 2 � � P earson ( ω ) = ν � � � ν = 2 ν n − 1 + 1 1 χ 2 and var Φ � � P earson ( ω ) p j − ν − 1 0 n n √ χ 2 χ 2 � � std = P earson − ν / 2 ν even when χ 2 P earson statistic is not χ 2 ν distributed ✫ ✪ Ensait - Roubaix - France 6

  7. ✬ ✩ www.douillet.info WSC 2009 batch mean method • each result has been obtained with N = 200000 replications of the n -sized sample • containing rounding errors, allowing parallelization (with suitable random generator) • estimation of the sd of the estimators (and checking for independence) ✫ ✪ Ensait - Roubaix - France 7

  8. ✬ ✩ www.douillet.info WSC 2009 √ • Well-Known Results . . . . . . . . . . . . . . 4 ⇒ • Closed Form Results for m 2 . . . . . . . . . . 9 normal distribution normal law behaves abnormally n=2, n=3 n=2, n=3, R-uniform − a ≤ x ≤ a • Experimental Results . . . . . . . . . . . . . 14 • Variations of the Sample Variance . . . . . . 19 • Useful and Useless Statistics . . . . . . . . . 25 ✫ ✪ • Conclusions . . . . . . . . . . . . . . . . . . . 29 Ensait - Roubaix - France 8

  9. ✬ ✩ www.douillet.info WSC 2009 Closed Form Results for m 2 normal distribution • 200 000 samples ( n = 8 ) 3500 obs nor chi 3000 • plot all the m 2 ( ω ) 2500 • well known model χ 2 2000 7 1500 • goodness of fit : 1000 500 χ 2 P earson = 25 . 10 0 0 49 160 ⊕ = observed, solid= chi2(7) χ 2 ✫ std = − 1 . 28 ✪ Ensait - Roubaix - France 9

  10. ✬ ✩ www.douillet.info WSC 2009 normal law behaves abnormally • Random variates m and m 2 are fully independent if and only if the sampled population Ω is normal. In such a case, ( n − 1) m 2 /µ 2 is χ 2 n − 1 distributed. • Most of the time, stated in the "Gaussian distribution" chapter of statistics books • Quite never recalled in the " χ 2 " chapter... • full independence is the key property for χ 2 • χ 2 is not a model, even not an approximate model, for the sample variance, when Ω is not Gaussian. ✫ ✪ Ensait - Roubaix - France 10

  11. ✬ ✩ www.douillet.info WSC 2009 n=2, n=3 • Very special situations, excluded from next coming general formulae • A direct attack leads to : t + √ 2 m 2 � 2 � � � pd f 2 ( m 2 ) = R ϕ ( t ) ϕ d t m 2 √ � t = s √ 4 3 � � 3 m 2 − 3 t 2 � pd f 3 ( m 2 ) = R ϕ ( u − t ) ϕ ( u + t ) ϕ u + d u d t √ t =0 m 2 − t 2 • Applied to a Gaussian distribution, leads back to χ 2 1 and χ 2 2 • m and m 2 are linearly but not fully independent ✫ ✪ Ensait - Roubaix - France 11

  12. ✬ ✩ www.douillet.info WSC 2009 n=2, n=3, R-uniform − a ≤ x ≤ a 1 1 • n = 2 , 0 ≤ m 2 ≤ 2 a 2 , pd f 2 ( m 2 ) = a √ 2 m 2 − 2 a 2 • n = 3 , 0 ≤ s 2 = m 2 ≤ 4 a 2 / 3 and � π  √ f 3 ( m 2 ) = 3 3 s � pd 6 − 0 < s < a  a 2 2 a  � � � √ f 3 ( m 2 ) = 3 3 s 2 arcsin a s − π s pd 3 − 2 a + a 2 − 1 a < s  a 2  • the sample belongs to a cube ; we have to measure the set of all the ω that share the same value of m 2 ; the shape and therefore the description changes when ω travels from center (hexagon) to corner (triangle). ✫ ✪ Ensait - Roubaix - France 12

  13. ✬ ✩ www.douillet.info WSC 2009 √ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m 2 . . . . . . . . . . 9 ⇒ • Experimental Results . . . . . . . . . . . . . 14 Z-uniform R-uniform lognormal Student-like t-statistic • Variations of the Sample Variance . . . . . . 19 • Useful and Useless Statistics . . . . . . . . . 25 ✫ ✪ • Conclusions . . . . . . . . . . . . . . . . . . . 29 Ensait - Roubaix - France 13

  14. ✬ ✩ www.douillet.info WSC 2009 Experimental Results Z-uniform 1200 4500 obs nor chi 0 0 37 120 0 0 37 120 neither chi2 nor normal ✫ ✪ a = 10 , n = 5 , # = 617 Ensait - Roubaix - France 14

  15. ✬ ✩ www.douillet.info WSC 2009 R-uniform 5000 7000 obs obs nor nor chi chi 0 0 0 33 120 0 33 100 n = 8 quite normal a = 10 , n = 5 a = 10 , ✫ ✪ γ 1 ≈ 0 . 40 � = 1 . 41 γ 1 ≈ 0 . 27 � = 1 . 07 Ensait - Roubaix - France 15

  16. ✬ ✩ www.douillet.info WSC 2009 lognormal 3000 80000 obs obs nor nor chi chi 0 0 0 98 250 0 10 M = 7 , K = 2 ln M = E (ln ξ ) ln K = var (ln ξ ) n = 8 m 2 , usual scale, γ 1 ≈ 39 m 2 , log scale, γ 1 ≈ 0 . 05 ✫ ✪ Ensait - Roubaix - France 16

  17. ✬ ✩ www.douillet.info WSC 2009 Student-like t-statistic obs obs nor nor stu stu 80000 70000 0 0 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 R-uniform, n = 5 lognormal, n = 8 t = ( m − µ ) /s , tail t very skew, far away from ✫ ✪ ≈ Student models Ensait - Roubaix - France 17

  18. ✬ ✩ www.douillet.info WSC 2009 √ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m 2 . . . . . . . . . . 9 √ • Experimental Results . . . . . . . . . . . . . 14 ⇒ • Variations of the Sample Variance . . . . . . 19 expectation of products experimental values of theoretical formulae new results and proof of correctness some applications • Useful and Useless Statistics . . . . . . . . . 25 ✫ ✪ • Conclusions . . . . . . . . . . . . . . . . . . . 29 Ensait - Roubaix - France 18

  19. ✬ ✩ www.douillet.info WSC 2009 Variations of the Sample Variance expectation of products = � µ α j • estimation of monomials α . relative to Ω j = � m β k using monomials β . relative to ω . k = � β k the number of m k occurring in β • degree : dg m β . = � k β k , the number of factors x i occurring in β . dg x β . E Φ ( β ) ∈ Span { α | dg x α = dg x β } ✫ ✪ Ensait - Roubaix - France 19

  20. ✬ ✩ www.douillet.info WSC 2009 experimental values of theoretical formulae "Science is what we understand well enough to explain to a computer. Art is everything else we do (Knuth)." • for each n in [2 , N ] , expand β as polynomial in the x 1 · · · x n • substitute each x j i ( j > 1 ) by µ j , and then each x i by 0 • for each n , obtain a polynomial P n = � α c ( n, α ) × α , where c ( n, α ) ∈ Q ✫ ✪ Ensait - Roubaix - France 20

  21. ✬ ✩ www.douillet.info WSC 2009 experimental values of theoretical formulae (2) • each c ( n, α ) has a closed form, quotient of polynomials in n , whose degrees cannot exceed dg x β • general algorithm AeqB, implemented as gfun (Maple) • each denominator is a divisor of n p ( n − 1) q where p + q + 2 = dg x β and q + 1 = dg m β . • closed form of polynomial numerator from a list of values : divided differences (Newton) ✫ ✪ Ensait - Roubaix - France 21

  22. ✬ ✩ www.douillet.info WSC 2009 new results and proof of correctness • Fisher (1929) started the process. • n = 11 now, n = 12 soon after Xmas (?) • Error prone process... • Test : the determinant of all the β over all the α of same dg x splits into linear factors. ∆ 4 = ( n − 2)( n − 3) n ( n − 1) ∆ 11 = ( n − 2) 14 ( n − 3) 12 ( n − 4) 10 ( n − 5) 7 ( n − 6) 5 ( n − 7) 3 ( n − 8) 2 ( n − 9)( n − 10) n 28 ( n − 1) 27 ✫ ✪ Ensait - Roubaix - France 22

Recommend


More recommend