probability and statistics
play

Probability and Statistics for Computer Science In sta(s(cs we - PowerPoint PPT Presentation

Probability and Statistics for Computer Science In sta(s(cs we apply probability to draw conclusions from data. ---Prof. J. Orloff Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.06.2020 Last time Cumula(ve


  1. Probability and Statistics ì for Computer Science “In sta(s(cs we apply probability to draw conclusions from data.” ---Prof. J. Orloff Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.06.2020

  2. Last time � Cumula(ve Distribu(on Func(on of a con(nuous RV * ⇒ � Normal (Gaussian) distribu(on C LT . . . = 1%0 pcxldx m

  3. Objectives � Exponen(al Distribu(on → � Sample mean and confidence interval

  4. Exponential distribution � Common p ( x ) = λ e − λ x for x ≥ 0 { o Model for otherwise wai(ng (me � Associated y f p ex ) DX = I with the - TX Poisson is / % - I - distribu(on . with the poisoned a same λ " T a : Credit: wikipedia

  5. Exponential distribution � A con(nuous random variable X is exponen(al if it represent the “(me” un(l next incident in a Poisson distribu(on with intensity λ . Proof See Degroot et al Pg 324. p ( x ) = λ e − λ x for x ≥ 0 � It’s similar to Geometric distribu1on – the discrete version of wai(ng in queue

  6. Expectations of Exponential distribution � A con(nuous random variable X is exponen(al if it represent the “(me” un(l next incident in a Poisson distribu(on with intensity λ . = at Jjoxpcxsdx p ( x ) = λ e − λ x for x ≥ 0 = ¥ Sf ex - 55 pcxidx E [ X ] = 1 & var [ X ] = 1 x λ 2 λ - -

  7. Example of exponential distribution � How long will it take un(l the next call to be received by a call center? Suppose it’s a random variable T . If the number of incoming call is a Poisson distribu(on with intensity λ = 20 in an hour . What is the expected (me for T? = To T = I - 05 Chr ) = o n has Ayame R ! Exponential

  8. Motivation for drawing conclusion from samples � In a study of new-born babies’ health, random samples from different (me, places and different groups of people will be collected to see how the overall health of the babies is like. - f Weights babies at l month ?

  9. Motivation of sampling: the poll example Source: FiveThirtyEight.com � This senate elec(on poll tells us: � The sample has 1211 likely voters � Ms. Hyde-Smith has realized sample mean equal to 51% � What is the es(mate of the percentage of votes for Hyde-smith? � How confident is that es(mate?

  10. Population � What is a popula(on? � It’s the en(re possible data set { X } � It has a countable size N p � The popula(on mean is a number popmean ( { X } ) � The popula(on standard devia(on is and popsd ( { X } ) is also a number � The popula(on mean and standard devia(on are the same as defined previously in chapter 1

  11. Population 12 } ④ }f= { I , 3 2 , - Np= 12 - - , pm mean4X},=? pop Std 4 X ) ) = ? ElX I

  12. Sample � The sample is a random subset of the BETTI . Np popula(on and is denoted as , where { x } sampling is done with replacement � The sample size is assumed to be much N less than popula(on size N p � The sample mean of a popula1on is X ( N ) and is a random variable

  13. Sample { x } ' " and Sample Mean X 6 3 5 4 l l . - 12 } { X } = fi , 3 2 , - - - , One 3 } N = 5 = { I , 2 , 3 , random I , , Sample - x RV ' value ? I takes → = KitKzt---+ ) XCN = z N " ' ⇒ x' * TIFFIN ! ' i 13 ' I I , = , , .

  14. Sample mean of a population � The sample mean of a popula(on is very similar to the sample mean of N random variables if the samples are IID samples -randomly & - independently drawn with replacement. � Therefore the expected value and the standard devia(on of the sample mean can be derived similarly as we did in the proof of the weak law of large numbers. - -

  15. Sample mean of a population � The sample mean is the average of IID samples X ( N ) = 1 N ( X 1 + X 2 + ... + X N ) T � By linearity of the expecta(on and the fact the sample items are iden(cally drawn from the same - - popula(on with replacement OQi÷ T E [ X ( N ) ] = 1 N ( E [ X (1) ] + E [ X (1) ] .. + E [ X (1) ]) = E [ X (1) ]

  16. Expected value of one random sample is the population mean � Since each sample is drawn uniformly from the popula(on - E [ X (1) ] = popmean ( { X } ) therefore E [ X ( N ) ] = popmean ( { X } ) I � We say that is an unbiased es(mator of the X ( N ) popula(on mean.

  17. Standard deviation of the sample mean < � We can also rewrite another result from the lecture on the weak law of large numbers ' " ] Std Cx var [ X ( N ) ] = popvar ( { X } ) = i N � The standard devia(on of the sample mean ✓ std [ X ( N ) ] = popsd ( { X } ) √ N - � But we need the popula(on standard devia(on in order to calculate the ! std [ X ( N ) ] -7*9 ?

  18. Unbiased estimate of population standard deviation & Stderr � The unbiased es(mate of is popsd ( { X } ) defined as � 1 � stdunbiased ( { x } ) = ( x i − mean ( { x i } )) 2 N − 1 x i ∈ sample � So the standard error is an es(mate of std [ X ( N ) ] std [ X ( N ) ] = popsd ( { X } ) arr :* √ a N popsd ( { X } ) = stdunbiased ( { x } ) . = stderr ( { x } ) x √ √ N N

  19. ( s ) standard the unbiased The reason use to deviation for pops d mm - . 9 ch 7 L n m Hogg et . al . notation might be different * The in this ref .

  20. Standard error: election poll 51% � What is the es(mate of the percentage of votes ② 51% for Hyde-smith? sanpfI.mg?ue - Number of sampled voters who selected Ms. Smith is: u - loaf %i¥#¥÷¥ " " ' 1211(0.51) � 618 µ = 1211 yens Number of sampled voters who didn’t selected Ms. Smith was 1211(0.49) �� 593

  21. Standard error: election poll =D � stdunbiased ( { x } ) � 1 1211 − 1(618(1 − 0 . 51) 2 + 593(0 − 0 . 51) 2 ) = 0 . 5001001 = - - 2- a- t.EE � exit : stderr ( { x } ) . - 0 . 5 viii. = 1211 ≃ 0 . 0144 √ : : ;÷÷ F- 1211

  22. Interpreting the standard error � Sample mean is a random variable and has its own probability distribu(on, stderr is an es(mate of the sample mean’s standard devia(on � When N is very large, according to the Central Limit Theorem , sample mean is approaching a normal distribu(on with stdwnb.dk# µ I meant " ) ) GE std err = ; x NJ " ' ) =p . pmeaylx } , " ) " - E [ x Efx -

  23. Interpreting the standard error � Sample mean is a random variable and has its own probability distribu(on, stderr is an es(mate of sample mean’s standard devia(on � When N is very large, according to the Central Limit Theorem , sample mean is approaching a normal distribu(on with σ = popsd ( { X } ) . µ = popmean ( { X } ) ; = stderr ( { x } ) x √ N stderr ( { x } ) = stdunbiased ( { x } ) √ N

  24. Interpreting the standard error - g Probability 99.7% distribu(on 95% of sample 68% mean tends normal when N is large Credit: ( N ) wikipedia X . . . . Flues stderr I mean 4 × 34 Popula(on s = Std hub mean #

  25. Confidence intervals � Confidence interval 95% 0.5 for a popula(on mean 0.4 is defined by frac(on 0.3 dnorm(x) � Given a percentage, 0.2 find how many units of 0.1 strerr it covers. 0.0 − 4 − 2 0 2 4 -2 2 x 9514 × 44 For 95% of the realized sample means , values the popula(on mean lies in realized value [sample mean-2 stderr, sample mean+2 stderr] ← - KEI

  26. Confidence intervals when N is large � For about 68% of realized sample means mean ( { x } ) − stderr ( { x } ) ≤ popmean ( { X } ) ≤ mean ( { x } ) + stderr ( { x } ) � For about 95% of realized sample means mean ( { x } ) − 2 stderr ( { x } ) ≤ popmean ( { X } ) ≤ mean ( { x } )+2 stderr ( { x } ) � For about 99.7% of realized sample means mean ( { x } ) − 3 stderr ( { x } ) ≤ popmean ( { X } ) ≤ mean ( { x } )+3 stderr ( { x } )

  27. Q. Confidence intervals � What is the 68% confidence interval for a popula(on mean? A. [sample mean-2stderr, sample mean+2stderr] or B. [sample mean-stderr, sample mean+stderr] C. [sample mean-std, sample mean+std]

  28. Standard error: election poll " ' ' ' ' " " here is × X 51% g � We es(mate the popula(on mean as 51% with stderr 1.44% → an t " " ex - , meant a } , . � The 95% confidence interval is { x ) has NINI [51%-2×1.44%, 51%+2×1.44%]= [48.12%, 53.88%] " it , *

  29. Q. � A store staff mixed their fuji and gala apples and they were individually wrapped, so they are indis(nguishable. if I pick 30 apples and found 21 fuji , what is my 95% confidence interval to es(mate the popmean is 70% for fuji? (hint: strerr > 0.05) o A. [0.7-0.17, 0.7+0.17] B. [0.7-0.056, 0.7+0.056]

  30. What if N is small? When is N large enough? � If samples are taken from normal distributed popula(on, the following variable is a random variable whose distribu(on is Student’s t - distribu(on with N -1 degree of freedom. → random sample 3 × 3 sup " M site from e ! T = mean ( { x } ) − popmean ( { X } ) Einen " " " stderr ( { x } ) * " I . n.EC x' R = Degree of freedom is N -1 due mean , to this constraint: = pop � ( x i − mean ( { x } )) = 0 i

  31. t-distribution is a family of distri. with different degrees of freedom t-distribu(on with N=5 pdf of t − distribution and N=30 0.5 degree = 4, N=5 degree = 29, N=30 0.4 0.3 density 0.2 0.1 Credit : wikipedia 0.0 William Sealy Gosset 1876-1937 − 10 − 5 0 5 10 X

Recommend


More recommend