Review • Probability • Basic definitions: Randomization experiment Sample spaces Elementary outcomes Event • Basic operations—conditional probability • Bayes Theorem
Objectives • Random Variable Discrete random variable Continuous random variable • Two probability distributions Binomial distribution Normal distribution
Random variables 4 • A random variable is a function that assigns numeric values to different events in a sample space. Usually we denote a random variable using a capital letter X, Y or Z… NOTE: (1) Randomness; (2) Numeric values • Example 1: Randomly select a student from a class. • X=student’s number of siblings. X could be 0, 1, 2 … Example 2: Randomly select a student from a class. • X=student’s height. X could be any value bigger than 0
Two types of random variables 5 Discrete random variable: their outcomes are set of 1. discrete (isolated) values. Eg. X=number of siblings Continuous random variable: its possible values 2. cannot be enumerated; infinite number of values, all outcomes have probability zero. p(x)=0 for every x. Eg. X=the student’ height
EG1. Tossing two coins 6 let X=number of heads Outcome TT HT TH HH x 0 1 2 Notation: X: variable x: observed values
Probability distribution function 7 • A probability distribution function (pdf) is a mathematical relationship, or rule, that assigns to any possible value x of a discrete random variable X the probability Pr(X=x).
Probability Distribution of the Random Variable 8 X=number of heads. Outcome TT WT TW WW x 0 1 2 P(X=x) 1/4 1/2 1/4 Probability histogram
EG2. Tossing two dice 9 Y: the sum of the dots on the two Dice. What’s the possible values of Y?
Probability Distribution of the Random Variable 10 Y: the sum of the dots on the two Dice.
Relative frequency In practice, the probability can be estimated by the relative frequency of an event “in a long run”. frequency of occurrences Probability = frequency of all possible occurrences 0 ≤ Probability ≤ 1 Relative frequency histogram should look very much like the probability histogram, if the experiment is repeated many times.
Data set vs. Probability distributions Sample properties—based on data set 12 ∑ n = x x / n Sample mean: = i i 1 1 ∑ Sample variance: n = − 2 2 s ( x x ) − = i i 1 n 1 Model or population properties—based on probability distribution. R ∑ µ = = Pr( ) x X x Population mean: i i = i 1 Population variance: R ∑ σ = − µ = 2 2 ( x ) Pr( X x ) i i = i 1
Mean of Random Variable Mean or expected value of X, denoted as E(X) 13 or µ, is defined as R ∑ = µ = = ( ) Pr( ) E X x X x i i = i 1 It is the sum of the possible values, each weighted by its probability Expectation represents “average” value of the random variable
Mean of X 14 X=number of heads. Outcome TT WT TW WW x 0 1 2 P(X=x) 1/4 1/2 1/4 xP(x) 0 1/2 1/2 3 ∑ = µ = = = E X ( ) x Pr( X x ) 1 i i = i 1
Variance of Random Variable 15 The variance of X is the expected squared distance from the population mean. R ∑ = σ = − µ = 2 2 Var ( X ) ( x ) Pr( X x ) i i = i 1 The standard deviation σ is the square root of variance = σ = sd ( X ) Var ( X )
Variance of X 16 X=number of heads. (X-µ) 2 P(x) x P(x) (0-1) 2 *0.25=0.25 0 0.25 (1-1) 2 *0.25=0 1 0.5 (2-1) 2 *0.25=0.25 2 0.25 Total 0.50 σ = 2 Thus, 0.5 Summary, µ and σ are computed from probability distribution. They are population properties.
Two types of random variables 17 Discrete random variable: their outcomes are set of 1. discrete (isolated) values. Continuous random variable: its possible values 2. cannot be enumerated; infinite number of values, all outcomes have probability zero. p(x)=0 for every x.
Continuous random variables 18 A balanced spinning pointer. Can stop anywhere in the circle X—the proportion of the total circumference it lands on. X can be any value between 0 and 1. Infinite values. p(0.25≤x ≤0.75)=0.5 p(x=0.5)=0, for x can take on an infinite number of values.
Probability density function(pdf) of X = y f x ( ) 19 • The curve is the probability density function (pdf) of the random = variable X y f x ( ) • Pr( a≤X ≤b)= is the area under the curve between the x value a and b. = ∫ b ≤ ≤ P a ( X b ) f x dx ( ) a • The total area under the density function curve over the entire range of possible values for the random variable is 1 ∞ ∫ −∞ ≤ ≤ ∞ = = P ( X ) f x dx ( ) 1 −∞
Probability density function(pdf) of X 20 • The pdf has large values in regions of high probability and = y f x ( ) small values in regions of low probability • Pr(X=x)=0 for any specific value x • Generally, a distinction is not made between probabilities such as Pr(X<x) and Pr( X≤x ), Pr( a≤X≤b ) and Pr(a<X<b) when X is a continuous
Expectation and variance of a continuous random variable 21 = ∫ ∞ µ = µ • Mean : E (X) xf x dx ( ) −∞ Center of the probability density ∞ ∫ = σ = − µ σ 2 2 2 Var (X) ( x ) f x dx ( ) • Variance : −∞ Spread of the probability density • The standard deviation , or σ , is the square root of the variance, that is, σ = Var ( X )
Two distributions 22 Binomial --discrete Normal -- continuous
Bernoulli trial 23 Examples: A heads-or-tails Coin toss A win-or-lose football game A pass-or-fail automotive smog inspection Properties: Two outcomes: success or failure Success probability(p) is the same in each trial Trials are independent.
Binomial random variable 24 ---X is the number of success in n repeated Bernoulli trial with probability p of success. Success probability(p) is the same in each trial Trials are independent.
Binomial random variable 25 Probability Distribution: the probability of obtaining k successes in n trial, with success probability p: n − = = − k n k P X ( k ) p (1 p ) k : counts all possible ways of getting k = n n ! − success and n-k failures k k n !( k )! = × − × × where n ! n ( n 1) ... 1 : probability for getting k success and − − k n k p (1 p ) n-k failures
Mean and Variance of the Binomial Distribution 26 µ = np σ = − 2 np (1 p )
Exercise 27 Newborns were screened for HIV in a Massachusetts hospital. The positive rate for inner-city baby is p=0.01. If 500 newborns are screened, 1. what is the exact binomial probability of 5 HIV positive test results?
Exercise 28 Newborns were screened for HIV in a Massachusetts hospital. The positive rate for inner-city baby is p=0.01. If 500 newborns are screened, 1. what is the exact binomial probability of 5 HIV positive test results? 500 = = − 5 495 P X ( 5) 0.01 (1 0.01) Answer: 5 = 0.176 EXCEL: BINOMDIST(5,500,0.01,FALSE)
Exercise 29 Newborns were screened for HIV in a Massachusetts hospital. The positive rate for inner-city baby is p=0.01. If 500 newborns are screened, 2. What is the exact binomial probability of at least 5 HIV positive test results?
Exercise 30 Newborns were screened for HIV in a Massachusetts hospital. The positive rate for inner-city baby is p=0.01. If 500 newborns are screened, 2. What is the exact binomial probability of at least 5 HIV positive test results? ≥ = − ≤ P X ( 5) 1 P X ( 4) Answer: = − 1 ( 4) F = − 1 0.44 = 0 .5 6 EXCEL: F(4)= BINOMDIST(4,500,0.01,TRUE)
Normal distribution 31 • Normal distribution is also called Gaussian distribution, after the well-known mathematician Karl Gauss (1777-1855, “the Prince of Mathematicians“)
Normal distribution 32 • Normal distribution is very useful • Many things closely follow a normal distribution • Heights of people • Errors in measurement • Blood pressure • Scores on a test • Many other distributions can be made approximately normal by transformation—Binomial et al. • Most statistical methods considered in this text are based on normal distribution
The pdf of normal distribution 33 • The normal distribution is defined by its pdf, which is given as for some parameters µ and σ − µ 2 ( x ) − 1 σ 2 = 2 f x ( ) e πσ 2
Other properties of Normal pdf 34 • Mean=median=mode • Symmetry about the center • 50% of values less than the mean
Location is measured by µ • In the graph, µ 2 > µ 1 35
Spread is measured by σ 2 • In the graph, σ 2 > σ 1 36
Recommend
More recommend