UQ, STAT2201, 2017, Lecture 5 Unit 4 – Joint Distributions and Unit 5 – Descriptive Statistics. 1
Unit 4 - Joint Probability Distributions 2
A joint probability distribution – two (or more) random variables in the experiment. In case of two, referred to as bivariate probability distribution . 3
A joint probability mass function for discrete random variables X and Y , denoted as p XY ( x , y ), satisfies the following properties: (1) p XY ( x , y ) ≥ 0 for all x , y . (2) p XY ( x , y ) = 0 for ( x , y ) not in the range. (3) � � p XY ( x , y ) = 1, where the summation is over all ( x , y ) in the range. (4) p XY ( x , y ) = P ( X = x , Y = y ). 4
5
Example: Throw two independent dice and look at the, X ≡ Sum , Y ≡ Product . 6
A joint probability density function for continuous random variables X and Y , denoted as f XY ( x , y ), satisfies the following properties: (1) f XY ( x , y ) ≥ 0 for all x , y . (2) f XY ( x , y ) = 0 for ( x , y ) not in the range. ∞ ∞ � � (3) f XY ( x , y ) dx dy = 1. −∞ −∞ (4) For small ∆ x , ∆ y : � � f XY ( x , y ) ∆ x ∆ y ≈ P ( X , Y ) ∈ [ x , x +∆ x ) × [ y , y +∆ y ) . (5) For any region R of two-dimensional space, � � �� P ( X , Y ) ∈ R = f XY ( x , y ) dx dy . R e.g. Height and Weight. 7
8
A joint probability density function can also be defined for n > 2 random variables (as can be a joint probability mass function ). The following needs to hold: (1) f X 1 X 2 ... X n ( x 1 , x 2 , . . . , x n ) ≥ 0. ∞ ∞ ∞ � � � (2) f X 1 X 2 ... X n ( x 1 , x 2 , . . . , x n ) dx 1 dx 2 . . . dx n = 1. . . . −∞ −∞ −∞ 9
The marginal distributions of X and Y as well as conditional distributions of X given a specific value Y = y and vice versa can be obtained from the joint distribution. 10
If the random variables X and Y are independent, then f XY ( x , y ) = f X ( x ) f Y ( y ) and similarly in the discrete case. 11
Generalized Moments 12
The expected value of a function of two random variables is: �� � � h ( X , Y ) = h ( x , y ) f XY ( x , y ) dx dy for X , Y continuous . E 13
The covariance is a common measure of the relationship between two random variables (say X and Y ). It is denoted as cov( X , Y ) or σ XY , and is given by: � � σ XY = E ( X − µ X )( Y − µ Y ) = E ( XY ) − µ X µ Y . The covariance of a random variable with itself is its variance. 14
The correlation between the random variables X and Y , denoted as ρ XY , is cov( X , Y ) = σ XY ρ XY = . � σ X σ Y V ( X ) V ( Y ) For any two random variables X and Y , − 1 ≤ ρ XY ≤ 1. 15
If X and Y are independent random variables then σ XY = 0 and ρ XY = 0. The opposite case does not always hold: In general ρ XY = 0 does not imply independence. For jointly Normal random variables it does. In any case, if ρ XY = 0 then the random variables are called uncorrelated . 16
When considering several random variables, it is common to consider the (symmetric) Covariance Matrix , Σ with Σ i , j = cov( X i , X j ). 17
Bivariate Normal 18
The probability density function of a bivariate normal distribution is 1 f XY ( x , y ; σ X , σ Y , µ X , µ Y , ρ ) = � 1 − ρ 2 2 πσ X σ Y � � �� ( x − µ X ) 2 + ( y − µ Y ) 2 − 1 − 2 ρ ( x − µ X )( y − µ Y ) × exp 2(1 − ρ 2 ) σ 2 σ 2 σ X σ Y X Y for −∞ < x < ∞ and −∞ < y < ∞ . The parameters are σ X > 0, σ Y > 0, −∞ < µ X < ∞ , −∞ < µ Y < ∞ , − 1 < ρ < 1. 19
20
Linear Combinations of Random Variables 21
Given random variables X 1 , X 2 , . . . , X n and constants c 1 , c 2 , . . . , c n , the (scalar) linear combination Y = c 1 X 1 + c 2 X 2 + · · · + c n X n is often a random variable of interest. 22
The mean of the linear combination is the linear combination of the means, E ( Y ) = c 1 E ( X 1 ) + c 2 E ( X 2 ) + · · · + c n E ( X n ) . This holds even if the random variables are not independent. 23
The variance of the linear combination is as follows: V ( Y ) = c 2 1 V ( X 1 )+ c 2 2 V ( X 2 )+ · · · + c 2 � � n V ( X n )+2 c i c j cov( X i , X j ) i < j 24
If X 1 , X 2 , . . . , X n are independent (or even if they are just uncorrelated). V ( Y ) = c 2 1 V ( X 1 ) + c 2 2 V ( X 2 ) + · · · + c 2 n V ( X n ) . 25
Example: Derive Mean and variance of the Binomial Distribution. 26
Linear Combinations of Normal Random Variables 27
Linear combinations of Normal random variables remain Normally distributed : If X 1 , . . . , X n are jointly Normal then, � � Y ∼ Normal E ( Y ) , V ( Y ) . 28
i.i.d. Random Samples 29
A collection of random variables, X 1 , . . . , X n is said to be i.i.d. , or independent and identically distributed if they are mutually independent and identically distributed. The ( n - dimensional) joint probability density is a product of the individual densities. 30
In the context of statistics, a random sample is often modelled as an i.i.d. vector of random variables. X 1 , . . . , X n . An important linear combination associated with a random sample is the sample mean : � n i =1 X i = 1 nX 1 + 1 nX 2 + . . . + 1 X = nX n . n 31
If X i has mean µ and variance σ 2 then sample mean (of an i.i.d. sample) has, V ( X ) = σ 2 E ( X ) = µ, n . 32
Unit 5 – Descriptive Statistics 33
Descriptive statistics deals with summarizing data using numbers, qualitative summaries, tables and graphs. There are many possible data configurations... 34
Single sample: x 1 , x 2 , . . . , x n . 35
Single sample over time (time series): x t 1 , x t 2 , . . . , x t n with t 1 < t 2 < . . . < t n . 36
Two samples: x 1 , . . . , x n and y 1 , . . . , y m . 37
Generalizations from two samples to k samples (each of potentially different sample size, n 1 , . . . , n k ). 38
Observations in tuples: ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ). 39
Generalizations from tuples to vector observations (each vector of length ℓ ), ( x 1 1 , . . . , x ℓ 1 ) , . . . , ( x 1 n , . . . , x ℓ n ) . 40
Individual variables may be categorical or numerical . Categorical variables may be ordinal meaning that they be sorted (e.g. “a”, “b”, “c”, “d”), or not ordinal (e.g. “cat”, “dog”, “fish”). 41
A Statistic 42
A statistic is a quantity computed from a sample (assume here a single sample x 1 , . . . , x n ). 43
n � x i The sample mean : x = x 1 + · · · + x n i =1 = . n n 44
n n ( x i − x ) 2 x 2 i − n x 2 � � The sample variance : s 2 = i =1 i =1 = . n − 1 n − 1 √ s 2 . The sample standard deviation : s = 45
Order Statistics 46
Order statistics : Sort the sample to obtain the sequence of sorted observations, denoted x (1) , . . . , x ( n ) where, x (1) ≤ x (2) ≤ . . . ≤ x ( n ) . Some common order statistics: The minimum min( x 1 , . . . , x n ) = x (1) . The maximum max( x 1 , . . . , x n ) = x ( n ) . The median � x ( n +1 if n is odd , 2 ) median = 1 � � x ( n 2 ) + x ( n if n is even . 2 +1) 2 The median is the 50’th percentile and the 2nd quartile (see below). 47
The q th quantile ( q ∈ [0 , 1]) or alternatively the p = 100 q percentile (measured in percents instead of a decimal), is the observation such that p percent of the observations are less than it and (1 − p ) percent of the observations are greater than it. The first quartile , denoted Q 1 is the 25th percentile. The second quartile ( Q 2) is the median. The third quartile , denoted Q 3 is the 75th percentile. Thus half of the observations lie between Q 1 and Q 3. In other words, the quartiles break the sample into 4 quarters. The difference Q 3 − Q 1 is the interquartile range . The sample range is x ( n ) − x (1) . 48
Interlude: The quantile of a probability distribution? Given α ∈ [0 , 1] : What is x such that P ( X ≤ x ) = α , F ( x ) = α. Or, � x u du = α. −∞ To find the quantile, solve the equation for x . 49
Visualization 50
Histogram (with Equal Bin Widths): (1) Label the bin (class interval) boundaries on a horizontal scale. (2) Mark and label the vertical scale with frequencies or counts . (3) Above each bin, draw a rectangle where height is equal to the frequency (or count). 51
A Kernel Density Estimate (KDE) is a way to construct a Smoothed Histogram . While construction is not as straightforward as steps (1)–(3) above, automated tools can be used. 52
Both the histogram and the KDE are not unique in the way they summarize data. With these methods, different settings (e.g. number of bins in histograms or bandwidth in a KDE) may yield different representations of the same data set. Nevertheless, they are both very common, sensible and useful visualisations of data. 53
Recommend
More recommend