multivariate probability distributions
play

Multivariate probability distributions September 1, 2017 STAT 151 - PowerPoint PPT Presentation

Outline Background Discrete bivariate distribution Continuous bivariate distribution Multivariate probability distributions September 1, 2017 STAT 151 Class 2 Slide 1 Outline Background Discrete bivariate distribution Continuous bivariate


  1. Outline Background Discrete bivariate distribution Continuous bivariate distribution Multivariate probability distributions September 1, 2017 STAT 151 Class 2 Slide 1

  2. Outline Background Discrete bivariate distribution Continuous bivariate distribution Outline of Topics Background 1 Discrete bivariate distribution 2 Continuous bivariate distribution 3 STAT 151 Class 2 Slide 2

  3. Outline Background Discrete bivariate distribution Continuous bivariate distribution Multivariate analysis When one measurement is made on each observation in a dataset, univariate analysis is used, e.g. , survival time of patients If more than one measurement is made on each observation, a multivariate analysis is used, e.g. , survival time, age, cancer subtype, size of cancer, etc. We focus on bivariate analysis, where exactly two measurements are made on each observation The two measurements will be called X and Y . Since X and Y are obtained for each observation, the data for one observation is the pair ( X , Y ) STAT 151 Class 2 Slide 3

  4. Outline Background Discrete bivariate distribution Continuous bivariate distribution Bivariate data Bivariate data can be represented as: Observation X Y 1 X 1 Y 1 2 X 2 Y 2 3 X 3 Y 3 4 X 4 Y 4 . . . . . . . . . n X n Y n Each observation is a pair of values, e.g. , ( X 4 , Y 4 ) is the 4-th observation X and Y can be both discrete, both continuous, or one discrete and one continuous. We focus on the first two cases Some examples: X (survived > 1 year) and Y (cancer subtype) of each patient in a sample X (length of job training) and Y (time to find a job) for each unemployed individual in a job training program X (income) and Y (happiness) for each individual in a survey STAT 151 Class 2 Slide 4

  5. Outline Background Discrete bivariate distribution Continuous bivariate distribution Bivariate distributions We can study X and Y separately, i.e. , we can analyse X 1 , X 2 , ..., X n and Y 1 , Y 2 , ..., Y n separately using probability distribution function, probability density function or cumulative distribution function. These are examples of univariate analyses. When X and Y are studied separately, their distribution and probability are called marginal When X and Y are considered together, many interesting questions can be answered, e.g. , Is subtype I cancer ( X ) associated with a higher chance of survival beyond 1 year ( Y )? Does longer job training ( X ) result in shorter time to find a job ( Y )? Do people with higher income ( X ) lead a happier life ( Y )? The joint behavior of X and Y is summarized in a bivariate probability distribution . A bivariate distribution is an example of a joint distribution STAT 151 Class 2 Slide 5

  6. Outline Background Discrete bivariate distribution Continuous bivariate distribution Review of a discrete distribution: Drawing a marble from an urn Probability 1 2 4 3 2 3 5 5 5 Probability distribution tells us the long run frequency for is higher than STAT 151 Class 2 Slide 6

  7. Outline Background Discrete bivariate distribution Continuous bivariate distribution Discrete bivariate distribution - Drawing 2 marbles with replacement Draw 2 ( Y ) Draw 1 ( X ) 9 6 1 25 25 2 4 3 5 6 4 25 25 � 3 � � 3 � = 9 P ( X = and Y = ) = P ( , ) = 5 5 25 � 2 � � 3 � = 6 P ( X = and Y = ) = P ( , ) = 5 5 25 ) = 9 25 + 6 25 + 6 25 + 4 P ( , ) + P ( , ) + P ( , ) + P ( , 25 = 1 STAT 151 Class 2 Slide 7

  8. Outline Background Discrete bivariate distribution Continuous bivariate distribution Discrete bivariate distribution- Drawing 2 marbles without replacement Draw 2 ( Y ) Draw 1 ( X ) 6 6 1 20 20 2 4 3 5 6 2 20 20 � 2 � � 3 � = 6 P ( X = and Y = ) = P ( , ) = 4 5 20 � 3 � � 2 � = 6 P ( X = and Y = ) = P ( , ) = 4 5 20 ) = 6 20 + 6 20 + 6 20 + 2 P ( , ) + P ( , ) + P ( , ) + P ( , 20 = 1 STAT 151 Class 2 Slide 8

  9. Outline Background Discrete bivariate distribution Continuous bivariate distribution Discrete bivariate distributions A discrete bivariate distribution is used to model the joint behavior of two variables, X and Y , both of which are discrete. X and Y are discrete random variables if there is a countable number of possible values for X : a 1 , a 2 , ..., a k and for Y : b 1 , b 2 , ..., b l . ( X , Y ) is the unknown outcome if we randomly draw an observation from the population. P ( X = a i , Y = b j ) is the joint probability distribution function of observing X = a i , Y = b j . A valid joint probability distribution function must satisfy the following rules: P ( X = a i , Y = b j ) must be between 0 and 1 We are certain that one of the values will appear, therefore: P [( X = a 1 , Y = b 1 ) or ( X = a 2 , Y = b 1 ) or ... or ( X = a k , Y = b l )] = P ( X = a 1 , Y = b 1 ) + P ( X = a 2 , Y = b 1 ) + ... + P ( X = a k , Y = b l ) = 1 STAT 151 Class 2 Slide 9

  10. Outline Background Discrete bivariate distribution Continuous bivariate distribution Discrete joint distribution: Example 1 P ( X = a , Y = b ) = a + b 48 , if a , b = 0 , 1 , 2 , 3 Y X 0 1 2 3 P ( X = a ) P ( X , Y ) 0 1 2 3 6 0 48 48 48 48 48 Y 1 2 3 4 10 1 48 48 48 48 48 2 3 4 5 14 2 0 3 48 48 48 48 48 1 2 X 2 1 3 4 5 6 18 48 + 4 3 48 + 5 48 + 6 3 48 ⇐ 48 48 48 48 48 3 0 6 10 14 18 P ( Y = b ) 1 48 48 48 48 P ( X = a , Y = b ) are the joint probabilities P ( X = a ) , P ( Y = b ) are called marginal probabilities . P ( X = a ) gives us information about X ignoring Y and P ( Y = b ) gives us information about Y ignoring X We can always find marginal probabilities from joint probabilities (as in Example 1) but not the other way around unless X and Y are independent (see next slide) STAT 151 Class 2 Slide 10

  11. Outline Background Discrete bivariate distribution Continuous bivariate distribution Discrete joint distribution- Independence X and Y are independent if, for all X = a , Y = b : P ( X = a | Y = b ) = P ( X = a ) ⇔ P ( Y = b | X = a ) = P ( Y = b ) ⇔ P ( X = a , Y = b ) = P ( X = a ) P ( Y = b ) P ( Y = b | X = a ) and P ( X = a | Y = b ) are conditional probabilities If X and Y are independent, then we can easily (a) calculate P ( X = a , Y = b ) by P ( X = a ) P ( Y = b ) (b) write P ( X = a | Y = b ) as P ( X = a ) (c) write P ( Y = b | X = a ) as P ( Y = b ) STAT 151 Class 2 Slide 11

  12. Outline Background Discrete bivariate distribution Continuous bivariate distribution Discrete joint distribution- Example 1 (cont’d) P ( X , Y ) Y 0 3 1 2 X 2 1 3 0 Try, P ( Y = 1 | X = 3) = P ( Y = 1 , X = 3) = 4 / 48 18 / 48 = 4 18 � = P ( Y = 1) = 10 48 . P ( X = 3) Alternatively, try 48 � = P ( X = 3) P ( Y = 1) = 18 4 48 × 10 P ( X = 3 , Y = 1) = 48 Either way is sufficient to show X and Y are not independent. Furthermore, we can try any combination of X = a , Y = b to disprove independence. STAT 151 Class 2 Slide 12

  13. Outline Background Discrete bivariate distribution Continuous bivariate distribution Discrete joint distribution: Example 2 P ( X = a , Y = b ) = ab 18 , if a = 1 , 2 , 3; b = 1 , 2 Y X 1 2 P ( X = a ) P ( X , Y ) 1 2 3 1 Y 18 18 18 2 4 6 2 18 18 18 3 6 9 18 + 6 3 3 18 ⇐ 1 X 18 18 18 2 2 6 12 3 1 P ( Y = b ) 1 18 18 P ( X = 1 , Y = 1) = 1 P ( X = 1) P ( Y = 1) = 3 18 × 6 18 = 18 18 2 = 1 = 18 18 . . . P ( X = 3 , Y = 2) = 6 P ( X = 3) P ( Y = 2) = 9 18 × 12 18 = 108 18 2 = 6 = 18 18 To show independence, we must show P ( X = a , Y = b ) = P ( X = a ) P ( Y = b ) for all combinations of X = a , Y = b STAT 151 Class 2 Slide 13

  14. Outline Background Discrete bivariate distribution Continuous bivariate distribution Probability under a univariate probability density function (PDF) PDF P ( X ≤ 1) can be found by integration : � 1 P ( X ≤ 1) = f ( x ) dx 0 P ( X ≤ 1) � 1 f ( x ) = 1 . 5 e − 1 . 5 x 1 . 5 e − 1 . 5 x dx = 0 � − e − 1 . 5 x � 1 = 0 1 − e − 1 . 5 = 1 X ≈ 0 . 776 PDF It turns out, for any x > 0 , 1 − e − 1 . 5 x P ( X ≤ x ) = f ( x ) we often write P ( X ≤ x ) as F ( x ) and call F ( x ) dx the cumulative distribution function (CDF) F (1) is a probability but f (1) (a point on f ( x )) is not a probability 1 X STAT 151 Class 2 Slide 14

  15. Outline Background Discrete bivariate distribution Continuous bivariate distribution The univariate cumulative distribution function (CDF) PDF F ( x ) can be used to find P ( X ≤ x ) ≡ F ( x ) P ( X ≤ x ) for any x , e.g., F (1) = P ( X ≤ 1) A plot of F ( x ) is a convenient way for finding probabilities. Probability is found by drawing a x X line ( ) from the CDF horizontal axis until it meets the 1 CDF and then drawing a horizontal line until it meets the vertical axis F ( x ) All CDF plots have an asymptote at 1 ( ): F ( x ) ≤ 1 because it is a probability x X STAT 151 Class 2 Slide 15

Recommend


More recommend