Joint probability density function For any Borel set S ⊆ R n � � � � X ( � x ) d � P X ∈ S = f � x S In particular, � X ( � x ) d � x = 1 R n f �
Example: Triangle lake 1 . 5 E F 1 C D 0 . 5 B 0 A − 0 . 5 − 0 . 5 0 0 . 5 1 1 . 5
Example: Triangle lake 0 if x 1 < 0 or x 2 < 0, 2 x 1 x 2 , if x 1 ≥ 0 , x 2 ≥ 0 , x 1 + x 2 ≤ 1 , 2 x 1 + 2 x 2 − x 2 2 − x 2 1 − 1 , if x 1 ≤ 1 , x 2 ≤ 1 , x 1 + x 2 ≥ 1 , X ( � x ) = F � 2 x 2 − x 2 2 , if x 1 ≥ 1 , 0 ≤ x 2 ≤ 1 , 2 x 1 − x 2 1 , if 0 ≤ x 1 ≤ 1 , x 2 ≥ 1 , 1 , if x 1 ≥ 1 , x 2 ≥ 1
Marginalization We can compute the marginal cdf from the joint cdf F X ( x ) = P ( X ≤ x ) = lim y →∞ F X , Y ( x , y ) or from the joint pdf � ∞ � x F X ( x ) = P ( X ≤ x ) = f X , Y ( u , y ) d u d y u = −∞ y = −∞ Differentiating we obtain � ∞ f X ( x ) = f X , Y ( x , y ) d y y = −∞
Marginalization Marginal pdf of a subvector � X I , I := { i 1 , i 2 , . . . , i m } , � � � X I ( � x I ) = · · · X ( � x ) d x j 1 d x j 2 · · · d x j n − m f � f � x j 1 x j 2 x jn − m where { j 1 , j 2 , . . . , j n − m } := { 1 , 2 , . . . , n } / I
Example: Triangle lake (continued) Marginal cdf of x 1 0 if x 1 < 0, 2 x 1 − x 2 F X 1 ( x 1 ) = lim x 2 →∞ F � X ( � x ) = if 0 ≤ x 1 ≤ 1 , 1 1 if x 1 ≥ 1 Marginal pdf of x 1 � f X 1 ( x 1 ) = d F X 1 ( x 1 ) 2 ( 1 − x 1 ) if 0 ≤ x 1 ≤ 1 = d x 1 0 otherwise
Joint conditional cdf and pdf given an event If we know that ( X , Y ) ∈ S for any Borel set in R 2 F X , Y | ( X , Y ) ∈S ( x , y ) := P ( X ≤ x , Y ≤ y | ( X , Y ) ∈ S ) = P ( X ≤ x , Y ≤ y , ( X , Y ) ∈ S ) P (( X , Y ) ∈ S ) � u ≤ x , v ≤ y , ( u , v ) ∈S f X , Y ( u , v ) d u d v = � ( u , v ) ∈S f X , Y ( u , v ) d u d v f X , Y | ( X , Y ) ∈S ( x , y ) := ∂ 2 F X , Y | ( X , Y ) ∈S ( x , y ) ∂ x ∂ y
Conditional cdf and pdf Distribution of Y given X = x ? The event has zero probability!
Conditional cdf and pdf Distribution of Y given X = x ? The event has zero probability! Define f Y | X ( y | x ) := f X , Y ( x , y ) , if f X ( x ) > 0 f X ( x ) � y F Y | X ( y | x ) := f Y | X ( u | x ) d u u = −∞ Chain rule for continuous random variables f X , Y ( x , y ) = f X ( x ) f Y | X ( y | x )
Conditional cdf and pdf P ( x ≤ X ≤ x + ∆ x ) f X ( x ) = lim ∆ x ∆ x → 0 1 ∂ P ( x ≤ X ≤ x + ∆ x , Y ≤ y ) f X , Y ( x , y ) = lim ∆ x ∂ y ∆ x → 0
Conditional cdf and pdf F Y | X ( y | x ) � y 1 ∂ P ( x ≤ X ≤ x + ∆ x , Y ≤ u ) = lim d u P ( x ≤ X ≤ x + ∆ x ) ∂ y ∆ x → 0 , ∆ y → 0 u = −∞ � y 1 ∂ P ( x ≤ X ≤ x + ∆ x , Y ≤ u ) = lim d u P ( x ≤ X ≤ x + ∆ x ) ∂ y ∆ x → 0 u = −∞ P ( x ≤ X ≤ x + ∆ x , Y ≤ y ) = lim P ( x ≤ X ≤ x + ∆ x ) ∆ x → 0 = lim ∆ x → 0 P ( Y ≤ y | x ≤ X ≤ x + ∆ x )
Conditional pdf of a random subvector Conditional pdf of a random subvector � X I , I ⊆ { 1 , 2 , . . . , n } , given another subvector � X { 1 ,..., n } / I is X ( � x ) f � � � � x I | � := f � x { 1 ,..., n } / I X I | � X { 1 ,..., n } / I � � � f � x { 1 ,..., n } / I X { 1 ,..., n } / I Chain rule for continuous random vectors f � X ( � x ) = f X 1 ( x 1 ) f X 2 | X 1 ( x 2 | x 1 ) . . . f X n | X 1 ,..., X n − 1 ( x n | x 1 , . . . , x n − 1 ) n � � � = x i | � f X i | � x { 1 ,..., i − 1 } X { 1 ,..., i − 1 } i = 1 Any order works!
Example: Triangle lake (continued) Conditioned on { x 1 = 0 . 75 } what is the pdf and cdf of x 2 ?
Example: Triangle lake (continued) f X 2 | X 1 ( x 2 | x 1 )
Example: Triangle lake (continued) X ( � x ) f X 2 | X 1 ( x 2 | x 1 ) = f � f X 1 ( x 1 )
Example: Triangle lake (continued) X ( � x ) f X 2 | X 1 ( x 2 | x 1 ) = f � f X 1 ( x 1 ) 1 = , 0 ≤ x 2 ≤ 1 − x 1 1 − x 1
Example: Triangle lake (continued) X ( � x ) f X 2 | X 1 ( x 2 | x 1 ) = f � f X 1 ( x 1 ) 1 = , 0 ≤ x 2 ≤ 1 − x 1 1 − x 1 � x 2 F X 2 | X 1 ( x 2 | x 1 ) = f X 2 | X 1 ( u | x 1 ) d u −∞ x 2 = 1 − x 1
Example: Desert ◮ Car traveling through the desert ◮ Time until the car breaks down: T ◮ State of the motor: M ◮ State of the road: R ◮ Model : ◮ M uniform between 0 (no problem) and 1 (very bad) ◮ R uniform between 0 (no problem) and 1 (very bad) ◮ M and R independent ◮ T exponential with parameter M + R
Example: Desert Joint pdf?
Example: Desert Joint pdf? f M , R , T ( m , r , t )
Example: Desert Joint pdf? f M , R , T ( m , r , t ) = f M ( m ) f R | M ( r | m ) f T | M , R ( t | m , r )
Example: Desert Joint pdf? f M , R , T ( m , r , t ) = f M ( m ) f R | M ( r | m ) f T | M , R ( t | m , r ) = f M ( m ) f R ( r ) f T | M , R ( t | m , r ) by independence
Example: Desert Joint pdf? f M , R , T ( m , r , t ) = f M ( m ) f R | M ( r | m ) f T | M , R ( t | m , r ) = f M ( m ) f R ( r ) f T | M , R ( t | m , r ) by independence � ( m + r ) e − ( m + r ) t for t ≥ 0 , 0 ≤ m ≤ 1 , 0 ≤ r ≤ 1 , = 0 otherwise
Example: Desert ◮ Car breaks down after 15 min (0.25 h), T = 0 . 25 ◮ Road seems OK, R = 0 . 2 ◮ What was the state of the motor M ?
Example: Desert ◮ Car breaks down after 15 min (0.25 h), T = 0 . 25 ◮ Road seems OK, R = 0 . 2 ◮ What was the state of the motor M ? f M | R , T ( m | r , t ) = f M , R , T ( m , r , t ) f R , T ( r , t )
Example: Desert f R , T ( r , t ) =
Example: Desert � 1 f R , T ( r , t ) = f M , R , T ( m , r , t ) d m m = 0
Example: Desert � 1 f R , T ( r , t ) = f M , R , T ( m , r , t ) d m m = 0 �� 1 � 1 � me − tm d m + r e − tm d m = e − tr m = 0 m = 0
Example: Desert � 1 f R , T ( r , t ) = f M , R , T ( m , r , t ) d m m = 0 �� 1 � 1 � me − tm d m + r e − tm d m = e − tr m = 0 m = 0 � 1 − ( 1 + t ) e − t + r ( 1 − e − t ) � = e − tr t 2 t
Example: Desert � 1 f R , T ( r , t ) = f M , R , T ( m , r , t ) d m m = 0 �� 1 � 1 � me − tm d m + r e − tm d m = e − tr m = 0 m = 0 � 1 − ( 1 + t ) e − t + r ( 1 − e − t ) � = e − tr t 2 t = e − tr 1 + tr − e − t ( 1 + t + tr ) � � for t ≥ 0 , 0 ≤ r ≤ 1 t 2
Example: Desert f M | R , T ( m | r , t ) = f M , R , T ( m , r , t ) f R , T ( r , t ) ( m + r ) e − ( m + r ) t = e − tr t 2 ( 1 + tr − e − t ( 1 + t + tr )) ( m + r ) t 2 e − tm = 1 + tr − e − t ( 1 + t + tr )
Example: Desert f M | R , T ( m | r , t ) = f M , R , T ( m , r , t ) f R , T ( r , t ) ( m + r ) e − ( m + r ) t = e − tr t 2 ( 1 + tr − e − t ( 1 + t + tr )) ( m + r ) t 2 e − tm = 1 + tr − e − t ( 1 + t + tr ) ( m + 0 . 2 ) 0 . 25 2 e − 0 . 25 m f M | R , T ( m | 0 . 2 , 0 . 25 ) = 1 + 0 . 25 · 0 . 2 − e − 0 . 25 ( 1 + 0 . 25 + 0 . 25 · 0 . 2 ) = 1 . 66 ( m + 0 . 2 ) e − 0 . 25 m for 0 ≤ m ≤ 1
State of the car 1 . 5 f M | R , T ( m | 0 . 2 , 0 . 25 ) 1 0 . 5 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 m
Independent continuous random variables Two random variables X and Y are independent if and only if for all ( x , y ) ∈ R 2 F X , Y ( x , y ) = F X ( x ) F Y ( y ) , Equivalently, F X | Y ( x | y ) = F X ( x ) for all ( x , y ) ∈ R 2 F Y | X ( y | x ) = F Y ( y )
Independent continuous random variables Two random variables X and Y with joint pdf f X , Y are independent if and only if for all ( x , y ) ∈ R 2 f X , Y ( x , y ) = f X ( x ) f Y ( y ) , Equivalently, f X | Y ( x | y ) = f X ( x ) for all ( x , y ) ∈ R 2 f Y | X ( y | x ) = f Y ( y )
Mutually independent continuous random variables The components of a random vector � X are mutually independent if and only if n � X ( � F � x ) = F X i ( x i ) i = 1 Equivalently, n � f � X ( � x ) = f X i ( x i ) i = 1
Mutually conditionally independent random variables The components of a subvector � X I , I ⊆ { 1 , 2 , . . . , n } are mutually conditionally independent given another subvector � X J , J ⊆ { 1 , 2 , . . . , n } , if and only if � F � X J ( � x I | � x J ) = F X i | � X J ( x i | � x J ) X I | � i ∈I Equivalently, � f � X J ( � x I | � x J ) = f X i | � X J ( x i | � x J ) X I | � i ∈I
Functions of random variables U = g ( X , Y ) and V = h ( X , Y ) F U , V ( u , v ) = P ( U ≤ u , V ≤ v ) = P ( g ( X , Y ) ≤ u , h ( X , Y ) ≤ v ) � = f X , Y ( x , y ) d x d y { ( x , y ) | g ( x , y ) ≤ u , h ( x , y ) ≤ v }
Sum of independent random variables X and Y are independent random variables, what is the pdf of Z = X + Y ?
Sum of independent random variables X and Y are independent random variables, what is the pdf of Z = X + Y ? F Z ( z )
Sum of independent random variables X and Y are independent random variables, what is the pdf of Z = X + Y ? F Z ( z ) = P ( X + Y ≤ z )
Sum of independent random variables X and Y are independent random variables, what is the pdf of Z = X + Y ? F Z ( z ) = P ( X + Y ≤ z ) � ∞ � z − y = f X ( x ) f Y ( y ) d x d y y = −∞ x = −∞ � ∞ = F X ( z − y ) f Y ( y ) d y y = −∞
Sum of independent random variables X and Y are independent random variables, what is the pdf of Z = X + Y ? F Z ( z ) = P ( X + Y ≤ z ) � ∞ � z − y = f X ( x ) f Y ( y ) d x d y y = −∞ x = −∞ � ∞ = F X ( z − y ) f Y ( y ) d y y = −∞ � u f Z ( z ) = d d z lim F X ( z − y ) f Y ( y ) d y u →∞ y = − u
Sum of independent random variables X and Y are independent random variables, what is the pdf of Z = X + Y ? F Z ( z ) = P ( X + Y ≤ z ) � ∞ � z − y = f X ( x ) f Y ( y ) d x d y y = −∞ x = −∞ � ∞ = F X ( z − y ) f Y ( y ) d y y = −∞ � u f Z ( z ) = d d z lim F X ( z − y ) f Y ( y ) d y u →∞ y = − u � ∞ = f X ( z − y ) f Y ( y ) d y y = −∞ Convolution of individual pdfs
Example: Coffee beans ◮ Company buys coffee beans from two local producers ◮ Beans from Colombia: C tons/year ◮ Beans from Vietnam: V tons/year ◮ Model : ◮ C uniform between 0 and 1 ◮ V uniform between 0 and 2 ◮ C and V independent ◮ What is the distribution of the total amount of beans B ?
Example: Coffee beans f B ( b ) =
Example: Coffee beans � ∞ f B ( b ) = f C ( b − u ) f V ( u ) d u u = −∞
Example: Coffee beans � ∞ f B ( b ) = f C ( b − u ) f V ( u ) d u u = −∞ � 2 = 1 f C ( b − u ) d u 2 u = 0
Example: Coffee beans � ∞ f B ( b ) = f C ( b − u ) f V ( u ) d u u = −∞ � 2 = 1 f C ( b − u ) d u 2 u = 0 � b u = 0 d u = b 1 if b ≤ 1 2 2 � b 1 u = b − 1 d u = 1 = if 1 ≤ b ≤ 2 2 2 � 2 1 u = b − 1 d u = 3 − b if 2 ≤ b ≤ 3 2 2
Example: Coffee beans f C f B f V 1 1 0 . 5 0 . 5 0 0 0 0 . 5 1 1 . 5 2 2 . 5 3 0 0 . 5 1 1 . 5 2 2 . 5 3
Gaussian random vector A Gaussian random vector � X has a joint pdf of the form 1 � − 1 � µ ) T Σ − 1 ( � X ( � x ) = exp 2 ( � x − � x − � µ ) f � ( 2 π ) n | Σ | � µ ∈ R n and the covariance matrix Σ is a symmetric where the mean � positive definite matrix
Linear transformation of Gaussian random vectors � X is a Gaussian r.v. of dimension n with mean � µ and covariance matrix Σ For any matrix A ∈ R m × n and � b ∈ R m Y = A � � X + � b µ + � b and covariance matrix A Σ A T is Gaussian with mean A �
Marginal distributions are Gaussian Gaussian random vector, � � � � µ � � X � Z := , with mean � µ := X � µ � Y Y and covariance matrix � Σ � Σ � � X � X Y Σ � Z = Σ T Σ � X � � Y Y � X is a Gaussian random vector with mean µ � X and covariance matrix Σ � X
Marginal distributions are Gaussian f X ( x ) f Y ( y ) f X , Y ( X , Y ) 0 . 2 0 . 1 0 2 − 3 − 2 − 1 0 0 1 y x − 2 2 3
Discrete random variables Continuous random variables Joint distributions of discrete and continuous random variables
Discrete and continuous random variables How do we model the relation between a continuous random variable C and a discrete random variable D ? Conditional cdf and pdf of C given D F C | D ( c | d ) := P ( C ≤ c | D = d ) f C | D ( c | d ) := d F C | D ( c | d ) d c By the Law of Total Probability � F C ( c ) = p D ( d ) F C | D ( c | d ) d ∈ R D � f C ( c ) = p D ( d ) f C | D ( c | d ) d ∈ R D
Mixture models Data are drawn from continuous distribution whose parameters are chosen from a discrete set Important example: Gaussian mixture models
Grizzlies in Yellowstone Model for the weight of grizzly bears in Yellowstone: Males: Gaussian with µ := 240 kg and σ := 40 kg Females: Gaussian with µ := 140 kg and σ := 20 kg There are about the same number of females and males
Grizzlies in Yellowstone The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight)
Grizzlies in Yellowstone The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight) f W ( w )
Grizzlies in Yellowstone The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight) 1 � f W ( w ) = p S ( s ) f W | S ( w | s ) s = 0
Grizzlies in Yellowstone The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight) 1 � f W ( w ) = p S ( s ) f W | S ( w | s ) s = 0 e − ( w − 240 ) 2 + e − ( w − 140 ) 2 1 3200 800 √ = 40 20 2 2 π
Recommend
More recommend