EECS 70: Lecture 27. Recap Joint distribution. Joint and - PowerPoint PPT Presentation

EECS 70: Lecture 27. Recap Joint distribution. Joint and Conditional Distributions. Variance Two random variables, X and Y , in probability space: (Ω , P ) . What is ∑ x P [ X = x ] ? 1. What is ∑ y P [ Y = y ] ? 1. ◮ Variance: var [ X ] := E [( X − E [ X ]) 2 ] = E [ X 2 ] − E [ X ] 2 1. Recap of variance of a random variable Let’s think about: P [ X = x , Y = y ] . ◮ Fact: var [ aX + b ] = a 2 var [ X ] 2. Joint distributions What is ∑ x , y P [ X = x , Y = y ] ? ◮ Thm.: If X , Y are indep., Var ( X + Y ) = Var ( X )+ Var ( Y ) . 3. Recap of indep. rand. variables: Variance of B ( n , p ) Are the events “ X = x , Y = y ” disjoint? ◮ U [ 1 ,..., n ] : Pr [ X = m ] = 1 4. Conditioning of Random Variables (revisit G ( p ) ) n , m = 1 ,..., n ; Yes! Y and X are functions on Ω . Do they cover the entire sample space? 2 ; var ( X ) = n 2 − 1 E [ X ] = n + 1 12 . ; Yes! X and Y are functions on Ω . ◮ G ( p ) : Pr [ X = n ] = ( 1 − p ) n − 1 p , n = 1 , 2 ,... ; So, ∑ x , y P [ X = x , Y = y ] = 1. p ; var [ X ] = 1 − p E [ X ] = 1 p 2 . Joint Distribution: P [ X = x , Y = y ] . Marginal Distributions: P [ X = x ] and P [ Y = y ] . � n p m ( 1 − p ) n − m , m = 0 ,..., n ; ◮ B ( n , p ) : Pr [ X = m ] = � m Important for inference. E [ X ] = np ; var ( X ) = = np ( 1 − p ) . Two random variables, same outcome space. Joint distribution: Example. Independent random variables. The joint distribution of X and Y is: Experiment: pick a random person. Definition: Independence X = number of episodes of Games of Thrones they have seen. The random variables X and Y are independent if and only if Y/X 0 1 2 3 5 40 All Y = number of episodes of Westworld they have seen. 0 0.15 0 0 0 0 0.1 0.05 =0.3 P [ Y = b | X = a ] = P [ Y = b ] , for all a and b . X 0 1 2 3 5 40 All 1 0 0.05 0.05 0 0 0 0 =0.1 P 0.3 0.05 0.05 0.05 0.05 0.1 0.4 5 0 0 0 0.05 0.05 0 0 =0.1 10 0.15 0 0 0 0 0 0.35 =0.5 Fact: Is this a distribution? =0.3 =0.05 =0.05 =0.05 =0.05 =0.1 =0.4 X , Y are independent if and only if Yes! All the probabilities are non-negative and add up to 1. Is this a valid distribution? Yes! Notice that P [ X = a ] and P [ Y = b ] are (marginal) distributions! P [ X = a , Y = b ] = P [ X = a ] P [ Y = b ] , for all a and b . Y 0 1 5 10 But now we have more information! P 0.3 0.1 0.1 0.5 Don’t need a huge table of probabilities like the previous slide. For example, if I tell you someone watched 5 episodes of Westworld, they definitely didn’t watch all the episodes of GoT.

Independence: examples. Mean of product of independent RVs. Variance of sum of two independent random variables Theorem: Theorem If X and Y are independent, then Let X , Y be independent RVs. Then Var ( X + Y ) = Var ( X )+ Var ( Y ) . E [ XY ] = E [ X ] E [ Y ] . Example 1 Proof: Since shifting the random variables does not change their variance, Roll two dices. X , Y = number of pips on the two dice. X , Y are let us subtract their means. independent. Proof: Recall that E [ g ( X , Y )] = ∑ x , y g ( x , y ) P [ X = x , Y = y ] . Hence, That is, we assume that E ( X ) = 0 and E ( Y ) = 0. Indeed: P [ X = a , Y = b ] = 1 / 36 , P [ X = a ] = P [ Y = b ] = 1 / 6 . Then, by independence, = ∑ xyP [ X = x , Y = y ] = ∑ E [ XY ] xyP [ X = x ] P [ Y = y ] , by ind. Example 2 x , y x , y Roll two dices. X = total number of pips, Y = number of pips on E ( XY ) = E ( X ) E ( Y ) = 0 . � � = ∑ ∑ xyP [ X = x ] P [ Y = y ] die 1 minus number on die 2. X and Y are not independent. x y Indeed: P [ X = 12 , Y = 1 ] = 0 � = P [ X = 12 ] P [ Y = 1 ] > 0. Hence, = ∑ � � �� ∑ xP [ X = x ] yP [ Y = y ] x y E (( X + Y ) 2 ) = E ( X 2 + 2 XY + Y 2 ) var ( X + Y ) = = ∑ xP [ X = x ] E [ Y ] = E [ X ] E [ Y ] . E ( X 2 )+ 2 E ( XY )+ E ( Y 2 ) = E ( X 2 )+ E ( Y 2 ) = x = var ( X )+ var ( Y ) . Examples. Variance: binomial. Variance of Binomial Distribution. Flip coin with heads probability p . (1) Assume that X , Y , Z are (pairwise) independent, with X - how many heads? E [ X ] = E [ Y ] = E [ Z ] = 0 and E [ X 2 ] = E [ Y 2 ] = E [ Z 2 ] = 1. � 1 Then if i th flip is heads X i = 0 otherwise n � n � E [( X + 2 Y + 3 Z ) 2 ] E [ X 2 ] ∑ i 2 p i ( 1 − p ) n − i . = i = E [ X 2 + 4 Y 2 + 9 Z 2 + 4 XY + 12 YZ + 6 XZ ] i ) = 1 2 × p + 0 2 × ( 1 − p ) = p . i = 0 E ( X 2 = Really???!!##... Var ( X i ) = p − ( E ( X )) 2 = p − p 2 = p ( 1 − p ) . = 1 + 4 + 9 + 4 × 0 + 12 × 0 + 6 × 0 p = 0 = ⇒ Var ( X i ) = 0 = 14 . Too hard! p = 1 = ⇒ Var ( X i ) = 0 Ok.. fine. (2) Let X , Y be independent and U { 1 , 2 ,..., n } . Then X = X 1 + X 2 + ... X n . Let’s do something else. E [ X 2 + Y 2 − 2 XY ] = 2 E [ X 2 ] − 2 E [ X ] 2 E [( X − Y ) 2 ] Maybe not much easier...but there is a payoff. = X i and X j are independent: Pr [ X i = 1 | X j = 1 ] = Pr [ X i = 1 ] . 1 + 3 n + 2 n 2 − ( n + 1 ) 2 = . Var ( X ) = Var ( X 1 + ··· X n ) = np ( 1 − p ) . 3 2

Conditioning of RVs Conditional distributions Revisiting mean of geometric RV X ∼ G ( p ) X | Y is a RV: X is memoryless Recall conditioning on an event A p XY ( x , y ) p X | Y ( x | y ) = ∑ ∑ = 1 P [ X = n + m | X > n ] = P [ X = m ] . p Y ( y ) P [ X = k | A ] = P [( X = k ) ∩ A ] x x P [ A ] Multiplication or Product Rule: Thus E [ X | X > 1 ] = 1 + E [ X ] . Conditioning on another RV p XY ( x , y ) = p X ( x ) p Y | X ( y | x ) = p Y ( y ) p X | Y ( x | y ) Why? (Recall E [ g ( X )] = ∑ l g ( l ) P [ X = l ] ) P [ X = k | Y = m ] = P [ X = k , Y = m ] ∞ Total Probability Theorem: If A 1 , A 2 , ... , A N partition Ω , and = p X | Y ( x | y ) ∑ P [ Y = m ] E [ X | X > 1 ] = kP [ X = k | X > 1 ] P [ A i ] > 0 ∀ i , then k = 1 N ∞ p X | Y ( x | y ) is called the conditional distribution or ∑ p X ( x ) = P [ A i ] P [ X = x | A i ] ∑ = kP [ X = k − 1 ] ( memoryless ) conditional probability mass function (pmf) of X given Y i = 1 k = 2 ∞ Nothing special about just two random variables, naturally ∑ = ( l + 1 ) P [ X = l ] ( l = k − 1 ) p X | Y ( x | y ) = p XY ( x , y ) extends to more. l = 1 p Y ( y ) = E [ X + 1 ] = 1 + E [ X ] Let’s visit the mean and variance of the geometric distribution using conditional expectation. Revisiting mean of geometric RV X ∼ G ( p ) Summary of Conditional distribution Summary. X is memoryless For Random Variables X and Y , P [ X = x | Y = k ] is the conditional distribution of X given Y = k Joint and Conditional Distributions. P [ X = k + m | X > k ] = P [ X = m ] . Thus E [ X | X > 1 ] = 1 + E [ X ] . Joint distributions: P [ X = x | Y = k ] = P [ X = x , Y = k ] ◮ Normalization: ∑ x , y P [ X = x , Y = y ] = 1. We have E [ X ] = P [ X = 1 ] E [ X | X = 1 ]+ P [ X > 1 ] E [ X | X > 1 ] . P [ Y = k ] ◮ Marginalization: ∑ y P [ X = x , Y = y ] = P [ X = x ] . ⇒ E [ X ]= p . 1 +( 1 − p )( E [ X ]+ 1 ) Numerator: Joint distribution of ( X , Y ) . ◮ Independence: P [ X = x , Y = y ] = P [ X = x ] P [ Y = y ] for all ⇒ E [ X ] = p + 1 − p + E [ X ] − pE [ X ] Denominator: Marginal distribution of Y . x , y . E [ XY ] = E [ X ] E [ Y ] . ⇒ pE [ X ] = 1 Conditional distributions: (Aside: surprising result using conditioning of RVs): ⇒ E [ X ] = 1 ◮ Sum of independent Poissons is Poisson. Theorem : If X ∼ Poisson( λ 1 ) , Y ∼ Poisson( λ 2 ) are independent, p ◮ Conditional expectation: useful for mean & variance then X + Y ∼ Poisson( λ 1 + λ 2 ) . calculations Derive the variance for X ∼ G ( p ) by finding E [ X 2 ] using “Sum of independent Poissons is Poisson.” conditioning.

EECS 70: Lecture 27. Recap Joint distribution. Joint and - PowerPoint PPT Presentation

EECS 70: Lecture 27. Recap Joint distribution. Joint and Conditional Distributions. Variance Two random variables, X and Y , in probability space: ( , P ) . What is x P [ X = x ] ? 1. What is y P [ Y = y ] ? 1. Variance: var [ X ]

EECS 70: Lecture 27. Joint and Conditional Distributions. EECS 70: Lecture 27. Joint and

Optimization Models EECS 127 / EECS 227AT Laurent El Ghaoui EECS department UC Berkeley Spring

EECS 3401 AI and Logic Prog. Lecture 1 Adapted from slides of Prof. Yves Lesperance York

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

EECS PhD Qualifying Exam 1 Department of Electrical Engineering and Computer Science (EECS)

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2016 - Lecture 14: Grand

Lecture 1: Introduction to the Course EECS 545: Machine Learning Benjamin Kuipers EECS 545:

EECS 228a Lecture 1 Overview: Networks Jean Walrand www.eecs.berkeley.edu/~wlr Fall 2002

5.2 Joint Continuous Distributions Anna Karlin Most slides by Alex Tsun recap Joint PDFs

Lecture 3: Cameras II Justin Johnson EECS 442 WI 2020: Lecture 3 - 1 January 16, 2020

Lecture 6: Math Review II Justin Johnson EECS 442 WI 2020: Lecture 6 - 1 January 28, 2020

Lecture 7: More Math + Image Filtering Justin Johnson EECS 442 WI 2020: Lecture 7 - 1 January

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018

Independent vs. Joint Estimation in Multi Agent Iterative Learning Control Angela Schoellig,

LTG Dennis L. Via Director for C4 Systems, Joint Staff (J6) UNCLASSIFIED The Joint Staff Today

Texture Characterization via Joint Statistics of Wavelet Coef cient Magnitudes Eero Simoncelli

Condition Number for Joint Optimization of Cycle-Consistent Networks Leonidas Guibas 1 , Qixing

Discrete Translates in Function Spaces Alexander Olevskii The talk is based on joint work with

Formal Modeling in Cognitive Science Lecture 20: Joint, Marginal, and Conditional Distributions

Joint Source-Channel LZ'77 Coding Stefano Lonardi University of California, Riverside Wojciech

Joint ITU-T and OASIS Workshop and Demonstration of Advances in ICT Standards for Public Warning

EECS 70: Lecture 27. Recap Joint distribution. Joint and - PowerPoint PPT Presentation

EECS 70: Lecture 27. Recap Joint distribution. Joint and Conditional Distributions. Variance Two random variables, X and Y , in probability space: ( , P ) . What is x P [ X = x ] ? 1. What is y P [ Y = y ] ? 1. Variance: var [ X ]

EECS 70: Lecture 27. Joint and Conditional Distributions. EECS 70: Lecture 27. Joint and

Optimization Models EECS 127 / EECS 227AT Laurent El Ghaoui EECS department UC Berkeley Spring

EECS 3401 AI and Logic Prog. Lecture 1 Adapted from slides of Prof. Yves Lesperance York

1. Normal distribution 2. Geometric distribution 3. Binomial distribution 4.

EECS PhD Qualifying Exam 1 Department of Electrical Engineering and Computer Science (EECS)

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2016 - Lecture 14: Grand

Lecture 1: Introduction to the Course EECS 545: Machine Learning Benjamin Kuipers EECS 545:

EECS 228a Lecture 1 Overview: Networks Jean Walrand www.eecs.berkeley.edu/~wlr Fall 2002

5.2 Joint Continuous Distributions Anna Karlin Most slides by Alex Tsun recap Joint PDFs

Lecture 3: Cameras II Justin Johnson EECS 442 WI 2020: Lecture 3 - 1 January 16, 2020

Lecture 6: Math Review II Justin Johnson EECS 442 WI 2020: Lecture 6 - 1 January 28, 2020

Lecture 7: More Math + Image Filtering Justin Johnson EECS 442 WI 2020: Lecture 7 - 1 January

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker &amp; Debabrata Panja - April-July 2018

Independent vs. Joint Estimation in Multi Agent Iterative Learning Control Angela Schoellig,

LTG Dennis L. Via Director for C4 Systems, Joint Staff (J6) UNCLASSIFIED The Joint Staff Today

Texture Characterization via Joint Statistics of Wavelet Coef cient Magnitudes Eero Simoncelli

Condition Number for Joint Optimization of Cycle-Consistent Networks Leonidas Guibas 1 , Qixing

Discrete Translates in Function Spaces Alexander Olevskii The talk is based on joint work with

Formal Modeling in Cognitive Science Lecture 20: Joint, Marginal, and Conditional Distributions

Joint Source-Channel LZ'77 Coding Stefano Lonardi University of California, Riverside Wojciech

Joint ITU-T and OASIS Workshop and Demonstration of Advances in ICT Standards for Public Warning

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018