expectation
play

Expectation DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance, covariance Expectation


  1. Chebyshev’s inequality Define Y := ( X − E ( X )) 2 By Markov’s inequality � Y ≥ a 2 � P ( | X − E ( X ) | ≥ a ) = P

  2. Chebyshev’s inequality Define Y := ( X − E ( X )) 2 By Markov’s inequality � Y ≥ a 2 � P ( | X − E ( X ) | ≥ a ) = P ≤ E ( Y ) a 2

  3. Chebyshev’s inequality Define Y := ( X − E ( X )) 2 By Markov’s inequality � Y ≥ a 2 � P ( | X − E ( X ) | ≥ a ) = P ≤ E ( Y ) a 2 = Var ( X ) a 2

  4. Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30?

  5. Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30? P ( A ≥ 30 ) ≤ P ( | A − 20 | ≥ 10 )

  6. Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30? P ( A ≥ 30 ) ≤ P ( | A − 20 | ≥ 10 ) ≤ Var ( A ) 100 9 = 100 At least 91 %

  7. Expectation operator Mean and variance Covariance Conditional expectation

  8. Covariance The covariance of X and Y is Cov ( X , Y ) := E (( X − E ( X )) ( Y − E ( Y ))) = E ( XY − Y E ( X ) − X E ( Y ) + E ( X ) E ( Y )) = E ( XY ) − E ( X ) E ( Y ) If Cov ( X , Y ) = 0, X and Y are uncorrelated

  9. Covariance Cov ( X , Y ) 0.5 0.9 0.99 Cov ( X , Y ) 0 -0.9 -0.99

  10. Variance of the sum � ( X + Y − E ( X + Y )) 2 � Var ( X + Y ) = E � ( X − E ( X )) 2 � � ( Y − E ( Y )) 2 � = E + E + 2 E (( X − E ( X )) ( Y − E ( Y ))) = Var ( X ) + Var ( Y ) + 2 Cov ( X , Y )

  11. Variance of the sum � ( X + Y − E ( X + Y )) 2 � Var ( X + Y ) = E � ( X − E ( X )) 2 � � ( Y − E ( Y )) 2 � = E + E + 2 E (( X − E ( X )) ( Y − E ( Y ))) = Var ( X ) + Var ( Y ) + 2 Cov ( X , Y ) If X and Y are uncorrelated, then Var ( X + Y ) = Var ( X ) + Var ( Y )

  12. Independence implies uncorrelation Cov ( X , Y ) = E ( XY ) − E ( X ) E ( Y ) = E ( X ) E ( Y ) − E ( X ) E ( Y ) = 0

  13. Uncorrelation does not imply independence X , Y are independent Bernoulli with parameter 1 2 Let U = X + Y and V = X − Y Are U and V independent? Are they uncorrelated?

  14. Uncorrelation does not imply independence p U ( 0 ) p V ( 0 ) p U , V ( 0 , 0 )

  15. Uncorrelation does not imply independence p U ( 0 ) = P ( X = 0 , Y = 0 ) = 1 4 p V ( 0 ) p U , V ( 0 , 0 )

  16. Uncorrelation does not imply independence p U ( 0 ) = P ( X = 0 , Y = 0 ) = 1 4 p V ( 0 ) = P ( X = 1 , Y = 1 ) + P ( X = 0 , Y = 0 ) = 1 2 p U , V ( 0 , 0 )

  17. Uncorrelation does not imply independence p U ( 0 ) = P ( X = 0 , Y = 0 ) = 1 4 p V ( 0 ) = P ( X = 1 , Y = 1 ) + P ( X = 0 , Y = 0 ) = 1 2 p U , V ( 0 , 0 ) = P ( X = 0 , Y = 0 ) = 1 4

  18. Uncorrelation does not imply independence p U ( 0 ) = P ( X = 0 , Y = 0 ) = 1 4 p V ( 0 ) = P ( X = 1 , Y = 1 ) + P ( X = 0 , Y = 0 ) = 1 2 p U , V ( 0 , 0 ) = P ( X = 0 , Y = 0 ) = 1 4 � = p U ( 0 ) p V ( 0 ) = 1 8

  19. Uncorrelation does not imply independence Cov ( U , V ) = E ( UV ) − E ( U ) E ( V ) = E (( X + Y ) ( X − Y )) − E ( X + Y ) E ( X − Y ) − E 2 ( X ) + E 2 ( Y ) X 2 � Y 2 � � � = E − E

  20. Uncorrelation does not imply independence Cov ( U , V ) = E ( UV ) − E ( U ) E ( V ) = E (( X + Y ) ( X − Y )) − E ( X + Y ) E ( X − Y ) − E 2 ( X ) + E 2 ( Y ) X 2 � Y 2 � � � = E − E = 0

  21. Correlation coefficient Pearson correlation coefficient of X and Y ρ X , Y := Cov ( X , Y ) . σ X σ Y Covariance between X /σ X and Y /σ Y

  22. Correlation coefficient σ Y = 1, σ Y = 3, σ Y = 3, Cov ( X , Y ) = 0 . 9, Cov ( X , Y ) = 0 . 9, Cov ( X , Y ) = 2 . 7, ρ X , Y = 0 . 9 ρ X , Y = 0 . 3 ρ X , Y = 0 . 9

  23. Cauchy-Schwarz inequality For any X and Y � | E ( XY ) | ≤ E ( X 2 ) E ( Y 2 ) . and � E ( Y 2 ) � E ( X 2 ) E ( Y 2 ) ⇐ E ( XY ) = ⇒ Y = E ( X 2 ) X � E ( Y 2 ) � E ( XY ) = − E ( X 2 ) E ( Y 2 ) ⇐ ⇒ Y = − E ( X 2 ) X

  24. Cauchy-Schwarz inequality We have Cov ( X , Y ) ≤ σ X σ Y and equivalently | ρ X , Y | ≤ 1 In addition | ρ X , Y | = 1 ⇐ ⇒ Y = c X + d where � σ Y if ρ X , Y = 1 , σ X c := d := E ( Y ) − c E ( X ) − σ Y if ρ X , Y = − 1 , σ X

  25. Covariance matrix of a random vector The covariance matrix of � X is defined as   Var ( X 1 ) Cov ( X 1 , X 2 ) · · · Cov ( X 1 , X n ) Cov ( X 2 , X 1 ) Var ( X 2 ) · · · Cov ( X 2 , X n )   Σ � X =  . . .  ... . . .   . . .   Cov ( X n , X 2 ) Cov ( X n , X 2 ) · · · Var ( X n ) � T � X T � � � � X � � � � = E − E X E X

  26. Covariance matrix after a linear transformation Σ A � X + � b

  27. Covariance matrix after a linear transformation �� � T � � T � � � � � A � X + � A � X + � A � X + � A � X + � Σ A � b = E b b − E b b E X + �

  28. Covariance matrix after a linear transformation �� � T � � T � � � � � A � X + � A � X + � A � X + � A � X + � Σ A � b = E b b − E b b E X + � � T � X T � A T + � � A T + A E � � b T + � X � � � � � b � b T = A E b E X X � T � T � � � A T − A E � � b T − � � A T − � � � � � � b � b T − A E X E X X b E X

  29. Covariance matrix after a linear transformation �� � T � � T � � � � � A � X + � A � X + � A � X + � A � X + � Σ A � b = E b b − E b b E X + � � T � X T � A T + � � A T + A E � � b T + � X � � � � � b � b T = A E b E X X � T � T � � � A T − A E � � b T − � � A T − � � � � � � b � b T − A E X E X X b E X � � T � � X T � � � � X � � � � A T = A − E E X E X

  30. Covariance matrix after a linear transformation �� � T � � T � � � � � A � X + � A � X + � A � X + � A � X + � Σ A � b = E b b − E b b E X + � � T � X T � A T + � � A T + A E � � b T + � X � � � � � b � b T = A E b E X X � T � T � � � A T − A E � � b T − � � A T − � � � � � � b � b T − A E X E X X b E X � � T � � X T � � � � X � � � � A T = A − E E X E X X A T = A Σ �

  31. Variance in a fixed direction For any unit vector � u � u T � � u T Σ � Var � X = � X � u

  32. Direction of maximum variance To find direction of maximum variance we must solve u T Σ � arg max u || 2 = 1 � X � u || �

  33. Linear algebra Symmetric matrices have orthogonal eigenvectors X = U Λ U T Σ �  λ 1 0 · · · 0  0 λ 2 · · · 0 � T   � � � � � � � � � = u 1 u 2 · · · u n u 1 u 2 · · · u n   · · ·   0 0 · · · λ n

  34. Linear algebra || u || 2 = 1 u T Au λ 1 = max || u || 2 = 1 u T Au u 1 = arg max u T Au λ k = max || u || 2 = 1 , u ⊥ u 1 ,..., u k − 1 u T Au u k = arg max || u || 2 = 1 , u ⊥ u 1 ,..., u k − 1

  35. Direction of maximum variance √ λ 1 = 1 . 22, √ λ 1 = 1 . 38, √ λ 1 = 1, √ λ 2 = 1 √ λ 2 = 0 . 71 √ λ 2 = 0 . 32

  36. Coloring Goal: Transform uncorrelated samples with unit variance so that they have a prescribed covariance matrix Σ 1. Compute the eigendecomposition Σ = U Λ U T . 2. Set √ � y := U Λ � x where √ λ 1  0 · · · 0  √ λ 2 √ 0 0 · · ·   Λ :=   · · ·  √ λ n  0 0 · · ·

  37. Coloring Σ � Y

  38. Coloring √ √ T U T Σ � Y = U ΛΣ � Λ X

  39. Coloring √ √ T U T Σ � Y = U ΛΣ � Λ X √ √ T U T = U Λ I Λ

Recommend


More recommend