refresher on discrete probability
play

Refresher on Discrete Probability STAT 27725/CMSC 25400: Machine - PowerPoint PPT Presentation

Refresher on Discrete Probability STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University of Chicago October 2015 Refresher on Discrete Probability STAT 27725/CMSC 25400 Background Things you should have seen before Events,


  1. Axioms of Probability The third ingredient in the model for a random experiment is the specification of the probability of events The probability of some event A , denoted by P ( A ) , is defined such that P ( A ) satisfies the following axioms 1 P ( A ) ≥ 0 2 P (Ω) = 1 3 For any sequence A 1 , A 2 , . . . of disjoint events we have: � � � ∪ i A i = P ( A i ) P i Kolmogorov showed that these three axioms lead to the rules of probability theory de Finetti, Cox and Carnap have also provided compelling reasons for these axioms Refresher on Discrete Probability STAT 27725/CMSC 25400

  2. Some Consequences Probability of the Empty set: P ( ∅ ) = 0 Refresher on Discrete Probability STAT 27725/CMSC 25400

  3. Some Consequences Probability of the Empty set: P ( ∅ ) = 0 Monotonicity: if A ⊆ B then P ( A ) ≤ P ( B ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  4. Some Consequences Probability of the Empty set: P ( ∅ ) = 0 Monotonicity: if A ⊆ B then P ( A ) ≤ P ( B ) Numeric Bound: 0 ≤ P ( A ) ≤ 1 ∀ A ∈ S Refresher on Discrete Probability STAT 27725/CMSC 25400

  5. Some Consequences Probability of the Empty set: P ( ∅ ) = 0 Monotonicity: if A ⊆ B then P ( A ) ≤ P ( B ) Numeric Bound: 0 ≤ P ( A ) ≤ 1 ∀ A ∈ S Addition Law: P ( A ∪ B ) = P ( A ) + P ( B ) − P ( A ∩ B ) P ( A c ) = P ( S \ A ) = 1 − P ( A ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  6. Some Consequences Probability of the Empty set: P ( ∅ ) = 0 Monotonicity: if A ⊆ B then P ( A ) ≤ P ( B ) Numeric Bound: 0 ≤ P ( A ) ≤ 1 ∀ A ∈ S Addition Law: P ( A ∪ B ) = P ( A ) + P ( B ) − P ( A ∩ B ) P ( A c ) = P ( S \ A ) = 1 − P ( A ) Axioms of probability are the only system with this property: If you gamble using them you can’t be be unfairly exploited by an opponent using some other system (di Finetti, 1931) Refresher on Discrete Probability STAT 27725/CMSC 25400

  7. Discrete Sample Spaces For now, we focus on the case when the sample space is countable Ω = { ω 1 , ω 2 , . . . , ω n } Refresher on Discrete Probability STAT 27725/CMSC 25400

  8. Discrete Sample Spaces For now, we focus on the case when the sample space is countable Ω = { ω 1 , ω 2 , . . . , ω n } The probability P on a discrete sample space can be specified by first specifying the probability p i of each elementary event ω i and then defining: Refresher on Discrete Probability STAT 27725/CMSC 25400

  9. Discrete Sample Spaces For now, we focus on the case when the sample space is countable Ω = { ω 1 , ω 2 , . . . , ω n } The probability P on a discrete sample space can be specified by first specifying the probability p i of each elementary event ω i and then defining: � p i ∀ A ⊂ Ω P ( A ) = i : ω i ∈ A Refresher on Discrete Probability STAT 27725/CMSC 25400

  10. Discrete Sample Spaces For now, we focus on the case when the sample space is countable Ω = { ω 1 , ω 2 , . . . , ω n } The probability P on a discrete sample space can be specified by first specifying the probability p i of each elementary event ω i and then defining: � p i ∀ A ⊂ Ω P ( A ) = i : ω i ∈ A Refresher on Discrete Probability STAT 27725/CMSC 25400

  11. Discrete Sample Spaces � P ( A ) = p i ∀ A ⊂ Ω i : ω i ∈ A Refresher on Discrete Probability STAT 27725/CMSC 25400

  12. Discrete Sample Spaces � P ( A ) = p i ∀ A ⊂ Ω i : ω i ∈ A In many applications, each elementary event is equally likely. Probability of an elementary event: 1 divided by total number of elements in Ω Equally likely principle: If Ω has a finite number of outcomes, and all ar equally likely, then the possibility of each event A is defined as P ( A ) = | A | | Ω | Refresher on Discrete Probability STAT 27725/CMSC 25400

  13. Discrete Sample Spaces � P ( A ) = p i ∀ A ⊂ Ω i : ω i ∈ A In many applications, each elementary event is equally likely. Probability of an elementary event: 1 divided by total number of elements in Ω Equally likely principle: If Ω has a finite number of outcomes, and all ar equally likely, then the possibility of each event A is defined as P ( A ) = | A | | Ω | Finding P ( A ) reduces to counting What is the probability of getting a full house in poker? Refresher on Discrete Probability STAT 27725/CMSC 25400

  14. Discrete Sample Spaces � P ( A ) = p i ∀ A ⊂ Ω i : ω i ∈ A In many applications, each elementary event is equally likely. Probability of an elementary event: 1 divided by total number of elements in Ω Equally likely principle: If Ω has a finite number of outcomes, and all ar equally likely, then the possibility of each event A is defined as P ( A ) = | A | | Ω | Finding P ( A ) reduces to counting What is the probability of getting a full house in poker? � 4 � 4 � � 13 · 12 3 2 ≈ 0 . 14 � 52 � 5 Refresher on Discrete Probability STAT 27725/CMSC 25400

  15. Counting Counting is not easy! Fortunately, many counting problems can be cast into the framework of drawing balls from an urn with replacement without replacement ordered not ordered Refresher on Discrete Probability STAT 27725/CMSC 25400

  16. Choosing k of n distinguishable objects with replacement without replacement n k ordered n ( n − 1) . . . ( n − k + 1) � n + k − 1 � n � � not ordered n − 1 k Refresher on Discrete Probability STAT 27725/CMSC 25400

  17. Choosing k of n distinguishable objects with replacement without replacement n k ordered n ( n − 1) . . . ( n − k + 1) � n + k − 1 � n � � not ordered n − 1 k − → usually goes in the denominator Refresher on Discrete Probability STAT 27725/CMSC 25400

  18. Indistinguishable Objects If we choose k balls from an urn with n 1 red balls and n 2 green balls, what is the probability of getting a particular sequence of x red balls and k − x green ones? What is the probability of any such sequence? How many ways can this happen? (this goes in the numerator) Refresher on Discrete Probability STAT 27725/CMSC 25400

  19. Indistinguishable Objects If we choose k balls from an urn with n 1 red balls and n 2 green balls, what is the probability of getting a particular sequence of x red balls and k − x green ones? What is the probability of any such sequence? How many ways can this happen? (this goes in the numerator) with replacement without replacement n x 1 n k − x ordered n 1 . . . ( n 1 − x + 1) · n 2 . . . ( n 2 − k + x 2 �� n 2 � k 1 n k − x � n 1 n x � � not ordered k ! 2 x x k − x Refresher on Discrete Probability STAT 27725/CMSC 25400

  20. Joint and conditional probability Joint: P ( A, B ) = P ( A ∩ B ) Conditional: P ( A | B ) = P ( A ∩ B ) P ( B ) AI is all about conditional probabilities. Refresher on Discrete Probability STAT 27725/CMSC 25400

  21. Conditional Probability P ( A | B ) = fraction of worlds in which B is true that also have A true Refresher on Discrete Probability STAT 27725/CMSC 25400

  22. Conditional Probability P ( A | B ) = fraction of worlds in which B is true that also have A true H = ”Have a headache”, F = ”Have flu”. Refresher on Discrete Probability STAT 27725/CMSC 25400

  23. Conditional Probability P ( A | B ) = fraction of worlds in which B is true that also have A true H = ”Have a headache”, F = ”Have flu”. P ( H ) = 1 10 , P ( F ) = 1 40 , P ( H | F ) = 1 2 Refresher on Discrete Probability STAT 27725/CMSC 25400

  24. Conditional Probability P ( A | B ) = fraction of worlds in which B is true that also have A true H = ”Have a headache”, F = ”Have flu”. P ( H ) = 1 10 , P ( F ) = 1 40 , P ( H | F ) = 1 2 ”Headaches are rare and flu is rarer, but if you are coming down wih flu, there is a 50-50 chance you’ll have a headache.” Refresher on Discrete Probability STAT 27725/CMSC 25400

  25. Conditional Probability P ( A | B ) = fraction of worlds in which B is true that also have A true H = ”Have a headache”, F = ”Have flu”. P ( H ) = 1 10 , P ( F ) = 1 40 , P ( H | F ) = 1 2 ”Headaches are rare and flu is rarer, but if you are coming down wih flu, there is a 50-50 chance you’ll have a headache.” Refresher on Discrete Probability STAT 27725/CMSC 25400

  26. Conditional Probability P ( H | F ) : Fraction of flu-inflicted worlds in which you have a headache Refresher on Discrete Probability STAT 27725/CMSC 25400

  27. Conditional Probability P ( H | F ) : Fraction of flu-inflicted worlds in which you have a headache P ( H | F ) = Number of worlds with flu and headache Number of worlds with flu Refresher on Discrete Probability STAT 27725/CMSC 25400

  28. Conditional Probability P ( H | F ) : Fraction of flu-inflicted worlds in which you have a headache P ( H | F ) = Number of worlds with flu and headache Number of worlds with flu Refresher on Discrete Probability STAT 27725/CMSC 25400

  29. Conditional Probability P ( H | F ) : Fraction of flu-inflicted worlds in which you have a headache P ( H | F ) = Number of worlds with flu and headache Number of worlds with flu P ( H | F ) = Area of H and F region = P ( H ∩ F ) Area of F region P ( F ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  30. Conditional Probability P ( H | F ) : Fraction of flu-inflicted worlds in which you have a headache P ( H | F ) = Number of worlds with flu and headache Number of worlds with flu P ( H | F ) = Area of H and F region = P ( H ∩ F ) Area of F region P ( F ) Conditional Probability: P ( A | B ) = P ( A ∩ B ) P ( B ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  31. Conditional Probability P ( H | F ) : Fraction of flu-inflicted worlds in which you have a headache P ( H | F ) = Number of worlds with flu and headache Number of worlds with flu P ( H | F ) = Area of H and F region = P ( H ∩ F ) Area of F region P ( F ) Conditional Probability: P ( A | B ) = P ( A ∩ B ) P ( B ) Corollary: The Chain Rule P ( A ∩ B ) = P ( A | B ) P ( B ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  32. Probabilistic Inference H = ”Have a headache”, F = ”Have flu”. P ( H ) = 1 10 , P ( F ) 1 40 , P ( H | F ) = 1 2 Refresher on Discrete Probability STAT 27725/CMSC 25400

  33. Probabilistic Inference H = ”Have a headache”, F = ”Have flu”. P ( H ) = 1 10 , P ( F ) 1 40 , P ( H | F ) = 1 2 Suppose you wake up one day with a headache and think: ”50 % of flus are associated with headaches so I must have a 50-50 chance of coming down with flu” Is this reasoning good? Refresher on Discrete Probability STAT 27725/CMSC 25400

  34. Bayes Rule: Relates P ( A | B ) to P ( A | B ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  35. Sensitivity and Specificity TRUE FALSE predict + true + false + predict − false − true − Sensitivity = P (+ | disease ) FNR = P ( −| T ) = 1 − sensitivity Specificity = P ( −| healthy ) FPR = P (+ | F ) = 1 − specificity Refresher on Discrete Probability STAT 27725/CMSC 25400

  36. Mammography Sensitivity of screening mammogram P (+ | cancer ) ≈ 90% Specificity of screening mammogram P ( −| no cancer ) ≈ 91% Probability that a woman age 40 has breast cancer ≈ 1% If a previously unscreened 40 year old woman’s mammogram is positive, what is the probability that she has breast cancer? Refresher on Discrete Probability STAT 27725/CMSC 25400

  37. Mammography Sensitivity of screening mammogram P (+ | cancer ) ≈ 90% Specificity of screening mammogram P ( −| no cancer ) ≈ 91% Probability that a woman age 40 has breast cancer ≈ 1% If a previously unscreened 40 year old woman’s mammogram is positive, what is the probability that she has breast cancer? P ( cancer | +) = Refresher on Discrete Probability STAT 27725/CMSC 25400

  38. Mammography Sensitivity of screening mammogram P (+ | cancer ) ≈ 90% Specificity of screening mammogram P ( −| no cancer ) ≈ 91% Probability that a woman age 40 has breast cancer ≈ 1% If a previously unscreened 40 year old woman’s mammogram is positive, what is the probability that she has breast cancer? P ( cancer | +) = P ( cancer , +) = P (+) Refresher on Discrete Probability STAT 27725/CMSC 25400

  39. Mammography Sensitivity of screening mammogram P (+ | cancer ) ≈ 90% Specificity of screening mammogram P ( −| no cancer ) ≈ 91% Probability that a woman age 40 has breast cancer ≈ 1% If a previously unscreened 40 year old woman’s mammogram is positive, what is the probability that she has breast cancer? = P (+ | cancer ) P ( cancer ) P ( cancer | +) = P ( cancer , +) = P (+) P (+) Refresher on Discrete Probability STAT 27725/CMSC 25400

  40. Mammography Sensitivity of screening mammogram P (+ | cancer ) ≈ 90% Specificity of screening mammogram P ( −| no cancer ) ≈ 91% Probability that a woman age 40 has breast cancer ≈ 1% If a previously unscreened 40 year old woman’s mammogram is positive, what is the probability that she has breast cancer? = P (+ | cancer ) P ( cancer ) P ( cancer | +) = P ( cancer , +) = P (+) P (+) 0 . 01 × . 9 0 . 01 × . 9 + 0 . 99 × 0 . 09 ≈ Refresher on Discrete Probability STAT 27725/CMSC 25400

  41. Mammography Sensitivity of screening mammogram P (+ | cancer ) ≈ 90% Specificity of screening mammogram P ( −| no cancer ) ≈ 91% Probability that a woman age 40 has breast cancer ≈ 1% If a previously unscreened 40 year old woman’s mammogram is positive, what is the probability that she has breast cancer? = P (+ | cancer ) P ( cancer ) P ( cancer | +) = P ( cancer , +) = P (+) P (+) 0 . 01 × . 9 0 . 009 + 0 . 09 ≈ 0 . 009 0 . 009 0 . 01 × . 9 + 0 . 99 × 0 . 09 ≈ ≈ 9% 0 . 1 Message : P ( A | B ) � = P ( B | A ) . Refresher on Discrete Probability STAT 27725/CMSC 25400

  42. Bayes’ rule P ( B | A ) = P ( A | B ) P ( B ) P ( A ) (Bayes, Thomas (1763) An Essay towards solving a problem in the doctrine of chances. Philosophi- cal Transactions of the Royal So- ciety of London ) Rev. Thomas Bayes (1701–1761) Refresher on Discrete Probability STAT 27725/CMSC 25400

  43. Prosecutor’s fallacy: Sally Clark Two kids died with no explanation. Sir Roy Meadow testified that chance of this happening due to SIDS is (1 / 8500) 2 ≈ (73 × 10 6 ) − 1 . Sally Clark found guilty and imprisoned. Later verdict overturned and Sally Clark (1964–2007) Meadow struck off medical register. Fallacy: P ( SIDS | 2 deaths ) � = P ( SIDS , 2 deaths ) P ( guilty | +) = 1 − P ( not guilty | +) � = 1 − P (+ | not guilty ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  44. Independence Two events A and B are independent , denoted A ⊥ B if P ( A, B ) = P ( A ) P ( B ) . Refresher on Discrete Probability STAT 27725/CMSC 25400

  45. Independence Two events A and B are independent , denoted A ⊥ B if P ( A, B ) = P ( A ) P ( B ) . P ( A | B ) = P ( A, B ) = P ( A ) P ( B ) = P ( A ) P ( B ) P ( B ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  46. Independence Two events A and B are independent , denoted A ⊥ B if P ( A, B ) = P ( A ) P ( B ) . P ( A | B ) = P ( A, B ) = P ( A ) P ( B ) = P ( A ) P ( B ) P ( B ) P ( A c | B ) = P ( B ) − P ( A, B ) = P ( B )(1 − P ( A )) = P ( A c ) P ( B ) P ( B ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  47. Independence A collection of events A are mutually independent if for any { i 1 , i 2 , . . . , i n } ⊆ A � n n � � � A i = P ( A i ) P i =1 i =1 If A is independent of B and C , that does not necessarily mean that it is independent of ( B, C ) (example). Refresher on Discrete Probability STAT 27725/CMSC 25400

  48. Conditional independence A is conditionally independent of B given C , denoted A ⊥ B | C if P ( A, B | C ) = P ( A | C ) P ( B | C ) . A ⊥ B | C does not imply and is not implied by A ⊥ B . Refresher on Discrete Probability STAT 27725/CMSC 25400

  49. Common cause Refresher on Discrete Probability STAT 27725/CMSC 25400

  50. Common cause p ( x A , x B , x C ) = p ( x C ) p ( x A | x C ) p ( x B | x C ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  51. Common cause p ( x A , x B , x C ) = p ( x C ) p ( x A | x C ) p ( x B | x C ) X A �⊥ X B but X A ⊥ X B | X C Example: Lung cancer ⊥ Yellow teeth | Smoking Refresher on Discrete Probability STAT 27725/CMSC 25400

  52. Explaining away Refresher on Discrete Probability STAT 27725/CMSC 25400

  53. Explaining away p ( x A , x B , x C ) = p ( x A ) p ( x B ) p ( x C | x A , x B ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  54. Explaining away p ( x A , x B , x C ) = p ( x A ) p ( x B ) p ( x C | x A , x B ) X A ⊥ X B but X A �⊥ X B | X C Example: Burglary �⊥ Earthquake | Alarm Refresher on Discrete Probability STAT 27725/CMSC 25400

  55. Explaining away p ( x A , x B , x C ) = p ( x A ) p ( x B ) p ( x C | x A , x B ) X A ⊥ X B but X A �⊥ X B | X C Example: Burglary �⊥ Earthquake | Alarm Even if two variables are independent, they can become dependent when we observe an effect that they can both influence Refresher on Discrete Probability STAT 27725/CMSC 25400

  56. Bayesian Networks Simple case: POS Tagging. Want to predict an output vector y = { y 0 , y 1 , . . . , y T } of random variables given an observed feature vector x (Hidden Markov Model) Refresher on Discrete Probability STAT 27725/CMSC 25400

  57. Random Variables Refresher on Discrete Probability STAT 27725/CMSC 25400

  58. Random Variables A Random Variable is a function X : Ω �→ R Refresher on Discrete Probability STAT 27725/CMSC 25400

  59. Random Variables A Random Variable is a function X : Ω �→ R Example: Sum of two fair dice Refresher on Discrete Probability STAT 27725/CMSC 25400

  60. Random Variables A Random Variable is a function X : Ω �→ R Example: Sum of two fair dice The set of all possible values a random variable X can take is called its range Discrete random variables can only take isolated values (probability of a random variable taking a particular value reduces to counting) Refresher on Discrete Probability STAT 27725/CMSC 25400

  61. Discrete Distributions Assume X is a discrete random variable. We would like to specify probabilities of events { X = x } Refresher on Discrete Probability STAT 27725/CMSC 25400

  62. Discrete Distributions Assume X is a discrete random variable. We would like to specify probabilities of events { X = x } If we can specify the probabilities involving X , we can say that we have specified the probability distribution of X Refresher on Discrete Probability STAT 27725/CMSC 25400

  63. Discrete Distributions Assume X is a discrete random variable. We would like to specify probabilities of events { X = x } If we can specify the probabilities involving X , we can say that we have specified the probability distribution of X For a countable set of values x 1 , x 2 , . . . x n , we have P ( X = x i ) > 0 , i = 1 , 2 , . . . , n and � i P ( X = x i ) = 1 Refresher on Discrete Probability STAT 27725/CMSC 25400

  64. Discrete Distributions Assume X is a discrete random variable. We would like to specify probabilities of events { X = x } If we can specify the probabilities involving X , we can say that we have specified the probability distribution of X For a countable set of values x 1 , x 2 , . . . x n , we have P ( X = x i ) > 0 , i = 1 , 2 , . . . , n and � i P ( X = x i ) = 1 We can then define the probability mass function f of X by f ( X ) = P ( X = x ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  65. Discrete Distributions Assume X is a discrete random variable. We would like to specify probabilities of events { X = x } If we can specify the probabilities involving X , we can say that we have specified the probability distribution of X For a countable set of values x 1 , x 2 , . . . x n , we have P ( X = x i ) > 0 , i = 1 , 2 , . . . , n and � i P ( X = x i ) = 1 We can then define the probability mass function f of X by f ( X ) = P ( X = x ) Sometimes write as f X Refresher on Discrete Probability STAT 27725/CMSC 25400

  66. Discrete Distributions Example: Toss a die and let X be its face value. X is discrete with range { 1 , 2 , 3 , 4 , 5 , 6 } . The pmf is Refresher on Discrete Probability STAT 27725/CMSC 25400

  67. Discrete Distributions Example: Toss a die and let X be its face value. X is discrete with range { 1 , 2 , 3 , 4 , 5 , 6 } . The pmf is Another example: Toss two dice and let X be the largest face value. The pmf is Refresher on Discrete Probability STAT 27725/CMSC 25400

  68. Expectation Assume X is a discrete random variable with pmf f . Refresher on Discrete Probability STAT 27725/CMSC 25400

  69. Expectation Assume X is a discrete random variable with pmf f . The expectation of X , E [ X ] is defined by: � � E [ X ] = x P ( X = x ) = xf ( x ) x x Refresher on Discrete Probability STAT 27725/CMSC 25400

  70. Expectation Assume X is a discrete random variable with pmf f . The expectation of X , E [ X ] is defined by: � � E [ X ] = x P ( X = x ) = xf ( x ) x x Sometimes written as µ X . Is sort of a ”weighted average” of the values that X can take (another interpretation is as a center of mass). Refresher on Discrete Probability STAT 27725/CMSC 25400

  71. Expectation Assume X is a discrete random variable with pmf f . The expectation of X , E [ X ] is defined by: � � E [ X ] = x P ( X = x ) = xf ( x ) x x Sometimes written as µ X . Is sort of a ”weighted average” of the values that X can take (another interpretation is as a center of mass). Example: Expected outcome of toss of a fair die - 7 2 Refresher on Discrete Probability STAT 27725/CMSC 25400

  72. Expectation If X is a random variable, then a function of X , such as X 2 is also a random variable. The following statement is easy to prove: Refresher on Discrete Probability STAT 27725/CMSC 25400

  73. Expectation If X is a random variable, then a function of X , such as X 2 is also a random variable. The following statement is easy to prove: Theorem If X is discrete with pmf f , then for any real-valued function g , � E g ( X ) = g ( x ) f ( x ) x Example: E [ X 2 ] when X is outcome of the toss of a fair die, is 91 6 Refresher on Discrete Probability STAT 27725/CMSC 25400

  74. Linearity of Expectation A consequence of the obvious theorem from earlier is that Expectation is linear i.e. has the following two properties for a, b ∈ R and functions g, h Refresher on Discrete Probability STAT 27725/CMSC 25400

  75. Linearity of Expectation A consequence of the obvious theorem from earlier is that Expectation is linear i.e. has the following two properties for a, b ∈ R and functions g, h E ( aX + b ) = a E X + b Refresher on Discrete Probability STAT 27725/CMSC 25400

  76. Linearity of Expectation A consequence of the obvious theorem from earlier is that Expectation is linear i.e. has the following two properties for a, b ∈ R and functions g, h E ( aX + b ) = a E X + b (Proof: Suppose X has pmf f . Then the above follows from E ( aX + b ) = � x ( ax + b ) f ( x ) = a � x f ( x ) + b � x f ( x ) = a E X + b ) E ( g ( X ) + h ( X )) = E g ( X ) + E h ( X ) Refresher on Discrete Probability STAT 27725/CMSC 25400

  77. Linearity of Expectation A consequence of the obvious theorem from earlier is that Expectation is linear i.e. has the following two properties for a, b ∈ R and functions g, h E ( aX + b ) = a E X + b (Proof: Suppose X has pmf f . Then the above follows from E ( aX + b ) = � x ( ax + b ) f ( x ) = a � x f ( x ) + b � x f ( x ) = a E X + b ) E ( g ( X ) + h ( X )) = E g ( X ) + E h ( X ) (Proof: E ( g ( X ) + h ( X ) = � x ( g ( x ) + h ( x )) f ( x ) = � x g ( x ) f ( x ) + � x h ( x ) f ( x ) = E g ( X ) + E h ( X ) ) Refresher on Discrete Probability STAT 27725/CMSC 25400

Recommend


More recommend