Probability Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ September 16, 2020 Introduction to Random Processes Probability Review 1
Markov and Chebyshev’s inequalities Markov and Chebyshev’s inequalities Convergence of random variables Limit theorems Conditional probabilities Conditional expectation Introduction to Random Processes Probability Review 2
Markov’s inequality ◮ RV X with E [ | X | ] < ∞ , constant a > 0 ◮ Markov’s inequality states ⇒ P ( | X | ≥ a ) ≤ E ( | X | ) a Proof. ◮ I {| X | ≥ a } = 1 when | X | ≥ a and | X | 0 else. Then (figure to the right) a I {| X | ≥ a } ≤ | X | a ◮ Use linearity of expected value a E ( I {| X | ≥ a } ) ≤ E ( | X | ) X − a a ◮ Indicator function’s expectation = Probability of indicated event a P ( | X | ≥ a ) ≤ E ( | X | ) Introduction to Random Processes Probability Review 3
Chebyshev’s inequality ◮ RV X with E ( X ) = µ and E ( X − µ ) 2 � = σ 2 , constant k > 0 � ◮ Chebyshev’s inequality states ⇒ P ( | X − µ | ≥ k ) ≤ σ 2 k 2 Proof. ◮ Markov’s inequality for the RV Z = ( X − µ ) 2 and constant a = k 2 ( X − µ ) 2 � � ≤ E [ | Z | ] = E ( X − µ ) 2 ≥ k 2 � | Z | ≥ k 2 � � � P = P k 2 k 2 ◮ Notice that ( X − µ ) 2 ≥ k 2 if and only if | X − µ | ≥ k thus ( X − µ ) 2 � � P ( | X − µ | ≥ k ) ≤ E k 2 ◮ Chebyshev’s inequality follows from definition of variance Introduction to Random Processes Probability Review 4
Comments and observations ◮ If absolute expected value is finite, i.e., E [ | X | ] < ∞ ⇒ Complementary (c)cdf decreases at least like x − 1 (Markov’s) ◮ If mean E ( X ) and variance E ( X − µ ) 2 � � are finite ⇒ Ccdf decreases at least like x − 2 (Chebyshev’s) ◮ Most cdfs decrease exponentially (e.g. e − x 2 for normal) ⇒ Power law bounds ∝ x − α are loose but still useful ◮ Markov’s inequality often derived for nonnegative RV X ≥ 0 ⇒ Can drop the absolute value to obtain P ( X ≥ a ) ≤ E ( X ) a ⇒ General bound P ( X ≥ a ) ≤ E ( X r ) holds for r > 0 a r Introduction to Random Processes Probability Review 5
Convergence of random variables Markov and Chebyshev’s inequalities Convergence of random variables Limit theorems Conditional probabilities Conditional expectation Introduction to Random Processes Probability Review 6
Limits ◮ Sequence of RVs X N = X 1 , X 2 , . . . , X n , . . . ⇒ Distinguish between random process X N and realizations x N Q1) Say something about X n for n large? ⇒ Not clear, X n is a RV Q2) Say something about x n for n large? ⇒ Certainly, look at n →∞ x n lim Q3) Say something about P ( X n ∈ X ) for n large? ⇒ Yes, n →∞ P ( X n ∈ X ) lim ◮ Translate what we now about regular limits to definitions for RVs ◮ Can start from convergence of sequences: n →∞ x n lim ⇒ Sure and almost sure convergence ◮ Or from convergence of probabilities: n →∞ P ( X n ) lim ⇒ Convergence in probability, in mean square and distribution Introduction to Random Processes Probability Review 7
Convergence of sequences and sure convergence ◮ Denote sequence of numbers x N = x 1 , x 2 , . . . , x n , . . . ◮ Def: Sequence x N converges to the value x if given any ǫ > 0 ⇒ There exists n 0 such that for all n > n 0 , | x n − x | < ǫ ◮ Sequence x n comes arbitrarily close to its limit ⇒ | x n − x | < ǫ ⇒ And stays close to its limit for all n > n 0 ◮ Random process (sequence of RVs) X N = X 1 , X 2 , . . . , X n , . . . ⇒ Realizations of X N are sequences x N ◮ Def: We say X N converges surely to RV X if ⇒ n →∞ x n = x for all realizations x N of X N lim ◮ Said differently, lim n →∞ X n ( s ) = X ( s ) for all s ∈ S ◮ Not really adequate. Even a (practically unimportant) outcome that happens with vanishingly small probability prevents sure convergence Introduction to Random Processes Probability Review 8
Almost sure convergence ◮ RV X and random process X N = X 1 , X 2 , . . . , X n , . . . ◮ Def: We say X N converges almost surely to RV X if � � P n →∞ X n = X lim = 1 ⇒ Almost all sequences converge, except for a set of measure 0 ◮ Almost sure convergence denoted as ⇒ n →∞ X n = X lim a.s. ⇒ Limit X is a random variable 1 Example 0.5 ◮ X 0 ∼ N (0 , 1) (normal, mean 0, variance 1) 0 ◮ Z n sequence of Bernoulli RVs, parameter p − 0.5 ◮ Define ⇒ X n = X 0 − Z n − 1 n − 1.5 ◮ Z n n → 0 so lim n →∞ X n = X 0 a.s. (also surely) − 2 10 20 30 40 50 60 70 80 90 100 Introduction to Random Processes Probability Review 9
Almost sure convergence example ◮ Consider S = [0 , 1] and let P ( · ) be the uniform probability distribution ⇒ P ([ a , b ]) = b − a for 0 ≤ a ≤ b ≤ 1 ◮ Define the RVs X n ( s ) = s + s n and X ( s ) = s ◮ For all s ∈ [0 , 1) ⇒ s n → 0 as n → ∞ , hence X n ( s ) → s = X ( s ) ◮ For s = 1 ⇒ X n (1) = 2 for all n , while X (1) = 1 ◮ Convergence only occurs on the set [0 , 1), and P ([0 , 1)) = 1 ⇒ We say lim n →∞ X n = X a.s. ⇒ Once more, note the limit X is a random variable Introduction to Random Processes Probability Review 10
Convergence in probability ◮ Def: We say X N converges in probability to RV X if for any ǫ > 0 n →∞ P ( | X n − X | < ǫ ) = 1 lim ⇒ Prob. of distance | X n − X | becoming smaller than ǫ tends to 1 ◮ Statement is about probabilities, not about realizations (sequences) ⇒ Probability converges, realizations x N may or may not converge ⇒ Limit and prob. interchanged with respect to a.s. convergence Theorem Almost sure (a.s.) convergence implies convergence in probability Proof. ◮ If n →∞ X n = X then for any ǫ > 0 there is n 0 such that lim | X n − X | < ǫ for all n ≥ n 0 ◮ True for all almost all sequences so P ( | X n − X | < ǫ ) → 1 Introduction to Random Processes Probability Review 11
Convergence in probability example − 0.6 − 0.8 − 1 − 1.2 ◮ X 0 ∼ N (0 , 1) (normal, mean 0, variance 1) − 1.4 ◮ Z n sequence of Bernoulli RVs, parameter 1 / n − 1.6 − 1.8 ◮ Define ⇒ X n = X 0 − Z n − 2 10 20 30 40 50 60 70 80 90 100 − 0.6 ◮ X n converges in probability to X 0 because − 0.8 − 1 − 1.2 P ( | X n − X 0 | < ǫ ) = P ( | Z n | < ǫ ) − 1.4 = 1 − P ( Z n = 1) − 1.6 − 1.8 = 1 − 1 − 2 100 200 300 400 500 600 700 800 900 1000 n → 1 − 0.6 − 0.8 − 1 ◮ Plot of path x n up to n = 10 2 , n = 10 3 , n = 10 4 − 1.2 − 1.4 ⇒ Z n = 1 becomes ever rarer but still happens − 1.6 − 1.8 − 2 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Introduction to Random Processes Probability Review 12
Difference between a.s. and in probability ◮ Almost sure convergence implies that almost all sequences converge ◮ Convergence in probability does not imply convergence of sequences ◮ Latter example: X n = X 0 − Z n , Z n is Bernoulli with parameter 1 / n ⇒ Showed it converges in probability P ( | X n − X 0 | < ǫ ) = 1 − 1 n → 1 ⇒ But for almost all sequences, n →∞ x n does not exist lim ◮ Almost sure convergence ⇒ disturbances stop happening ◮ Convergence in prob. ⇒ disturbances happen with vanishing freq. ◮ Difference not irrelevant ◮ Interpret Z n as rate of change in savings ◮ With a.s. convergence risk is eliminated ◮ With convergence in prob. risk decreases but does not disappear Introduction to Random Processes Probability Review 13
Mean-square convergence ◮ Def: We say X N converges in mean square to RV X if | X n − X | 2 � � lim = 0 n →∞ E ⇒ Sometimes (very) easy to check Theorem Convergence in mean square implies convergence in probability Proof. ◮ From Markov’s inequality � | X n − X | 2 � ≤ E | X n − X | 2 ≥ ǫ 2 � � P ( | X n − X | ≥ ǫ ) = P ǫ 2 /ǫ 2 → 0 for all ǫ ◮ If X n → X in mean-square sense, E | X n − X | 2 � � ◮ Almost sure and mean square ⇒ neither one implies the other Introduction to Random Processes Probability Review 14
Convergence in distribution ◮ Consider a random process X N . Cdf of X n is F n ( x ) ◮ Def: We say X N converges in distribution to RV X with cdf F X ( x ) if ⇒ lim n →∞ F n ( x ) = F X ( x ) for all x at which F X ( x ) is continuous ◮ No claim about individual sequences, just the cdf of X n ⇒ Weakest form of convergence covered ◮ Implied by almost sure, in probability, and mean square convergence Example 4 2 ◮ Y n ∼ N (0 , 1) 0 − 2 ◮ Z n Bernoulli with parameter p − 4 − 6 ◮ Define ⇒ X n = Y n − 10 Z n / n − 8 ◮ Z n − 10 n → 0 so lim n →∞ F n ( x ) “=” N (0 , 1) − 12 10 20 30 40 50 60 70 80 90 100 Introduction to Random Processes Probability Review 15
Convergence in distribution (continued) ◮ Individual sequences x n do not converge in any sense ⇒ It is the distribution that converges n = 1 n = 10 n = 100 0.4 0.4 0.4 0.35 0.35 0.35 0.3 0.3 0.3 0.25 0.25 0.25 0.2 0.2 0.2 0.15 0.15 0.15 0.1 0.1 0.1 0.05 0.05 0.05 0 0 0 − 15 − 10 − 5 0 5 − 6 − 4 − 2 0 2 4 6 − 4 − 3 − 2 − 1 0 1 2 3 4 ◮ As the effect of Z n / n vanishes pdf of X n converges to pdf of Y n ⇒ Standard normal N (0 , 1) Introduction to Random Processes Probability Review 16
Recommend
More recommend