Markov’s Inequality Inequality, Probability, and Joviality • In many cases, we don’t know the true form of a • Say X is a non-negative random variable probability distribution E [ X ] for all P ( X a ) , a 0 E.g., Midterm scores a But, we know the mean • Proof: May also have other measures/properties I = 1 if X ≥ a , 0 otherwise o Variance X Since X 0 , I o Non-negativity a Taking expectations: o Etc. X E [ X ] Inequalities and bounds still allow us to say something E [ I ] P ( X a ) E a a about the probability distribution in such cases o May be imprecise compared to knowing true distribution! Andrey Andreyevich Markov Markov and the Midterm • Statistics from last quarter’s CS109 midterm • Andrey Andreyevich Markov (1856-1922) was a Russian mathematician X = midterm score Using sample mean X = 78.1 E[X] What is P(X ≥ 91)? [ ] 78 . 1 E X P ( X 91 ) 0 . 8582 91 91 Markov bound: 85.82% of class scored 91 or greater Markov’s Inequality is named after him In fact, 34.44% of class scored 91 or greater He also invented Markov Chains… o Markov inequality can be a very loose bound o …which are the basis for Google’s PageRank algorithm His facial hair inspires fear in Charlie Sheen o But, it made no assumption at all about form of distribution! Chebyshev’s Inequality Pafnuty Chebyshev • X is a random variable with E[X] = m , Var(X) = s 2 • Pafnuty Lvovich Chebyshev (1821-1894) was also a Russian mathematician s 2 m for all P ( X k ) , k 0 2 k • Proof: Since (X – m ) 2 is non-negative random variable, apply Markov’s Inequality with a = k 2 m s 2 2 E [( X ) ] m 2 2 P (( X ) k ) Chebyshev’s Inequality is named after him 2 2 k k Note that: (X – m ) 2 ≥ k 2 |X – m | ≥ k , yielding: o But actually formulated by his colleague Irénée-Jules Bienaymé He was Markov’s doctoral advisor s 2 m P ( X k ) o And sometimes credited with first deriving Markov’s Inequality 2 k There is a crater on the moon named in his honor 1
One- Sided Chebyshev’s Inequality Of the Midterm What Say You Chebyshev? • Statistics from last quarter’s CS109 midterm • X is a random variable with E[X] = 0, Var(X) = s 2 s 2 X = midterm score for any P ( X a ) , a 0 s 2 2 a Using sample mean X = 78.1 E[X] Equivalently, when E[Y] = m and Var(Y) = s 2 : Using sample variance S 2 = (24.5) 2 = 600.25 s 2 s 2 What is P(| X – 78.1 | ≥ 30)? for any P ( Y E [ Y ] a ) , a 0 s 2 2 a s 2 600 . 25 P ( X E [ X ] 30 ) 0 . 6669 s 2 2 ( 30 ) 900 for any P ( Y E [ Y ] a ) , a 0 s 2 2 a P ( X E [ X ] 30 ) 1 P ( X E [ X ] 30 ) 1 0 . 6669 0 . 3331 Follows directly by setting X = Y – E[Y], noting E[X] = 0 Chebyshev bound: 66.69% scored ≥ 108.1 or 48.1 In fact, 21.85% of class scored ≥ 108.1 or 48.1 o Chebyshev’s inequality is really a theoretical tool Comments on Midterm, One-Sided One? Chernoff Bound • Statistics from last quarter’s CS109 midterm • Say we have MGF, M( t ), for a random variable X X = midterm score Chernoff bounds: Using sample mean X = 78.1 E[X] ta for all P ( X a ) e M ( t ), t 0 Using sample variance S 2 = (24.5) 2 = 600.25 s 2 ta for all P ( X a ) e M ( t ), t 0 Bounds hold for t 0, so use t that minimizes e - ta M( t ) What is P(X ≥ 103.1)? 600 . 25 2 • Proof: P ( X 78 . 1 25 ) 0 . 4899 600 . 25 ( 25 ) X has MGF: M( t ) = E[e tX ] One-sided Chebyshev bound: 48.99% scored ≥ 103.1 Note P(X ≥ a ) = P(e tX ≥ e ta ), use Markov’s inequality: In fact, 13.26% of class scored ≥ 103.1 tX E [ e ] tX ta ta tX ta P ( X a ) P ( e e ) e E [ e ] e M ( t ), for all t 0 ta e 78 . 1 Using Markov’s inequality: P ( X 103 . 1 ) 0 . 7575 Similarity for P(X a) when t < 0 103 . 1 Chernoff’s Feeling (Unit) Normal Herman Chernoff • Herman Chernoff (1923-) is an American • Z is standard normal random variable: Z ~ N(0, 1) mathematician and statistician 2 t / 2 Moment generating function: M ( t ) e Z Chernoff bounds for P(Z ≥ a ) 2 2 ta t / 2 t / 2 ta for all P ( Z a ) e e e , t 0 To minimize bound, minimize: t 2 /2 – ta o Differentiate w.r.t. t , and set to 0: t – a = 0 t = a 2 a / 2 for all P ( Z a ) e , t a 0 Chernoff Bound is named after him Can proceed similarly for t = a < 0 to obtain: o And it actually was derived by him! 2 a / 2 P ( Z a ) e , for all t a 0 He is Professor Emeritus of Applied Mathematics at z 1 MIT and of Statistics at Harvard University 2 Compare to: x / 2 P ( Z z ) 1 P ( Z z ) 1 e dx o I do not know if he is a fan of Charlie Sheen 2 2
Chernoff’s Poisson Pill Jensen’s Inequality • X is Poisson random variable: X ~ Poi( l ) • If f ( x ) is a convex function then E[ f ( x )] ≥ f (E[X]) l t ( e 1 ) f ( x ) is convex if f ’’ ( x ) ≥ 0 for all x Moment generating function: M ( t ) e X Chernoff bounds for P(X ≥ i ) Intuition: Convex = “bowl”. E.g.: f ( x ) = x 2 , f ( x ) = e x l t l t ( e 1 ) it ( e 1 ) it P ( X i ) e e e , for all t 0 To minimize bound, minimize: l ( e t – 1) – it o Differentiate w.r.t. t , and set to 0: l e t – i = 0 e t = i / l i l i l i if g ( x ) = - f ( x ) is convex, then f ( x ) is concave i e l l l l l ( i / 1 ) i for all P ( X i ) e e e e , i/ 1 l Proof outline: Taylor series of f ( x ) about m . Be happy. i i l i Note: E[ f ( x )] = f (E[X]) only holds when f (x) is a line l Compare to: P ( X i ) e i ! o That is when: f ’’ ( x ) = 0 for all x Johan Jensen A Brief Digression on Utility Theory • Utility U(x) is “value” you derive from x • Johan Ludwig William Valdemar Jensen (1859- 1925) was a Danish mathematician 0.5 $20,000 yes $0 0.5 Play? no $10,000 0.5 U($20,000) yes U($0) He derived Jensen’s inequality 0.5 Play? He was president of the Danish Mathematical Society no U($10,000) from 1892 to 1903 Can be monetary, but often includes intangibles He has more names than Charlie Sheen o E.g., quality of life, life expectancy, personal beliefs, etc. Jensen’s Investment Advice Utility Curves • Example: risk-taking investor, with two choices: Choice 1: Invest money to get return X where E[X] = m Choice 2: Invest money to get return m (probability 1) Utility • Want to maximize utility: u (R), where R is return if u (X) convex then E[ u (X)] ≥ u ( m ), so choice 1 better If u (X) concave then E[ u (X)] u ( m ) so choice 2 better Convex u “risk preferring”, concave u “risk averse” Dollars • Utility curve determines your “risk preference” Can be different in different parts of the curve We’ll talk more about this near the end of the quarter 3
Recommend
More recommend