1
play

1 One- Sided Chebyshevs Inequality Of the Midterm What Say You - PDF document

Markovs Inequality Inequality, Probability, and Joviality In many cases, we dont know the true form of a Say X is a non-negative random variable probability distribution E [ X ] for all P ( X a ) , a 0


  1. Markov’s Inequality Inequality, Probability, and Joviality • In many cases, we don’t know the true form of a • Say X is a non-negative random variable probability distribution E [ X ]    for all P ( X a ) , a 0  E.g., Midterm scores a  But, we know the mean • Proof:  May also have other measures/properties  I = 1 if X ≥ a , 0 otherwise o Variance X   Since  X 0 , I o Non-negativity a  Taking expectations: o Etc.   X E [ X ]      Inequalities and bounds still allow us to say something E [ I ] P ( X a ) E     a a about the probability distribution in such cases o May be imprecise compared to knowing true distribution! Andrey Andreyevich Markov Markov and the Midterm • Statistics from last quarter’s CS109 midterm • Andrey Andreyevich Markov (1856-1922) was a Russian mathematician  X = midterm score  Using sample mean X = 78.1  E[X]  What is P(X ≥ 91)? [ ] 78 . 1 E X     P ( X 91 ) 0 . 8582 91 91  Markov bound:  85.82% of class scored 91 or greater  Markov’s Inequality is named after him  In fact, 34.44% of class scored 91 or greater  He also invented Markov Chains… o Markov inequality can be a very loose bound o …which are the basis for Google’s PageRank algorithm  His facial hair inspires fear in Charlie Sheen o But, it made no assumption at all about form of distribution! Chebyshev’s Inequality Pafnuty Chebyshev • X is a random variable with E[X] = m , Var(X) = s 2 • Pafnuty Lvovich Chebyshev (1821-1894) was also a Russian mathematician s 2  m    for all P ( X k ) , k 0 2 k • Proof:  Since (X – m ) 2 is non-negative random variable, apply Markov’s Inequality with a = k 2  m s 2 2 E [( X ) ]  m    2 2 P (( X ) k )  Chebyshev’s Inequality is named after him 2 2 k k  Note that: (X – m ) 2 ≥ k 2  |X – m | ≥ k , yielding: o But actually formulated by his colleague Irénée-Jules Bienaymé  He was Markov’s doctoral advisor s 2  m   P ( X k ) o And sometimes credited with first deriving Markov’s Inequality 2 k  There is a crater on the moon named in his honor 1

  2. One- Sided Chebyshev’s Inequality Of the Midterm What Say You Chebyshev? • Statistics from last quarter’s CS109 midterm • X is a random variable with E[X] = 0, Var(X) = s 2 s 2  X = midterm score    for any P ( X a ) , a 0 s  2 2 a  Using sample mean X = 78.1  E[X]  Equivalently, when E[Y] = m and Var(Y) = s 2 :  Using sample variance S 2 = (24.5) 2 = 600.25  s 2 s 2  What is P(| X – 78.1 | ≥ 30)?     for any P ( Y E [ Y ] a ) , a 0 s  2 2 a s 2 600 . 25      P ( X E [ X ] 30 ) 0 . 6669 s 2 2 ( 30 ) 900     for any P ( Y E [ Y ] a ) , a 0 s  2 2          a P ( X E [ X ] 30 ) 1 P ( X E [ X ] 30 ) 1 0 . 6669 0 . 3331  Follows directly by setting X = Y – E[Y], noting E[X] = 0  Chebyshev bound:  66.69% scored ≥ 108.1 or  48.1  In fact, 21.85% of class scored ≥ 108.1 or  48.1 o Chebyshev’s inequality is really a theoretical tool Comments on Midterm, One-Sided One? Chernoff Bound • Statistics from last quarter’s CS109 midterm • Say we have MGF, M( t ), for a random variable X  X = midterm score  Chernoff bounds:      Using sample mean X = 78.1  E[X] ta for all P ( X a ) e M ( t ), t 0      Using sample variance S 2 = (24.5) 2 = 600.25  s 2 ta for all P ( X a ) e M ( t ), t 0  Bounds hold for t  0, so use t that minimizes e - ta M( t )  What is P(X ≥ 103.1)? 600 . 25    2  • Proof: P ( X 78 . 1 25 ) 0 . 4899  600 . 25 ( 25 )  X has MGF: M( t ) = E[e tX ]  One-sided Chebyshev bound:  48.99% scored ≥ 103.1  Note P(X ≥ a ) = P(e tX ≥ e ta ), use Markov’s inequality:  In fact, 13.26% of class scored ≥ 103.1 tX E [ e ]          tX ta ta tX ta P ( X a ) P ( e e ) e E [ e ] e M ( t ), for all t 0 ta e 78 . 1  Using Markov’s inequality:    P ( X 103 . 1 ) 0 . 7575  Similarity for P(X  a) when t < 0 103 . 1 Chernoff’s Feeling (Unit) Normal Herman Chernoff • Herman Chernoff (1923-) is an American • Z is standard normal random variable: Z ~ N(0, 1) mathematician and statistician  2 t / 2  Moment generating function: M ( t ) e Z  Chernoff bounds for P(Z ≥ a )    2  2   ta t / 2 t / 2 ta for all P ( Z a ) e e e , t 0  To minimize bound, minimize: t 2 /2 – ta o Differentiate w.r.t. t , and set to 0: t – a = 0  t = a    2   a / 2 for all P ( Z a ) e , t a 0  Chernoff Bound is named after him  Can proceed similarly for t = a < 0 to obtain: o And it actually was derived by him!    2   a / 2 P ( Z a ) e , for all t a 0  He is Professor Emeritus of Applied Mathematics at z 1 MIT and of Statistics at Harvard University         2  Compare to: x / 2 P ( Z z ) 1 P ( Z z ) 1 e dx  o I do not know if he is a fan of Charlie Sheen 2   2

  3. Chernoff’s Poisson Pill Jensen’s Inequality • X is Poisson random variable: X ~ Poi( l ) • If f ( x ) is a convex function then E[ f ( x )] ≥ f (E[X]) l   t ( e 1 )  f ( x ) is convex if f ’’ ( x ) ≥ 0 for all x  Moment generating function: M ( t ) e X  Chernoff bounds for P(X ≥ i )  Intuition: Convex = “bowl”. E.g.: f ( x ) = x 2 , f ( x ) = e x   l t    l t    ( e 1 ) it ( e 1 ) it P ( X i ) e e e , for all t 0  To minimize bound, minimize: l ( e t – 1) – it o Differentiate w.r.t. t , and set to 0: l e t – i = 0  e t = i / l    i  l  i  l  i  if g ( x ) = - f ( x ) is convex, then f ( x ) is concave i e   l l    l   l l        ( i / 1 ) i for all P ( X i ) e e e e , i/ 1 l        Proof outline: Taylor series of f ( x ) about m . Be happy. i i l i  Note: E[ f ( x )] = f (E[X]) only holds when f (x) is a line    l  Compare to: P ( X i ) e i ! o That is when: f ’’ ( x ) = 0 for all x Johan Jensen A Brief Digression on Utility Theory • Utility U(x) is “value” you derive from x • Johan Ludwig William Valdemar Jensen (1859- 1925) was a Danish mathematician 0.5 $20,000 yes $0 0.5 Play? no $10,000 0.5 U($20,000) yes U($0)  He derived Jensen’s inequality 0.5 Play?  He was president of the Danish Mathematical Society no U($10,000) from 1892 to 1903  Can be monetary, but often includes intangibles  He has more names than Charlie Sheen o E.g., quality of life, life expectancy, personal beliefs, etc. Jensen’s Investment Advice Utility Curves • Example: risk-taking investor, with two choices:  Choice 1: Invest money to get return X where E[X] = m  Choice 2: Invest money to get return m (probability 1) Utility • Want to maximize utility: u (R), where R is return  if u (X) convex then E[ u (X)] ≥ u ( m ), so choice 1 better  If u (X) concave then E[ u (X)]  u ( m ) so choice 2 better  Convex u  “risk preferring”, concave u  “risk averse” Dollars • Utility curve determines your “risk preference”  Can be different in different parts of the curve  We’ll talk more about this near the end of the quarter 3

Recommend


More recommend