Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball Arthur Berg Pennsylvania State University
Introduction Bayes Estimation Empirical Bayes Basketball Arthur Berg Standing Between a Bayesian and a Frequentist 2 / 28
Introduction Bayes Estimation Empirical Bayes Basketball Bayesian and Frequentist Representatives Sir Ronald Fisher FRS (1890-1962) Rev. Thomas Bayes FRS (1702-1761) English Statistician English Mathematician Evolutionary Biologist, Geneticist Presbyterian Minister P ( H ∣ E ) = P ( E ∣ H ) P ( H ) —Let the data speak for itself.— P ( E ) Arthur Berg Standing Between a Bayesian and a Frequentist 3 / 28
Introduction Bayes Estimation Empirical Bayes Basketball Bayes Estimator as a Convex Combination 1 st Goal: List the top 250 movies of all time. Movies are rated on a scale of 1 to 10. Some movies are rated by many people, and some by only a few. Movies with fewer than 3000 votes are not considered. All movies have an average rating of C = 6 . 9 . ⋆ µ i represents the mean rating by everyone who has seen movie i . ⋆ The real goal is to construct the best estimate of µ i , then pick the top 250. The frequentist approach uses only ¯ X i , the average rating for movie i . µ (Fisher) = ¯ ˆ X i i The Bayesian approach shrinks ¯ X i towards C with more shrinking applied when the number of votes for movie i is small. µ (Bayes) = α i ¯ X i + ( 1 − α i ) C where α i ∈ ( 0 , 1 ) ˆ i Arthur Berg Standing Between a Bayesian and a Frequentist 4 / 28
Introduction Bayes Estimation Empirical Bayes Basketball Internet Movie Database—Top 250 Rank WR R Title Votes 1 9.2 9.2 The Shawshank Redemption (1994) 546,155 2 9.1 9.2 The Godfather (1972) 427,961 3 9.0 9.0 The Godfather: Part II (1974) 257,643 4 8.9 9.0 The Good, the Bad and the Ugly (1966) 170,045 5 8.9 9.0 Pulp Fiction (1994) 436,456 6 8.9 8.9 Inception (2010) 265,531 7 8.9 8.9 Schindler’s List (1993) 289,170 8 8.9 8.9 12 Angry Men (1957) 126,983 9 8.8 8.9 One Flew Over the Cuckoo’s Nest (1975) 225,419 10 8.8 8.9 The Dark Knight (2008) 487,800 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 85 8.5 8.7 Black Swan (2010) 20,326 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 142 8.2 8.3 Avatar (2009) 285,005 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 240 8.0 8.5 True Grit (2010) 6,444 Arthur Berg Standing Between a Bayesian and a Frequentist 5 / 28
Introduction Bayes Estimation Empirical Bayes Basketball IMDb Weighted Ranking—“a true Bayesian estimate” WR i = v i R i + mC v i m = + R i C v i + m v i + m v i + m � �ÜÜÜÜÜÜ�ÜÜÜÜÜÜÜ� �ÜÜÜÜÜÜ�ÜÜÜÜÜÜÜ� ¯ X i α i 1 − α i ▸ R i = average rating of the movie i ( ¯ X i ) ▸ v i = total number of votes from regular voters ▸ m = minimum # of votes to make the list = 3000 ▸ C = grand mean across all movies in the database = 6.9 Arthur Berg Standing Between a Bayesian and a Frequentist 6 / 28
Introduction Bayes Estimation Empirical Bayes Basketball A Bayesian Calculation X i = ( X i, 1 ,...,X i,v i ) represents the v i ratings of movie i . prior: µ i ∼ N( µ 0 ,σ 2 0 ) iid conditional: X i,j ∣ µ i ∼ N( µ i ,σ 2 ) ( j = 1 ,...,v i ) = E [ µ i ∣ X i ] (Bayes) ˆ µ i σ 2 / σ 2 = ( ) ¯ X i + ( ) µ 0 v i 0 v i + σ 2 / σ 2 v i + σ 2 / σ 2 0 0 ⇒ µ 0 = C, m = σ 2 / σ 2 v i m = v i + mR i + v i + mC 0 Arthur Berg Standing Between a Bayesian and a Frequentist 7 / 28
1 ¿Does shrinking really help? 2 ¿How much to shrink by? n ( µ i − ˆ Prediction Error = µ i ) 2 ∑ i = 1
Introduction Bayes Estimation Empirical Bayes Basketball Standing Between a Bayesian and a Frequentist ▸ In 1956, Charles Stein proved the existence of an estimator better than the sample mean under certain assumptions . ▸ In 1961, Willard James and Charles Stein explicitly constructed such an estimator. Arthur Berg Standing Between a Bayesian and a Frequentist 9 / 28
Introduction Bayes Estimation Empirical Bayes Basketball The James-Stein Estimator ( n ≥ 4) iid µ i ∼ N( µ 0 ,σ 2 0 ) X i ∣ µ i ∼ N( µ i ,σ 2 ) ( i = 1 ,...n ) = E [ µ i ∣ X i ] = ( σ 2 ) µ 0 + ( σ 2 ) X i (Bayes) 0 µ ˆ i 0 + σ 2 0 + σ 2 σ 2 σ 2 �ÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜ� �ÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜ� α 1 − α ( n − 3 ) σ 2 ( n − 3 ) σ 2 = ( ) ¯ X + ( 1 − ) X i (JS) ˆ ∑ ( X i − ¯ X ) 2 ∑ ( X i − ¯ X ) 2 µ i �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� α 1 − α In practice, if σ 2 is unknown, an estimate is used. Arthur Berg Standing Between a Bayesian and a Frequentist 10 / 28
Introduction Bayes Estimation Empirical Bayes Basketball Predicting Batting Averages 2 nd Goal: Predict final batting averages from pre-season performances. Pre-season batting averages for 18 major league players are provided. Season final batting averages for the same players are also recorded. Data is from the 1970 season and is published in JASA (1975) and Scientific American (1977) by Efron and Morris. The frequentist approach uses only X i , the pre-season batting average for player i . p (Fisher) = X i ˆ i The Emperical Bayes approach shrinks X i towards ¯ X by some empirically determined amount. p (Stein) α ) ¯ = ˆ αX i + ( 1 − ˆ α ∈ ( 0 , 1 ) ˆ X where ˆ i Arthur Berg Standing Between a Bayesian and a Frequentist 11 / 28
Introduction Bayes Estimation Empirical Bayes Basketball Name hits/AB pre-season ( ˆ µ (ML) ) season final ( µ ) 1 Clemente 18/45 0.400 0.346 2 Robinson 17/45 0.378 0.298 3 Howard 16/45 0.356 0.276 4 Johnstone 15/45 0.333 0.222 5 Berry 14/45 0.311 0.273 6 Spencer 14/45 0.311 0.270 7 Kessinger 13/45 0.289 0.263 8 Alvarado 12/45 0.267 0.210 9 Santo 11/45 0.244 0.269 10 Swoboda 11/45 0.244 0.230 11 Unser 10/45 0.222 0.264 12 Williams 10/45 0.222 0.256 13 Scott 10/45 0.222 0.303 14 Petrocelli 10/45 0.222 0.264 15 Rodriguez 10/45 0.222 0.226 16 Campaneris 9/45 0.200 0.286 17 Munson 8/45 0.178 0.316 18 Alvis 7/45 0.156 0.200 Arthur Berg Standing Between a Bayesian and a Frequentist 12 / 28
Introduction Bayes Estimation Empirical Bayes Basketball Batting Average Dataset 0.4 pre − season season final 0.3 Batting Average 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Arthur Berg Standing Between a Bayesian and a Frequentist 13 / 28
Introduction Bayes Estimation Empirical Bayes Basketball James-Stein Estimation of Batting Averages 0.4 pre − season season final 0.3 − − − − − − − − − − − − − − − − − − Batting Average 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Arthur Berg Standing Between a Bayesian and a Frequentist 14 / 28
Introduction Bayes Estimation Empirical Bayes Basketball Ranking Bias—Emperical Bayes + Order Statistics 0.4 pre − season season final ▸ Genome-wide association studies 0.3 Batting Average ▸ SNPS: AA/Aa/aa or 0/1/2 0.2 ( ∼ 10 7 ) 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ▸ ranking bias estimator — part frequentist, part Bayesian with robust properties ▸ Applied to 2 GWAS studies with ▸ Estimated effects of the top SNPs 2,000 cases and 3,000 controls are biased up. (winner’s curse) Crohn’s Disease Type 1 Diabetes Arthur Berg Standing Between a Bayesian and a Frequentist 15 / 28
Introduction Bayes Estimation Empirical Bayes Basketball 49ers Statistics—http://www.longbeachstate.com/ Arthur Berg Standing Between a Bayesian and a Frequentist 16 / 28
Introduction Bayes Estimation Empirical Bayes Basketball Opponents Over 3 Seasons — 08-09, 09-10, 10-11 iowa 1 opponent # syracuse 1 kentucky 1 alaska anchorage 1 temple 1 loyola marymount 2 texas 1 arizona state 1 montana 1 boise state 1 uc davis 6 montana state 1 uc irvine 6 byu cougars 1 new mexico state 1 byu hawaii 1 uc riverside 6 north carolina 1 cal poly 7 uc santa barbara 7 notre dame 1 cal state fullerton 6 ucla 1 oregon 1 cal state northridge 6 univ. san francisco 1 pacific 8 utah state 2 clemson 2 pepperdine 2 cs monterey bay 1 washington 1 saint mary’s 1 weber state 2 duke 1 saint peter’s 1 green bay 2 west virginia 1 san diego state 1 wisconsin 1 idaho 1 san francisco state 1 idaho state 1 Arthur Berg Standing Between a Bayesian and a Frequentist 17 / 28
Introduction Bayes Estimation Empirical Bayes Basketball Winning Percentages All Games Conference Games All 3 Seasons (93) 56% All 3 Seasons 67% 08-09 Season (30) 50% 08-09 Season 63% 09-10 Season (33) 52% 09-10 Season 50% 10-11 Season (30) 67% 10-11 Season 88% Arthur Berg Standing Between a Bayesian and a Frequentist 18 / 28
Recommend
More recommend