DATA SCIENCE AND THE BUSINESS OF MAJOR LEAGUE BASEBALL Matthew Horton Josh Hamilton Aaron Owen, PhD
DATA SCIENCE AT MLB THERE ARE 2 DISTINCT DATA SCIENCE GROUPS, FOCUSED ON DIFFERENT ASPECTS OF THE GAME B USI NESS / FAN FOCUSED 2
WHERE DOES DATA SCIENCE FIT INTO THE ORGANIZATION AND WHO USES OUR WORK? 3
FUTURE SCHEDULE EVALUATION CANDIDATE A CANDIDATE B CANDIDATE SCHEDULES ARE FOR 2 YEARS INTO FUTURE CANDIDATE C 4
FUTURE SCHEDULE EVALUATION LINEAR REGRESSION 7 SEASONS OF GAME DATA day of week month SINGLE-GAME opponent ATTENDANCE interleague/intraleague game time previous attendance MODEL previous revenue TRAINING summer vacation dates holidays weather variables SINGLE-GAME multi-year aggregates REVENUE feature interactions … LINEAR REGRESSION 5
FUTURE SCHEDULE EVALUATION CANDIDATE A CANDIDATE B TRAINED MODELS PREDICTED LEAGUE-WIDE ATTENDANCE: X PREDICTED LEAGUE-WIDE ATTENDANCE: X PREDICTED LEAGUE-WIDE REVENUE: $X PREDICTED LEAGUE-WIDE REVENUE: $X CANDIDATE C COMMISSIONER’S SCHEDULING COMMITTEE’S CHOICE PREDICTED LEAGUE-WIDE ATTENDANCE: X PREDICTED LEAGUE-WIDE REVENUE: $X 6
C A N D I DAT E C 7
8
SINGLE GAME TICKET DEMAND FORECASTING 9
SINGLE GAME TICKET DEMAND FORECASTING – MODEL 3 PREVIOUS SEASONS GRADIENT BOOSTED REGRESSOR OF GAME DATA number of days before game day of week STEP 1: month NUMBER OF opponent MODEL TRAINING TICKETS SOLD interleague/intraleague promo/no promo … CURRENT TICKET SALES TRAINED MODEL FOR ALL GAMES TICKET SALES STEP 2: GAME 29 PREDICTIONS DAYS BEFORE GAME 10
SINGLE GAME PROMOTION TICKET DEMAND OPTIMIZATION FORECASTING 11 11
PROMOTION SCHEDULE OPTIMIZATION - MODEL 3 PREVIOUS SEASONS GRADIENT BOOSTED REGRESSOR OF GAME DATA series start day/night STEP 1: REVENUE PREDICTION day of week month BASED ON CURRENT MODEL TRAINING opponent PROMOTION SCHEDULE promo type … PROMOTION SCHEDULE TRAINED MODEL 10,000 REVENUE PREDICTIONS RANDOMIZED x10,000 STEP 2: MONTE CARLO SIMULATION REVENUE 12
PROMOTION SCHEDULE OPTIMIZATION - RECOMMENDATION ORIGINAL PROMOTION SCHEDULE SIMULATIONS NO PROMOTIONS HYPOTHETICAL REVENUE 13
PROMOTION SCHEDULE OPTIMIZATION - RECOMMENDATION SIMULATIONS FIREWORKS ORIGINAL PROMOTION SCHEDULE SIMULATIONS SIMULATIONS SHIRT OR CAP NO PROMOTIONS SIMULATIONS BOBBLEHEAD HYPOTHETICAL REVENUE -OR- FIGURINE WORST 10% OF SIMULATED BEST 10% OF SIMULATED PROMOTION SCHEDULES PROMOTION SCHEDULES 14 14
15
TEAM AVIDITY METRIC Strong Mets Fan FAN SEGMENTATION Team Fan LIFETIME VALUE Ticketing LTV: $500 Shop LTV: $100 MLB.tv LTV: $100 Overall LTV: $700 PLAYER AVIDITY Jacob DeGrom Pete Alonso Mike Trout 17
TEAM AVIDITY – DEVELOPMENT 6 PREVIOUS YEARS OF FAN DATA email opt-ins EXPLICIT SIGNALS ballpark app … TeamAvidity fan, team = ENGAGEMENT MLB.TV streams ticket scans ExplicitSignals x W ES + Engagement x W E + Spend x W S SHARE OF … FAN’S SPEND shop purchases ticket purchases DATA SOURCE FEATURES SCORE AND RANK STANDARDIZE AND SEGMENT Weak Moderate Strong Fan ID ARI ATL BAL BOS CHC CIN CLE COL CWS DET HOU KC LAA LAD MIA MIL MIN NYM NYY OAK PHI PIT SD SEA SF STL TB TEX TOR WAS High Fav 0.00 0.00 0.12 0.16 0.00 0.00 0.05 0.00 0.06 0.04 0.00 0.04 0.04 0.00 0.00 0.00 0.05 0.03 0.19 0.07 0.05 0.00 0.00 0.04 0.00 0.00 0.09 0.09 1.00 0.02 1.00 TOR 0.00 0.07 0.00 0.00 0.94 0.06 0.06 0.09 0.05 0.00 0.00 0.00 0.00 0.00 0.10 0.16 0.00 0.05 0.00 0.00 0.05 0.13 0.00 0.00 0.03 0.06 0.00 0.00 0.00 0.00 0.94 CHC 0.00 0.04 0.15 0.86 0.02 0.00 0.00 0.00 0.05 0.04 0.03 0.03 0.04 0.00 0.04 0.00 0.00 0.00 0.09 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.19 0.06 0.12 0.00 0.86 BOS 0.34 0.43 0.22 0.35 0.31 0.29 0.27 0.40 0.18 0.14 0.29 0.21 0.35 0.31 0.28 0.31 0.28 0.81 0.31 0.25 0.31 0.19 0.33 0.16 0.24 0.29 0.10 0.21 0.74 0.42 0.81 NYM 0.06 0.21 0.33 0.72 0.11 0.12 0.12 0.06 0.12 0.17 0.03 0.19 0.17 0.09 0.15 0.06 0.13 0.34 0.81 0.19 0.14 0.06 0.03 0.07 0.07 0.11 0.21 0.00 0.28 0.21 0.81 NYY -3 1 2 3 -2 -1 0 0.00 0.00 0.09 0.13 0.02 0.00 0.05 0.00 0.00 0.04 0.01 0.04 0.07 0.00 0.04 0.00 0.06 0.03 0.81 0.06 0.00 0.01 0.00 0.00 0.00 0.02 0.03 0.00 0.18 0.06 0.81 NYY STANDARD DEVIATION 18
TEAM AVIDITY – USE CASES HOME FIELD ADVANTAGE IDENTIFY AND TARGET OUT-OF-MARKET FANS Opp. 97% 96% 95% 95% 94% 93% 93% 92% 92% 92% 91% 88% 86% 85% 85% 84% 80% 78% 0% 20% 40% 60% 80% 100% Fans Opponent Fans OFTEN MOST PREDICTIVE FEATURE IN MODELS 19
FAN SEGMENTATION – MODEL NETWORK ANALYSIS 3 PREVIOUS YEARS OF FAN DATA Degree centrality, MLB.TV participating teams Attended Games per game Clustering coefficient FEATURES DATA SOURCE ALL FANS ROOKIES TEAM FANS VETERANS Casual or New Mostly Interested in a Single Team Interested in Many Teams Fan 20
FAN SEGMENTATION – USE CASES ROOKIES TEAM FANS VETERANS 21
FAN LIFETIME VALUE (LTV) – MODEL 3 PREVIOUS YEARS OF FAN DATA REPURCHASE Model Business Lines Features (Gradient Boosted Classifier) ticket spend total num tickets unused tickets Probability of ticket resell ROI Repurchase … [0, 1] shop spend Predicted shop returns X shop unique products LTV … [0, ∞ ) MLB.tv total mins watched Predicted MLB.tv subscriber type Potential Spend MLB.tv num cancels MLB.TV num year subscriber … Potential Spend Model* (Gradient Boosted Regressor) 22 *only trained on fans that went on to spend again
FAN LIFETIME VALUE (LTV) – USE CASES LTV SEGMENTATION High [0, 1] X Ticketing LTV [0, ∞ ) POTENTIAL SPEND [0, 1] TOTAL EACH X Shop LTV LTV MLB FAN [0, ∞ ) [0, 1] X MLB.tv LTV [0, ∞ ) Low Low High PROBABILITY OF REPURCHASE EVALUATING MARKETING/ADVERTISING CAMPAIGN EFFICACY RESPONSE A / B T E S T I N G 23
PLAYER AVIDITY – MODEL CURRENT FAN + LATENT VARIABLE MODEL PLAYER DATA EXPECTATION-MAXIMIZATION ALGORITHM PREDICTED OBSERVED All-Star All-Star Votes Votes fan’s team avidity fan’s location FAN-PLAYER Shop Shop player’s MLB popularity AVIDITY Sales Sales player’s team popularity player’s performance Website Website Views Views 24
PLAYER AVIDITY – USE CASES IMPACT OF ROSTER CHANGES CUSTOMIZED FAN CONTENT ON A CLUB’S FANBASE 25
TEAM AVIDITY METRIC TICKET PACKAGE RENEWAL LEAD SCORE Top 20% of fans likely to renew Strong Mets Fan FAN SEGMENTATION SEASON TICKET HOLDER RISK Team Fan Not a season ticket holder LIFETIME VALUE MLB.TV ENGAGEMENT CAMPAIGN Ticketing LTV: $500 Moderately engaged Shop LTV: $100 Received email last week MLB.tv LTV: $100 Overall LTV: $700 PLAYER AVIDITY TARGETED TICKET GUIDE Jacob DeGrom Received notification about CLE vs. Pete Alonso NYM series Mike Trout 26
OTHER FAN-BASED MODELING Model Model Model 10 9 8 7 6 5 4 3 2 1 Fan ID Vars Prob. Decile 1 LEAD SCORING 1 2 0 1 Probability of Upgrading or Renewing 3 Model Model Risk Segment Low Moderate High Fan ID Vars Prob. Low SEASON TICKET High HOLDER RISK SCORE Mod. 0 1 Low Probability of NOT Renewing Model Engagement Recommended Unengaged Moderately Highly Engaged Engaged Fan ID Vars Segment Game MLB.TV Un NYM vs. ATL ENGAGEMENT High CHC vs. STL Mod. NYY vs. BOS 0 7 High SF vs. LAD Predicted Number of Days with a View Ticket Buying Recommended Fan ID History Vars Series TARGETED MIL vs. CHC TICKET GUIDE HOU vs. TEX SEA vs. OAK MIA vs. TB 27
QUICK SUMMARY • OVERVIEW OF DATA SCIENCE AT MLB • METRIC INTRODUCTIONS • GAME • FAN BETTER SERVE THE 30 CLUBS & OUR MILLIONS OF FANS 28
Rate today ’s session O’Reilly Events App Session page on conference website
QUESTIONS? • DataScience@mlb.com • OPEN POSITIONS: www.mlb.com/jobs • TECH BLOG: https://technology.mlblogs.com/ 30
Recommend
More recommend