20/10/2011 Predictive modeling competitions making data science a sport Anthony Goldbloom CEO, Kaggle e-mail anthony.goldbloom@kaggle.com twitter @antgoldbloom 1. Motivation 2. Does it Work? 3. Why it Works 4. How it Works 5. Case Studies 1
20/10/2011 Crowdsourcing Mismatch between those with data and those with the skills to analyse it 1. Motivation 2. Does it Work? 3. Why it Works 4. How it Works 5. Case Studies 2
20/10/2011 Tourism Forecasting Competition Forecast Error (MASE) Existing model Aug 9 2 weeks 1 month Competition later later End dunnhumby Shopping Challenge 20 19 18 17 % Correctly Predicted Visits 16 15 14 13 12 11 10 9 1 2 3 4 5 6 7 8 9 10 11 Competition Progress (Weeks) 3
20/10/2011 1. Motivation 2. Does it Work? 3. Why it Works 4. How it Works 5. Case Studies 4
20/10/2011 Kaggle’s Dark Matter Competition on the White House blog “The world’s brightest physicists have been working for decades on solving one of the great unifying problems of our universe” “In less than a week, Martin O’Leary, a PhD student in glaciology, outperformed the state-of-the-art algorithms” User base: ~16,000 registered data scientists 5
20/10/2011 Our User Base Users apply different techniques • neural networks • genetic algorithms • logistic regression • random forest • support vector machine • Monte Carlo methods • decision trees • principal component analysis • ensemble methods • Kalman filter • adaBoost • evolutionary fuzzy modeling • Bayesian networks 6
20/10/2011 Not MIT, not SAS … UoL? Additional slides 7
20/10/2011 1. Motivation 2. Does it Work? 3. Why it Works 4. How it Works 5. Case Studies 1 2 3 Upload Submit Evaluate & Exchange 8
20/10/2011 Use the wizard to post a competition Participants make their entries 9
20/10/2011 Competitions are judged based on predictive accuracy Competition Mechanics Competitions are judged on objective criteria 10
20/10/2011 1. Motivation 2. Does it Work? 3. Why it Works 4. How it Works 5. Case Studies 11
20/10/2011 Benchmarking 12
20/10/2011 Untouched problems 13
20/10/2011 2011 $3 million prize ~25% Outcomes of a competition to predict Successful the success of grant applications: grant applications - Better identify likely successes to avoid wasting resources on hopeless applications - Identify and communicate the characteristics of a successful application to future applicants 14
20/10/2011 Who to hire? Branding: “we do analytics” 15
20/10/2011 What could the world’s best analysts find in your data? e-mail anthony.goldbloom@kaggle.com phone +1 650 283 9781 Photo by gidzy, www.flickr.com/photos/gidzy 16
Recommend
More recommend