1. Motivation 2. Does it Work? 3. Why it Works 4. How it Works 5. - - PDF document

1 motivation 2 does it work 3 why it works 4 how it works
SMART_READER_LITE
LIVE PREVIEW

1. Motivation 2. Does it Work? 3. Why it Works 4. How it Works 5. - - PDF document

20/10/2011 Predictive modeling competitions making data science a sport Anthony Goldbloom CEO, Kaggle e-mail anthony.goldbloom@kaggle.com twitter @antgoldbloom 1. Motivation 2. Does it Work? 3. Why it Works 4. How it Works 5. Case Studies


slide-1
SLIDE 1

20/10/2011 1

Anthony Goldbloom CEO, Kaggle

e-mail anthony.goldbloom@kaggle.com twitter @antgoldbloom

Predictive modeling competitions

making data science a sport

  • 1. Motivation
  • 2. Does it Work?
  • 3. Why it Works
  • 4. How it Works
  • 5. Case Studies
slide-2
SLIDE 2

20/10/2011 2

Mismatch between those with data and those with the skills to analyse it

Crowdsourcing

  • 1. Motivation
  • 2. Does it Work?
  • 3. Why it Works
  • 4. How it Works
  • 5. Case Studies
slide-3
SLIDE 3

20/10/2011 3

Forecast Error (MASE)

Existing model

Tourism Forecasting Competition

Aug 9 2 weeks later 1 month later Competition End

dunnhumby Shopping Challenge

9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11

% Correctly Predicted Visits Competition Progress (Weeks)

slide-4
SLIDE 4

20/10/2011 4

  • 1. Motivation
  • 2. Does it Work?
  • 3. Why it Works
  • 4. How it Works
  • 5. Case Studies
slide-5
SLIDE 5

20/10/2011 5

“In less than a week, Martin O’Leary, a PhD student in glaciology, outperformed the state-of-the-art algorithms” “The world’s brightest physicists have been working for decades on solving one of the great unifying problems of our universe” Kaggle’s Dark Matter Competition

  • n the White House blog

User base: ~16,000 registered data scientists

slide-6
SLIDE 6

20/10/2011 6

Our User Base

  • neural networks
  • logistic regression
  • support vector machine
  • decision trees
  • ensemble methods
  • adaBoost
  • Bayesian networks
  • genetic algorithms
  • random forest
  • Monte Carlo methods
  • principal component analysis
  • Kalman filter
  • evolutionary fuzzy modeling

Users apply different techniques

slide-7
SLIDE 7

20/10/2011 7

Additional slides

Not MIT, not SAS … UoL?

slide-8
SLIDE 8

20/10/2011 8

  • 1. Motivation
  • 2. Does it Work?
  • 3. Why it Works
  • 4. How it Works
  • 5. Case Studies

1 2 3

Upload Submit Evaluate & Exchange

slide-9
SLIDE 9

20/10/2011 9

Use the wizard to post a competition Participants make their entries

slide-10
SLIDE 10

20/10/2011 10

Competitions are judged based on predictive accuracy

Competition Mechanics

Competitions are judged on objective criteria

slide-11
SLIDE 11

20/10/2011 11

  • 1. Motivation
  • 2. Does it Work?
  • 3. Why it Works
  • 4. How it Works
  • 5. Case Studies
slide-12
SLIDE 12

20/10/2011 12

Benchmarking

slide-13
SLIDE 13

20/10/2011 13

Untouched problems

slide-14
SLIDE 14

20/10/2011 14

2011 $3 million prize

Successful grant applications

Outcomes of a competition to predict the success of grant applications:

  • Better identify likely successes to

avoid wasting resources on hopeless applications

  • Identify and communicate the

characteristics of a successful application to future applicants

~25%

slide-15
SLIDE 15

20/10/2011 15

Who to hire? Branding: “we do analytics”

slide-16
SLIDE 16

20/10/2011 16

Photo by gidzy, www.flickr.com/photos/gidzy

What could the world’s best analysts find in your data?

e-mail anthony.goldbloom@kaggle.com phone +1 650 283 9781