data science the end of statistics
play

Data Science: The End of Statistics? Larry Wasserman Carnegie - PowerPoint PPT Presentation

Data Science: The End of Statistics? Larry Wasserman Carnegie Mellon University Interface 2015 Conclusion Conclusion Lets turn the Interface meeting into the statistics version of NIPS This Talk This Talk Will be short This Talk Will


  1. Data Science: The End of Statistics? Larry Wasserman Carnegie Mellon University Interface 2015

  2. Conclusion

  3. Conclusion Let’s turn the Interface meeting into the statistics version of NIPS

  4. This Talk

  5. This Talk Will be short

  6. This Talk Will be short Will be annoying provocative

  7. Main Points

  8. Main Points • Statisticians are being left out

  9. Main Points • Statisticians are being left out • This should worry everyone (not just statisticians)

  10. Main Points • Statisticians are being left out • This should worry everyone (not just statisticians) • It’s (partly) our fault

  11. Main Points • Statisticians are being left out • This should worry everyone (not just statisticians) • It’s (partly) our fault • We need a culture shift: 1. modernize training (no more UMVUE’s) 2. embrace the CS conference culture 3. watch and learn from CS: active learning, deep learning, SVM, online learning, RKHS, differential privacy ...

  12. Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ...

  13. Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians!

  14. Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy.

  15. Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician.

  16. Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician. • Forbes: World’s 7 Most Powerful Data Scientists

  17. Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician. • Forbes: World’s 7 Most Powerful Data Scientists 0 statisticians.

  18. Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician. • Forbes: World’s 7 Most Powerful Data Scientists 0 statisticians. • Startups?

  19. Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician. • Forbes: World’s 7 Most Powerful Data Scientists 0 statisticians. • Startups? • Google, Microsoft, Facebook all have Chief Economists. Chief Statisticians?

  20. Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions

  21. Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions • Gary King: Big data is not about the data, it’s about the analytics.

  22. Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions • Gary King: Big data is not about the data, it’s about the analytics. • Google search: big data bad analytics = 10,700,000 hits

  23. Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions • Gary King: Big data is not about the data, it’s about the analytics. • Google search: big data bad analytics = 10,700,000 hits • Statisticians have been doing data science for at least 100 years.

  24. Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions • Gary King: Big data is not about the data, it’s about the analytics. • Google search: big data bad analytics = 10,700,000 hits • Statisticians have been doing data science for at least 100 years. • You would not get brain surgery done by a cardiologist.

  25. Why Are Statisticians Left Out? Statisticians are:

  26. Why Are Statisticians Left Out? Statisticians are: conservative

  27. Why Are Statisticians Left Out? Statisticians are: conservative stubborn

  28. Why Are Statisticians Left Out? Statisticians are: conservative stubborn inflexible

  29. Why Are Statisticians Left Out? Statisticians are: conservative stubborn inflexible bad at selling themselves

  30. Why Are Statisticians Left Out? Statisticians are: conservative stubborn inflexible bad at selling themselves afraid

  31. Why Are Statisticians Left Out? Statisticians are: conservative stubborn inflexible bad at selling themselves afraid experts at saying what you can’t do

  32. A (mostly) True Story • Astronomer asks us for help.

  33. A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data.

  34. A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year.

  35. A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime...

  36. A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime... ... my astronomer friend went to see my friends in ML.

  37. A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime... ... my astronomer friend went to see my friends in ML. • Two days later the ML people produced fancy plots, analyses etc.

  38. A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime... ... my astronomer friend went to see my friends in ML. • Two days later the ML people produced fancy plots, analyses etc. • We complain that their analysis was not rigorous.

  39. A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime... ... my astronomer friend went to see my friends in ML. • Two days later the ML people produced fancy plots, analyses etc. • We complain that their analysis was not rigorous. • Who will the astronomer go to in the future?

  40. Anecdote: My One Week as Editor of JASA

  41. Anecdote: My One Week as Editor of JASA I was hired as editor of JASA.

  42. Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online.

  43. Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired.

  44. Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis.

  45. Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis. JASA is still behind a paywall.

  46. Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis. JASA is still behind a paywall. Compare this to JMLR (Journal of Machine Learning Research) jmlr.org. or NIPS (nips.cc) or ICML (imcl.cc) etc.

  47. What to Do?

  48. What to Do? • Change “Department of Statistics” to “Department of Statistics and Data Science”

  49. What to Do? • Change “Department of Statistics” to “Department of Statistics and Data Science” • Mostly, we need a cultural shift: training, conferences, topics.

  50. Training

  51. Training • Get rid of: MVUE, ancillarity, completeness, ...

  52. Training • Get rid of: MVUE, ancillarity, completeness, ... • Get rid of assumptions: (more on this is in a minute)

  53. Training • Get rid of: MVUE, ancillarity, completeness, ... • Get rid of assumptions: (more on this is in a minute) • Add: VC dimension support vector machines online learning, bandits deep learning optimization coding (not just R) cloud computing basic software engineering (github etc)

  54. Assumptions are For Suckers

  55. Assumptions are For Suckers • model-based, assumption-laden methods are useless in the world of big, complex, datasets

Recommend


More recommend