Data Science: The End of Statistics? Larry Wasserman Carnegie Mellon University Interface 2015
Conclusion
Conclusion Let’s turn the Interface meeting into the statistics version of NIPS
This Talk
This Talk Will be short
This Talk Will be short Will be annoying provocative
Main Points
Main Points • Statisticians are being left out
Main Points • Statisticians are being left out • This should worry everyone (not just statisticians)
Main Points • Statisticians are being left out • This should worry everyone (not just statisticians) • It’s (partly) our fault
Main Points • Statisticians are being left out • This should worry everyone (not just statisticians) • It’s (partly) our fault • We need a culture shift: 1. modernize training (no more UMVUE’s) 2. embrace the CS conference culture 3. watch and learn from CS: active learning, deep learning, SVM, online learning, RKHS, differential privacy ...
Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ...
Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians!
Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy.
Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician.
Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician. • Forbes: World’s 7 Most Powerful Data Scientists
Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician. • Forbes: World’s 7 Most Powerful Data Scientists 0 statisticians.
Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician. • Forbes: World’s 7 Most Powerful Data Scientists 0 statisticians. • Startups?
Where are the Statisticians? • President’s Council of Advisors on Science and Technology (PCAST) includes ... 0 statisticians! • Chief Data Scientist of the United States Office of Science and Technology Policy. Not a statistician. • Forbes: World’s 7 Most Powerful Data Scientists 0 statisticians. • Startups? • Google, Microsoft, Facebook all have Chief Economists. Chief Statisticians?
Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions
Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions • Gary King: Big data is not about the data, it’s about the analytics.
Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions • Gary King: Big data is not about the data, it’s about the analytics. • Google search: big data bad analytics = 10,700,000 hits
Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions • Gary King: Big data is not about the data, it’s about the analytics. • Google search: big data bad analytics = 10,700,000 hits • Statisticians have been doing data science for at least 100 years.
Everyone Should Care (Not Just Statisticians) • Big Data + Bad Analysis = Bad Decisions • Gary King: Big data is not about the data, it’s about the analytics. • Google search: big data bad analytics = 10,700,000 hits • Statisticians have been doing data science for at least 100 years. • You would not get brain surgery done by a cardiologist.
Why Are Statisticians Left Out? Statisticians are:
Why Are Statisticians Left Out? Statisticians are: conservative
Why Are Statisticians Left Out? Statisticians are: conservative stubborn
Why Are Statisticians Left Out? Statisticians are: conservative stubborn inflexible
Why Are Statisticians Left Out? Statisticians are: conservative stubborn inflexible bad at selling themselves
Why Are Statisticians Left Out? Statisticians are: conservative stubborn inflexible bad at selling themselves afraid
Why Are Statisticians Left Out? Statisticians are: conservative stubborn inflexible bad at selling themselves afraid experts at saying what you can’t do
A (mostly) True Story • Astronomer asks us for help.
A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data.
A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year.
A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime...
A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime... ... my astronomer friend went to see my friends in ML.
A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime... ... my astronomer friend went to see my friends in ML. • Two days later the ML people produced fancy plots, analyses etc.
A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime... ... my astronomer friend went to see my friends in ML. • Two days later the ML people produced fancy plots, analyses etc. • We complain that their analysis was not rigorous.
A (mostly) True Story • Astronomer asks us for help. • We spend months learning the science, cleaning the data and carefully analyzing the data. • Some careful, modest results after one year. • In the meantime... ... my astronomer friend went to see my friends in ML. • Two days later the ML people produced fancy plots, analyses etc. • We complain that their analysis was not rigorous. • Who will the astronomer go to in the future?
Anecdote: My One Week as Editor of JASA
Anecdote: My One Week as Editor of JASA I was hired as editor of JASA.
Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online.
Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired.
Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis.
Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis. JASA is still behind a paywall.
Anecdote: My One Week as Editor of JASA I was hired as editor of JASA. I insisted that the journal be made freely available, online. I was fired. ASA sold the rights to the journal to Taylor and Francis. JASA is still behind a paywall. Compare this to JMLR (Journal of Machine Learning Research) jmlr.org. or NIPS (nips.cc) or ICML (imcl.cc) etc.
What to Do?
What to Do? • Change “Department of Statistics” to “Department of Statistics and Data Science”
What to Do? • Change “Department of Statistics” to “Department of Statistics and Data Science” • Mostly, we need a cultural shift: training, conferences, topics.
Training
Training • Get rid of: MVUE, ancillarity, completeness, ...
Training • Get rid of: MVUE, ancillarity, completeness, ... • Get rid of assumptions: (more on this is in a minute)
Training • Get rid of: MVUE, ancillarity, completeness, ... • Get rid of assumptions: (more on this is in a minute) • Add: VC dimension support vector machines online learning, bandits deep learning optimization coding (not just R) cloud computing basic software engineering (github etc)
Assumptions are For Suckers
Assumptions are For Suckers • model-based, assumption-laden methods are useless in the world of big, complex, datasets
Recommend
More recommend