What is Data Science? January 23, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter
Your Phenomenal Staff! Karlly Feng Shunjia Zhu Diane Sol Josh Mutako Zitter Levin Shash Sinha Maggie Wu Neil Sehgal Huay JP Natalie Ben Jonathan Champa Delworth Gershuny Weisskoff Nazem Aldroubi Will Glaser Sunny Deng Ben Vu Mounika Dandu Marcin Arvind Kolaszewski Juho Choi Yalavarti Nam Do Minna Kimura-
Waitlist • If you are not registered, make sure you are on the waitlist ( link is on course webpage ) • We have a *little* wiggle room in the enrollment cap • We will prioritize fairly (i.e. graduating and need this to graduate > graduating > not graduating…)
What is Data Science?
Moneyball! https://en.wikipedia.org/wiki/Moneyball
Obama Campaign http://crowdsourcing-class.org/slides/ab-testing.pdf
Google’s “40 Shades of Blue” Why Google has 200m reasons to put engineers over designers. The Gaurdian. The Origin of A/B Testing. Nicolai Kramer Jakobsen.
Data Science = Magic
Data Science!
The Scientific Method https://en.wikipedia.org/wiki/Scientific_method
The Scientific Method
The Scientific Method Data Analytics, Visualization, Presentation
The Scientific Method Data Analytics, Visualization, Presentation Machine Learning, Forecasting, Modeling
The Scientific Method Data Analytics, Visualization, Data Collection, Presentation Sampling, Cleaning and Processing Machine Learning, Forecasting, Modeling
The Scientific Method 👎 👎 👎 👎
The Scientific Method 👎 👎 👎 👎
What is Data Science?
What is Data Science?
Data “Science”
Data “Science” https://www.dailydot.com/unclick/state-googled-2017 http://nerdgeeks.co/us-state-words-map
Data “Science” Natalie Delworth https://www.dailydot.com/unclick/state-googled-2017 http://nerdgeeks.co/us-state-words-map
Data “Science” So many maps! https://xkcd.com/1845/
Data “Science” • To be fair… • Intuition plays a huge role in the scientific method (“make observations” is Step 1). • Exploratory analysis is necessary, its okay to not be all rigor all the time • But! • Exploratory analysis (even when it involves the biggest of data) is meant to *form* a hypothesis, not test one • Good experimental design and rigorous statistics are essential if we want to make claims about how the world works
Data “Science” • To be fair… • Intuition plays a huge role in the scientific method (“make observations” is Step 1). • Exploratory analysis is necessary, its okay to not be all rigor all the time • But! • Exploratory analysis (even when it involves the biggest of data) is meant to *form* a hypothesis, not test one • Good experimental design and rigorous statistics are essential if we want to make claims about how the world works
Data “Science” • To be fair… • Intuition plays a huge role in the scientific method (“make observations” is Step 1). • Exploratory analysis is necessary, its okay to not be all rigor all the time • But! • Exploratory analysis (even when it involves the biggest of data) is meant to *form* a hypothesis, not test one • Good experimental design and rigorous statistics are essential if we want to make claims about how the world works
Data “Science” “Eyeballing it” 13-18 23-29 19-22 30-65 Facebook posts by age group Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. Schwartz et al. (2013).
Data “Science” “Eyeballing it” Frequent topics observed in 17,000 Science articles Probabilistic Topic Models. Blei (2012).
Data “Science” “Eyeballing it” https://devopedia.org/word-embedding
Data “Science” • To be fair… • Intuition plays a huge role in the scientific method (“make observations” is Step 1). • Exploratory analysis is necessary, its okay to not be all rigor all the time • But! • Exploratory analysis (even when it involves the biggest of data) is meant to *form* a hypothesis, not test one • Good experimental design and rigorous statistics are essential if we want to make claims about how the world works
Data “Science” • To be fair… • Intuition plays a huge role in the scientific method (“make observations” is Step 1). • Exploratory analysis is necessary, its okay to not be all rigor all the time • But! • Exploratory analysis (even when it involves the biggest of data) is meant to *form* a hypothesis, not test one • Good experimental design and rigorous statistics are essential if we want to make claims about how the world works
Data “Science” • To be fair… • Intuition plays a huge role in the scientific method (“make observations” is Step 1). • Exploratory analysis is necessary, its okay to not be all rigor all the time • But! • Exploratory analysis (even when it involves the biggest of data) is meant to *form* a hypothesis, not test one • Good experimental design and rigorous statistics are essential if we want to make claims about how the world works
Data “Science” Per capita cheese consumption correlates with Number of people who died by becoming tangled in their bedsheets 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 800 deaths 33lbs Bedsheet tanglings Cheese consumed 600 deaths 31.5lbs 400 deaths 30lbs 28.5lbs 200 deaths 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ρ = 0.95 Bedsheet tanglings Cheese consumed tylervigen.com https://en.wikipedia.org/wiki/Data_dredging http://www.tylervigen.com/spurious-correlations
Recommend
More recommend