data science @ The New York Times and how a 164-year old content company became data-driven chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins references: bit.ly/icerm
data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm
“data science” jobs, jobs, jobs references: bit.ly/icerm
“data science” jobs, jobs, jobs references: bit.ly/icerm
“data science” jobs, jobs, jobs references: bit.ly/icerm
data science: mindset & toolset drew conway, 2010 references: bit.ly/icerm
modern history: 2009 references: bit.ly/icerm
“data science” blogs, blogs, blogs references: bit.ly/icerm
The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data The first time I heard "data science" was in 2007 while science. reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science. “data science” blogs, blogs, blogs references: bit.ly/icerm
“data science” blogs, blogs, blogs references: bit.ly/icerm
“data science” ancient history: 2001 references: bit.ly/icerm
“data science” ancient history: 2001 references: bit.ly/icerm
data science context references: bit.ly/icerm
home schooled references: bit.ly/icerm
PhD in topology references: bit.ly/icerm
“By the end of late 1945, I was a statistician rather than a topologist” references: bit.ly/icerm
invented: “bit” references: bit.ly/icerm
invented: “software” references: bit.ly/icerm
invented: “FFT” references: bit.ly/icerm
“the progenitor of data science.” - @mshron references: bit.ly/icerm
“The Future of Data Analysis,” 1962 John W. Tukey references: bit.ly/icerm
introduces: “Exploratory data anlaysis” references: bit.ly/icerm
Tukey 1965, via John Chambers references: bit.ly/icerm
TUKEY BEGAT S WHICH BEGAT R references: bit.ly/icerm
Tukey 1972 references: bit.ly/icerm
? 1972 references: bit.ly/icerm
Jerome H. Friedman references: bit.ly/icerm
In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field of information design). These course materials became the foundation for his first book on information design, The Visual Display of Quantitative Information Tukey 1975 references: bit.ly/icerm
TUKEY BEGAT VDQI references: bit.ly/icerm
Tukey 1977 references: bit.ly/icerm
TUKEY BEGAT EDA references: bit.ly/icerm
fast forward -> 2001 references: bit.ly/icerm
“The primary agents for change should be university departments themselves.” references: bit.ly/icerm
data science @ The New York Times histories and how a 164-year old content company became data-driven 1. in academia -> Bell: as heretical statistics (see also Breiman) 2. in industry: as job description historical rant: bit.ly/data-rant
data science @ The New York Times and how a 164-year old content company became data-driven chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins references: bit.ly/icerm
biology: 1892 vs. 1995 biology changed for good. references: bit.ly/icerm
genetics: 1837 vs. 2012 ML toolset; data science mindset references: bit.ly/icerm
genetics: 1837 vs. 2012 references: bit.ly/icerm
genetics: 1837 vs. 2012 ML toolset; data science mindset arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost
data science: mindset & toolset references: bit.ly/icerm
1851 references: bit.ly/icerm
news: 20th century church state references: bit.ly/icerm
church references: bit.ly/icerm
church references: bit.ly/icerm
church
news: 20th century church state references: bit.ly/icerm
news: 21st century church state engineering references: bit.ly/icerm
newspapering: 1851 vs. 1996 1851 1996 references: bit.ly/icerm
example: millions of views per hour 2015
references: bit.ly/icerm
data science: the web references: bit.ly/icerm
data science: the web is your “online presence” references: bit.ly/icerm
data science: the web is a microscope references: bit.ly/icerm
data science: the web is an experimental tool references: bit.ly/icerm
data science: the web is an optimization tool references: bit.ly/icerm
newspapering: 1851 vs. 1996 vs. 2008 2008 1851 1996 references: bit.ly/icerm
“a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank references: bit.ly/icerm
every publisher is now a startup references: bit.ly/icerm
news: 21st century church state engineering references: bit.ly/icerm
news: 21st century church state engineering references: bit.ly/icerm
learnings references: bit.ly/icerm
learnings - supervised learning - unsupervised learning - reinforcement learning references: bit.ly/icerm
learnings - supervised learning - unsupervised learning - reinforcement learning cf. modelingsocialdata.org references: bit.ly/icerm
stats.stackexchange.com references: bit.ly/icerm
N X L = ϕ ( y i f ( x i ; β )) + λ || β || i =1 from “are you a bayesian or a frequentist” —michael jordan
supervised learning, e.g., cf. modelingsocialdata.org
supervised learning, e.g., “the funnel” cf. modelingsocialdata.org
interpretable supervised learning super cool stuff cf. modelingsocialdata.org
interpretable supervised learning super cool stuff cf. modelingsocialdata.org arxiv.org/abs/q-bio/0701021
optimization & learning, e.g., “How The New York Times Works “popular mechanics, 2015
recommendation as supervised learning
unsupervised learning, e.g, cf. daeilkim.com ; import bnpy
modeling your audience bit.ly/Hughes-Kim-Sudderth-AISTATS15
modeling your audience (optimization, ultimately)
modeling your audience also allows recommendation as inference
reinforcement learning: from A/B to…. (esp. Learning supervised) aka “A/B Test testing”; Some of the most recognizable personalization in our service is the collection of “genre” rows. …Members connect with these rows so well that we measure an increase in member retention by placing the most tailored rows higher on the page instead of lower. business as Reporting usual cf. modelingsocialdata.org
real-time A/B -> “bandits” GOOG blog: cf. modelingsocialdata.org
Explore unsupervised: Learning supervised: Test Optimizing reinforcement: Reporting
Explore unsupervised: Learning supervised: Test Optimizing reinforcement: Reporting
common requirements in data science:
common requirements in data science: 1. people 2. ideas 3. things cf. USAF
things: what does DS team deliver?
things: what does DS team deliver? - build data prototypes - build APIs - impact roadmaps
- build data prototypes
- build data prototypes cf. daeilkim.com
- build data prototypes cf. daeilkim.com
- build APIs - in puppet, w/python2.7 - collaboration w/pers. team
- impact roadmaps flickr/McJex
data science: ideas
data skills - data engineering - data science - data visualization - data product - data multiliteracies - data embeds cf. “data scientists at work”, ch 1
data skills - data engineering - data science - data visualization - data product - data multiliteracies - data embeds cf. “data scientists at work”, ch 1
data science: people - new mindset > new toolset
summary: pay attention to: 1. people 2. ideas 3. things cf. USAF
thanks to the data science team!
Recommend
More recommend