data science the new york times
play

data science @ The New York Times and how a 164-year old content - PowerPoint PPT Presentation

data science @ The New York Times and how a 164-year old content company became data-driven chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins references: bit.ly/icerm data science @ The New York Times and how a 164-year old


  1. data science @ The New York Times and how a 164-year old content company became data-driven chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins references: bit.ly/icerm

  2. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm

  3. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm

  4. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm

  5. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm

  6. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm

  7. data science @ The New York Times and how a 164-year old content company became data-driven references: bit.ly/icerm

  8. “data science” jobs, jobs, jobs references: bit.ly/icerm

  9. “data science” jobs, jobs, jobs references: bit.ly/icerm

  10. “data science” jobs, jobs, jobs references: bit.ly/icerm

  11. data science: mindset & toolset drew conway, 2010 references: bit.ly/icerm

  12. modern history: 2009 references: bit.ly/icerm

  13. “data science” blogs, blogs, blogs references: bit.ly/icerm

  14. The first time I heard "data science" was in 2007 while reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data The first time I heard "data science" was in 2007 while science. reading a proposal that my adviser had passed along, outlining an academic program similar to what we think of as data science. “data science” blogs, blogs, blogs references: bit.ly/icerm

  15. “data science” blogs, blogs, blogs references: bit.ly/icerm

  16. “data science” ancient history: 2001 references: bit.ly/icerm

  17. “data science” ancient history: 2001 references: bit.ly/icerm

  18. data science context references: bit.ly/icerm

  19. home schooled references: bit.ly/icerm

  20. PhD in topology references: bit.ly/icerm

  21. “By the end of late 1945, I was a statistician rather than a topologist” references: bit.ly/icerm

  22. invented: “bit” references: bit.ly/icerm

  23. invented: “software” references: bit.ly/icerm

  24. invented: “FFT” references: bit.ly/icerm

  25. “the progenitor of data science.” - @mshron references: bit.ly/icerm

  26. “The Future of Data Analysis,” 1962 John W. Tukey references: bit.ly/icerm

  27. introduces: “Exploratory data anlaysis” references: bit.ly/icerm

  28. Tukey 1965, via John Chambers references: bit.ly/icerm

  29. TUKEY BEGAT S WHICH BEGAT R references: bit.ly/icerm

  30. Tukey 1972 references: bit.ly/icerm

  31. ? 1972 references: bit.ly/icerm

  32. Jerome H. Friedman references: bit.ly/icerm

  33. In 1975, while at Princeton, Tufte was asked to teach a statistics course to a group of journalists who were visiting the school to study economics. He developed a set of readings and lectures on statistical graphics, which he further developed in joint seminars he subsequently taught with renowned statistician John Tukey (a pioneer in the field of information design). These course materials became the foundation for his first book on information design, The Visual Display of Quantitative Information Tukey 1975 references: bit.ly/icerm

  34. TUKEY BEGAT VDQI references: bit.ly/icerm

  35. Tukey 1977 references: bit.ly/icerm

  36. TUKEY BEGAT EDA references: bit.ly/icerm

  37. fast forward -> 2001 references: bit.ly/icerm

  38. “The primary agents for change should be university departments themselves.” references: bit.ly/icerm

  39. data science @ The New York Times histories and how a 164-year old content company became data-driven 1. in academia -> Bell: as heretical statistics (see also Breiman) 2. in industry: as job description historical rant: bit.ly/data-rant

  40. data science @ The New York Times and how a 164-year old content company became data-driven chris.wiggins@columbia.edu chris.wiggins@nytimes.com @chrishwiggins references: bit.ly/icerm

  41. biology: 1892 vs. 1995 biology changed for good. references: bit.ly/icerm

  42. genetics: 1837 vs. 2012 ML toolset; data science mindset references: bit.ly/icerm

  43. genetics: 1837 vs. 2012 references: bit.ly/icerm

  44. genetics: 1837 vs. 2012 ML toolset; data science mindset arxiv.org/abs/1105.5821 ; github.com/rajanil/mkboost

  45. data science: mindset & toolset references: bit.ly/icerm

  46. 1851 references: bit.ly/icerm

  47. news: 20th century church state references: bit.ly/icerm

  48. church references: bit.ly/icerm

  49. church references: bit.ly/icerm

  50. church

  51. news: 20th century church state references: bit.ly/icerm

  52. news: 21st century church state engineering references: bit.ly/icerm

  53. newspapering: 1851 vs. 1996 1851 1996 references: bit.ly/icerm

  54. example: millions of views per hour 2015

  55. references: bit.ly/icerm

  56. data science: the web references: bit.ly/icerm

  57. data science: the web is your “online presence” references: bit.ly/icerm

  58. data science: the web is a microscope references: bit.ly/icerm

  59. data science: the web is an experimental tool references: bit.ly/icerm

  60. data science: the web is an optimization tool references: bit.ly/icerm

  61. newspapering: 1851 vs. 1996 vs. 2008 2008 1851 1996 references: bit.ly/icerm

  62. “a startup is a temporary organization in search of a repeatable and scalable business model” —Steve Blank references: bit.ly/icerm

  63. every publisher is now a startup references: bit.ly/icerm

  64. news: 21st century church state engineering references: bit.ly/icerm

  65. news: 21st century church state engineering references: bit.ly/icerm

  66. learnings references: bit.ly/icerm

  67. learnings - supervised learning - unsupervised learning - reinforcement learning references: bit.ly/icerm

  68. learnings - supervised learning - unsupervised learning - reinforcement learning cf. modelingsocialdata.org references: bit.ly/icerm

  69. stats.stackexchange.com references: bit.ly/icerm

  70. N X L = ϕ ( y i f ( x i ; β )) + λ || β || i =1 from “are you a bayesian or a frequentist” —michael jordan

  71. supervised learning, e.g., cf. modelingsocialdata.org

  72. supervised learning, e.g., “the funnel” cf. modelingsocialdata.org

  73. interpretable supervised learning super cool stuff cf. modelingsocialdata.org

  74. interpretable supervised learning super cool stuff cf. modelingsocialdata.org arxiv.org/abs/q-bio/0701021

  75. optimization & learning, e.g., “How The New York Times Works “popular mechanics, 2015

  76. recommendation as supervised learning

  77. unsupervised learning, e.g, cf. daeilkim.com ; import bnpy

  78. modeling your audience bit.ly/Hughes-Kim-Sudderth-AISTATS15

  79. modeling your audience (optimization, ultimately)

  80. modeling your audience also allows recommendation as inference

  81. reinforcement learning: from A/B to…. (esp. Learning supervised) aka “A/B Test testing”; Some of the most recognizable personalization in our service is the collection of “genre” rows. …Members connect with these rows so well that we measure an increase in member retention by placing the most tailored rows higher on the page instead of lower. business as Reporting usual cf. modelingsocialdata.org

  82. real-time A/B -> “bandits” GOOG blog: cf. modelingsocialdata.org

  83. Explore unsupervised: Learning supervised: Test Optimizing reinforcement: Reporting

  84. Explore unsupervised: Learning supervised: Test Optimizing reinforcement: Reporting

  85. common requirements in data science:

  86. common requirements in data science: 1. people 2. ideas 3. things cf. USAF

  87. things: what does DS team deliver?

  88. things: what does DS team deliver? - build data prototypes - build APIs - impact roadmaps

  89. - build data prototypes

  90. - build data prototypes cf. daeilkim.com

  91. - build data prototypes cf. daeilkim.com

  92. - build APIs - in puppet, w/python2.7 - collaboration w/pers. team

  93. - impact roadmaps flickr/McJex

  94. data science: ideas

  95. data skills - data engineering - data science - data visualization - data product - data multiliteracies - data embeds cf. “data scientists at work”, ch 1

  96. data skills - data engineering - data science - data visualization - data product - data multiliteracies - data embeds cf. “data scientists at work”, ch 1

  97. data science: people - new mindset > new toolset

  98. summary: pay attention to: 1. people 2. ideas 3. things cf. USAF

  99. thanks to the data science team!

Recommend


More recommend