what is data science business efficiency wal mart
play

What is Data Science? Business efficiency: Wal-Mart - PowerPoint PPT Presentation

What is Data Science? Business efficiency: Wal-Mart http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html Business Marketing: Target http://tinyurl.com/7jbntx3 Recommendations: In October 2006 Netflix held a competition for the best


  1. What is Data Science?

  2. Business efficiency: Wal-Mart http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html

  3. Business Marketing: Target http://tinyurl.com/7jbntx3

  4. Recommendations: In October 2006 Netflix held a competition for the best • algorithm to predict user ratings of movies. • The winner must improve Netflix’ own algorithm (Cinematch) by at least 10% • Award was given in September 2009 • Based on Collaborative Filtering • Difficult movies to predict: “Napoleon Dynamite” , “ Lost in Translation”, “Fahrenheit 9/11”, “Kill Bill: Volume 1” http://www2.research.att.com/~volinsky/netflix/bpc.html

  5. Sports Analytics

  6. Beyond Moneyball: The defensive shift http://www.sporttechie.com/2014/11/11/sports/mlb/beyond-moneyball-how-big-data-is-changing-baseball/

  7. Lesson for Data Scientists: - Question your assumptions (be especially skeptical when predicting a rare event with limited history using human behavior. - Examine data quality - in this election polls were not reaching all likely voters - Beware of your own biases : many pollsters were likely Clinton supporters and did not want to question the results that favored their candidate

  8. Cholera outbreak in London 1854 • Physician John Snow links the outbreak to a contaminated well by plotting number of cases on a map • Started the science of epidemiology

  9. The Book of Winchester (1086) a.k.a. Domesday Book • Commissioned in 1085 by William the Conqueror • Record of the Great Survey of England • Last used to settle dispute in court in the 1960s! http://www.domesdaybook.co.uk/

  10. Data in the 20th century What problems were solved? • Engineering: design of machines Sciences: formulation of theories • How were problems solved? Empirically • • Theories Computation •

  11. Data in the 21st Century How is today different? • More data is available More data is digital • • More data is observed, rather than generated by a designed experiment

  12. Data in the 21st Century What problems are solved today? • Spell checking Face recognition • • Sentiment analysis Optimal routing • High-frequency trading algorithms • just to name a few … •

  13. Data in the 21st Century How are problems solved today? • Empirically Theories • • Computation • Data exploration http://research.microsoft.com/en-us/collaboration/fourthparadigm/

  14. For Example Network security: • 20th century: based on rules and signatures 21st century: data mining traffic logs • http://www.bro.org/ Artificial Intelligence: VS.

  15. IBM Watson: The Jeopardy Challenge Not everything is perfect! Category: U.S. Cities ITS LARGEST AIRPORT IS NAMED FOR A WORLD WAR II HERO: ITS SECOND LARGEST, FOR A WORLD WAR II BATTLE.

  16. A good question So, what is data science?

  17. Who are the Data Scientists? https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ Skills: • Make discoveries while swimming in data • Don’t allow technical limitations to bog down solutions • Often fashion their own tools • Skilled in storytelling with data Some data-driven companies: Google, Wal-Mart, Twitter, LinkedIn, Amazon

  18. What data scientists do • Ask a question • Get relevant data • Prepare data for analysis - outliers, missing values, incorrect values • Explore data - understand the world as it is (was) • Statistical model - estimate/train and validate model - predict what will (likely) happen • Communicate results - tell a story - recommend

  19. The Data Science Process Exploratory Data Analysis Data Extraction Machine Learning, Statistical Models Data Cleaning Communicate and Report Findings Build Data Product

  20. Data Scientist skills Computer science • - programming, hacking skills • Statistics - probability, distributions, modelling • Mathematics - linear algebra, calculus, optimization • Domain expertise - storytelling, pose question, interpret result • Communication - presentation, data visualization

  21. Drew Conway’s Venn diagram • Extract insight • Acquire and clean data • Familiarity with statistical • Text file manipulation tools • Think algorithmically • Understand algorithms • Interpret results • Real world motivating questions • Hypothesis Testing http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

  22. IBM Predictive Analytics for Asset Management https://www.youtube.com/watch?v=b9LrXxG5SjY

Recommend


More recommend