big data so what s the big deal
play

Big Data so whats the big deal? Jevin West Information School, - PowerPoint PPT Presentation

Big Data so whats the big deal? Jevin West Information School, University of Washington DataLab (MGH 310E) jevinw@uw.edu January 26, 2017 What is Data Science? Spring Quarter, 2017 http://callingbullshit.org


  1. Big Data – so what’s the big deal? Jevin West Information School, University of Washington DataLab (MGH 310E) jevinw@uw.edu January 26, 2017

  2. What is Data Science?

  3. Spring Quarter, 2017 http://callingbullshit.org

  4. http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

  5. Want to be a data scientist?

  6. ‘The Data Scientist’ Communication skills Ethical Reasoning Information/Data Management Personnel Management Interdisciplinary Adaptable

  7. Data Scientist Drew Conway, NYU

  8. Examples of data science

  9. Agenda • What is data science? • Cautionary Tales • Data Science at UW and in Seattle • Big data – why should you care? • More cautionary Tales (Data and Society) • Data Science, in action • DataLab • Data for Social Good

  10. Universities are going big

  11. Big Data at UW • LSST • CS (Farecast) • Libraries (digital content) • Oceanography • Neuroscience

  12. Data Science at the Information School • Data Science Option (~ Spring 2016) • INFO 370: Introduction to Data Science (Fall) • INFO 371: Machine Learning (Spring) • INFO 445: Advanced Database Design, Management, and Maintenance • INFO 474: Interactive Data Visualization

  13. Other Classes in iSchool • INFX 551 (4 credits) – Fundamentals of Data Curation • INFX 576 (4 credits) – Social Network Analysis • INFO 470 (5 credits) – Research Methods • INFX 573 (4 credits) – Introduction to Data Science • INFX 574 (4 credits) – Core Methods in Data Science and Analytics • INFX 575 (4 credits) – Advanced Methods in Data Science and Analytics

  14. Extra Credit

  15. What is big data?

  16. “Yes, some of the best theorizing comes after collecting data because then you become aware of another reality…” Robert Shiller, Nobel Price in Economics (2013)

  17. Data Exhaust: by-product of human activity Examples: cell phone locations, purchase transactions, social media Barabasi et al., Nature (2008), Ginsperg et al., Nature (2009)

  18. Why big data? • Cheaper sensors (climate research, astronomy, high energy physics, high-throughput gene sequencing, cell phones) • Cheaper storage (4 TB, $168) • People willing to share their personal information (Facebook, social media) • Faster communication (internet, cell phones) • Other reasons?

  19. The Four A’s and V’s • A rchitecture • A cquisition • A nalysis • A rchiving • V olume • V elocity • V ariety • V eracity

  20. References

  21. Why should you care about big data? A shortage of 1.5 million jobs!

  22. Concerns • Privacy • Overconfidence and Overfitting • Correlation versus causation • Who owns big data? • What else?

  23. Big Data is messy

  24. http://www.theatlantic.com/magazine/archive/2013/12/theyre-watching-you-at-work/354681/

  25. New MIT algorithm rubs shoulders with human intuition in big data analysis https://www.washingtonpost.com/news/speaking-of-science/wp/2015/10/19/new-mit-algorithm-rubs-shoulders-with-human-intuition-in-big-data- analysis/

  26. Correlation versus Causation

  27. http://www.washingtonpost.com/news/wonkblog/wp/2015/10/01/the-hidden-inequality-of-who-dies-in-car-crashes/

  28. Sampling

  29. Big Data in action

  30. DJ Patil

  31. If you had access to the personal calendars of 200 million people, what could you do with it? What products could you create?

  32. Is there a secondary market for the data that companies are collecting?

  33. Big data is about asking good questions

  34. JW Jevin West Science of Science Jevin West | jevinw@uw.edu | @jevinwest | jevinwest.org

  35. Fluid Mechanics Material Engineering Circuits Computer Science Geosciences Tribology Operations Research Astronomy & Astrophysics Computer Imaging Mathematics Power Systems Physics Telecommunication Electromagnetic Engineering Control Theory Chemical Engineering Probability & Statistics Chemistry Environmental Chemistry & Microbiology Applied Acoustics Business & Marketing Analytic Chemistry Geography Economics Psychology Sociology Crop Science Education Ecology & Evolution Pharmacology Political Science Neuroscience Agriculture Law Psychiatry Environmental Health Medical Imaging Anthropology Molecular & Cell Biology Veterinary Orthopedics Parasitology Dentistry Medicine Ophthalmology Citation flow within field Otolaryngology Citation flow from B to A Gastroenterology B A Urology Pathology Dermatology Rheumatology Citation flow from A to B Citation flow out of field

  36. JW

  37. JW West, Wesley-Smith, Bergstrom (2016) A recommendation system based on hierarchical clustering of an article-level citation network. IEEE, Transactions on Big Data (in press)

  38. Mining the literature In collaboration with P . I. Imoukhuede, University of Illinois

  39. http://jevinwest.org

  40. Why should you care about big data? Jobs Privacy

  41. Enjoy the wave but be cautious…

  42. Big Data involves people

  43. “Data is increasingly digital air: the oxygen we breathe and the carbon dioxide that we exhale. It can be a source of both sustenance and pollution.” -- Dana Boyd D. Boyd & K. Crawford (2011) Six Provocations on Big Data . SSRN

  44. Jevin West jevinw@uw.edu @jevinwest Website: jevinwest.org Lab: datalab.ischool.uw.edu

Recommend


More recommend