D ATA S CIENCE E COSYSTEM M. T AMER Ö ZSU N ANCY R EID R AYMOND N G U. W ATERLOO U. T ORONTO UBC
D ATA S CIENCE /B IG D ATA IN THE N EWS … Canadian Data Science Workshop 2
D ATA S CIENCE E VERYWHERE !... Canadian Data Science Workshop 3
D ATA S CIENCE E VERYWHERE !... Canadian Data Science Workshop 3
D ATA S CIENCE E VERYWHERE !... Canadian Data Science Workshop 3
D ATA S CIENCE V OCABULARY Canadian Data Science Workshop 4
W HAT IS D ATA S CIENCE ? Canadian Data Science Workshop 5
W HAT IS D ATA S CIENCE ? • “ Data science , also known as data-driven science , is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” Canadian Data Science Workshop 5
W HAT IS D ATA S CIENCE ? • “ Data science , also known as data-driven science , is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” • “Data science intends to analyze and understand actual phenomena with ‘data’. In other words, the aim of data science is to reveal the features or the hidden structure of complicated natural, human, and social phenomena with data from a different point of view from the established or traditional theory and method.” Canadian Data Science Workshop 5
W HAT IS D ATA S CIENCE ? • Fourth paradigm • “… change of all sciences moving from observational, to theoretical, to computational and now to the 4th Paradigm – Data-Intensive Scientific Discovery” Canadian Data Science Workshop 6
W HAT IS I MPORTANT ? Need to solve a real problem using data… No applications, no data science. Canadian Data Science Workshop 7
D ATA S CIENCE AS A U NIFIER Humanities Machine/ Data Statistical Management Learning Data Application Law Domain Science Expertise Social Visualization Science Mathematical Optimization Canadian Data Science Workshop 8
D ATA S CIENCE AND B IG D ATA • They are not the “same thing” • Big data = crude oil • Big data is about extracting “crude oil”, transporting it in “mega tankers”, siphoning it through “pipelines”, and storing it in “massive silos” • Data science is about refining the “crude oil” Carlos Samohano Founder, Data Science London Canadian Data Science Workshop 9
D ATA S CIENCE AND A RTIFICIAL I NTELLIGENCE Data Artificial ML/DM/ Analytics Science Intelligence Canadian Data Science Workshop 10
D ATA S CIENCE AND A RTIFICIAL I NTELLIGENCE Data Artificial ML/DM/ Analytics Science Intelligence “Data science produces insights . Machine learning produces predictions” Canadian Data Science Workshop 10
D ATA S CIENCE A PPLICATION E XAMPLES • Fraud detection • Investigate fraud patterns in past data • Early detection is important • Before damage propagates • Harder than late detection • Precision is important • False positive and false negative are both bad • Real-time analytics Canadian Data Science Workshop 11
D ATA S CIENCE A PPLICATION E XAMPLES • Recommender systems • The ability to offer unique personalized service • Increase sales, click-through rates, conversions, … • Netflix recommender system valued at $1B per year • Amazon recommender system drives a 20-35% lift in sales annually • Collaborative filtering at scale Canadian Data Science Workshop 12
D ATA S CIENCE A PPLICATION E XAMPLES • Predicting why patients are being readmitted • Reduce costs • Improve population health • Find the “why” behind specific populations being readmitted • Data lakes of multiple data sources • Investigate ties between readmission and socioeconomic data points, patient history, genetics, … Canadian Data Science Workshop 13
D ATA S CIENCE A PPLICATION E XAMPLES • “Smart cities” • Not well-defined 14 Canadian Data Science Workshop
D ATA S CIENCE A PPLICATION E XAMPLES • “Smart cities” • Not well-defined 14 Canadian Data Science Workshop
D ATA S CIENCE A PPLICATION E XAMPLES • “Smart cities” • Not well-defined • Generally refers to using data and ICT to • Better plan communities • Better manage assets • Reduce costs • Deploy open data to better engage with community 14 Canadian Data Science Workshop
D ATA S CIENCE A PPLICATION E XAMPLES • Moneyball • How to build a baseball team on a very low budget by relying on data • Sabermetrics : the statistical analysis of baseball data to objectively evaluate performance • 2002 record of 103-59 was joint best in MLB • Team salary budget: $40 million • Other team: Yankees • Team salary budget: $120 million Canadian Data Science Workshop 15
H OLISTIC A PPROACH TO D ATA S CIENCE Core Data Security & Privacy Data Data Making Data Management of Modeling & Dissemination & Trustable & Big Data Analysis Visualization Usable Acquisition Preservation Ethics, Policy & Social Impact Application Application Application Application Canadian Data Science Workshop 16
C ORE R ESEARCH I SSUES & I NTERACTIONS Making Data Trustable & Usable Big Data Modelling & Management Analysis Data Visualization & Dissemination Canadian Data Science Workshop 17
C ORE R ESEARCH I SSUES & I NTERACTIONS • Data cleaning • Sampling Making Data • Data provenance Trustable & Usable Big Data Modelling & Management Analysis Data Visualization & Dissemination Canadian Data Science Workshop 17
C ORE R ESEARCH I SSUES & I NTERACTIONS • Data cleaning • Sampling Making Data • Data lakes • Data provenance Trustable & • Batch & online access Usable • Platforms Big Data Modelling & Management Analysis Data Visualization & Dissemination Canadian Data Science Workshop 17
C ORE R ESEARCH I SSUES & I NTERACTIONS • Data cleaning • Sampling Making Data • Data lakes • Data provenance Trustable & • Batch & online access Usable • Platforms Big Data Modelling & Management Analysis • Models & methods for data lakes • Unsupervised Data classification & AI Visualization & Dissemination Canadian Data Science Workshop 17
C ORE R ESEARCH I SSUES & I NTERACTIONS • Data cleaning • Sampling Making Data • Data lakes • Data provenance Trustable & • Batch & online access Usable • Platforms Big Data Modelling & Management Analysis • Visualization for wider • Models & methods for data audience lakes • Visualization for data • Unsupervised exploration Data classification & AI • Open data technologies Visualization & Dissemination Canadian Data Science Workshop 17
C ORE R ESEARCH I SSUES & I NTERACTIONS • Data cleaning • Sampling Making Data • Data lakes • Data provenance Trustable & • Batch & online access Usable • Platforms • DM support for provenance • Data preparation for big data management • Cleaning for data Big Data Modelling & Management Analysis analysis • DM for ML • ML for DM • Visualization for wider • Models & methods for data • Visual analytics audience lakes • Visualization for data … • Unsupervised exploration Data classification & AI • Open data technologies Visualization & Dissemination Canadian Data Science Workshop 17
Recommend
More recommend