the 3 r s of data the 3 r s of data science science
play

THE 3-R'S OF DATA- THE 3-R'S OF DATA- SCIENCE: SCIENCE: - PowerPoint PPT Presentation

06/05/2019 reveal.js The HTML Presentation Framework THE 3-R'S OF DATA- THE 3-R'S OF DATA- SCIENCE: SCIENCE: REPEATABILITY REPEATABILITY, , REPRODUCIBILITY REPRODUCIBILITY & & REPLICABILITY REPLICABILITY By Suneeta Mall


  1. 06/05/2019 reveal.js – The HTML Presentation Framework THE 3-R'S OF DATA- THE 3-R'S OF DATA- SCIENCE: SCIENCE: REPEATABILITY REPEATABILITY, , REPRODUCIBILITY REPRODUCIBILITY & & REPLICABILITY REPLICABILITY By Suneeta Mall localhost:8000/3Rs.html?print-pdf#/intro 1/93

  2. 06/05/2019 reveal.js – The HTML Presentation Framework OVERVIEW OVERVIEW Industry adaptation Importance of 3-Rs Peek into Reproducibility crisis Define 3-Rs: repeatability, reproducibility & replicability Down the memory lane of confused terminology Techniques to ensure Repeatability & Reproducibility In depth review of few of the promising tools Techniques to ensure Replicability Few examples One last point! localhost:8000/3Rs.html?print-pdf#/intro 2/93

  3. 06/05/2019 reveal.js – The HTML Presentation Framework INDUSTRY ADAPTATION INDUSTRY ADAPTATION “58% of respondents indicated that they were seriously building data science based solutions, with only 14% indicating no involvement just yet. Evolving Data Infrastructure - Ben Lorica and https://www.kdnuggets.com/2017/05/machine- Paco Nathan (O’Reilly, Oct 2018) learning-overtaking-big-data.html ” localhost:8000/3Rs.html?print-pdf#/intro 3/93

  4. 06/05/2019 reveal.js – The HTML Presentation Framework INDUSTRY ADAPTATION INDUSTRY ADAPTATION “As per a survey in UK, 84% of startups focus on Data- science. With 52% of companies preferred to build/use their own models. https://www.kdnuggets.com/2017/05/machine- David Kelnar, MMC Ventures, 2016 ” learning-overtaking-big-data.html localhost:8000/3Rs.html?print-pdf#/intro 4/93

  5. 06/05/2019 reveal.js – The HTML Presentation Framework DATA-SCIENCE DATA-SCIENCE https://xkcd.com/1838/ localhost:8000/3Rs.html?print-pdf#/intro 5/93

  6. IMPORTANCE OF 3-RS IN DATA- IMPORTANCE OF 3-RS IN DATA- 06/05/2019 reveal.js – The HTML Presentation Framework SCIENCE SCIENCE localhost:8000/3Rs.html?print-pdf#/intro 6/93

  7. IMPORTANCE OF 3-RS IN DATA- IMPORTANCE OF 3-RS IN DATA- 06/05/2019 reveal.js – The HTML Presentation Framework SCIENCE SCIENCE Know what... why ... & how ... localhost:8000/3Rs.html?print-pdf#/intro 7/93

  8. IMPORTANCE OF 3-RS IN DATA- IMPORTANCE OF 3-RS IN DATA- 06/05/2019 reveal.js – The HTML Presentation Framework SCIENCE SCIENCE Know what... why ... & how ... recreate it ... & solve it. localhost:8000/3Rs.html?print-pdf#/intro 8/93

  9. 06/05/2019 reveal.js – The HTML Presentation Framework IMPORTANCE OF 3RS .. CTD.. IMPORTANCE OF 3RS .. CTD.. We are continually faced by great opportunities brilliantly disguised as insoluble problems. Lee Iacocca The opportunities here are building: localhost:8000/3Rs.html?print-pdf#/intro 9/93

  10. 06/05/2019 reveal.js – The HTML Presentation Framework IMPORTANCE OF 3RS .. CTD.. IMPORTANCE OF 3RS .. CTD.. We are continually faced by great opportunities brilliantly disguised as insoluble problems. Lee Iacocca The opportunities here are building: Reliable, & robust predictive solution localhost:8000/3Rs.html?print-pdf#/intro 10/93

  11. 06/05/2019 reveal.js – The HTML Presentation Framework IMPORTANCE OF 3RS .. CTD.. IMPORTANCE OF 3RS .. CTD.. We are continually faced by great opportunities brilliantly disguised as insoluble problems. Lee Iacocca The opportunities here are building: Reliable, & robust predictive solution - That can be trusted localhost:8000/3Rs.html?print-pdf#/intro 11/93

  12. 06/05/2019 reveal.js – The HTML Presentation Framework IMPORTANCE OF 3RS: IN IMPORTANCE OF 3RS: IN RESEARCH RESEARCH “ Non-reproducible single occurrences are of no significance to science. Popper (The logic of Scientific Discovery) ” localhost:8000/3Rs.html?print-pdf#/intro 12/93

  13. 06/05/2019 reveal.js – The HTML Presentation Framework IMPORTANCE OF 3RS: IN IMPORTANCE OF 3RS: IN RESEARCH RESEARCH “ Non-reproducible single occurrences are of no significance to science. Popper (The logic of Scientific Discovery) ” Yet 70% of researchers have failed to reproduce another scientist's experiments localhost:8000/3Rs.html?print-pdf#/intro 13/93

  14. 06/05/2019 reveal.js – The HTML Presentation Framework IMPORTANCE OF 3RS: IN IMPORTANCE OF 3RS: IN RESEARCH RESEARCH “ Non-reproducible single occurrences are of no significance to science. Popper (The logic of Scientific Discovery) ” Yet 70% of researchers have failed to reproduce another scientist's experiments , and > 50% have failed to reproduce their own experiments localhost:8000/3Rs.html?print-pdf#/intro 14/93

  15. 06/05/2019 reveal.js – The HTML Presentation Framework IMPORTANCE OF 3RS: IN IMPORTANCE OF 3RS: IN RESEARCH RESEARCH “ Non-reproducible single occurrences are of no significance to science. Popper (The logic of Scientific Discovery) ” Yet 70% of researchers have failed to reproduce another scientist's experiments , and > 50% have failed to reproduce their own experiments - Nature's Survey (2016) localhost:8000/3Rs.html?print-pdf#/intro 15/93

  16. 06/05/2019 reveal.js – The HTML Presentation Framework localhost:8000/3Rs.html?print-pdf#/intro 16/93

  17. 06/05/2019 reveal.js – The HTML Presentation Framework localhost:8000/3Rs.html?print-pdf#/intro 17/93

  18. 06/05/2019 reveal.js – The HTML Presentation Framework A reproducibility crisis localhost:8000/3Rs.html?print-pdf#/intro 18/93

  19. 06/05/2019 reveal.js – The HTML Presentation Framework International Conference on Learning Representations Annual reproducibility challenge (since 2018) lead by Dr. Joelle Pineau, an Associate Professor at McGill University and lead for Facebook’s Artificial Intelligence Research lab (FAIR) localhost:8000/3Rs.html?print-pdf#/intro 19/93

  20. 06/05/2019 reveal.js – The HTML Presentation Framework Thats research and reproducibility crisis is very real! But why are we talking about it? localhost:8000/3Rs.html?print-pdf#/intro 20/93

  21. 06/05/2019 reveal.js – The HTML Presentation Framework Thats research and reproducibility crisis is very real! But why are we talking about it? Industry Adaptation is making data-science accessible to people localhost:8000/3Rs.html?print-pdf#/intro 21/93

  22. 06/05/2019 reveal.js – The HTML Presentation Framework Thats research and reproducibility crisis is very real! But why are we talking about it? Industry Adaptation is making data-science accessible to people Thus changing our community, and society localhost:8000/3Rs.html?print-pdf#/intro 22/93

  23. 06/05/2019 reveal.js – The HTML Presentation Framework Thats research and reproducibility crisis is very real! But why are we talking about it? Industry Adaptation is making data-science accessible to people Thus changing our community, and society We have moral and social obligation to provide confident and reliable answers! localhost:8000/3Rs.html?print-pdf#/intro 23/93

  24. 06/05/2019 reveal.js – The HTML Presentation Framework 1 REPEATABILITY REPEATABILITY, REPRODUCIBILITY REPRODUCIBILITY & & REPLICABILITY REPLICABILITY localhost:8000/3Rs.html?print-pdf#/intro 24/93

  25. 06/05/2019 reveal.js – The HTML Presentation Framework 1.1 1.1 REPEATABILITY REPEATABILITY is the closeness of the agreement between the results of successive attempt of the same experiment/process carried out under the same conditions. localhost:8000/3Rs.html?print-pdf#/intro 25/93

  26. 06/05/2019 reveal.js – The HTML Presentation Framework 1.1 1.1 REPEATABILITY REPEATABILITY is the closeness of the agreement between the results of successive attempt of the same experiment/process carried out under the same conditions. e.g. replay, repeat localhost:8000/3Rs.html?print-pdf#/intro 26/93

  27. 06/05/2019 reveal.js – The HTML Presentation Framework 1.1 1.1 REPEATABILITY REPEATABILITY 1 import matplotlib.pyplot as plt 2 import numpy as np 3 from sklearn import datasets, linear_model 4 from sklearn.metrics import mean_squared_error, r2_score 5 from sklearn.model_selection import train_test_split 6 7 diabetes = datasets.load_diabetes() 8 diabetes_X = diabetes.data[:, np.newaxis, 9] 9 xtrain, xtest, ytrain, ytest = train_test_split( 10 diabetes_X, diabetes.target, 11 test_size=0.33, random_state= None ) 12 13 regr = linear_model.LinearRegression() 14 regr.fit(xtrain, ytrain) 15 diabetes_y_pred = regr.predict(xtest) A simple linear regression example on Scikit diabetes dataset localhost:8000/3Rs.html?print-pdf#/intro 27/93

  28. 06/05/2019 reveal.js – The HTML Presentation Framework Run 1 Run 2 localhost:8000/3Rs.html?print-pdf#/intro 28/93

  29. 06/05/2019 reveal.js – The HTML Presentation Framework 1.1 1.1 REPEATABILITY REPEATABILITY 1 import matplotlib.pyplot as plt 2 import numpy as np 3 from sklearn import datasets, linear_model 4 from sklearn.metrics import mean_squared_error, r2_score 5 from sklearn.model_selection import train_test_split 6 7 diabetes = datasets.load_diabetes() 8 diabetes_X = diabetes.data[:, np.newaxis, 9] 9 xtrain, xtest, ytrain, ytest = train_test_split( 10 diabetes_X, diabetes.target, 11 test_size=0.33, random_state=32) 12 13 regr = linear_model.LinearRegression() 14 regr.fit(xtrain, ytrain) 15 diabetes_y_pred = regr.predict(xtest) Linear regression example on Scikit diabetes dataset with fixed seed localhost:8000/3Rs.html?print-pdf#/intro 29/93

  30. 06/05/2019 reveal.js – The HTML Presentation Framework Repeat 1 Repeat 2 localhost:8000/3Rs.html?print-pdf#/intro 30/93

Recommend


More recommend