introduction to data science using r
play

Introduction to Data Science using R Lecture 1 2 58M Aims The - PowerPoint PPT Presentation

1 Introduction to Data Science using R Lecture 1 2 58M Aims The aim of this module is to enable you to: to develop skills in some specific types of data analysis by providing supported practice in workshops and opportunities to


  1. 1 Introduction to Data Science using R Lecture 1

  2. 2 58M Aims The aim of this module is to enable you to: to develop skills in some specific types of ‘data analysis’ by providing ● supported practice in workshops and opportunities to apply them independently in ‘projects’ These will help you become: independent researchers ● highly employable ●

  3. 3 Options: You do one of 1. Analysing and using 3D structures in molecular bioscience research 2. Biological data science 3. Image analysis 4. Sequence analysis Each option is about 15 hours contact time

  4. 4 Learning Outcomes Broadest sense At the end of this module the successful student will be able to: 1. Demonstrate the acquisition of skills in experimental design and data analysis, related to the option chosen within the module 2. Apply the skills learned to address novel bioscience problems

  5. 5 Learning Outcomes: this option At the end of this module the successful student will be able to: 1. Demonstrate the acquisition of skills in experimental design and data analysis, related to the options chosen within the module 2. Apply the skills learned to address novel bioscience problems i.e., Devise reproducible strategies to import, tidy, transform, model and present data in R

  6. 6 Overview What is Data Science? Not the same as numeracy - you don’t have to be good at maths Not the same as Statistics: includes statistical analysis but also what you have to do before and after. Data Science: reproducible workflows for the simulation, collection, organisation, processing, analysis and presentation of data.

  7. 7 Science Experiments (tests of ideas) Experimental activity Interpret and report Explanatory Response Analyse variables variables Visualise Choose / set / manipulate measure Abstraction Simulation Data skills

  8. 8 What is data science Simulate Explore Transform Tidy (mental model and activity) Model (statistics) Import Report

  9. 9 How much of data science is using statistics? Less than you probably think ~80% of your time on getting data, cleaning data, aggregating data, reshaping data, and exploring data using exploratory data analysis and data visualization. Data analysis means: getting data, reshaping it, exploring it, and visualizing it as well as modelling Reproducibility: same data + same analysis = same results

  10. 10 Reproducibility is a key feature Reproducibly Simulate Explore Transform Tidy Model Import Report

  11. 11 Rationale Experiments (tests of ideas) Experimental design Interpret and report Explanatory Response Analyse variables variables Visualise Choose / set / manipulate measure Repeatable: protocol, lab book Reproducible: scripting

  12. 12 Reproducible, Repeatable, Replicated Replication: within a study Repeatable: between studies. Independently, without the use of original data but generally using the same methods. Reproducible: The original data and original methods reproduce all of the findings of a study. Methods need to be perfectly described Patil et al. A statistical definition for reproducibility and replicability

  13. 13 That’s what it is….how will it work in this module What am I trying to do? My objectives

  14. 14 Additional learning objectives Or My objectives…. Create a learning environment characterised by ● A focus on progress and improvement ● Enjoyment and satisfaction ● Interaction and exchange of ideas ● Initiative and independence ● Supported problem solving

  15. 15 Assessment, learning objectives and approach I didn’t want ● one size fits all ● artificial/meaningless jumping through hoops ● fear of failure and judgement to interfere I did want you to ● be able to work on problems you are interested in ● be able to develop the skills needed for that ● have more supported unstructured time ● be assessed on what you can do (not what you can’t do) Core skills taught with some recipes and lots of suggested practice examples 6 workshops support you in learning how to create the assessed output: a reproducible analysis related to your project, a past ‘project’ of or provided ‘projects’

  16. 16 Module Do you need to revise? Access to the latest versions of Stage 1 and Stage 2 L01 Introduction to Data Science W01 Developing independence and good practice - Tips W02 Importing data W03 Reproducibility 1 W04 Reproducibility 2 W05 Tidy data and Tidying data W06 An Introduction to Machine Learning W07 Project work Weeks 6, 7 and 8, drop-ins Note: timetable session titles may be incorrect...VLE is correct

  17. 17 Assessment and the learning objectives i.e., Devise reproducible strategies to import, tidy and model data in R The submission is a zip file of organised files including, at least, the Rmd, the knitted output (can be html, pdf or word) and the data. An example is available on the VLE. The Rmd should be well commented and contain everything needed to recreate, and understand the recreation of, the knitted output. The knitted output should be no more than 1000 words

  18. 18 Advice Do you need to revise? Access to the latest versions of Stage 1 and Stage 2 Other sources Google Talk to people RBloggers stackoverflow Foundation for Open Access Statistics Teach others especially house-keeping

  19. 19 Reading #biol58M Genome Res. 2015. 25: 1417-1422 Good Enough Practices in Scientific Computing R for Data Science: Garrett Grolemund & Hadley Wickham

Recommend


More recommend