1 Reproducibility 1 Good practice Workshop 3
2 Aim In this session you will practice creating reproducible work. Your success in achieving this will allow you easily recycle code for the assessment. Objectives By following the slides and applying the techniques to the workshop examples the successful student will be able to: - explain the importance and principles of reproducible research - follow good programming practice in terms of directory structure, code formatting and design, variable naming and comments - design reproducible analyses and evaluate their success in doing so
3 Rationale: Extension of scientific good practice Experiments (tests of ideas) Experimental design Interpret and report Explanatory Response Analyse variables variables Visualise Choose / set / manipulate measure Repeatable: protocol, lab book Reproducible: scripting
4 Extension of scientific good practice Lab book for computational work Readability - Could future you or others understand what you did and why? Could you repeat? Reproducibility - Could you (or others) recreate everything from data import to results communication? Reproducibility plus - Could you track development of reproducible work
5 Reproducibility Is best practice, important in research collaboration and mandatory in many industry settings Will likely become mandatory for science publication and funding Will ultimately make your life much easier Requires time, diligence and practice Some reproducibility is better than none Has ‘impact’
6 Reproducibility continuum Organise your files Increasing reproducibility Script everything Organisation and Comments Code: formatting and style Code: ‘algorithmically’ / ‘algebraically’ Use RStudio, R Markdown, knitr Collaboration and Version control: Git and GitHub Public repositories of protocols, raw data, and source code
7 Reproducibility continuum Required here Organise your files Increasing reproducibility Script everything Organisation and Comments Today Code: formatting and style Code: ‘algorithmically’ / ‘algebraically’ Use RStudio, R Markdown, knitr Next week Collaboration and Version control: Git and GitHub Public repositories of protocols, raw data, and source code
8 Organise your files
9 Reproducibility: Script everything Write everything down. Already introduced: Getting started in every prac 1. Start RStudio. 2. Make a new script file called workshop1.R 3. Set your working directory (to script file location?) - in the script
10 Reproducibility: Organisation and Comments For .R scripts Use plain text format (easy in RStudio) Divide the script into sections Use comments extensively Use space extensively
11 Reproducibility: Code “The only way to write good code is to write tons of shitty code first. Feeling shame about bad code stops you from getting to good code” Formatting and style names(pigeon)[1] <- "interorbital" hist(pigeon$interorbital, Most important: be consistent xlim = c(8, 14), Spaces after commas, around main = NULL, operators (except :) xlab = "Width (mm)", ylab = "Number of pigeons", Indentation of blocks, layout col = "grey") Limit width names(pigeon)[1]<-"interorbital" hist(pigeon$interorbital,xlim=c(8,14),main=NULL,xlab="Width (mm)",ylab="Number of pigeons",col="grey") http://yihui.name/formatR/ Yihui Xie (2016) http://adv-r.had.co.nz/Style.html
12 Reproducibility: Code Emulate others! Formatting and style Naming “ There are only two hard things in Computer Science: cache invalidation and naming things .” Files: localisation.R Variables: lowercase, meaningful, use _ between words max_value Most important: be consistent
13 Reproducibility: Code ‘Algorithmically’ / ‘algebraically’ ‘superparametrically Code which expresses the structure of the problem/solution > sum(3, 5, 6, 7, 8) / 5 [1] 5.8 > (3 - 5.8)^2 + (5 - 5.8)^2 + (6 - 5.8)^2 + (7 - 5.8)^2 + (8 - 5.8)^2 [1] 14.8 > x <- c(3, 5, 6, 7, 8) > aver <- sum(x) / length(x) > sum((x - aver)^2) [1] 14.8
14 Citing packages • Packages should be cited – and R helps with that too: > citation("MASS") To cite the MASS package in publications use: Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 > citation("ggplot2") To cite the ggplot2 in publications, please use: H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.
15 Reproducibility continuum Organise your files Increasing reproducibility Script everything Required here Organisation and Comments Code: formatting and style Code: ‘algorithmically’ / ‘algebraically’ Use RStudio, R Markdown, knitr Collaboration and Version control: Git and GitHub Public repositories of protocols, raw data, and source code
Recommend
More recommend