experimental epidemiology
play

Experimental epidemiology analyses with R and R commander Lars T. - PowerPoint PPT Presentation

Experimental epidemiology analyses with R and R commander Lars T. Fadnes Centre for International Health University of Bergen 1 Click to add an outline 2 How to install R commander? - install.packages("Rcmdr",


  1. Experimental epidemiology analyses with R and R commander Lars T. Fadnes Centre for International Health University of Bergen 1

  2. • Click to add an outline 2

  3. How to install R commander? - install.packages("Rcmdr", dependencies=TRUE) - - Download necessary web files put them in a folder: http://statistics.fadnes.net/epi – Fieldtrials12.RData – Exercises-for-the-exp-epi-2012-03.rtf 3

  4. Installation of RcmdrPlugin.Cih.Epi • Install R (if not done already) – http://www.r-project.org/ • Install package Rcmdr (if not done already) – Package Install package scroll down to Rcmdr click OK • Install package Epi – Package Install package scroll down to Rcmdr click OK • Download RcmdrPlugin.Cih.Epi from http://statistics.fadnes.net/epi/ • Install package(s) from local zip file (choose the package you just downloaded) • Open program – Load package Rcmdr – Tools -> load Rcmdr plug-in(s) RcmdrPlugin.Cih.Epi • click ok and Yes • Now you are ready 4

  5. Aim for this session • Introduce a brilliant tool • Analyse a dataset with the tool • Guide to further knowledge 5

  6. What is R? • R is a free software environment that includes a set of base packages for graphics, math, and statistics. • You can make use of specialized packages contributed by R users or write your own new functions. 6

  7. Why R? • Very powerful • Developing extremely quickly • Working on different platforms (not only Microsoft Windows…) • Free of all costs 7

  8. Why don’t all use R? • Click to add an outline 8

  9. What is R commander? 9

  10. Why R commander? • Powerful • Free of all costs • Working on different platforms (not only Microsoft Windows…) • Easy to learn and to use… 11

  11. How to install R? For Windows: • http://cran.r-project.org/bin/windows/base/ – Easy • If here at UiB, the IT department will fix it for you if you just ask them to add it for you For Linux (Ubuntu etc): - Very Easy - Just search for R-base-core in Synaptic Package Manager and add it - http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html (also contains a good description for installation on Mac) 12

  12. How to install R commander? - Is already installed on UiB computers - - If not: install.packages("Rcmdr", dependencies=TRUE) 13

  13. Some things to note first • R is case-sensitive – help, Help, HELP and HELF are different… – Recommendation: • Choose one style and stick to it • If it’s something you don’t know? – There are lot’s of good information on the web • Particularly for R 14

  14. How to start? • Open R – Load packages Rcmdr or write • library(Rcmdr) 15

  15. Menu • File Menu: – items for loading and saving script files; – for saving output and the R workspace; – and for exiting • Edit Menu: – items (Cut, Copy, Paste, etc.) for editing the contents of the script and output windows. • Data – Submenus containing menu items for reading and manipulating data. • Statistics – Submenus containing menu items for a variety of basic statistical analyses. 16

  16. Menu • Graphs – Menu items for creating simple statistical graphs. • Models – Menu items and submenus for obtaining numerical summaries, confidence intervals, hypothesis tests, diagnostics, and graphs for a statistical model, and for adding diagnostic quantities, such as residuals, to the data set. • Distributions – Probabilities, quantiles, and graphs of standard statistical distributions (to be used, for example, as a substitute for statistical tables) and samples from these distributions. • Tools – Menu items for loading R packages unrelated to the Rcmdr package (e.g., to access data saved in another package), and for setting some options. • Help Menu – items to obtain information about the R Commander (including this manual). As well, each R Commander dialog box has a Help button (see below). 17

  17. • Script Window – R commands generated by the R Commander – You can also type R commands directly into the script window or the R Console – The main purpose of the R Commander, however, is to avoid having to type commands. • Output Window – Printed output • Messages Window – Displays error messages, warnings, and notes • Graphics Device window – When you create graphs, these will appear in a separate window outside of the main R Commander window. 18

  18. Available functions: 19

  19. Click to add title 20

  20. Save dataset under your documents folder Files and documents are available at http://statistics.fadnes.net/epi 21

  21. Let’s get started… • Change directory (under File) – Find the folder where you placed your data file • Import data – Give it the name: fieldtrials • Save workspace as – Give a name to your file – The file contains the dataset and any models you might have generated 22

  22. Data Types • Vectors – Quantitative difference (one vs. two apples) – including continuous (numerical) variables – Number variables coded as vectors as default • Factors – Qualitative difference (apples vs. pears) – Categorical – Text variables coded as factors as default • Matrices, lists, arrays and data frames http://www.statmethods.net/input/datatypes.html 23

  23. Variables in dataset - define the datatypes id id number gender male/female - ( factor) treatmentarm Treatment (1=zinc, 0=placebo) - (factor) childage Age of the child in months - vector breastfed Is the child breast fed - (factor) lentils Does the child eat lentils? (0=no, 1= yes) - (factor) meat Does the child eat meat? (0=no, 1= yes) - (factor) duration Duration of diarrhea in days - vector diarsev Severe diarrhoea ≥ 10 stools per day - (factor) fever Did the child have fever at enrollment? - (factor) clusterzn2/4/8/16 cluster variables identifying living areas 24 - coded as vector, but needs to be transformed into a factor

  24. How to save? • Save R workspace as… – This will save your data (in the R format) • Save output as… – This will save your output – Another strategy is to cut and paste what you want to save • Always save the commands (syntax) – essential if you want to re-run the analyses later – WordPad is a better option than Word etc (does not autocorrect - change to upper case etc) 25

  25. How to write and a command? • Simply write the command in the script window, mark it and click ’Submit’ or press Ctrl+R 26

  26. Nice to know: • When writing comments in the syntax, start with the following sign # – R will then not consider the line as a command • If you are uncertain about a function, use google or help(name-of-function) 27

  27. – Cluster has numbers and is as default coded as vector, but needs to be recoded into a factor (categorical variables for grouping etc) 28

  28. Now we’re ready to answer some scientific questions… 29

  29. Compute new variable • Child age in years • Child age now given in months • of vaccination is often calculated by measuring antibodies before and after vaccination • childageyear = childage/ 12 30

  30. Does the new variable look reasonable? • View data set 31

  31. Summarize variable • Calculate mean, median and standard deviation for – childageyear • for each intervention arm (treatmentarm) • Numerical summaries 32

  32. Doing calculations for subsets by generating new datasets 33

  33. • Placebo: treatmentarm == "placebo" • Zinc: treatmentarm == "zinc" Make the other subset by changing the syntax and run ('Submit') the syntax zinc <- subset(fieldtrials3, subset=treatmentarm=="zinc") placebo <- subset(fieldtrials3, subset=treatmentarm=="placebo") 34

  34. – You can now easily change between the datasets • 35

  35. Make a histogram of childageyear • First for the 'zinc' dataset • Then for the 'placebo' dataset • Are they look similar? Note: – The histograms will be printed in the R window (not inside R commander) – Right click on the graph and you can copy it as metafile to paste it into a document, print it or save it 36

  36. • Does it look normally distributed? 40 30 frequency 20 10 0 0.5 1.0 1.5 2.0 2.5 37 placebo$childageyear

  37. Box plot • Box plot for childageyear – By treatmentarm – (first remember to select the complete fieldtrials dataset) 2.5 2.0 childageyear 1.5 1.0 0.5 placebo zinc 38 treatmentarm

  38. Is the baseline child age different in the zinc and placebo arms? • This can be checked with a robust test not assuming normal distribution? • Check with a – non-paramethric » two-sample wilcoxon test (log rank test) » Use the ’ Exact ’ test 39

  39. Recoding variables Diarrhoea duration can be recoded into an additional categorised variable (diarlong) – Data  manage variables • value = ”factor” – Factor can be either number or word • value, value, value = ”factor” – Listed with comma • value:value = ”factor” – From lowest to highest values • else all other values • NA missing 40

  40. Click to add title 41

Recommend


More recommend