Experimental epidemiology analyses with R and R commander Lars T. Fadnes Centre for International Health University of Bergen 1
• Click to add an outline 2
How to install R commander? - install.packages("Rcmdr", dependencies=TRUE) - - Download necessary web files put them in a folder: http://statistics.fadnes.net/epi – Fieldtrials12.RData – Exercises-for-the-exp-epi-2012-03.rtf 3
Installation of RcmdrPlugin.Cih.Epi • Install R (if not done already) – http://www.r-project.org/ • Install package Rcmdr (if not done already) – Package Install package scroll down to Rcmdr click OK • Install package Epi – Package Install package scroll down to Rcmdr click OK • Download RcmdrPlugin.Cih.Epi from http://statistics.fadnes.net/epi/ • Install package(s) from local zip file (choose the package you just downloaded) • Open program – Load package Rcmdr – Tools -> load Rcmdr plug-in(s) RcmdrPlugin.Cih.Epi • click ok and Yes • Now you are ready 4
Aim for this session • Introduce a brilliant tool • Analyse a dataset with the tool • Guide to further knowledge 5
What is R? • R is a free software environment that includes a set of base packages for graphics, math, and statistics. • You can make use of specialized packages contributed by R users or write your own new functions. 6
Why R? • Very powerful • Developing extremely quickly • Working on different platforms (not only Microsoft Windows…) • Free of all costs 7
Why don’t all use R? • Click to add an outline 8
What is R commander? 9
Why R commander? • Powerful • Free of all costs • Working on different platforms (not only Microsoft Windows…) • Easy to learn and to use… 11
How to install R? For Windows: • http://cran.r-project.org/bin/windows/base/ – Easy • If here at UiB, the IT department will fix it for you if you just ask them to add it for you For Linux (Ubuntu etc): - Very Easy - Just search for R-base-core in Synaptic Package Manager and add it - http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/installation-notes.html (also contains a good description for installation on Mac) 12
How to install R commander? - Is already installed on UiB computers - - If not: install.packages("Rcmdr", dependencies=TRUE) 13
Some things to note first • R is case-sensitive – help, Help, HELP and HELF are different… – Recommendation: • Choose one style and stick to it • If it’s something you don’t know? – There are lot’s of good information on the web • Particularly for R 14
How to start? • Open R – Load packages Rcmdr or write • library(Rcmdr) 15
Menu • File Menu: – items for loading and saving script files; – for saving output and the R workspace; – and for exiting • Edit Menu: – items (Cut, Copy, Paste, etc.) for editing the contents of the script and output windows. • Data – Submenus containing menu items for reading and manipulating data. • Statistics – Submenus containing menu items for a variety of basic statistical analyses. 16
Menu • Graphs – Menu items for creating simple statistical graphs. • Models – Menu items and submenus for obtaining numerical summaries, confidence intervals, hypothesis tests, diagnostics, and graphs for a statistical model, and for adding diagnostic quantities, such as residuals, to the data set. • Distributions – Probabilities, quantiles, and graphs of standard statistical distributions (to be used, for example, as a substitute for statistical tables) and samples from these distributions. • Tools – Menu items for loading R packages unrelated to the Rcmdr package (e.g., to access data saved in another package), and for setting some options. • Help Menu – items to obtain information about the R Commander (including this manual). As well, each R Commander dialog box has a Help button (see below). 17
• Script Window – R commands generated by the R Commander – You can also type R commands directly into the script window or the R Console – The main purpose of the R Commander, however, is to avoid having to type commands. • Output Window – Printed output • Messages Window – Displays error messages, warnings, and notes • Graphics Device window – When you create graphs, these will appear in a separate window outside of the main R Commander window. 18
Available functions: 19
Click to add title 20
Save dataset under your documents folder Files and documents are available at http://statistics.fadnes.net/epi 21
Let’s get started… • Change directory (under File) – Find the folder where you placed your data file • Import data – Give it the name: fieldtrials • Save workspace as – Give a name to your file – The file contains the dataset and any models you might have generated 22
Data Types • Vectors – Quantitative difference (one vs. two apples) – including continuous (numerical) variables – Number variables coded as vectors as default • Factors – Qualitative difference (apples vs. pears) – Categorical – Text variables coded as factors as default • Matrices, lists, arrays and data frames http://www.statmethods.net/input/datatypes.html 23
Variables in dataset - define the datatypes id id number gender male/female - ( factor) treatmentarm Treatment (1=zinc, 0=placebo) - (factor) childage Age of the child in months - vector breastfed Is the child breast fed - (factor) lentils Does the child eat lentils? (0=no, 1= yes) - (factor) meat Does the child eat meat? (0=no, 1= yes) - (factor) duration Duration of diarrhea in days - vector diarsev Severe diarrhoea ≥ 10 stools per day - (factor) fever Did the child have fever at enrollment? - (factor) clusterzn2/4/8/16 cluster variables identifying living areas 24 - coded as vector, but needs to be transformed into a factor
How to save? • Save R workspace as… – This will save your data (in the R format) • Save output as… – This will save your output – Another strategy is to cut and paste what you want to save • Always save the commands (syntax) – essential if you want to re-run the analyses later – WordPad is a better option than Word etc (does not autocorrect - change to upper case etc) 25
How to write and a command? • Simply write the command in the script window, mark it and click ’Submit’ or press Ctrl+R 26
Nice to know: • When writing comments in the syntax, start with the following sign # – R will then not consider the line as a command • If you are uncertain about a function, use google or help(name-of-function) 27
– Cluster has numbers and is as default coded as vector, but needs to be recoded into a factor (categorical variables for grouping etc) 28
Now we’re ready to answer some scientific questions… 29
Compute new variable • Child age in years • Child age now given in months • of vaccination is often calculated by measuring antibodies before and after vaccination • childageyear = childage/ 12 30
Does the new variable look reasonable? • View data set 31
Summarize variable • Calculate mean, median and standard deviation for – childageyear • for each intervention arm (treatmentarm) • Numerical summaries 32
Doing calculations for subsets by generating new datasets 33
• Placebo: treatmentarm == "placebo" • Zinc: treatmentarm == "zinc" Make the other subset by changing the syntax and run ('Submit') the syntax zinc <- subset(fieldtrials3, subset=treatmentarm=="zinc") placebo <- subset(fieldtrials3, subset=treatmentarm=="placebo") 34
– You can now easily change between the datasets • 35
Make a histogram of childageyear • First for the 'zinc' dataset • Then for the 'placebo' dataset • Are they look similar? Note: – The histograms will be printed in the R window (not inside R commander) – Right click on the graph and you can copy it as metafile to paste it into a document, print it or save it 36
• Does it look normally distributed? 40 30 frequency 20 10 0 0.5 1.0 1.5 2.0 2.5 37 placebo$childageyear
Box plot • Box plot for childageyear – By treatmentarm – (first remember to select the complete fieldtrials dataset) 2.5 2.0 childageyear 1.5 1.0 0.5 placebo zinc 38 treatmentarm
Is the baseline child age different in the zinc and placebo arms? • This can be checked with a robust test not assuming normal distribution? • Check with a – non-paramethric » two-sample wilcoxon test (log rank test) » Use the ’ Exact ’ test 39
Recoding variables Diarrhoea duration can be recoded into an additional categorised variable (diarlong) – Data manage variables • value = ”factor” – Factor can be either number or word • value, value, value = ”factor” – Listed with comma • value:value = ”factor” – From lowest to highest values • else all other values • NA missing 40
Click to add title 41
Recommend
More recommend