practice in analysis of multistate
play

Practice in analysis of multistate R allows you to build powerful - PDF document

Objects and functions Practice in analysis of multistate R allows you to build powerful procedures from simple building models using Epi::Lexis blocks. These building blocks are objects and functions . All data in R is represented by objects ,


  1. Objects and functions Practice in analysis of multistate R allows you to build powerful procedures from simple building models using Epi::Lexis blocks. These building blocks are objects and functions . ◮ All data in R is represented by objects , for example: Bendix Carstensen Steno Diabetes Center Copenhagen, ◮ A dataset (called data frame in R) Gentofte, Denmark ◮ A vector of numbers & Department of Biostatistics, ◮ The result of fitting a model to data University of Copenhagen b@bxc.dk ◮ You, the user, call functions http://BendixCarstensen.com ◮ Functions act on objects to create new objects : ◮ Using glm on a dataframe (an object) produces a fitted model University of Aberdeen, 18 AUgust 2017 (another object). http://BendixCarstensen/AdvCoh/courses/Frias-2016 1/ 218 Introducing R ( Data ) 5/ 218 Because all is functions. . . Introducing R ◮ You will always (almost) use parentheses: > res <- FUN( x, y ) Bendix Carstensen, Martyn Plummer ◮ . . . which is pronounced ◮ res gets (” <- ” ) FUN of x,y (” (x,y) ” ) Practice in analysis of multistate models using Epi::Lexis University of Aberdeen, 18 AUgust 2017 http://BendixCarstensen/AdvCoh/courses/Frias-2016 Data Introducing R ( Data ) 6/ 218 The best way to learn R Vectors ◮ The best way to learn R is to use it! One of the simplest objects in R is a sequence of numbers, called a vector . ◮ This is a very short introduction before you sit down in front of a computer. You can create a vector in R with the collection ( c ) function: ◮ R is a little different from other packages for statistical > c(1,3,2) analysis. [1] 1 3 2 ◮ These differences make R very powerful, but for a new user You can save the results of any calculation using the left arrow: they can sometimes be confusing. ◮ Our first job is to help you up the initial learning curve so that > x <- c(1,3,2) you can be comfortable with R. > x [1] 1 3 2 Introducing R ( Data ) 2/ 218 Introducing R ( Data ) 7/ 218 Nothing is lost or hidden The workspace ◮ Statistical software provides“canned”procedures to address ◮ Every time you use <- , you create a new object in the common statistical problems. workspace (or overwrite an old one). ◮ A list of objects in the workspace can be seen with the ◮ Canned procedures are useful for routine analysis, but they are objects function (synonym: ls() ): also limiting. > objects() ◮ You can only do what the programmer lets you do. [ 1 ] "a" "aa" "acz2" "alpha" "b" ◮ In R, the results of statistical calculations are always accessible. [ 6 ] "bar" "bb" "bdendo" "beta" "cc" ◮ You can use them for further calculations. [ 11 ] "Col" ◮ You can always see how the calculations were done. ◮ In Epi is a function lls() that gives a bit more information on the objects. ◮ The workspace is held entirely in (volatile) computer memory and will be lost at the end of the session unless you explicitly save it. Introducing R ( Data ) 3/ 218 Introducing R ( Data ) 8/ 218 R Packages Working Directory ◮ The capabilities of R can be extended using“packages” . Every R session has a current working directory , which is the location on the hard disk where files are saved, and the default ◮ Distributed over the Internet via CRAN : location from which files are read into R. (the C omprehensive R A rchive N etwork) and can be downloaded directly from an R session. ◮ getwd() Prints the current working directory ◮ There is an R package developed during the annual course on ◮ setwd("c:/Users/Martyn/Project") sets the current “Statistical Practice in Epidemiology using R , called“ Epi ” . working directory. ◮ Contains special functions for epidemiologists and some data ◮ You may also use a Graphical User Interface (GUI) to change sets that . directory. ◮ There are 5,825 other user contributed packages on CRAN. Introducing R ( Data ) 4/ 218 Introducing R ( Data ) 9/ 218

  2. Ending an R session Building your own data frame ◮ To end an R session, call the quit() function Data frames can be constructed from a list of vectors ◮ Every time you want to do something in R, you call a function. > mydata <- data.frame(x=c(3,6,7),f=c("a","b","a")) > mydata ◮ You will be asked“Save workspace image?” x f Yes saves the workspace to the file“ .RData ”in your 1 3 a 2 6 b current working directory. It will be automatically 3 7 a loaded into R the next time you start an R session. No does not save the workspace. Character vectors are automatically converted to factors. Cancel continues the current R session without saving anything. ◮ It is recommended you just say“No” . Introducing R ( Data ) 10/ 218 Introducing R ( Data ) 15/ 218 Always start with a clean workspace Inspecting data frames Keeping objects in your workspace from one session to another can Most data frames are too large to inspect by printing them to the be dangerous: screen, so use: ◮ names returns a vector of variable names. ◮ You forget how they were made. ◮ You can use sort(names(x)) to get them in alphabetical order. ◮ You cannot easily recreate them if your data changes. ◮ head prints the first few lines, and tail . . . ◮ They may not even be from the same project ◮ str prints a brief overview of the str ucture of the data frame. It is almost always best to start with an empty workspace and use a Can be used on any object. script file to create the objects you need from scratch. ◮ summary prints a more comprehensive summary ◮ Quantiles for numeric variables ◮ Tables for factors Introducing R ( Data ) 11/ 218 Introducing R ( Data ) 16/ 218 Rectangular Data Extracting values from a data frame Rectangular data sets are common to most statistical packages Use square brackets to take subsets of a data frame ◮ mydata[1,2] . The value in row 1, column 2. ” id” ” visit” ” time” ” status” ◮ mydata[1,] . The whole of the first row. 1 1 0.0 0 ◮ mydata[,2] . The whole of the second column. 1 2 1.5 0 You can also extract a column from a data frame by name: 2 1 0.0 0 2 2 1.1 0 ◮ mydata$age . The column, or variable, named“age” 2 3 2.3 1 ◮ mydata[,"age"] . The same. Columns represent variables. Rows represent individual records. Introducing R ( Data ) 12/ 218 Introducing R ( Data ) 17/ 218 The world is not a rectangle! Importing data ◮ R has good facilities for importing data from other ◮ Most statistical packages used by epidemiologists assume that applications: all data can be represented as a rectangular data set. ◮ read.dta for reading Stata datasets. ◮ R allows a much richer set of data structures, represented by ◮ read.spss for reading SPSS datasets. objects of different classes . ◮ read.xport and read.ssd for reading SAS-datasets. ◮ Rectangular data sets are just one type of object that may be in your workspace. This class of object is called a data frame . Introducing R ( Data ) 13/ 218 Introducing R ( Data ) 18/ 218 Data Frames Reading Text Files Each column of a data frame is a variable. The function read.table reads data from a text file and returns a data frame. Variables may be of different types: ◮ mydata <- read.table("myfile") ◮ vectors: ◮ myfile could be ◮ numeric: c(1,2,3) ◮ A file in the current working directory : fem.dat ◮ character: c("John","Paul","George","Ringo") ◮ A path to a file: c:/rex/fem.dat ◮ logical: c(FALSE,FALSE,TRUE) ◮ A URL: http://BendixCarstensen.com/AdvCoh/Scot- ◮ factors: factor ( c ( "low","medium","high","low", 2014/data/bogus.txt ◮ Note: myfile must be enclosed in quotes. "low" )) write.table does the opposite. R uses a forward slash / for file paths. If you want to use backslash, you have to double it: c:\\rex\\fem.dat Introducing R ( Data ) 14/ 218 Introducing R ( Data ) 19/ 218

Recommend


More recommend