most r novices will start with the introductory session
play

Most R novices will start with the introductory session in An - PowerPoint PPT Presentation

1 An Introduction to R: Preface Suggestions to the reader Most R novices will start with the introductory session in An Introduction to R Appendix A: A sample session on page 79 of the document. Novice or crack programmer in R, it is


  1. 1 An Introduction to R: Preface Suggestions to the reader Most R novices will start with the introductory session in An Introduction to R Appendix A: A sample session on page 79 of the document. Novice or crack programmer in R, it is advisable to follow this tutorial. If commands are familiar, simply move through that part of the tutorial with haste. It is likely that any R user will learn atleast a few things from this tutorial.

  2. 2 1.4 R and the window system; 1.5 Using R interactively Provided the popularity of the Windows Operating System (OS) generally and its use on the desktops in the labratory for STAT 695V, what is common interactivity in the UNIX environment should also be understood for the use in the R GUI Windows platform. There is an h: drive on your ITaP account. Access it. To make a ’subdirectory’ (folder) called "work" in it, right click in an open space, scroll down to "New", select "Folder", and name it "work". Once this is done, in R observe the following commands: > getwd() [1] "h:/..." > setwd("h:/work") > getwd() [1] "h:/work" You will need to do this EVERY time you open up R unless you establish setwd("h:/work") as a command in your .First function (p. 48) within your ".Rprofile" file, which is initiated upon opening R. However, this practice is not recommended even if you establish a most commonly used directory. Thus, just start every R session in this way.

  3. 3 1.5 Using R interactively...Purpose For a task of any size, files from a R session will almost inevitably need to be created. Once determining the best location for files for the task to be undertaken. When you determine this for the project at hand you will want to create the necessary directory and set the Working Directory (WD) to it immediately proceeding the initiaititiation of your R session. Do this now for this class.

  4. 4 1.10 Executing commands from or diverting output to a file A test file called "lecture1.R" to observe how section 1.10 works has been created in the R data files subdirectory. The file loads the Permanent Seat Licenses (PSLs) Data Frame (DF), which will be a running example for these slides. Save it to your WD for this class then input the following commands: > source("lecture1.R") > ls() [1] "psl.rawsold.df" > head(psl.rawsold.df) day month year num area sec row total perseat price 28 8 2007 4 265sc3 304 10 20000 5000 265 > attach(psl.rawsold.df) #To be run with data of use. > save(perseat, file = "psl.rawsold.response.RData") > q() Reopen R and input these commands: > load("psl.rawsold.response.RData") > ls() [1] "perseat" > head(perseat) [1] 5000 4950 11000 5500 16500 6125

  5. 5 2 Simple manipulations; numbers and vectors: 2.1 Vectors and assignment R is based on a vectorization system, so numbers are merely vectors of length 1. There are multiple ways of assigning an object of class number: x <- 1 == x <- c(1) == c(1) -> x == 1 -> x == assign("x", 1) == assign("x", c(1)) Similarly for vectors of length = n > 1 aside from the fact that the function c() (concatenate) must be used for vectors of length(x) = n > 1: x <- c(1, 2) == c(1, 2) -> x == assign("x", c(1, 2)) In most cases the "=" operator will suffice for "<-", but you may as well utilize the pointer convention to avoid times when the alternative will not work.

  6. 6 2.2 Vector arithmetic The elementary arithematic operators of +, -, *, /, ˆ are operable on vectors in R as well as other common functions of log, exp, sin, cos, tan, sqrt, max, min, range, sum, prod, length, floor, ceiling, round, sort, mean(x) == sum(x)/length(x), median, fivenum, summary, var(x) == sum((x - mean(x))ˆ2)/(length(x) - 1) == sˆ2 A few examples of these operators are in use below: > load("psl.rawsold.response.RData") > head(log(perseat, base = 2)) [1] 12.28771 12.27321 13.42522 12.42522 14.01018 12.5 > sum(perseat) [1] 7805644 > length(perseat) [1] 988 > summary(perseat) Min. 1st Qu. Median Mean 3rd Qu. Max. 300 4250 6250 7900 10000 46250 > sd(perseat) [1] 6239.572

  7. 7 2.3 Generating regular sequences seq, rep, and ":" are vector operators more specific than c() that reduce the amount of time it takes to program vectors of regular sequences. For a < b: a:b == c(a, a + 1,..., a + i) where b - 1 < a + i <= seq(a, b, by = 1) == a:b rep(a, b) == rep(a, floor(b)) == c(a, a,...,a) (b as) rep(c(a, b), each = b) == c(rep(a, b), rep(b, b)) d*a:b == c(d*a, d*(a + 1),..., d*(a + i)) where d*(b - 1) < d*(a + i) < d*b seq(a, b, d) == c(a, a + d,..., a + d*j) where b - d < a + d*j <= b seq(length = n, from = a, by = d) == c(a, a + d,...) For a > b the sequences are merely reversed in order. An example of these operators is in use below: > 3*1.5:9.75 [1] 4.5 7.5 10.5 13.5 16.5 19.5 22.5 25.5 28.5

  8. 8 2.4 Logical vectors The logical operators of <, <=, >, >=, !, |, &, == determine the logical condition of TRUE, FALSE, | NA on vectors at each index. Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors of FALSE == 0, TRUE == 1. A few examples of these operators are in use below: > load("psl.rawsold.perseat.RData") > bin <- perseat > 5500 & perseat < 16500 > head(bin) [1] FALSE FALSE TRUE FALSE FALSE TRUE > bin01 <- as.numeric(bin) > bin01 [1] 0 0 1 0 0 1

  9. 9 2.5 Missing values NA is used for "Not Available" missing and NaN is used for "Not a Number" and the functions of is.na() to yield true for either NAs or NaNs and is.nan() to yield true for only NaNs A few examples of these operations are in use below: > x <- c(1, NA, 4, 0/0) > x [1] 1 NA 4 NaN > is.na(x) [1] FALSE TRUE FALSE TRUE > is.nan(x) [1] FALSE FALSE FALSE TRUE

  10. 10 3.4 The class of an object All objects in R have a class, reported by the function class. For simple vectors this is just the mode, for example "numeric", "logical", "character", or "list", but "matrix", "array", "factor", and "data.frame" are other possible values.

  11. 11 4 Ordered and unordered factors A factor is a vector object used to specify a discrete classification (grouping) of the components of other vectors of the same length. R provides both ordered and unordered factors. While the "real" application of factors is with model formulae, we here look at a specific example.

  12. 12 4.1 Permanent Seat Licenses (PSLs) example The PSL data file is in the R data files subdirectory. Save it to your WD for this class then input the following commands: > load("psl.rawsold.RData") > attach(psl.rawsold.df) > head(area) [1] 265sc3 335mc3 265sc2 265sc2 335mc2 110s3 13 Levels: 90e2 108a1 110s3 120m3 125p3a2 ... 385pc23 > class(area) [1] "factor" > levels(area) [1] "108a1" "110s3" "120m3" "125p3a2" "128s1" ...

  13. 13 4.2 The function tapply() and ragged arrays tapply(numeric, factor, function) stands for table apply, which in a tabular fashion extends/applies a function that can be applied to a numeric vector to be separated via a factor vector of the same data frame of the same vector length. This is an extraordinarilly useful function to receive quick statistical summaries per the factor(s) of the data frame to be analyzed. As examples, take the 5-# + mean, summary(perseat) and sort(median(perseat)) conditioned on (per) area: > load("psl.rawsold.RData") > attach(psl.rawsold.df) > tapply(perseat, area, summary) $’108a1’ Min. 1st Qu. Median Mean 3rd Qu. Max. 5667 7250 8075 8268 9250 13750 $’110s3’ ... > sort(tapply(perseat, area, median)) 265sc3 265sc2 335mc3 90e2 110s3 125p3a2 ... 2250.00 4250.00 4975.00 5162.5 7000.00 7875.0 ...

  14. 14 4.3 Ordered Factors The levels of factors are stored in alphabetical order, or in the order they were specified to factor if they were specified explicitly. Thus, typically, it is left to the analyst to alter the order of the levels of a factor manually if one should exist in the factor. This will aid in the comprehension of your analysis particularly when plotting utilizing lattice graphics. The levels of area in the PSL Data Frame (DF) are not ordered in the preferred order. A more logical and useful ordering would be to order them by price. It is nearly there by virtue of the applied alphabetical ordering already in place, so the necessary adjustment will be minimal: > load("psl.rawsold.RData") > attach(psl.rawsold.df) > n <- length(levels(area)) > area <- factor(area, levels(area)[c(n, 1:(n - 1))]) > psl.rawsold.df$area <- area > save(psl.rawsold.df, file = "psl.rawsold.RData") > levels(area) [1] "90e2" "108a1" "110s3" "120m3" "125p3a2" ...

Recommend


More recommend