An Introduction to Statistical Computing in R K2I Data Science Boot Camp - Day 1 AM Session May 15, 2017 Statistical Computing in R May 15, 2017 1 / 55
AM Session Outline Intro to R Basics Plotting In R Data Manipulation Statistical Computing in R May 15, 2017 2 / 55
R Basics Here we will give a quick overview of the R language and the RStudio IDE. Our emphasis will be to explore the most used features of R, especially those used in later courses. This won’t cover all the details, but will the most important parts. Statistical Computing in R May 15, 2017 3 / 55
Working with Rstudio Before beginning with R let’s orient ourselves with RStudio. Statistical Computing in R May 15, 2017 4 / 55
Our initial view of RStudio is: Statistical Computing in R May 15, 2017 5 / 55
Go to: File -> New File -> R Script. This gives: Statistical Computing in R May 15, 2017 6 / 55
Statistical Computing in R May 15, 2017 7 / 55
Try It Out Type the following into console ?lm ??linear plot(1:20, 1:20) Statistical Computing in R May 15, 2017 8 / 55
There are several useful shortcut keys in RStudio. A few popular ones: Ctrl+Enter - When pressed in Editor, sends current line to console. Ctrl+1 , Ctrl+2 - switch between editor and console Ctrl+Shift+Enter - run entire script in console tab completion - this is perhaps the most used feature For vim / emacs users Tools -> Global Options -> Code -> Keybindings will give you your prefered bindings. Statistical Computing in R May 15, 2017 9 / 55
It’s important to know our working directory. Given a file name, R will assume it is located in your current working directory. R will also save output to the working directory by default. It is important to set your working directory to the correct location or specify full path names. Statistical Computing in R May 15, 2017 10 / 55
Try out the following in the console window: getwd() list.files() To change your working directory go to: Session -> Set Working Directory -> Choose Directory Alternatively, setwd("/path/to/directory") Statistical Computing in R May 15, 2017 11 / 55
Reading, Writing, Saving, and Loading Here we’ll look at bringing data into R and getting it out We’ll also see how to save R objects and environments Statistical Computing in R May 15, 2017 12 / 55
Reading In Data read.table read.csv read.fwf Check out options for each ?read.table Statistical Computing in R May 15, 2017 13 / 55
Syntax ?read.table ?read.csv read.table("/path/to/your/file.ext", header=TRUE, sep=",", stringsAsFactors = FALSE) Statistical Computing in R May 15, 2017 14 / 55
Most Common Options sep tells how fields/variables are separated. Commons values are: ”,” (comma) ” ” (single space) ” \t ” (tab escape character) stringsAsFactors tells whether to treat non numeric values as factor/categorical variables. header tells whether first line of file has variable names na.strings tells how missing values are encoded in the file. Statistical Computing in R May 15, 2017 15 / 55
Standard Procedure Open file in text editor Check items relevant to options. Header? Separator type? For big files, Linux tools are helpful: head -n10 BigFile.txt > OpenMe Statistical Computing in R May 15, 2017 16 / 55
Try it Out Let’s read in the ReadMeInX.txt files into R. Try it on your own before looking at the answer on the next slides. Example workflow: 1 Set your working directory to the directory containing the files. 2 Examine the files in a text editor to check for common options (header, separator, etc.) Statistical Computing in R May 15, 2017 17 / 55
# read.table's default seperator ok for this one set0 <- read.table("ReadMeIn0.txt", header=TRUE) # specify new seperator set1 <- read.table("ReadMeIn1.txt", header=TRUE, sep=',') # Or use read.csv set1 <- read.csv("ReadMeIn1.txt", header=TRUE) Statistical Computing in R May 15, 2017 18 / 55
# another change of seperator set2 <- read.table("ReadMeIn2.txt", header=TRUE, sep=';') # check for missing set3 <- read.table("ReadMeIn3.txt", header=FALSE, sep=',', na.strings = '') Statistical Computing in R May 15, 2017 19 / 55
Writing Data write.table write.csv Statistical Computing in R May 15, 2017 20 / 55
Syntax and Common Options ?write.csv write.csv(myRObject, file="/path/to/save/spot/file.csv", row.names=FALSE) Options largely the same as their read counterparts row.names = FALSE is helpful to avoid have 1,2,3,... as a variable/column Statistical Computing in R May 15, 2017 21 / 55
Try It Out Write out one of the files you imported. Try to varying options like sep , quote . Statistical Computing in R May 15, 2017 22 / 55
Saving Objects saveRDS / readRDS are used to save (compressed version of) individual R objects # save our data set saveRDS(set1,file="TstObj.rds") # get it back newtst <- readRDS("TstObj.rds") # can save any R object. Try a vector my.vector <- c(1,8,-100) saveRDS(my.vector, file="JustAVector.rds") Statistical Computing in R May 15, 2017 23 / 55
Saving Environment We can save all variables in the current R workspace with save.image We can load in a saved workspace with load R will ask you save your work when you exit # Save all our work save.image("AllMyWork.RData") # Reload it load("AllMyWork.RData") # name given to default save load(".RData") Statistical Computing in R May 15, 2017 24 / 55
The Basics of R Let’s do a whirlwind tour of R: it’s syntax and data structures This won’t cover all the details, but will the most important parts Statistical Computing in R May 15, 2017 25 / 55
Basic R Data Types # numeric types: interger, double 348 # character "my string" # logical TRUE FALSE # artithmetic as you'd expect 43 + 1 * 2^4 # so too logical operators/comparison TRUE | FALSE 1 + 7 != 7 # Other logical operators: # &, |, ! # <,>,<=,>=, ==, != Statistical Computing in R May 15, 2017 26 / 55
Data Types Cont. # variables assignment is done with the <- operator my.number <- 483 # the '.' above does nothing. we could have done: # mynumber <- 483 # instead # it's an Rism to use .'s in variable names. # typeof() tells use type typeof(my.number) ## [1] "double" # we can convert between types my.int <- as.integer(my.number) typeof(my.int) ## [1] "integer" Statistical Computing in R May 15, 2017 27 / 55
R Data Structures - Vectors # the vector is the most important data structure # create it with c() my.vec <- c(1,2,67,-98) # get some properties str(my.vec) ## num [1:4] 1 2 67 -98 length(my.vec) ## [1] 4 # access elements with [] my.vec[3] ## [1] 67 my.vec[c(3,4)] ## [1] 67 -98 # can do assignment too my.vec[5] <- 41.2 Statistical Computing in R May 15, 2017 28 / 55
Vectors - Cont. # other ways to create vectors x <- 1:6 y <- seq(7,12,by=1) # Operations get recycled through whole vector x + 1 ## [1] 2 3 4 5 6 7 x > 3 ## [1] FALSE FALSE FALSE TRUE TRUE TRUE # Can do component wise operations between vectors x * y ## [1] 7 16 27 40 55 72 x / y ## [1] 0.1428571 0.2500000 0.3333333 0.4000000 0.4545455 0.5000000 y %/% x ## [1] 7 4 3 2 2 2 Statistical Computing in R May 15, 2017 29 / 55
Try It Out # Try guess what the following lines will do # Will it run at all? If so, what will it give? # Think about it and run to confirm 7 -> w w <- z <- 44 1 + TRUE 0 | 15 & 3 my.vec[2:4] my.vec[-2] my.vec[c(TRUE,FALSE,FALSE,TRUE,FALSE)] my.vec[ sum( c(TRUE,FALSE,FALSE,TRUE,TRUE) ) ] <- TRUE my.vec[3] <- "I'm a string" as.numeric(my.vec) x[x>3] x + c(1,2) Statistical Computing in R May 15, 2017 30 / 55
Matrices # matricies are 2d vectors. # create using matrix() my.matrix <- matrix(rnorm(20),nrow=4,ncol=5) # rnorm() draws 20 random samples from a n(0,1) distribution my.matrix ## [,1] [,2] [,3] [,4] [,5] ## [1,] 0.5351131 1.08710882 0.5670939 0.2800755 -0.8050743 ## [2,] -1.9263838 0.86267009 0.7318280 0.4177110 -0.9576529 ## [3,] -1.2931770 -1.03381286 -0.9035750 1.9787516 0.3747967 ## [4,] -2.6190953 -0.04829205 1.3157181 1.2562005 0.1131199 # note matricies loaded by column # Get details dim(my.matrix) ## [1] 4 5 nrow(my.matrix) ## [1] 4 ncol(my.matrix) ## [1] 5 Statistical Computing in R May 15, 2017 31 / 55
Matrices - Cont. # Indexing is similar to vectors but with 2 dimensions # get second row my.matrix[2,] ## [1] -1.9263838 0.8626701 0.7318280 0.4177110 -0.9576529 # get first,last columns of row three my.matrix[3,c(1,4)] ## [1] -1.293177 1.978752 # transposing done with t() Statistical Computing in R May 15, 2017 32 / 55
Recommend
More recommend