BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of Nebraska at Omaha February 18, 2020 @jaheppler
Welcome!
Hi. I'm Jason. I like to gesture at screens. Digital Engagement Librarian , University of Nebraska at Omaha Mentor, Mozilla Open Leaders Researcher, Humanities+Design , Stanford University
Today's plan • Basics of R • How the language works • Symbols and grammar • Math and statistics • R data types • Interactive worksheet! Open up RStudio. We'll start doing a few things together soon.
Some vocabulary • Packages are add-on features for R that can include data, new functions and methods, and extended capabilities. • Scripts are where you store commands to be run by R. • Functions are commands that do something to an object in R. • Dataframe is the main element for statistical purposes, an object with rows and columns. • Workspace is the working memory of R where all objects are stored. • Vector is the basic unit of data in R
A note on maintaining R • Adding packages to R also means keeping them up-to-date. Use update.packages() in the console or the package update interface in RStudio. • Package updates are at the whim of the package developer. There may not be a regular release cycle.
Help! • R is good at helping you through self-guidance. • Try typing ?summary in the console. • Now try typing ??regression . • If you're getting odd warnings or errors? Jump over to Google or Stack Overflow. A number of R Core members hang out there.
(Some more) Help! • The most important thing you can do in getting help is to have a reproducible example available (a short simulated data and code that replicates the problem). For example: foo <- c(1, "b", 5, 7, 0) bar <- c(1, 2, 3, 4, 5) foo + bar Error: non-numeric argument to binary operator
The Data Frame • Open up RStudio and type in: data(mtcars) mtcars mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 [ reached 'max' / getOption("max.print") -- omitted 23 rows ]
The Data Frame • The data frame operates a lot like a spreadsheet, and it's the central feature for doing data analysis in R. • Data frames are the primary method for storing and manipulating data. • Unlike a spreadsheet, everything we do to the data frame will either be by the entire row or entire column.
R as a calculator 2 + 2 [1] 4 2 * pi # multiply by a constant [1] 6.283 7 + runif(1, min = 0, max = 1) # add a random variable [1] 7.375 4^4 # powers [1] 256 sqrt(4^4) # functions [1] 16
R as a calculator data(trees) median(trees$Girth) var(trees$Girth) # variance sd(trees$Girth) # standard deviation max(trees$Girth) # max value min(trees$Girth) # min value range(trees$Girth) # range quantile(trees$Girth) # quantiles 25% fivenum(trees$Girth) # box plot elements length(trees$Girth) # number of observations for a variable length(trees) # number of observations for a dataset nrows(trees) # number of rows in a data frame
Arithmetic Operators R can do the usual arithmetic operators + - = / * and ^ , plus integer division %/% and remainder integer division %% . Try the following in the console: 2 + 2 2/2 2 * 2 2^2 2 == 2 23%/%2 23%%2
Symbols <- is the assignment operator. RStudio keyboard shortcut exists for macOS ( Option + - ) and Windows and Linux ( Alt + - ). This is different from most programming languages, which often use a single = for assignment. Try entering into your console: foo <- 3 foo
Symbols : is the sequence operator. We can create ranges this way. Try the following in the console: 1:10 You can also store these ranges in a variable. Try: a <- 100:120 a
Symbols # is for writing comments. Anything after the # is not evaluated and ignored by R. # Something I want to keep from R, but mostly # notes for myself or someone else so they # understand what's happening with the follow # code. Below, we just add two numbers. 2 + 2
Advanced Math We can do plenty of advanced math in R. For example, we can generate distributions of data very easily. Try this: rnorm(100) Neat, huh? Now try this: hist(rnorm(10000)) On advanced R: • Hadley Wickham, Advanced R <https://adv-r.hadley.nz> • Hadley Wickham, R for Data Science <https://r4ds.had.co.nz>
The Workspace R lets us store vectors, datasets, and functions in memory. All R objects are stored in the memory of the computer, and R makes it easy for organizing the workspace. Try the following in the console: x <- 5 # store the variable x # print the variable z <- 3 ls() # list all variables ls.str() # list and describe variables rm(x) # delete a variable ls()
R as a language R, like any programming language, has a set of rules to follow. You'll learn more as you go, but let's cover a few quick ones. 1. Case sensitivity matters. A and a are not the same. a <- 3 A <- 4 print(c(a, A)) # what happens if you type print(a,A)? Are they the same? a == A
R as a language 2. c() stands for concatenate , and allows vectors to have multiple elements. If you need two elements in a vector, you need to wrap it up in c . c() can put together any vectors, but typically you want to keep the objects of the vector all of the same type (e.g., don't mix strings and numbers). G <- c(3,4) print(G)
R as a language 3. R is maddeningly inconsistent in naming conventions. Some functions are camelCase , others are.dot.separated , others used_underscores . RStudio autocomplete can try to help. 4. R has multiple packages and functions that do the same thing, even sometimes sharing function names. Sometimes you'll need to tell R explicitly which package you're referring to. This is done with two colons :: (e.g., dplyr::filter() )
R as a language: Objects Everything in R is an object, even functions. We can manipulate objects in a variety of ways. For example, we can apply the summary function to a variety of object types. Let's try this. # summary of columns 1, 2, and 3 summary(mtcars[, 1:3]) # summary of a single column summary(mtcars$mpg)
R as a language: Objects Since everything is an object in R, we can do all sorts of operations against them. length(unique(mtcars$mpg)) We can also store the results of function calls. unique_mpg <- length(unique(mtcars$mpg)) unique_mpg
R as a language: Operators We can use comparison operators to compare values across vectors . ( < > <= >= == != ) big <- c(9,12,15,25) small <- c(9, 3, 4, 2) big > small big == small # don't do big = small!
R Data Modes R allows us to implement different data types. Three basic ones are supported: numeric, character, and logical. Vectors must be of one consistent type of data. If you make a vector that mixes types, it will default to a character vector. is.numeric(A) is.character(A) is.logical(A) # If you don't know what the data type is, # just ask! class(A)
R Data Modes There are several more supported classes in R beyond numeric, character, and logical. This includes things like linear models, matrices, networks, spatial data frames, and others. Classes determine what you can and cannot do to objects.
Let's do some more hands-on Head on over to https://tinyurl.com/unobootcamp
Questions? Troubleshooting? Next workshop: February 25, 1:30p-3p: Spark Joy with Data (CL 232)
Recommend
More recommend