R Basics / Course Business l We’ll be using a sample dataset in class today: l CourseWeb: Course Documents à Sample Data à Week 2 l Can download to your computer before class l Thanks for answering CourseWeb background survey! l If sitting in on the course, e-mail me so I can add you to CourseWeb
R Basics
R Basics
R Basics l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help
R Commands l Simplest way to interact with R is by typing in commands at the > prompt: R STUDIO R
R as a Calculator l Typing in a simple calculation shows us the result: l 608 + 28 l What’s 11527 minus 283? l Some more examples: l 400 / 65 (division) l 2 * 4 (multiplication) l 5 ^ 2 (exponentiation)
Functions l More complex calculations can be done with functions: l sqrt(64) In parenthesis: What What the function we want to perform the is (square root) function on l Can often read these left to right (“square root of 64”) l What do you think this means? l abs(-7)
Arguments l Some functions have settings (“arguments”) that we can adjust: l round(3.14) Rounds off to the nearest integer (zero - decimal places) l round(3.14, digits=1) One decimal place -
Nested Functions
Nested Functions l We can use multiple functions in a row, one inside another sqrt(abs(-16)) - “Square root of the absolute value of -16” - l Don't get scared when you see multiple parentheses! Can often just read left to right - R first figures out the thing nested in - the middle Can you round off the square root of 7? •
Using Multiple Numbers at Once l When we want to use multiple numbers, we concatenate them l c(2,6,16) A list of the numbers 2, 6, and 16 - l Sometimes a computation requires multiple numbers mean(c(2,6,16)) - l Also a quick way to do the same thing to multiple different numbers: sqrt(c(16,100,144)) -
R Basics l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help
Course Documents: Sample Data: Week 2 l Reading plausible versus implausible sentences l “Scott chopped the carrots with a knife .” Measure reading time on final word “Scott chopped the carrots with a spoon .” Note: Simulated data; not a real experiment.
Course Documents: Sample Data: Week 2 l Reading plausible versus implausible sentences l Reading time on critical word l 36 subjects l Each subject sees 30 items (sentences): half plausible, half implausible l Interested in changes over time, so we’ll track number of trials remaining (29 vs 28 vs 27 vs 26…)
Reading in Data l Make sure you have the dataset at this point if you want to follow along: Course Documents à Sample Data à Week 2
Reading in Data – RStudio l Navigate to the folder in lower-right l More -> Set as Working Directory l Open a “comma-separated value” file: experiment <-read.csv('week2.csv') - Name of the “dataframe” we’re creating (whatever read.csv is the File name we want to call this dataset) function name
Reading in Data – Regular R l Read in a “comma-separated value” file: read.csv is the function name experiment <- read.csv - ('/Users/scottfraundorf/Desktop/week2.csv') Folder & file name Name of the “dataframe” we’re creating (whatever we want to call this dataset) • Drag & drop the file into R to get the full folder & filename
Looking at the Data: Summary l A “big picture” of the dataset: l summary(experiment) l summary() is a very important function! l Basic info & descriptive statistics l Check to make sure the data are correct
Looking at the Data: Summary l A “big picture” of the dataset: l summary(experiment) l We can use $ to refer to a specific column/variable in our dataset: l summary(experiment$ItemName)
Looking at the Data: Raw Data l Let’s look at the data! l experiment l Ack! That’s too much! How about just a few rows? l head(experiment) l head(experiment, n=10)
Reading in Data: Other Formats l Excel: library(gdata) - experiment <- - read.xls('/Users/scottfraundorf/De sktop/week2.xls') l SPSS: library(foreign) - experiment <- - read.spss('/Users/scottfraundorf/D esktop/week2.spss', to.data.frame=TRUE)
R Basics l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help
R Scripts l Save & reuse commands with a script R File -> New Document R STUDIO
R Scripts l Run commands without typing them all again l R Studio: l Code -> Run Region -> Run All: Run entire script l Code -> Run Line(s): Run just what you’ve highlighted/selected l R: Highlight the section of script you want to run - Edit -> Execute - l Keyboard shortcut for this: Ctrl+Enter (PC), ⌘ +Enter (Mac) -
R Scripts l Saves times when re-running analyses l Other advantages? l Some: Documentation for yourself - Documentation for others - Reuse with new analyses/experiments - Quicker to run—can automatically - perform one analysis after another
R Scripts—Comments l Add # before a line to make it a comment Not commands to R, just notes to self - (or other readers) Can also add a # to make the rest of a • line a comment summary(experiment$Subject) #awesome •
R Basics l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help
Descriptive Statistics l Remember how we referred to a particular variable in a dataframe? $ - l Combine that with functions: mean(experiment$RT) - median(experiment$RT) - sd(experiment$RT) - l Or, for a categorical variable: levels(experiment$ItemName) - summary(experiment$Subject) -
Descriptive Statistics l We often want to look at a dependent variable as a function of some independent variable(s) tapply(experiment$RT, - experiment$Condition, mean) “Split up the RTs by Condition, then get the mean” - l Try getting the mean RT for each item l How about the median RT for each subject? l To combine multiple results into one table, “column bind” them with cbind() : l cbind( tapply(experiment$RT, experiment$Condition, mean), tapply(experiment$RT, experiment$Condition, sd) )
Descriptive Statistics l Can have 2-way tables... tapply(experiment$RT, - list(experiment$Subject, experiment$Condition), mean) 1 st variable is rows, 2 nd is columns - l ...or more! tapply(experiment$RT, - list(experiment$ItemName, experiment$Condition, experiment$TestingRoom), mean)
Descriptive Statistics l Contingency tables for categorical variables: xtabs (~ Subject + Condition, - data=experiment)
R Basics l R commands & functions l Reading in data l Saving R scripts l Descriptive statistics l Subsetting data l Assigning new values l Referring to specific cells l Types & type conversion l NA values l Getting help
Subsetting Data l Often, we want to examine or use just part of a dataframe l Remember how we read our dataframe? experiment <- read.csv(...) - l Create a new dataframe that's just a subset of experiment: experiment.LongRTsRemoved <- - subset(experiment, RT < 2000) Inclusion criterion: RT Original dataframe New dataframe name less than 2000 ms
Subsetting Data: Logical Operators l Try getting just the observations with RTs 200 ms or more: experiment.ShortRTsRemoved <- - subset(experiment, RT >= 200) l Why not just delete the bad RTs from the spreadsheet? l Easy to make a mistake / miss some of them l Faster to have the computer do it l We’d lose the original data l No documentation of how we subsetted the data
Subsetting Data: AND and OR l What if we wanted only RTs between 200 and 2000 ms? Could do two steps: - experiment.Temp <- - subset(experiment, RT >= 200) experiment.BadRTsRemoved <- - subset(experiment.Temp, RT <= 2000) l One step with & for AND: experiment2 <- subset(experiment, - RT >= 200 & RT <= 2000)
Subsetting Data: AND and OR l What if we wanted only RTs between 200 and 2000 ms? l One step with & for AND: experiment2 <- subset(experiment, - RT >= 200 & RT <= 2000) l | means OR: experiment.BadRTs <- - subset(experiment, RT < 200 | RT > 2000) Logical OR (“either or both”) -
Subsetting Data: == and != l Get a match / equals: experiment.LastTrials <- - subset(experiment, TrialsRemaining == 0) Note DOUBLE equals sign l Words/categorical variables need quotes: experiment.ImplausibleSentences <- - subset(experiment, Condition=='Implausible') l != means “not equal to”: experiment.BadSubjectRemoved <- - subset(experiment, Subject != 'S23') Drops subject “S23”
Recommend
More recommend