About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Outline About this Tutorial An Introduction to the R Environment Basics of R Objects and arithmetic Matrix calculus Peter Dalgaard Important functions Working with data frames Center for Statistics Programming Copenhagen Business School Statistics with R MPAS Lecture April 2010 Modelling Graphics 1 / 70 2 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Practicalities Plan ◮ Elementary things about R ◮ Data types and some important functions ◮ Matrix calculus ◮ Short introduction (approx. 90 min) ◮ Working with data sets ◮ Focus on things relevant to your project ◮ R as a programming language ◮ Script of demos on MPAS web page ◮ Basic statistics and tests ◮ Modeling tools ◮ Elementary Graphics 4 / 70 5 / 70
About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic The R environment R is a vectorized language ◮ The basic data type in R is a vector ◮ Built around the programming language R, an Open ◮ Vectors often represent data (e.g. the age for each Source dialect of the S language participant in a study), but also other things like regression ◮ R is Free Software, and runs on a variety of platforms (I’ll coefficients, plot limits, cut points, etc. be using Linux here). ◮ Data types: Numeric (integer/double), character (strings), ◮ Command-line execution based on function calls logical (TRUE/FALSE) ◮ Extensible with user functions ◮ Factor (really integer + level attribute) for categorical ◮ Workspace containing data and functions variables ◮ Various graphics devices (interactive and non-interactive) ◮ Lists (generic vectors) 7 / 70 8 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic Objects and arithmetic Basic operations Demo 1 ◮ Standard arithmetic is vectorized : x + y adds each element of x to the corresponding element of y x <- round(rnorm(10,mean=20,sd=5)) # simulate data ◮ Recycling: If operating on two vectors of different length, x the shorter one is replicated (with warning if it is not an mean(x) even multiple) m <- mean(x) ◮ c — concatenate: c(7, 9, 13) x - m # notice recycling ◮ seq — sequences: seq(1, 9, 2) , short form: 1:5 is sqrt(sum((x - m)^2)/9) the same as seq(1,5,1) sd(x) ◮ rep — replication rep(1:3, 3:1) (1 1 1 2 2 3) ◮ sum , mean , range , . . . 9 / 70 10 / 70
About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic Objects and arithmetic Smart indexing Extended data types ◮ The basic vector types can be combined and extended to form more complex data structures ◮ a[5] single element ◮ Attributes extend a basic type with further information. ◮ a[c(5,6,7)] several elements E.g., a vector can have a names attribute, for more ◮ a[-6] all except the 6th readable printing ◮ a[b>200] index by logical vector ◮ Classes have two main functions: ◮ a["name"] by name ◮ Hide details ◮ Allow function dispatch (functions that behave differently depending on the class. 11 / 70 12 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic Objects and arithmetic Factors Lists (generic vectors) ◮ Factors are used to describe nominal variables (the term originates from factorial designs ) ◮ Internally, they are just integer codes plus a set of names ◮ A vector where the elements can have different types for the levels ◮ Functions often return (classed) lists ◮ They have class "factor" making them (a) print nicely ◮ Indexing: and (b) behave consistently ◮ lst$A ◮ A factor can also be ordered (class "ordered" ), ◮ lst[[1]] first element ◮ lst[1] list containing the first element signifying that there is a natural sort order on the levels ◮ In model specifications, factors play a fundamental role by indicating that a variable should be treated as a classification rather than as a quantitative variable. 13 / 70 14 / 70
About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Objects and arithmetic Matrix calculus Demo 2 Elementary matrix manipulations ◮ Matrices are implemented as vectors with a dim attribute (of length 2) ◮ Constructor function: matrix(1:4,2,2) (indexing, factors, lists) ◮ Indexing in the usual way M[i,j] , with all the features of “smart indexing”. M[,j] is j th column, etc. ◮ Special feature for matrices and arrays: Matrix indexing , M[A] where A has as many columns as M has dimensions. 15 / 70 16 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Matrix calculus Matrix calculus Matrix algebra Demo 3 Permutation matrix ( Mx permutes the elements of x ) ◮ R contains a pretty full set of primitives for matrix calculus perm <- sample(5) # w/o replacement ◮ A %*% B for matrix multiplication n <- length(perm) M <- matrix(0,n,n) ◮ solve(A, b) for solving linear equations. ( solve(A) for M[cbind(1:n,perm)] <- 1 M matrix inverse) perm ◮ t(A) for transpose of a matrix. M %*% 1:n 17 / 70 18 / 70
About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Matrix calculus Matrix calculus Other matrix techniques Row and column matrices ◮ diag has multiple functions: creation of diagonal matrices, ◮ R usually treats vectors as row or column matrices “as extracting, and manipulating the diagonal of a matrix. appropriate” (i.e., it guesses) Beware: diag(v) is ambiguous if v can have length 1. ◮ E.g., you can left- or right-multiply a vector by a matrix, ◮ row(X) , col(X) are convenient for generating some even though the latter formally requires transposition forms of matrices. ◮ And even do y %*% x to get the inner product y ′ x ◮ upper.tri and lower.tri generate indexes for ◮ If you want to be explicit about it, you can use rbind or accessing the upper/lower triangle of a matrix. cbind to create the appropriate single-row or ◮ Matrices can be “glued together” using cbind and rbind single-column matrix. 19 / 70 20 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Matrix calculus Important functions Using drop() and drop=FALSE Some Basic Functions ◮ Default: If a dimension has length one, it is dropped from results. M[1,] is a vector, not 1 × n matrix. ◮ Constructors of simple objects ◮ Often convenient, but source of obscure bugs ◮ Single-column modifications ◮ Watch out for extreme cases ◮ Modifying and subsetting data frames ◮ Use M[1,drop=FALSE] to prevent this ◮ Conversely sometimes you get a matrix and want a vector, as in drop(M %*% 1:n) 21 / 70 22 / 70
About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Important functions Important functions Constructors Demo 4 ◮ R deals with many kinds of objects besides data sets ◮ Need to have ways of constructing them from the command line x <- c(boys = 1.2, girls = 1.1) x ◮ We have (briefly) seen the c and list functions names(x) ◮ Notice the naming forms c(boys=1.2, girls=1.1) names(x) <- c("M", "F") x ◮ Extracting and setting names with names(x) matrix(1:4,ncol=2) cbind(x=0:3,"exp(x)"=exp(0:3)) ◮ For matrices and arrays, use the (surprise) matrix and array functions. data.frame for data frames. ◮ It is also fairly common to construct a matrix from its columns using cbind 23 / 70 24 / 70 About this Tutorial Basics of R Statistics with R Modelling Graphics About this Tutorial Basics of R Statistics with R Modelling Graphics Important functions Important functions The factor Function Demo 5 ◮ This is typically used when read.table gets it wrong ◮ E.g. group codes read as numeric aq <- airquality ◮ Or read as factors, but with levels in the wrong order (e.g. aq$Month <- factor(aq$Month, levels=5:9, c("rare", "medium", "well-done") sorted labels=month.name[5:9]) alphabetically.) aq$Month ◮ Notice the slightly confusing use of levels and labels levels(aq$Month) <- month.abb[5:9] arguments. aq$Month ◮ levels are the value codes on input ◮ labels are the value codes on output (and become the levels of the resulting factor) 25 / 70 26 / 70
Recommend
More recommend