The R Language A Hands-on Introduction Venkatesh-Prasad Ranganath http://about.me/rvprasad
What is R? • A dynamical typed programming language • http://cran.r-project.org/ • Open source and free • Provides common programming language constructs/features • Multiple programming paradigms • Numerous libraries focused on various data-rich topics • http://cran.r-project.org/web/views/ • Ideal for statistical calculation; lately, the go-to tool for data analysis • Accompanied by RStudio, a simple and powerful IDE • http://rstudio.org
Data Types (Modes) • Numeric • Character • Logical (TRUE / FALSE) • Complex • Raw (bytes)
Data Structures • Vectors • Matrices • Arrays • Lists • Data Frames • Factors • Tables
Data Structures: Vectors • A sequence of objects of the same (atomic) data type • Creation • x <- [ <- is the assignment operator ] b c • y <- seq( 5, 9, 2) = c( 5, 7, 9) • y <- 5: 7 = c( 5, 6, 7) [ m : n is equivalent to seq( m , n, 1) ] • y <- c( 1, 4: 6) = c( 1, 4, 5, 6) [ no nesting / always flattened ] • z <- r ep( 1, 3) = c( 1, 1, 1)
Data Structures: Vectors • Accessing • x[ 1] [ 1-based indexing ] • x[ 2: 3] • x[ c( 2, 3) ] = x[ 2: 3] • x[ - 1] [ Negative subscripts imply exclusion ] • Naming • nam [ Makes equivalent to x[ 1] ] es( x) <-
Data Structures: Vectors • Operations • x <- c( 5, 6, 7) • x + 2 = c( 7, 8, 9) [ Vectorized operations ] • x > 5 = c( FALSE, TR U E, TR U E) • subset ( x, x > 5) = c( 6, 7) • w hi ch( x > 5) = c( 2, 3) • i f el se( x > 5, N aN , x) = c( 5, N aN , N aN ) • sqr <- f unct i on ( n) { n * n } • sappl y( x, sqr ) = c( 25 , 36, 49) • sqr ( x) = c( 25, 36, 49)
Data Structures: Vectors • Operations • x <- c( 5, 6, 7) • any( x > 5) = TR [ How about al l ( x > 5) ? ] U E • sum [ Why is na. r m required? ] ( c( 1, 2, 3, N A) , na. r m = TR U E) = 6 • sor t ( c( 7, 6, 5) ) = c( 5, 6, 7) • or der ( c( 7, 6, 5) ) = ??? • subset ( x, x > 5) = c( 6, 7) • head( 1: 100) = ??? • t ai l ( 1: 100) = ??? • How is x == c( 5, 6, 7) different from i dent i cal ( x, c( 5, 6, 7) ) ? • Tr y st r ( x)
Data Structures: Matrices • A two dimensional matrix of objects of the same (atomic) data type • Creation • y <- m =2, ncol =3) [ empty matrix ] at r i x( nr ow 1 3 5 • y <- m at r i x( c( 1, 2, 3, 4, 5, 6) , nr ow =2) = 2 4 6 • y <- m 1 2 3 at r i x( c( 1, 2, 3, 4, 5, 6) , nr ow =2, byr ow =T) = 4 5 6 • Accessing • y[ 1, 2] = 2 • y[ , 2: 3] [ How about y[ 1, ] ? ] = 2 3 5 6 • What’s the difference between y[ 2, ] and y[ 2, , dr op=FALSE] ?
Data Structures: Matrices • Naming • r ow es( ) and col nam nam es( ) • Operations • nr ow [ number of rows ] ( y) = 2 • ncol ( y) = 3 [ number of columns ] • appl y( y, 1, sum [ apply sum to each row ] ) = c( 6, 15) • appl y( y, 2, sum [ apply sum to each column ] ) = c( 5, 7, 9) • t ( y) = [ transpose a matrix ] 1 4 2 5 3 6
Data Structures: Matrices • Operations 1 2 3 • r bi nd( y, c( 7, 8, 9) ) = 4 5 6 7 8 9 1 2 3 7 • cbi nd( y, c( 7, 8) ) = 4 5 6 8 • Tr y st r ( y)
Data Structures: Matrices • What will this yield? m <- m at r i x( nr ow =4, ncol =4) m <- i f el se( r ow ( m ) == col ( m ) , 1, 0. 3)
Data Structures: Lists • A sequence of objects (of possibly different data types) • Creation • k <- l i st ( c( 1, 2, 3) , • l <- [ f1 and f2 are tags ] • Accessing • k[ 2: 3] • k[ [ 2] ] [ How about k[ 2] ? ] • l $f 1 = c( 1, 2, 3) [ Is it same as l [ 1] or l [ [ 1] ] ? ]
Data Structures: Lists • Naming • nam es( k) <- • Operations • l appl y( l i st ( 1: 2, 9: 10) , sum ) = l i st ( 3, 19) • sappl y( l i st ( 1: 2, 9: 10) , sum ) = c( 3, 19) • l $f 1 <- N U LL = ??? • st r ( l ) = ???
Data Structures: Data Frames • A two dimensional matrix of objects where different columns can be of different types. • Creation • x <- dat a. f r am e j i l l • Accessing • x$nam [ How about x[ [ 1] ] ? ] es j i l l • x[ 1] = ??? • x[ c( 1, 2) ] = ??? • x[ 1, ] = ??? • x[ , 1] = ???
Data Structures: Data Frames • Naming • r ow es( ) and col nam nam es( ) • Operations • x[ x$age > 5, ] = dat a. f r am e j i l l ) ) • subset ( x, age > 5) = ??? • appl y( x, 1, sum ) = ??? • y <- dat a. f r am e( 1: 3, 5: 7) • appl y( y, 1, m ean) = ??? • l appl y( y, m ean) = ??? • sappl y( y, m ean) = ??? • Tr y st r ( y)
Factors (and Tables) • Type for categorical/nominal values. • Example • xf <- f act or ( c( 1: 3, 2, 4: 5) ) • Try xf and st r ( xf ) • Operations • t abl e( xf ) = ??? • w i t h( m t car s, spl i t ( m pg, cyl ) ) = ??? • w i t h( m t car s, t appl y( m pg, cyl , m ean) ) = ??? • by( m t car s, m t car s$cyl , f unct i on( m ) { m edi an( m $m pg) } = ??? • aggr egat e( m t car s, l i st ( m t car s$cyl ) , m edi an) = ??? • You can use cut to bin values and create factors. Try it.
Basic Graphs • w i t h( m t car s, boxpl ot ( m pg) ) • hi st ( m t car s$m pg) • w i t h( m t car s, pl ot ( hp, m pg) ) • dot char t ( VAD eat hs) • Try pl ot ( aggr egat e( m t car s, l i st ( m t car s$cyl ) , m edi an) ) You can get the list of datasets via l s package. dat aset s
Stats 101 using R • m ean • m edi an • What about mode? • f i venum • quant i l e • sd • var • cov • cor
Data Exploration using R Let’s get out hands dirty!!
Recommend
More recommend