R ¡ A ¡Personalized ¡Introduc3on ¡ ¡ Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 18, 2014
About ¡“R” ¡ § A suite of software tools for – Data manipulation – Calculations – Graphical display § Largely based on the programming language S § Packages – About 25 packages standard and recommended supplied – Many more available for download at: http://CRAN.R-project.org § Free (GPL). Also BSD, MIT 2 ¡
Basic ¡ § Arithmetic > 2+2 � [1] 4 � § Assign variables > x <- 2 � > y <- 5 � > z <- 2 * x + 3 * y � > z � [1] 19 � § The created objects are now stored in the workspace. List them > ls() � [1] "x" "y" "z” � § Also, we can remove them > rm(x) � > ls() � [1] "y" "z” � � � 3 ¡
Vectors ¡ § Creating a vector > x <- c(2,5,9) � > y <- c(3,1,-1) � > x + y � [1] 5 6 8 � § But x * y would do a element-wise multiplication � > x * y � [1] 6 5 9 � § But x + 2 would add 2 to all elements of x � > x + 2 � [1] 4 7 11 � 4 ¡
Useful ¡func3ons ¡related ¡to ¡vectors ¡ § Sequence of integers from a to b � > seq(2,9) � [1] 2 3 4 5 6 7 8 9 � § The repeat function � > rep(1,3) � [1] 1 1 1 � > rep(1:3,3) � [1] 1 2 3 1 2 3 1 2 3 � § Try the help or ? command � > help(rep) � > ?rep � � � 5 ¡
Data ¡and ¡Sta3s3cs ¡– ¡Basics ¡ ¡ § A lot of things out of the box > x <- c(2,3,1,5,7,2,5,8,3,2,0,3,2,6,7,3,1,3,5,8,4) � > summary(x) � Min. 1st Qu. Median Mean 3rd Qu. Max. � 0.00 2.00 3.00 3.81 5.00 8.00 § Specifying elements or subsets (index starts at 1, not 0) > x[1] � [1] 2 � > x[3:6] � [1] 1 5 7 2 � § Excluding elements by the minus sign > x[-(2:4)] � [1] 2 7 2 5 8 3 2 0 3 2 6 7 3 1 3 5 8 4 � 6 ¡
Matrices ¡ § Bind columns (cbind) or rows (rbind) > x <- c(3,5,2); y <- c(8,2,1) > z <- cbind(x,y) > z x y [1,] 3 8 [2,] 5 2 [3,] 2 1 § Or specify the entries and number of rows > A <- matrix(c(3,5,2,8,2,1),nrow=3) � > B <- matrix(c(3,5,2,8,2,1),nrow=2) � 7 ¡
Matrix ¡opera3ons ¡ § Addition is usual > A + 2* A � � � [,1] [,2] � [1,] 9 24 � [2,] 15 6 � [3,] 6 3 � § Multiplication: x * y is element wise, not matrix multiplication § Matrix multiplication: %*% > A %*% B � � � � [,1] [,2] [,3] � [1,] 49 70 14 � [2,] 25 26 12 � [3,] 11 12 5 � 8 ¡
Inverse ¡and ¡Covariance ¡of ¡matrix ¡ § Computes the inverse of a matrix if it exists: > solve(X) � § Covariance matrix > var(X) � > cov(X) � § Covariance matrix (recall) X 1 ,…, X n are random variables, each with finite variance Σ is the covariance matrix where � Σ ij = cov( X i , X j ) = E [( X i − µ i )( X j − µ j )] § Also called var( X ) = Variance of the random vector X 9 ¡
Wri3ng ¡a ¡func3on ¡ § A new function can be defined > z <- function(x,y) 3*x + 4*y � > z(2,3) � [1] 18 � § A function with many lines > z <- function(x,y) { � � � c <- 3*x + 4*y; � � � 5 * c � } � § The last line is the output § Can write the function in a text file prog.R and source it > source("/Users/deb/…/R/xTest.R") � § Can also define a new binary operator > “%LL%” <- function(x,y) { 3*x + 4*y } � > 5 %LL% 3 � � � 10 ¡
Data ¡ § Read an entire data frame – The first line of the file should have a name for each variable in the data frame – Each additional line of the file has as its first item a row label and the values for each variable Age Income.K Owns.House � 01 25 8 No � 02 33 5 No � 03 30 130 Yes � 04 45 50 Yes � 05 65 5 No � 06 75 7 Yes � � > H <- read.table(”filename") � � 11 ¡
Using ¡data ¡ § Plot tries to figure out what kind of plot will be suitable > plot(H[1:2]) � § We want to label points based on some attribute – Let us select a subset of the data > H[which(H$Owns.House=='Yes'),] � Age Income.K Owns.House � 03 30 130 Yes � 04 45 50 Yes � 06 75 7 Yes � 07 28 200 Yes � 08 35 90 Yes � 10 55 102 Yes � … … … … � � 12 ¡
Using ¡data ¡ § Plot one subset with blue, another with red � 200 > HYes <- H[which(H $Owns.House=='Yes'),] � � 150 New ¡observa3on ¡(black) ¡ > plot(HYes[1:2], Income.K col='blue') � 100 � > points(HNo[1:2], col='red') � 50 Hands ¡on ¡in ¡class ¡ 0 30 40 50 60 70 80 Age 13 ¡
References ¡ § The R manual: http://cran.r-project.org/doc/manuals/r-release/R- intro.html § A self-learn tutorial: https://www.nceas.ucsb.edu/files/scicomp/Dloads/ RProgramming/BestFirstRTutorial.pdf 14 ¡
Recommend
More recommend