Sequential data analysis Sequential data analysis An introduction to R Gilbert Ritschard Department of Econometrics and Laboratory of Demography, University of Geneva http://mephisto.unige.ch/biomining APA-ATI Workshop on Exploratory Data Mining University of Southern California, Los Angeles, CA, July 2009 23/7/2009gr 1/64
Sequential data analysis Outline Introduction 1 Installing and launching R 2 Objects and operators 3 Elements of statistical modeling 4 Growing trees: rpart and party 5 Custom functions and programming 6 23/7/2009gr 2/64
Sequential data analysis Introduction Outline Introduction 1 Installing and launching R 2 Objects and operators 3 Elements of statistical modeling 4 Growing trees: rpart and party 5 Custom functions and programming 6 23/7/2009gr 3/64
Sequential data analysis Introduction R R is: Software environment for statistical computing and graphics Based on the S language (as is S-PLUS) Freely distributed under GPL licence Available for any platform: Windows/Mac/Linux/Unix Easily extensible with numerous contributed modules 23/7/2009gr 4/64
Sequential data analysis Installing and launching R Outline Introduction 1 Installing and launching R 2 Objects and operators 3 Elements of statistical modeling 4 Growing trees: rpart and party 5 Custom functions and programming 6 23/7/2009gr 5/64
Sequential data analysis Installing and launching R Installation R and the modules can be downloaded from the CRAN http://cran.r-project.org By default, no GUI is proposed under Linux. Under Windows and MacOSX, the basic GUI remains limited. ... but try Rcmdr (can be download from the CRAN) 23/7/2009gr 6/64
Sequential data analysis Installing and launching R First steps in R Four possibilities to send commands to R 1 Type commands in the R Console. 2 The script editor - > File/New script (only Windows/Mac) 3 The Rcmd module 4 Use a text editor with R support (Tinn-R, WinEdt, etc.) In addition, you can also use your preferred text editor and copy-paste the commands into the R Console, 23/7/2009gr 7/64
Sequential data analysis Objects and operators Outline Introduction 1 Installing and launching R 2 Objects and operators 3 Elements of statistical modeling 4 Growing trees: rpart and party 5 Custom functions and programming 6 23/7/2009gr 8/64
Sequential data analysis Objects and operators Introduction to R objects Section outline Objects and operators 3 Introduction to R objects Acting on subsets of objects Importation/exportation 23/7/2009gr 9/64
Sequential data analysis Objects and operators Introduction to R objects Objects R works with objects Assigning a value to an object ‘a’ R> a <- 50 Operation on an object R> a/50 [1] 1 Case-sensitive: a � = A R> A/50 Error: object "A" not found 23/7/2009gr 10/64
Sequential data analysis Objects and operators Introduction to R objects Types of objects Different types of objects vector: 4 5 1 or in R c(4,5,1) ” D”” E”” A” or in R c("D","E","A") factor: categorical variable matrix: table of numerical data data frame: general data table (columns can be of different types) ... 23/7/2009gr 11/64
Sequential data analysis Objects and operators Introduction to R objects Factors I A factor is defined by“levels”(possible values) and an indicator of whether it is ordinal or not. Vector of“strings” R> sex <- c("man", "woman", "woman", "man", "woman") R> sex [1] "man" "woman" "woman" "man" "woman" Creation of a factor R> sex.fac <- factor(sex) R> sex.fac [1] man woman woman man woman Levels: man woman R> attributes(sex.fac) 23/7/2009gr 12/64
Sequential data analysis Objects and operators Introduction to R objects Factors II $levels [1] "man" "woman" $class [1] "factor" R> table(sex.fac) sex.fac man woman 2 3 To change the order of the“levels” R> sex.fac2 <- factor(sex, levels = c("woman", "man")) R> sex.fac2n <- as.numeric(sex.fac2) R> table(sex.fac2, sex.fac2n) sex.fac2n sex.fac2 1 2 woman 3 0 man 0 2 23/7/2009gr 13/64
Sequential data analysis Objects and operators Introduction to R objects Objects (continued) I Results can always be stored in a new object Example: R> library(TraMineR) R> data(mvad) R> tab.male.gcse <- table(mvad$male, mvad$gcse5eq) R> tab.male.gcse no yes no 186 156 yes 266 104 23/7/2009gr 14/64
Sequential data analysis Objects and operators Introduction to R objects Objects (continued) Depending of its class, methods can be directly applied to it R> plot(tab.male.gcse, cex.axis = 1.5) tab.male.gcse yes no no yes 23/7/2009gr 15/64
Sequential data analysis Objects and operators Introduction to R objects Row and marginal distributions Row and column distributions R> prop.table(tab.male.gcse, 1) no yes no 0.5438596 0.4561404 yes 0.7189189 0.2810811 R> prop.table(tab.male.gcse, 2) no yes no 0.4115044 0.6000000 yes 0.5884956 0.4000000 Margins R> margin.table(tab.male.gcse, 1) no yes 342 370 R> margin.table(tab.male.gcse, 2) no yes 452 260 23/7/2009gr 16/64
Sequential data analysis Objects and operators Acting on subsets of objects Section outline Objects and operators 3 Introduction to R objects Acting on subsets of objects Importation/exportation 23/7/2009gr 17/64
Sequential data analysis Objects and operators Acting on subsets of objects Indexes Indexing vectors x[n] nth element x[-n] all but the nth element x[1:n] first n elements x[-(1:n)] elements from n+1 to the end x[c(1,4,2)] specific elements x["name"] element named "name" x[x > 3] all elements greater than 3 x[x > 3 & x < 5] all elements between 3 and 5 x[x %in% c("a","and","the")] elements in the given set Indexing matrices x[i,j] element at row i, column j x[i,] row i x[,j] column j x[,c(1,3)] columns 1 and 3 x["name",] row named "name" Indexing data frames (matrix indexing plus the following) x[["name"]] column named "name" x$name idem 23/7/2009gr 18/64
Sequential data analysis Objects and operators Acting on subsets of objects Crosstable on data subsets Cross tables for catholic and non catholic R> table(mvad$male[mvad$catholic == "yes"], mvad$gcse5eq[mvad$catholic == + "yes"]) no yes no 82 77 yes 133 52 R> table(mvad$male[mvad$catholic == "no"], mvad$gcse5eq[mvad$catholic == + "no"]) no yes no 104 79 yes 133 52 23/7/2009gr 19/64
Sequential data analysis Objects and operators Acting on subsets of objects 3-dimensional crosstables Alternatively R> table(mvad$male, mvad$gcse5eq, mvad$catholic) , , = no no yes no 104 79 yes 133 52 , , = yes no yes no 82 77 yes 133 52 23/7/2009gr 20/64
Sequential data analysis Objects and operators Importation/exportation Section outline Objects and operators 3 Introduction to R objects Acting on subsets of objects Importation/exportation 23/7/2009gr 21/64
Sequential data analysis Objects and operators Importation/exportation Opening and closing R R saves the working environment in the .RData file of the current directory. getwd() provides the current directory setwd("C:/introR/") sets the current directory save.image() saves the working directory in .RData load("example.RData") loads working directory example.RData On line help command: help(subject) , or ?sujet 23/7/2009gr 22/64
Sequential data analysis Objects and operators Importation/exportation Object Management List of objects in the“Workingspace” R> ls() [1] "a" "datadir" "filename" "graphdir" [5] "mvad" "pngdir" "sex" "sex.fac" [9] "sex.fac2" "sex.fac2n" "tab.male.gcse" Removing objects R> rm(sex, sex.fac2) R> ls() [1] "a" "datadir" "filename" "graphdir" [5] "mvad" "pngdir" "sex.fac" "sex.fac2n" [9] "tab.male.gcse" 23/7/2009gr 23/64
Sequential data analysis Objects and operators Importation/exportation Importing text files R can import text files (tab-delimited, CSV, ...) with read.table() read.table(file, header = FALSE, sep = "", quote = "\" ✬ ", dec = ".", row.names, col.names, as.is = FALSE, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#") Ex: importing a tab-delimited file with variables names in first row: R> example <- read.table(file = "example.dat", header = TRUE, + sep = "\t") R> example age revenu sexe 1 25 100 homme 2 45 200 femme 3 30 50 homme 23/7/2009gr 24/64
Recommend
More recommend