Sequential data analysis Sequential data analysis Outline Sequential data analysis Introduction 1 An introduction to R Installing and launching R 2 Gilbert Ritschard Objects and operators 3 Department of Econometrics and Laboratory of Demography, University of Geneva Elements of statistical modeling 4 http://mephisto.unige.ch/biomining Growing trees: rpart and party APA-ATI Workshop on Exploratory Data Mining 5 University of Southern California, Los Angeles, CA, July 2009 Custom functions and programming 6 23/7/2009gr 1/64 23/7/2009gr 2/64 Sequential data analysis Sequential data analysis Introduction Installing and launching R R Installation R is: R and the modules can be downloaded from the CRAN Software environment for statistical computing and graphics http://cran.r-project.org Based on the S language (as is S-PLUS) By default, no GUI is proposed under Linux. Freely distributed under GPL licence Under Windows and MacOSX, the basic GUI remains limited. Available for any platform: Windows/Mac/Linux/Unix ... but try Rcmdr (can be download from the CRAN) Easily extensible with numerous contributed modules 23/7/2009gr 4/64 23/7/2009gr 6/64 Sequential data analysis Sequential data analysis Installing and launching R Objects and operators Introduction to R objects First steps in R Objects R works with objects Four possibilities to send commands to R Assigning a value to an object ‘a’ 1 Type commands in the R Console. R> a <- 50 2 The script editor - > File/New script (only Windows/Mac) Operation on an object 3 The Rcmd module R> a/50 4 Use a text editor with R support (Tinn-R, WinEdt, etc.) [1] 1 Case-sensitive: a � = A In addition, you can also use your preferred text editor and R> A/50 copy-paste the commands into the R Console, Error: object "A" not found 23/7/2009gr 7/64 23/7/2009gr 10/64 1
Sequential data analysis Sequential data analysis Objects and operators Objects and operators Introduction to R objects Introduction to R objects Types of objects Factors I A factor is defined by“levels”(possible values) and an Different types of objects indicator of whether it is ordinal or not. vector: 4 5 1 or in R c(4,5,1) Vector of“strings” ” D”” E”” A” or in R c("D","E","A") R> sex <- c("man", "woman", "woman", "man", "woman") R> sex factor: categorical variable [1] "man" "woman" "woman" "man" "woman" matrix: table of numerical data Creation of a factor data frame: general data table (columns can be of different R> sex.fac <- factor(sex) types) R> sex.fac ... [1] man woman woman man woman Levels: man woman R> attributes(sex.fac) 23/7/2009gr 11/64 23/7/2009gr 12/64 Sequential data analysis Sequential data analysis Objects and operators Objects and operators Introduction to R objects Introduction to R objects Factors II Objects (continued) I $levels [1] "man" "woman" $class Results can always be stored in a new object [1] "factor" Example: R> table(sex.fac) sex.fac man woman R> library(TraMineR) 2 3 R> data(mvad) R> tab.male.gcse <- table(mvad$male, mvad$gcse5eq) To change the order of the“levels” R> tab.male.gcse R> sex.fac2 <- factor(sex, levels = c("woman", "man")) no yes R> sex.fac2n <- as.numeric(sex.fac2) no 186 156 R> table(sex.fac2, sex.fac2n) yes 266 104 sex.fac2n sex.fac2 1 2 woman 3 0 man 0 2 23/7/2009gr 13/64 23/7/2009gr 14/64 Sequential data analysis Sequential data analysis Objects and operators Objects and operators Introduction to R objects Introduction to R objects Objects (continued) Row and marginal distributions Depending of its class, methods can be directly applied to it Row and column distributions R> prop.table(tab.male.gcse, 1) R> plot(tab.male.gcse, cex.axis = 1.5) no yes no 0.5438596 0.4561404 tab.male.gcse yes 0.7189189 0.2810811 yes no R> prop.table(tab.male.gcse, 2) no yes no no 0.4115044 0.6000000 yes 0.5884956 0.4000000 Margins R> margin.table(tab.male.gcse, 1) yes no yes 342 370 R> margin.table(tab.male.gcse, 2) no yes 452 260 23/7/2009gr 15/64 23/7/2009gr 16/64 2
Sequential data analysis Sequential data analysis Objects and operators Objects and operators Acting on subsets of objects Acting on subsets of objects Indexes Crosstable on data subsets Indexing vectors x[n] nth element x[-n] all but the nth element Cross tables for catholic and non catholic x[1:n] first n elements x[-(1:n)] elements from n+1 to the end R> table(mvad$male[mvad$catholic == "yes"], mvad$gcse5eq[mvad$catholic == x[c(1,4,2)] specific elements + "yes"]) x["name"] element named "name" no yes x[x > 3] all elements greater than 3 no 82 77 x[x > 3 & x < 5] all elements between 3 and 5 yes 133 52 x[x %in% c("a","and","the")] elements in the given set Indexing matrices R> table(mvad$male[mvad$catholic == "no"], mvad$gcse5eq[mvad$catholic == + "no"]) x[i,j] element at row i, column j x[i,] row i no yes x[,j] column j no 104 79 x[,c(1,3)] columns 1 and 3 yes 133 52 x["name",] row named "name" Indexing data frames (matrix indexing plus the following) x[["name"]] column named "name" x$name idem 23/7/2009gr 18/64 23/7/2009gr 19/64 Sequential data analysis Sequential data analysis Objects and operators Objects and operators Acting on subsets of objects Importation/exportation 3-dimensional crosstables Opening and closing R Alternatively R saves the working environment in the .RData file of the R> table(mvad$male, mvad$gcse5eq, mvad$catholic) current directory. , , = no getwd() provides the current directory no yes setwd("C:/introR/") no 104 79 sets the current directory yes 133 52 save.image() saves the working directory in .RData , , = yes load("example.RData") loads working directory example.RData no yes no 82 77 On line help command: help(subject) , or ?sujet yes 133 52 23/7/2009gr 20/64 23/7/2009gr 22/64 Sequential data analysis Sequential data analysis Objects and operators Objects and operators Importation/exportation Importation/exportation Object Management Importing text files R can import text files (tab-delimited, CSV, ...) with read.table() List of objects in the“Workingspace” read.table(file, header = FALSE, sep = "", quote = "\" ✬ ", dec = ".", R> ls() row.names, col.names, as.is = FALSE, na.strings = "NA", [1] "a" "datadir" "filename" "graphdir" colClasses = NA, nrows = -1, [5] "mvad" "pngdir" "sex" "sex.fac" skip = 0, check.names = TRUE, fill = !blank.lines.skip, [9] "sex.fac2" "sex.fac2n" "tab.male.gcse" strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#") Removing objects R> rm(sex, sex.fac2) Ex: importing a tab-delimited file with variables names in first row: R> ls() R> example <- read.table(file = "example.dat", header = TRUE, + sep = "\t") [1] "a" "datadir" "filename" "graphdir" [5] "mvad" "pngdir" "sex.fac" "sex.fac2n" R> example [9] "tab.male.gcse" age revenu sexe 1 25 100 homme 2 45 200 femme 3 30 50 homme 23/7/2009gr 23/64 23/7/2009gr 24/64 3
Recommend
More recommend