Applied Statistics and Data Modeling An introduction to R Luc Duchateau 1 Paul Janssen 2 1 Faculty of Veterinary Medicine Ghent University, Belgium 2 Center for Statistics Hasselt University, Belgium 2020 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 1 / 38
Overview R and RStudio 1 What is R and RStudio? Installation of R and RStudio Using RStudio R as a calculator 2 Some R concepts 3 R help Objects R functions 4 Data 5 What are data? Reading in data Exploring data UGent STATS VM The function lm 6 L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 2 / 38
R and RStudio What is R and RStudio? What is R? Programming language Open source Software environment 8 basic packages + 14574 other packages available Packages installed via install.packages("package name") UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 3 / 38
R and RStudio What is R and RStudio? What is RStudio? Alternative implementation of R Packages can be installed via Tools - Install packages UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 4 / 38
R and RStudio Installation of R and RStudio Installation of R https://cran.r-project.org/ UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 5 / 38
R and RStudio Installation of R and RStudio Installation of RStudio https://www.rstudio.com/ UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 6 / 38
R and RStudio Using RStudio Interface of RStudio L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 7 / 38
R and RStudio Using RStudio Script in RStudio L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 8 / 38
R and RStudio Using RStudio Run command in RStudio L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 9 / 38
R as a calculator R as calculator 2+3 ## [1] 5 (5+11)/2-9 ## [1] -1 2ˆ3 ## [1] 8 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 10 / 38
Some R concepts R help R help built-in help help (mean) ?mean online help StackOverflow StackExchange R-bloggers UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 11 / 38
Some R concepts Objects Scalars Objects: scalars, vectors, datasets, . . . Creating objects: assignment operator ( <- ) height <- 173 height ## [1] 173 Case sensitive height <- 173 Height <- 186 height ## [1] 173 Height ## [1] 186 L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 12 / 38
Some R concepts Objects Scalars Calculations with objects height <- 173 weight <- 63 BMI<-weight/(height/100)ˆ2 BMI ## [1] 21.04982 Text objects Greeting <- "Hello world!" Greeting ## [1] "Hello world!" L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 13 / 38
Some R concepts Objects Vectors Vectors: function c() A numeric vector x <- c (1, 1, 2, 3, 5, 8) x ## [1] 1 1 2 3 5 8 A character vector y <- c ("Belgium", "Portugal", "Italy") y ## [1] "Belgium" "Portugal" "Italy" UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 14 / 38
Some R concepts Objects Vectors Calculating with vectors x*2 ## [1] 2 2 4 6 10 16 xˆ2 ## [1] 1 1 4 9 25 64 x*x ## [1] 1 1 4 9 25 64 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 15 / 38
R functions Functions We already used one function to create a vector c() x <- c (1, 1, 2, 3, 5, 8) x ## [1] 1 1 2 3 5 8 A function has a name and a list of arguments separated by a comma UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 16 / 38
R functions Math functions Trigonometric: sin (pi/2) asin (1) ## [1] 1 ## [1] 1.570796 cos (0) acos (1) ## [1] 1 ## [1] 0 tan (0) atan (0) ## [1] 0 ## [1] 0 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 17 / 38
R functions Math functions Rounding round (8.6178,2) sign (8.6178); ## [1] 8.62 ## [1] 1 floor (8.6178) sign (-8.6178) ## [1] 8 ## [1] -1 signif (8.6178,2) abs (-8.6178) ## [1] 8.6 ## [1] 8.6178 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 18 / 38
R functions Math functions Logarithms & exponentials exp (0) ## [1] 1 log (1) ## [1] 0 log10 (1000) ## [1] 3 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 19 / 38
R functions Math functions Others sqrt (25) ## [1] 5 factorial (4) ## [1] 24 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 20 / 38
R functions Statistical functions x <- c (1, 3, 4, 6, 2, 8) quantile (x) mean (x) ## 0% 25% 50% 75% 100% ## [1] 4 ## 1.00 2.25 3.50 5.50 8.00 var (x) sort (x) ## [1] 6.8 ## [1] 1 2 3 4 6 8 sd (x) rank (x) UGent ## [1] 2.607681 STATS VM ## [1] 1 3 4 5 2 6 L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 21 / 38
R functions Using functions on vectors x <- c (4, 16, 9, 25) sqrt (x) ## [1] 2 4 3 5 log (x) ## [1] 1.386294 2.772589 2.197225 3.218876 exp ( sqrt (x)) UGent ## [1] 7.389056 54.598150 20.085537 148.413159 STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 22 / 38
Data What are data? Dataset breed size litters weight 1 Maine coon large 2 5.1 observations 2 Russian blue small 0 3.9 3 Bengal medium 0 4.5 4 Ragdol medium 1 4.8 5 Chartreux large 1 5.2 6 Siamese small 2 4.1 7 Persian medium 2 4.2 8 Maine coon large 3 4.8 variables discrete continuous nominal ordinal UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 23 / 38
Data Reading in data Reading in data Different formats: .xls(x), .csv, .txt, . . . Most important distinguishing properties: header: does the first row contain column names? column separator: comma, semicolon, tab? decimal sign: point, comma? General function in R to read in data: read.table() args (read.table) ## function (file, header = FALSE, sep = "", quote = "\"'", dec = ".", ## numerals = c("allow.loss", "warn.loss", "no.loss"), row.names, ## col.names, as.is = !stringsAsFactors, na.strings = "NA", ## colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, ## fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, ## comment.char = "#", allowEscapes = FALSE, flush = FALSE, ## stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", UGent STATS ## encoding = "unknown", text, skipNul = FALSE) VM ## NULL L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 24 / 38
Data Reading in data Reading in data Specific functions for specific formats Function Format Header Column Decimal separator sign .csv TRUE ” , ” ” . ” read.csv() .csv TRUE ” ; ” ” . ” read.csv(,sep=";") .csv TRUE ” ; ” ” , ” read.csv2() .txt TRUE ” tab ” ” . ” read.delim() .txt TRUE ” tab ” ” , ” read.delim2() UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 25 / 38
Data Reading in data Reading in cats data In this course we use .csv files: cats.csv First open csv-file in notepad breed;size;litters;weight Maine coon;large;2;5.1 Russian Blue;small;0;3.9 Bengal;medium;0;4.5 British Shorthair;medium;1;4.8 Chartreux;large;1;5.2 Siamese;small;2;4.1 Persian;medium;2;4.2 Maine Coon;large;3;4.8 UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 26 / 38
Data Reading in data Reading in cats data breed;size;litters;weight Maine coon;large;2;5.1 Russian Blue;small;0;3.9 Bengal;medium;0;4.5 British Shorthair;medium;1;4.8 Chartreux;large;1;5.2 Siamese;small;2;4.1 Persian;medium;2;4.2 Maine Coon;large;3;4.8 separator: semicolon decimal sign: point UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 27 / 38
Data Reading in data Reading in cats data separator: semicolon decimal sign: point Function Format Header Column Decimal separator sign read.csv() .csv TRUE ” , ” ” . ” .csv TRUE ” ; ” ” . ” read.csv(,sep=";") .csv TRUE ” ; ” ” , ” read.csv2() .txt TRUE ” tab ” ” . ” read.delim() .txt TRUE ” tab ” ” , ” read.delim2() Most appropriate function: read.csv(,sep=";") UGent STATS VM L. Duchateau & P.Janssen (UH & UG) Applied Statistics and Data Modeling 2020 28 / 38
Recommend
More recommend