CME/STATS 195 CME/STATS 195 Lecture 2: Programming and Lecture 2: Programming and Communicating in R Communicating in R Evan Rosenman Evan Rosenman April 4, 2019 April 4, 2019 1
Announcements Announcements There will be no lecture on Thursday, April 25th. We will meet for the final time instead on Tuesday, April 30th. Please save debugging questions for Piazza or Office Hours. Auditors: please see me after class. 1
Contents Contents Exercise with Data Frames Programming Style Control flow statements Functions Communicating with R Markdown 1
Exercise with Data Frames Exercise with Data Frames 1
Data frames Data frames A data frame is a table or a 2D arraylike structure , whose: Columns can store data of different types e.g. numeric, character, etc. Each column must contain the same number of data items. The column names should be non-empty. The row names should be unique. # Create the data frame. employees <- data.frame ( row.names = c ("E1", "E2", "E3","E4", "E5"), name = c ("Rick","Dan","Michelle","Ryan","Gary"), salary = c (623.3,515.2,611.0,729.0,843.25), start_date = as.Date ( c ("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), stringsAsFactors = FALSE ) # Print the data frame. employees ## name salary start_date ## E1 Rick 623.30 2012-01-01 ## E2 Dan 515.20 2013-09-23 ## E3 Michelle 611.00 2014-11-15 ## E4 Ryan 729.00 2014-05-11 ## E5 Gary 843.25 2015-03-27 1
Useful functions for dataframes Useful functions for dataframes # Get the structure of the data frame. str (employees) ## 'data.frame': 5 obs. of 3 variables: ## $ name : chr "Rick" "Dan" "Michelle" "Ryan" ... ## $ salary : num 623 515 611 729 843 ## $ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ... # Print first few rows of the data frame. head (employees) ## name salary start_date ## E1 Rick 623.30 2012-01-01 ## E2 Dan 515.20 2013-09-23 ## E3 Michelle 611.00 2014-11-15 ## E4 Ryan 729.00 2014-05-11 ## E5 Gary 843.25 2015-03-27 # Print statistical summary of the data frame. summary (employees) ## name salary start_date ## Length:5 Min. :515.2 Min. :2012-01-01 ## Class :character 1st Qu.:611.0 1st Qu.:2013-09-23 ## Mode :character Median :623.3 Median :2014-05-11 ## Mean :664.4 Mean :2014-01-14 ## 3rd Qu.:729.0 3rd Qu.:2014-11-15 ## Max. :843.2 Max. :2015-03-27 1
Subsetting dataframes Subsetting dataframes We can extract specific We can extract specific rows: columns: # using row names. employees["E1",] employees[ c ("E2", "E3"), ] # using column names. employees$name[1:3] # using integer indexing employees[1, ] employees[ c (2, 3), ] ## [1] "Rick" "Dan" "Michelle" ## name salary start_date ## E1 Rick 623.3 2012-01-01 employees[, c ("name", "salary")] ## name salary ## name salary start_date ## E1 Rick 623.30 ## E2 Dan 515.2 2013-09-23 ## E2 Dan 515.20 ## E3 Michelle 611.0 2014-11-15 ## E3 Michelle 611.00 ## E4 Ryan 729.00 ## E5 Gary 843.25 # or using integer indexing employees[1:3, 1] ## [1] "Rick" "Dan" "Michelle" 1
Practice with data frames Practice with data frames R comes with several built-in datasets. We will use mtcars , from the 1974 Motor Trend US magazine, which comprises information on 32 selected car models. Call str() , head() , and summary() on mtcars Use the $ syntax to extract the mpg column from mtcars Run the hist() function on the mpg column to see the distribution of mpg values Run the plot() function on the mpg and cyl columns to see how they compare 1
Programming: style guide Programming: style guide 1
A general note A general note R is a specialized programming language – this often encourages bad stylistic choices: Poor variable naming Uncommented code Code not optimized for readability Repeated code + failure to abstract functions These bad practices make it harder to utilize code in the future, and to share it with others! 1
Naming conventions Naming conventions The first step of programming is naming things. In the “Hadley Wickam” R style convention : File names are meaningful. Script files end with “.R”, and R Markdown with “.Rmd” # Good # Bad (works but violates convention) fit-models.R foo.r utility-functions.R stuff.r Variable and function names are lowercase or camelcase. # Good # Bad (works but violates convention) day_one first_day_of_the_month dayOne DayOne day_1 1
Spacing Spacing Spacing around all infix operators (=, +, -, <-, etc.): average <- mean (feet / 12 + inches, na.rm = TRUE) # Good average<- mean (feet/12+inches,na.rm=TRUE) # Bad Spacing before left parentheses, except in a function call # Good if (debug) do (x) plot (x, y) # Bad if(debug) do (x) plot (x, y) Assignment use ‘<-’ not ‘=’: # Good # Bad (works but violates convention) x <- 1 + 2 x = 1 + 2 1
Comments and documentation (I) Comments and documentation (I) Comment your code! # 'get_answer' returns the answer to life, the universe and everything else. get_answer <- function(){ return (42)} # This is a comment Comments are not subtitles, i.e. don’t just nearly verbatim repeat the code in the comments. # Bad comments: # Loop through all bananas in the bunch for(banana in bunch) { # make the monkey eat one banana MonkeyEat (b) } 1
Comments and documentation (II) Comments and documentation (II) Section headers can help separate big chunks of code handling different tasks. ####################################### ## data generation ## ####################################### x <- rnorm (100) y <- 12 * x + 5 ####################################### ## make the plots ## ####################################### plot (x, y) 1
Programming: control flow Programming: control flow 1
Booleans/logicals Booleans/logicals Booleans are logical data types # You can combine multiple booleans TRUE & TRUE # AND (TRUE/FALSE) associated with conditional statements. They allow ## [1] TRUE us to modify the “control flow”. TRUE & FALSE # AND # equal "=="" 5 == 5 ## [1] FALSE ## [1] TRUE TRUE | FALSE # OR # not equal: "!="" ## [1] TRUE 5 != 5 !(TRUE) # NOT ## [1] FALSE ## [1] FALSE # greater than/geq: ">" or ">=" c (5 > 4, 5 >= 5) ## [1] TRUE TRUE 1
Booleans/logicals Booleans/logicals When dealing with vectors of booleans, can use & and | to evaluate elementwise. Rember the recycling property for vectors. c (TRUE, TRUE) & c (FALSE, TRUE) ## [1] FALSE TRUE c (5 < 4, 7 == 0, 1< 2) | c (5==5, 6> 2, !FALSE) ## [1] TRUE TRUE TRUE c (TRUE, TRUE) & c (TRUE, FALSE, TRUE, FALSE) # recycling ## [1] TRUE FALSE TRUE FALSE 1
Booleans/logicals Booleans/logicals If we use double operators && or || is used only the first elements are compared: c (TRUE, TRUE) && c (FALSE, TRUE) ## [1] FALSE c (5 < 4, 7 == 0, 1< 2) || c (5==5, 6> 2, !FALSE) ## [1] TRUE c (TRUE, TRUE) && c (TRUE, FALSE, TRUE, FALSE) ## [1] TRUE 1
Control statements Control statements Control flow is the order in which individual statements, instructions or function calls of a program are evaluated. Allow you to do more complicated tasks. Their execution results in a choice between which of two or more paths should be followed. If / else For While 1
If statements If statements ‘if-else’ statements let you Decide on whether a block of introduce more options code should be executed if (traffic_light == "green") { based on the associated print ("Go.") } else { boolean expression. print ("Stay.") } You can also use else if() Syntax . The if statements are followed by a boolean if (traffic_light == "green") { print ("Go.") expression wrapped in } else if (traffic_light == "yellow") { print ("Get ready.") parenthesis. The conditional } else { print ("Stay.") block of code is inside curly } braces {} . if (traffic_light == "green") { print ("Go.") } 1
For loops For loops A for loop is a statement which repeats the execution a block of code a given number of iterations. for (i in 1:5){ print (i^2) } ## [1] 1 ## [1] 4 ## [1] 9 ## [1] 16 ## [1] 25 1
While loops While loops Similar to for loops, but repeat the execution as long as the boolean condition supplied is TRUE . i = 1 while(i <= 5) { cat ("i =", i, "\n") i = i + 1 } ## i = 1 ## i = 2 ## i = 3 ## i = 4 ## i = 5 1
Recommend
More recommend