More on dplyr ~/> previously gg_miss_fct(x = riskfactors, fct = - PowerPoint PPT Presentation

~/>_ More on dplyr

~/> previously …

gg_miss_fct(x = riskfactors, fct = marital)

quick_na <- function (x, vals = c(9, 10, 11, 97, 99)) { x[x % in % vals] <- NA x } num_vec <- c(1:12, 97, 97, 99, NA) num_vec #> [1] 1 2 3 4 5 6 7 8 9 10 11 12 97 97 99 NA quick_na(num_vec) #> [1] 1 2 3 4 5 6 7 8 NA NA NA 12 NA NA NA NA

~/>_ Working with dplyr

~/>_ Standard verbs

group_by() Group the data at the level we want, such as “Religion by Region” or “Authors by Publications by Year”. filter() rows Filter or Select pieces of the data. This select() columns gets us the subset of the table we want to work on. Mutate the data by creating new variables at the mutate() current level of grouping. Mutating adds new columns to the table. Summarize or aggregate the grouped data. This creates new variables at a higher level of grouping. For example we might calculate summarize() means with mean() or counts with n(). This results in a smaller, summary table, which we might do more things with if we want.

~/>_ Scoped verbs

Scoped Verbs action _all() Take action on all variables action _if() Take action on a subset of variables selected by a criterion action _at() Take action on a subset of variables selected by their names action can be mutate summarize filter

Useful scope-setters is.character() is.factor() is.numeric() is.logical() is.integer() is.ordered() lubridate::is.Date()

Useful scoping helpers starts_with() ends_with() contains() one_of() matches() vars() everything()

Examples organdata %>% group_by(world) %>% summarize_if(is.numeric, mean, na.rm = TRUE) %>% select(world, donors, pubhealth, roads) %>% select_all(tools::toTitleCase) # A tibble: 4 x 4 World Donors Pubhealth Roads <chr> <dbl> <dbl> <dbl> 1 NA 28.1 5.45 161. 2 Corporatist 16.8 6.40 132. 3 Liberal 15.6 5.75 111. 4 SocDem 14.8 6.54 82.7

Examples organdata %>% group_by(country) %>% summarize_if(is.numeric, funs(avg = mean, sd = sd), na.rm = TRUE) %>% select(country, donors_avg, donors_sd, roads_avg, roads_sd) %>% arrange(desc(donors_avg))

Examples A tibble: 17 x 5 country donors_avg donors_sd roads_avg roads_sd <chr> <dbl> <dbl> <dbl> <dbl> 1 Spain 28.1 4.96 161. 35.3 2 Austria 23.5 2.42 150. 30.3 3 Belgium 21.9 1.94 155. 20.6 4 United States 20.0 1.33 155. 8.35 5 Ireland 19.8 2.48 118. 10.8 6 Finland 18.4 1.53 93.6 19.0 7 France 16.8 1.60 156. 20.1 8 Norway 15.4 1.11 70.0 6.68 9 Switzerland 14.2 1.71 96.4 21.7 10 Canada 14.0 0.751 109. 17.7 11 Netherlands 13.7 1.55 76.1 9.93 12 United Kingdom 13.5 0.775 67.9 10.5 13 Sweden 13.1 1.75 72.3 13.2 14 Denmark 13.1 1.47 102. 12.4 15 Germany 13.0 0.611 113. 25.9 16 Italy 11.1 4.28 122. 10.2 17 Australia 10.6 1.14 105. 14.3

~/>_ Scoping and Mapping

map() and friends are the general case out <- lm(donors ~ pop + gdp + roads, data = organdata) Call: summary(out) lm(formula = donors ~ pop + gdp + roads, data = organdata) Residuals: Min 1Q Median 3Q Max -13.423 -2.658 -0.080 1.963 15.864 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.506e+00 2.364e+00 1.906 0.0580 . pop -1.153e-05 5.643e-06 -2.043 0.0423 * gdp 1.082e-04 7.527e-05 1.438 0.1521 roads 8.988e-02 1.032e-02 8.710 1.14e-15 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.325 on 200 degrees of freedom (34 observations deleted due to missingness) Multiple R-squared: 0.2944, Adjusted R-squared: 0.2838 F-statistic: 27.81 on 3 and 200 DF, p-value: 4.486e-15

map() and friends are the general case > names(summary(out)) [1] "call" "terms" "residuals" "coefficients" "aliased" [6] "sigma" "df" "r.squared" "adj.r.squared" "fstatistic" [11] "cov.unscaled" "na.action"

map() and friends are the general case organdata %>% split(.$world) %>% map(~ lm(donors ~ pop + gdp + roads, data = .)) %>% map(summary) %>% map_dbl("r.squared") We’ll see cleaner ways to do this shortly

~/>_ Zero Counts in dplyr

data %>% select(start_year, job_type1) %>% group_by(start_year, job_type1) %>% summarize(n = n()) %>% mutate(pct = (n/sum(n))*100) # A tibble: 689 x 4 # Groups: start_year [38] start_year job_type1 n pct <date> <chr> <int> <dbl> 1 1945-01-03 NA 5 0.880 2 1945-01-03 Acting/entertainer 11 1.94 3 1945-01-03 Aeronautics 2 0.352 4 1945-01-03 Agriculture 65 11.4 5 1945-01-03 Business or banking 108 19.0 6 1945-01-03 Clergy 3 0.528 7 1945-01-03 Congressional Aide 11 1.94 8 1945-01-03 Construction/building trades 9 1.58 9 1945-01-03 Education 58 10.2 10 1945-01-03 Engineering 2 0.352 # … with 679 more rows

df <- data %>% filter(position == "U.S. Representative", start > "1945-01-01") %>% group_by(pid) %>% nest() %>% mutate(data = map(data, ~ mutate(.x, term_id = 1 + congress - first(congress)))) %>% unnest() %>% filter(term_id == 1, party %in% c("Democrat", "Republican"), start_year > int_to_year( 2012 )) %>% group_by(start_year, party, sex) %>% select(pid, start_year, party, sex) This caused the di ff erence in N you saw in class. Fixed here.

> df # A tibble: 293 x 4 # Groups: start_year, party, sex [14] pid start_year party sex <int> <date> <chr> <chr> 1 3160 2013-01-03 Republican M 2 3161 2013-01-03 Democrat F 3 3162 2013-01-03 Democrat M 4 3163 2013-01-03 Republican M 5 3164 2013-01-03 Democrat M 6 3165 2013-01-03 Republican M 7 3166 2013-01-03 Republican M 8 3167 2013-01-03 Democrat F 9 3168 2013-01-03 Republican M 10 3169 2013-01-03 Democrat M # … with 283 more rows

df %>% group_by ( start_year , party , sex ) %>% summarize ( N = n ()) %>% mutate ( freq = N / sum( N )) #> # A tibble: 14 x 5 #> # Groups: start_year, party [8] #> start_year party sex N freq #> <date> <chr> <chr> <int> <dbl> #> 1 2013-01-03 Democrat F 21 0.362 #> 2 2013-01-03 Democrat M 37 0.638 #> 3 2013-01-03 Republican F 8 0.101 #> 4 2013-01-03 Republican M 71 0.899 #> 5 2015-01-03 Democrat M 1 1 #> 6 2015-01-03 Republican M 5 1 #> 7 2017-01-03 Democrat F 6 0.24 #> 8 2017-01-03 Democrat M 19 0.76 #> 9 2017-01-03 Republican F 2 0.0667 #> 10 2017-01-03 Republican M 28 0.933 #> 11 2019-01-03 Democrat F 33 0.647 #> 12 2019-01-03 Democrat M 18 0.353 #> 13 2019-01-03 Republican F 1 0.0323 #> 14 2019-01-03 Republican M 30 0.968

## Hex colors for sex sex_colors <- c( "#E69F00" , "#993300" ) ## Hex color codes for Dem Blue and Rep Red party_colors <- c( "#2E74C0" , "#CB454A" ) ## Group labels mf_labs <- tibble ( M = "Men" , F = "Women" ) theme_set ( theme_minimal ())

df %>% group_by ( start_year , party , sex ) %>% summarize ( N = n ()) %>% mutate ( freq = N / sum( N )) %>% ggplot ( aes ( x = start_year , y = freq , fill = sex )) + geom_col () + scale_y_continuous ( labels = scales :: percent ) + scale_fill_manual ( values = sex_colors , labels = c( "Women" , "Men" )) + labs ( x = "Year" , y = "Percent" , fill = "Group" ) + facet_wrap (~ party )

df %>% group_by ( start_year , party , sex ) %>% summarize ( N = n ()) %>% mutate ( freq = N / sum( N )) %>% ggplot ( aes ( x = start_year , y = freq , color = sex )) + geom_line ( size = 1.1) + scale_y_continuous ( labels = scales :: percent ) + scale_color_manual ( values = sex_colors , labels = c( "Women" , "Men" )) + guides ( color = guide_legend ( reverse = TRUE)) + labs ( x = "Year" , y = "Percent" , color = "Group" ) + facet_wrap (~ party )

Should go to zero!

More on dplyr ~/> previously gg_miss_fct(x = riskfactors, fct = - PowerPoint PPT Presentation

~/>_ More on dplyr ~/> previously gg_miss_fct(x = riskfactors, fct = marital) quick_na <- function (x, vals = c(9, 10, 11, 97, 99)) { x[x % in % vals] <- NA x } num_vec <- c(1:12, 97, 97, 99, NA) num_vec #> [1] 1 2 3

Binds Joining Data in R with dplyr Joining Data in R with dplyr rbind() cbind()

Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017

Data manipulation with Data manipulation with dplyr dplyr Programming for Statistical

Welcome to the course! Joining Data in R with dplyr Var_1 Var_2 Var_3 Var_4 obs_1 33 3 54

Welcome! Introductions Year 1 Year 2 Miss Fowler Miss Mackintosh Mrs McEwan Miss Williams

Header Header Supported by: Miss Elwell Mrs Barlow Miss Kent Miss Abrahams 7RX 7DX 7EX

Welcome to the Year 5 Curriculum Meeting Miss Ainsley and Miss Keen Adults in Year 5 Miss

Theories of change (and dplyr magic) January 29, 2020 Fill out your reading report PMAP 8521:

Welcome Year 2 Parents Two Year 2 classes: Chive Class Miss Adams & Miss Angela Basil

Introductions Miss Tankaria (3NT) Miss Bunting (3SB) Miss Darkes (3DD)

GCSE DRAMA HEAD OF PERFORMING ARTS MISS PARKER GCSE DRAMA TEACHERS: MISS PARKER AND MISS

P4 & 5 Meet-The-Parents 2018 P4 Form Teachers Class Form Teachers Miss Sharon Sin 4

Improving Cache Performance AMAT: Average Memory Access Time AMAT = T hit + Miss Rate x Miss

FCT Trainee program follow-up Afonso Ferreira FCT Trainee program 1 2/9/2019 Afonso Soares

T -maturity ZCB; time- t price denoted P ( t ; T ) . As a fct of T : Smooth. As a fct of t :

20 2017 17- 2018 2018 le lear arnin ing toget ethe her, , growin wing togeth ether er

Graphing Functions Marco Chiarandini Department of Mathematics & Computer Science University

Programming in the Lambda-Calculus, Continued Testing booleans Recall: t. f. t tru =

Universal Packet Scheduling Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC

tidyfun : Tidy Functional Data A new framework for working with functional data in R Fabian Scheipl

Measuring LibreOffice Interoperability Dushyant Bhalgami LibreOffice Conference 2014, Bern

A Feasibility Study on Using Classifying Terms in Alloy Robert Claris & Martin Gogolla

0D\UDQJHIURP

An Introductory Exascale Feasibility Study for FFTs and Multigrid Hormozd Gahvari William Gropp

More on dplyr ~/> previously gg_miss_fct(x = riskfactors, fct = - PowerPoint PPT Presentation

~/>_ More on dplyr ~/> previously gg_miss_fct(x = riskfactors, fct = marital) quick_na <- function (x, vals = c(9, 10, 11, 97, 99)) { x[x % in % vals] <- NA x } num_vec <- c(1:12, 97, 97, 99, NA) num_vec #> [1] 1 2 3

Binds Joining Data in R with dplyr Joining Data in R with dplyr rbind() cbind()

Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017

Data manipulation with Data manipulation with dplyr dplyr Programming for Statistical

Welcome to the course! Joining Data in R with dplyr Var_1 Var_2 Var_3 Var_4 obs_1 33 3 54

Welcome! Introductions Year 1 Year 2 Miss Fowler Miss Mackintosh Mrs McEwan Miss Williams

Header Header Supported by: Miss Elwell Mrs Barlow Miss Kent Miss Abrahams 7RX 7DX 7EX

Welcome to the Year 5 Curriculum Meeting Miss Ainsley and Miss Keen Adults in Year 5 Miss

Theories of change (and dplyr magic) January 29, 2020 Fill out your reading report PMAP 8521:

Welcome Year 2 Parents Two Year 2 classes: Chive Class Miss Adams &amp; Miss Angela Basil

Introductions Miss Tankaria (3NT) Miss Bunting (3SB) Miss Darkes (3DD)

GCSE DRAMA HEAD OF PERFORMING ARTS MISS PARKER GCSE DRAMA TEACHERS: MISS PARKER AND MISS

P4 &amp; 5 Meet-The-Parents 2018 P4 Form Teachers Class Form Teachers Miss Sharon Sin 4

Improving Cache Performance AMAT: Average Memory Access Time AMAT = T hit + Miss Rate x Miss

FCT Trainee program follow-up Afonso Ferreira FCT Trainee program 1 2/9/2019 Afonso Soares

T -maturity ZCB; time- t price denoted P ( t ; T ) . As a fct of T : Smooth. As a fct of t :

20 2017 17- 2018 2018 le lear arnin ing toget ethe her, , growin wing togeth ether er

Graphing Functions Marco Chiarandini Department of Mathematics &amp; Computer Science University

Programming in the Lambda-Calculus, Continued Testing booleans Recall: t. f. t tru =

Universal Packet Scheduling Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, Scott Shenker UC

tidyfun : Tidy Functional Data A new framework for working with functional data in R Fabian Scheipl

Measuring LibreOffice Interoperability Dushyant Bhalgami LibreOffice Conference 2014, Bern

A Feasibility Study on Using Classifying Terms in Alloy Robert Claris &amp; Martin Gogolla

0D\UDQJHIURP

An Introductory Exascale Feasibility Study for FFTs and Multigrid Hormozd Gahvari William Gropp

Welcome Year 2 Parents Two Year 2 classes: Chive Class Miss Adams & Miss Angela Basil

P4 & 5 Meet-The-Parents 2018 P4 Form Teachers Class Form Teachers Miss Sharon Sin 4

Graphing Functions Marco Chiarandini Department of Mathematics & Computer Science University

A Feasibility Study on Using Classifying Terms in Alloy Robert Claris & Martin Gogolla