Data frame manipulation: group_by , summarize somgen223.stanford.edu - PowerPoint PPT Presentation

group_by , summarize , factors Steve Bagley somgen223.stanford.edu 1

Data frame manipulation: group_by , summarize somgen223.stanford.edu 2

3.4 1 3 2 5 3.3 2 2 2 4 1.1 2 1 2 3 data_dir <- "https://somgen223.stanford.edu/data/" 2 6.6 1 2 1.6 1 1 1 1 < dbl > < dbl > < dbl > < dbl > diet weight time chick # A tibble: 5 x 4 (cw1 <- read_csv ( str_c (data_dir, "cw1.csv"))) 2 Set up cw1 somgen223.stanford.edu 3

cw1 %>% distinct (diet) # A tibble: 2 x 1 diet < dbl > 1 1 2 2 Computing over groups • There are two different diets. • What is the mean weight of all the chicks on each diet? somgen223.stanford.edu 4

cw1 %>% group_by (diet) %>% summarize (mean_weight = mean (weight)) # A tibble: 2 x 2 diet mean_weight < dbl > < dbl > 1 1 2.5 2 2 3.67 Computing the mean weight of each diet somgen223.stanford.edu 5

group_by 2 3 2 1 2 1.1 4 2 1 2 3.3 5 2 3 2 6.6 cw1 %>% 3.4 2 diet weight group_by (diet) # A tibble: 5 x 4 # Groups: diet [2] chick 1 time < dbl > < dbl > < dbl > < dbl > 1 1 1 1 1.6 2 • This looks like the original data frame, except for the additional comment line: # Groups: ... , which is a record of the variables used to form groups. No analysis has happened yet. somgen223.stanford.edu 6

summarize cw1 %>% group_by (diet) %>% summarize (mean_weight = mean (weight)) • summarize takes a grouped data frame and performs the specified operation separately for all the values in each group. • In this case, mean will get called 2 times, once on each subset of rows corresponding to each value of diet . • The results for each group are then combined into a single data frame with the final result. • Note that the result has one row for each group value. somgen223.stanford.edu 7

cw1 %>% summarize (mean_weight = mean (weight)) # A tibble: 1 x 1 mean_weight < dbl > 1 3.20 summarize on an ungrouped data frame • Note also that you can use summarize on an ungrouped data frame: you’ll get one row of results. In this case, it will contain the overall mean weight (of all chicks). somgen223.stanford.edu 8

1 6.6 group_by (diet) %>% summarize (mean_weight = mean (weight), max_weight = max (weight)) # A tibble: 2 x 3 diet mean_weight max_weight < dbl > < dbl > < dbl > 1 cw1 %>% 2.5 3.4 2 2 3.67 Computing more than one summary at the same time • max(weight) will return the maximum value of the weight column. • Do not use max on the entire data frame: max(cw1) ! somgen223.stanford.edu 9

Exercise: the range of weights • For each diet, compute the range of weights (max - min), and sort the result by the range. somgen223.stanford.edu 10

< dbl > 5.5 group_by (diet) %>% summarize (weight_range = max (weight) - min (weight)) %>% arrange (weight_range) # A tibble: 2 x 2 diet weight_range < dbl > cw1 %>% 1 1 1.8 2 2 Answer: the range of weights somgen223.stanford.edu 11

Exercise: max weight of each chick • For each chick, compute its maximum weight somgen223.stanford.edu 12

cw1 %>% group_by (chick) %>% summarize (max_weight = max (weight)) # A tibble: 2 x 2 chick max_weight < dbl > < dbl > 1 1 3.4 2 2 6.6 Answer: max weight of each chick somgen223.stanford.edu 13

1 3 group_by (diet) %>% summarize (n_diet = n ()) # A tibble: 2 x 2 diet n_diet < dbl > < int > 1 cw1 %>% 2 2 2 How many chicks are on each diet? • The function n() returns the number of rows in a group. • group_by/summarize computes the number of rows in each group. somgen223.stanford.edu 14

Exercise: How many measurements for each chick? • Compute the number of measurements (rows) for each chick. somgen223.stanford.edu 15

cw1 %>% group_by (chick) %>% summarize (n_measurements = n ()) # A tibble: 2 x 2 chick n_measurements < dbl > < int > 1 1 2 2 2 3 Answer: How many measurements for each chick? somgen223.stanford.edu 16

Factors somgen223.stanford.edu 17

Defining factors • Factors are a powerful, but sometimes perplexing, way to work with discrete-valued data. • The possible values of a factor are drawn from a finite set of alternatives or categories. Factors are often used in graphics and analysis for grouping. • Example: encoding the sex of a human subject as either M or F and grouping by sex. • Example: encoding the names of the fifty US states and grouping by state. • Note that many measured values are better represented not as factors but as either integers (such as for counting) or floating-point (real-valued) numbers. Example: number of subjects, weight. • We will return to factors later in the course. somgen223.stanford.edu 18

Reading • Read: 5 Data transformation | R for Data Science (sections 5.6 to 5.7) • Watch at least part of this video: Tidy Tuesday screencast: analyzing malaria incidence in R - YouTube (or another video from the same channel). somgen223.stanford.edu 19

Data frame manipulation: group_by , summarize somgen223.stanford.edu - PowerPoint PPT Presentation

group_by , summarize , factors Steve Bagley somgen223.stanford.edu 1 Data frame manipulation: group_by , summarize somgen223.stanford.edu 2 3.4 1 3 2 5 3.3 2 2 2 4 1.1 2 1 2 3 data_dir <-

Kinds of picture Single frame Kinds of picture Single frame Multi-frame Kinds of

Classify then Summarize or Summarize then Classify Melvin F. Janowitz DIMACS, Rutgers University

Gather and Summarize Data Gather and Summarize Data 1 Introductions Introductions Audience

Calculating the Average and SD in R group_by() and summarize() # group and summarize data

Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017

What is frame busting? What is frame busting? HTML allows for any site to frame any URL with an

Frame Relay Topologies and Designs Frame Relay Topologies and Design As we learned in the Frame

Money Manipulation & the Effects on the International -Spencer Houston Community Definition

Recap Hashing-based sketch techniques summarize large data sets Summarize vectors: Test

FRAME- -DRAGGI NG DRAGGI NG FRAME (GRAVI TOMAGNETI SM) (GRAVI TOMAGNETI SM) AND I TS

Deck Deck Frame Frame DeckFrame Deck Frame is the utilization of VP Buildings

The Frame of the p -Adic Numbers Francisco Avila June 27, 2017 Francisco Avila The Frame

Solving Quadratic BSDEs Hlne HIBON 29/06/16 Contents Introduction The convex frame The

Manipulation in Political Stock Manipulation in Political Stock Markets Markets Koleman Strumpf

Recap: Strategic Manipulation We had seen two theorems that show that we cannot rule out strategic

Aggregate your data by category Importing & Managing Financial Data in Python Summarize

Coding Lab: Grouped Data Ari Anisfeld Summer 2020 1 / 22 Grouping data with dplyr Often you

Clustering Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics

Spatial Mapping of Multivariate Spatial Mapping of Multivariate Profiles Profiles John Molitor

Growth and Survival Today we will be... Exploring how humans grow as they get older.

The leaflet . e x tras Package IN TE R AC TIVE MAP S W ITH L E AFL E T IN R Rich Majer u s

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map Aggregate Map Reduce

1. Consider the wholesale data in the sheet Wholesale. (a) For the grocery sales in region

Semantic Data Placement for Power Management in Archival Storage Avani Wildani & Ethan L.

Sambuz

Useful Links

Newsletter

Mail Us

Data frame manipulation: group_by , summarize somgen223.stanford.edu - PowerPoint PPT Presentation

group_by , summarize , factors Steve Bagley somgen223.stanford.edu 1 Data frame manipulation: group_by , summarize somgen223.stanford.edu 2 3.4 1 3 2 5 3.3 2 2 2 4 1.1 2 1 2 3 data_dir <-

Kinds of picture Single frame Kinds of picture Single frame Multi-frame Kinds of

Classify then Summarize or Summarize then Classify Melvin F. Janowitz DIMACS, Rutgers University

Gather and Summarize Data Gather and Summarize Data 1 Introductions Introductions Audience

Calculating the Average and SD in R group_by() and summarize() # group and summarize data

Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017

What is frame busting? What is frame busting? HTML allows for any site to frame any URL with an

Frame Relay Topologies and Designs Frame Relay Topologies and Design As we learned in the Frame

Money Manipulation &amp; the Effects on the International -Spencer Houston Community Definition

Recap Hashing-based sketch techniques summarize large data sets Summarize vectors: Test

FRAME- -DRAGGI NG DRAGGI NG FRAME (GRAVI TOMAGNETI SM) (GRAVI TOMAGNETI SM) AND I TS

Deck Deck Frame Frame DeckFrame Deck Frame is the utilization of VP Buildings

The Frame of the p -Adic Numbers Francisco Avila June 27, 2017 Francisco Avila The Frame

Solving Quadratic BSDEs Hlne HIBON 29/06/16 Contents Introduction The convex frame The

Manipulation in Political Stock Manipulation in Political Stock Markets Markets Koleman Strumpf

Recap: Strategic Manipulation We had seen two theorems that show that we cannot rule out strategic

Aggregate your data by category Importing &amp; Managing Financial Data in Python Summarize

Coding Lab: Grouped Data Ari Anisfeld Summer 2020 1 / 22 Grouping data with dplyr Often you

Clustering Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics

Spatial Mapping of Multivariate Spatial Mapping of Multivariate Profiles Profiles John Molitor

Growth and Survival Today we will be... Exploring how humans grow as they get older.

The leaflet . e x tras Package IN TE R AC TIVE MAP S W ITH L E AFL E T IN R Rich Majer u s

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map Aggregate Map Reduce

1. Consider the wholesale data in the sheet Wholesale. (a) For the grocery sales in region

Semantic Data Placement for Power Management in Archival Storage Avani Wildani &amp; Ethan L.

Sambuz

Useful Links

Newsletter

Mail Us

Money Manipulation & the Effects on the International -Spencer Houston Community Definition

Aggregate your data by category Importing & Managing Financial Data in Python Summarize

Semantic Data Placement for Power Management in Archival Storage Avani Wildani & Ethan L.