Advanced column-oriented methods: _all, _at, _if Steve Bagley - PowerPoint PPT Presentation

Advanced column-oriented methods: _all, _at, _if Steve Bagley somgen223.stanford.edu 1

Different ways to select columns • It is easy to use filter to select rows: the filter expressions can use the values in the columns that are specified by writing the column names. • To use select , we provide the column names. • What if we want to select columns based on some aspect of the column names? • What if we want to select columns based on the values in those columns, such as, all columns that contain at least one NA value? • Somehow, we need to compute the identity of the desired columns. somgen223.stanford.edu 2

## select by computing which columns match a pattern 3 gene_exp2 %>% select_at ( vars ( starts_with ("d1"))) # A tibble: 4 x 2 d1_g1 d1_g2 < dbl > < dbl > 1 1 2 5 3 4 3 6 6 4 gene_exp2 <- read_csv ( str_c (data_dir, "gene_exp2.csv")) 2 5 1 ## select by using the exact names gene_exp2 %>% select (d1_g1, d1_g2) # A tibble: 4 x 2 d1_g1 d1_g2 < dbl > < dbl > 1 3 4 2 3 4 3 6 6 2 select_at : when you can compute the names of the columns somgen223.stanford.edu 3

everything starts_with ends_with contains matches num_range last_col one_of Functions you can use with select_at from the documentation: Function Notes Starts with a prefix Ends with a suffix Contains a literal string Matches a regular expression Matches a numerical range like x01, x02, x03 Matches variable names in a character vector Matches all variables Select last variable, possibly with an offset somgen223.stanford.edu 4

20 4 2 4 1 30 6 6 3 1 gene_exp2 %>% select_at ( vars ( contains ("_"))) 3 40 12121 2 1 10 3 1 1 < dbl > < dbl > < dbl > < dbl > d1_g1 d1_g2 d2_g1 d2_g2 # A tibble: 4 x 4 5 Example of select_at somgen223.stanford.edu 5

1 1 # A tibble: 4 x 2 gene d2_g2 < chr > < dbl > 1 ABC123 gene_exp2 %>% select_at ( vars ( -contains ("_"), last_col ())) 2 ABC123 1 3 DEF234 Example of select_at 4 DEF234 12121 • vars accepts multiple specifications. somgen223.stanford.edu 6

3 4 DEF234 # A tibble: 4 x 3 gene d1_g1 d1_g2 < chr > < dbl > < dbl > 1 ABC123 1 3 5 gene_exp2 %>% select_at ( vars ("gene", starts_with ("d1"))) 4 2 6 6 Example of select_at 2 ABC123 3 DEF234 • Can use exact name of column, as a string. somgen223.stanford.edu 7

2 ABC123 1 1 6 3 DEF234 1 4 gene_exp2 %>% select_at ( vars (1, ends_with ("g2"))) 3 5 12121 1 ABC123 < dbl > < dbl > < chr > d1_g2 d2_g2 gene # A tibble: 4 x 3 4 DEF234 Example of select_at • Can use the number of a column. somgen223.stanford.edu 8

2 ABC123 5 12121 # A tibble: 4 x 3 gene d1_g2 d2_g2 < chr > < dbl > < dbl > 1 ABC123 3 1 gene_exp2 %>% select_at ( vars ( seq (from = 1, to = 5, by = 2))) 4 1 3 DEF234 6 1 Example of select_at 4 DEF234 • This is useful if there is a regular pattern to the columns to want to keep. somgen223.stanford.edu 9

20 4 DEF234 # A tibble: 4 x 3 gene d2_g1 d2_g2 < chr > < dbl > < dbl > 1 ABC123 10 1 40 12121 gene_exp2 %>% select_if ( function (x) any (x > 10)) 1 3 DEF234 30 1 select_if : when you use the contents of the columns 2 ABC123 • The function is applied to the vector containing the contents of the column and returns TRUE to select that column. somgen223.stanford.edu 10

gene_exp2 $ d2_g2 [1] 1 1 1 12121 gene_exp2 $ d2_g2 > 10 [1] FALSE FALSE FALSE TRUE any (gene_exp2 $ d2_g2 > 10) [1] TRUE the anonymous function somgen223.stanford.edu 11

2 ABC123 1 # A tibble: 4 x 3 gene d2_g1 d2_g2 < chr > < dbl > < dbl > 1 ABC123 10 1 gene_exp2 %>% select_if ( function (x) any (x > 10)) 20 1 40 12121 30 select_if (repeated) 3 DEF234 4 DEF234 • Why is the gene column selected? Hint: Is "ABC123" > "10" ? somgen223.stanford.edu 12

20 40 12121 # A tibble: 4 x 3 gene d2_g1 d2_g2 < chr > < dbl > < dbl > 1 ABC123 10 1 2 ABC123 gene_exp2 %>% select_if ( ~any (. > 10)) 1 3 DEF234 30 1 select_if : alternative function syntax 4 DEF234 • Inside a ~ function, . refers to the argument passed in, in this case, each column in succession. • This syntax is often a bit shorter than using function (...) ... somgen223.stanford.edu 13

Same idea for mutate somgen223.stanford.edu 14

mutate_at mutate_at (gene_exp2, vars ( ends_with ("g2")), function (x) - x) 40 -12121 -5 2 4 DEF234 -1 30 -6 6 -1 20 -4 3 < dbl > < dbl > < dbl > gene 10 -3 1 # A tibble: 4 x 5 1 ABC123 < dbl > -1 < chr > d2_g2 d1_g1 d1_g2 d2_g1 2 ABC123 3 DEF234 • This will negate the values in all columns whose name contains the string “g2”. somgen223.stanford.edu 15

mutate_if mutate_if (gene_exp2, is.numeric, function (x) - x) -40 -12121 -5 -2 4 DEF234 -1 -30 -6 -6 -1 -20 -4 -3 < dbl > < dbl > < dbl > gene -10 -3 -1 # A tibble: 4 x 5 1 ABC123 < dbl > -1 < chr > d2_g2 d1_g1 d1_g2 d2_g1 2 ABC123 3 DEF234 • This will negate the values in all columns whose contents are numeric. somgen223.stanford.edu 16

Same idea for rename somgen223.stanford.edu 17

tibble (`this is a col name` = 3 : 4) %>% rename_all ( ~str_replace_all (., " ", "_")) # A tibble: 2 x 1 this_is_a_col_name < int > 1 3 2 4 Replace spaces in column names somgen223.stanford.edu 18

tibble (Col1 = 1 : 2, Col2 = 3 : 4) %>% rename_all ( ~str_to_lower (.)) # A tibble: 2 x 2 col1 col2 < int > < int > 1 1 3 2 2 4 Use all lower case in column names somgen223.stanford.edu 19

Same idea for summarize somgen223.stanford.edu 20

0.640 b 10 7 NA a 8 8 0.233 b 9 9 0.666 a 10 (m <- read_csv ( str_c (data_dir, "missing_df.csv"))) 0.514 b m %>% summarize_all ( ~sum ( is.na (.))) # A tibble: 1 x 3 id weight group < int > < int > < int > 1 0 7 6 1 2 # A tibble: 10 x 3 id weight group < dbl > < dbl > < chr > 1 1 0.114 a 2 0.622 b 6 3 3 0.609 a 4 4 NA b 5 5 0.861 < NA > 2 Count number of NA values in each column somgen223.stanford.edu 21

m %>% summarize_if (is.numeric, ~mean (., na.rm = TRUE)) # A tibble: 1 x 2 id weight < dbl > < dbl > 1 5.5 0.532 Summarize with mean • Summarize by computing the mean of all numeric columns, ignoring NA s. somgen223.stanford.edu 22

# A tibble: 2 x 2 4 2 3 1 1 < int > < int > new_a new_b (d1 <- tibble (a = 1 : 2, b = 3 : 4)) (d2 <- set_names (d1, c ("new_a", "new_b"))) 2 4 2 3 1 1 < int > < int > b a # A tibble: 2 x 2 2 Setting column names • set_names can assign all the columns new names. • Remember to save the new frame. somgen223.stanford.edu 23

Grouping over multiple columns somgen223.stanford.edu 24

Memantine DYRK1A_N 2 Control Saline 4 Control 0.592 DYRK1A_N Saline 3 Control 0.515 (group_by_example <- read_csv ( str_c (data_dir, "group_by_example.csv"))) 0.504 0.590 Memantine DYRK1A_N 1 Control < dbl > < chr > < chr > < chr > expression_value Genotype Treatment gene # A tibble: 4 x 4 DYRK1A_N Get example dataset • This is part of the intermediate result from data_challenge_mouse_protein_expression . somgen223.stanford.edu 25

group_by_example %>% group_by (Treatment) %>% summarize (mean_expression = mean (expression_value)) # A tibble: 2 x 2 Treatment mean_expression < chr > < dbl > 1 Memantine 0.509 0.591 Summarize by Treatment 2 Saline somgen223.stanford.edu 26

group_by_example %>% group_by (gene) %>% summarize (mean_expression = mean (expression_value)) # A tibble: 1 x 2 gene mean_expression < chr > < dbl > 1 DYRK1A_N 0.550 Summarize by gene somgen223.stanford.edu 27

< chr > DYRK1A_N group_by (Treatment, gene) %>% summarize (mean_expression = mean (expression_value)) # A tibble: 2 x 3 # Groups: Treatment [2] Treatment gene mean_expression < chr > group_by_example %>% < dbl > 1 Memantine DYRK1A_N 0.509 0.591 Summarize Treatment, gene 2 Saline • Note the result is grouped by Treatment . • If you summarize a grouped data frame, the last group is removed. somgen223.stanford.edu 28

< chr > 0.591 group_by (gene, Treatment) %>% summarize (mean_expression = mean (expression_value)) # A tibble: 2 x 3 # Groups: gene [1] gene Treatment mean_expression group_by_example %>% < chr > < dbl > 1 DYRK1A_N Memantine 0.509 2 DYRK1A_N Saline Summarize gene, Treatment • Note the result is grouped by gene . somgen223.stanford.edu 29

Advanced column-oriented methods: _all, _at, _if Steve Bagley - PowerPoint PPT Presentation

Advanced column-oriented methods: _all, _at, _if Steve Bagley somgen223.stanford.edu 1 Different ways to select columns It is easy to use filter to select rows: the filter expressions can use the values in the columns that are specified by

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Linear Algebra Vectors A column vector is a list of numbers stored vertically. The dimen-

Vectors and Matrices Vectors Defn. A matrix with one column is called a (column) vector . We

Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi (Yale), Samuel Madden

An improved primal simplex algorithm and column generation for degenerate linear programs

Column Generation Method Frdric Giroire FG Simplex 1/38 Column Generation in Two Words

Comp-304 : Object-Oriented Design What do is mean to be Object Oriented? Computer Science McGill

61A Lecture 15 Announcements Object-Oriented Programming Object-Oriented Programming 4

Shielding & Atomic Radius, Ions & Ionic Radius Chemistry AP Periodic Table Periodic

Meshless Meshless Methods Meshless Meshless Methods Methods Methods Contents

Session 3 Column-Oriented Model: Cassandra, HBase Sbastien Combfis Fall 2019 This work is

Object-Oriented Programming: Static Methods & Variables . . . . . Ewan Klein Inf1 ::

Imperative vs. object- oriented paradigms 1 11/14/17 Imperative vs. object-oriented u

Imperative vs. object- oriented paradigms 1 11/17/14 Imperative vs. object-oriented

Imperative vs. object- oriented paradigms 1 11/11/14 Imperative vs. object-oriented

Object-Oriented Design No SVN checkout today Software development methods Object-oriented

62 nd Annual New Mexico Water Conference Kenneth (KC) Carroll Plant & Environmental Science

COMPENSATION UPDATE Jeff Robinson PAS, Inc. Saline, MI 1-800-553-4655 http://www.pas1.com

THE FRUIT OF THE SPIRIT Longsu ff ering BY CHRIS DAWSON Longsuffering What is it? To be of

Y P O C Methodological considerations T for tDCS O N O D MA Nitsche E S Leibniz

Chapters 3 and 6: Oceans and Climate Gareth E. Roberts Department of Mathematics and Computer

Introduction To Groundwater Concepts Important Concepts Hydrologic Cycle Aquifers Hydraulic

Some Thoughts on CCS, EOR and UCG. L. Bruce Hill, Ph.D. Senior Scientist/Geologist Clean Air

GWA Board Meeting October 17, 2019 Agenda Approval of September Meeting Minutes

Advanced column-oriented methods: _all, _at, _if Steve Bagley - PowerPoint PPT Presentation

Advanced column-oriented methods: _all, _at, _if Steve Bagley somgen223.stanford.edu 1 Different ways to select columns It is easy to use filter to select rows: the filter expressions can use the values in the columns that are specified by

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Linear Algebra Vectors A column vector is a list of numbers stored vertically. The dimen-

Vectors and Matrices Vectors Defn. A matrix with one column is called a (column) vector . We

Column-Stores vs. Row-Stores: How Different Are They Really? Daniel Abadi (Yale), Samuel Madden

An improved primal simplex algorithm and column generation for degenerate linear programs

Column Generation Method Frdric Giroire FG Simplex 1/38 Column Generation in Two Words

Comp-304 : Object-Oriented Design What do is mean to be Object Oriented? Computer Science McGill

61A Lecture 15 Announcements Object-Oriented Programming Object-Oriented Programming 4

Shielding &amp; Atomic Radius, Ions &amp; Ionic Radius Chemistry AP Periodic Table Periodic

Meshless Meshless Methods Meshless Meshless Methods Methods Methods Contents

Session 3 Column-Oriented Model: Cassandra, HBase Sbastien Combfis Fall 2019 This work is

Object-Oriented Programming: Static Methods &amp; Variables . . . . . Ewan Klein Inf1 ::

Imperative vs. object- oriented paradigms 1 11/14/17 Imperative vs. object-oriented u

Imperative vs. object- oriented paradigms 1 11/17/14 Imperative vs. object-oriented

Imperative vs. object- oriented paradigms 1 11/11/14 Imperative vs. object-oriented

Object-Oriented Design No SVN checkout today Software development methods Object-oriented

62 nd Annual New Mexico Water Conference Kenneth (KC) Carroll Plant &amp; Environmental Science

COMPENSATION UPDATE Jeff Robinson PAS, Inc. Saline, MI 1-800-553-4655 http://www.pas1.com

THE FRUIT OF THE SPIRIT Longsu ff ering BY CHRIS DAWSON Longsuffering What is it? To be of

Y P O C Methodological considerations T for tDCS O N O D MA Nitsche E S Leibniz

Chapters 3 and 6: Oceans and Climate Gareth E. Roberts Department of Mathematics and Computer

Introduction To Groundwater Concepts Important Concepts Hydrologic Cycle Aquifers Hydraulic

Some Thoughts on CCS, EOR and UCG. L. Bruce Hill, Ph.D. Senior Scientist/Geologist Clean Air

GWA Board Meeting October 17, 2019 Agenda Approval of September Meeting Minutes

Shielding & Atomic Radius, Ions & Ionic Radius Chemistry AP Periodic Table Periodic

Object-Oriented Programming: Static Methods & Variables . . . . . Ewan Klein Inf1 ::

62 nd Annual New Mexico Water Conference Kenneth (KC) Carroll Plant & Environmental Science