rearranging and manipulating
play

Rearranging and manipulating h e a d e r = T R U E , n - PDF document

An introduction to WS 2019/2020 m y d a t a < - r e a d . t a b l e ( fj l e = " m y d a t a . t x t " , Rearranging and manipulating h e a d e r = T R U E , n a . s t r i n g


  1. An introduction to WS 2019/2020 m y d a t a < - r e a d . t a b l e ( fj l e = " m y d a t a . t x t " , Rearranging and manipulating h e a d e r = T R U E , n a . s t r i n g s = " n " ) data What was the sign for missing data in mydata.txt? Answer: “n” What is written in the first line of mydata.txt? Dr. Noémie Becker Answer: column names Dr. Eliza Argyridou Is the command correct? Answer: YES! Special thanks to : Dr. Benedikt Holtmann and Dr. SOnja Grath for sharing slides for this lecture What you should know after day 5 What you should know after day 5 Rearranging and manipulating data Rearranging and manipulating data ● Reshaping data ● Reshaping data ● Combining data sets ● Combining data sets ● Making new variables ● Making new variables ● Subsetting data ● Subsetting data ● Summarizing data ● Summarizing data We will work with two particular packages: ● t i d y r ● d p l y r YOUR TURN What do we have to do before we can work with a package in R? (2 things) 3 4 Reshaping data Reshaping data h e a d ( F i s h _ s u r v e y ) We will use data on fish abundance. Note: ● 3 species (trout, perch, stickleback) ● Download the file F from the course page. ● The numbers are abundance values for i s h _ s u r v e y . c s v the species at specific sites Set directory, for example: s e t w d ( " ~ / D e s k t o p / D a y _ 5 " ) To combine the three columns into one column that contains all ● Import the sample data into a variable F : i s h _ s u r v e y species you can use the function gather() from the tidyr package: F i s h _ s u r v e y < - r e a d . c s v ( " F i s h _ s u r v e y . c s v " , l i b r a r y ( t i d y r ) F i s h _ s u r v e y _ l o n g < - g a t h e r ( F i s h _ s u r v e y , h e a d e r = T R U E ) S p e c i e s , A b u n d a n c e , h e a d ( F i s h _ s u r v e y ) 4 : 6 ) 5 6

  2. Reshaping data Reshaping data To convert the data back into a format with separate columns for each F i s h _ s u r v e y _ l o n g < - g a t h e r ( F i s h _ s u r v e y , species, you can use the function spread() from the tidyr package: S p e c i e s , A b u n d a n c e , 4 : 6 ) F i s h _ s u r v e y _ w i d e < - s p r e a d ( F i s h _ s u r v e y _ l o n g , h e a d ( F i s h _ s u r v e y _ l o n g ) S p e c i e s , A b u n d a n c e ) t a i l ( F i s h _ s u r v e y _ l o n g ) 7 8 What you should know after day 5 Combining data Rearranging and manipulating data We now want to combine the information given by three different data ● Reshaping data sets. ● Combining data sets ● Making new variables To combine the data sets we will use the package dplyr: ● Subsetting data ● Summarizing data l i b r a r y ( d p l y r ) F i s h _ s u r v e y . c s v W a t e r _ d a t a . c s v G P S _ d a t a . c s v 9 10 Combining data Which function could we use here? YOUR TURN Functjons to combine data sets in dplyr We can join data sets by using the columns they share. lefu_join(a, b, by = "x1") Joins matching rows from b to a right_join(a, b, by = "x1") Joins matching rows from a to b Fish survey Water characteristjcs GPS inner_join(a, b, by = "x1") Returns all rows from a where there are matching Site Site values in b Site Month Transect full_join(a, b, by = "x1") Joins data and returns all rows and columns Month Transect Latjtude Water temp. Species Longitude O 2 - content semi_join(a, b, by = "x1") All rows in a that have a match in b, keeping just columns from a. antj_join(a, b, by = "x1") All rows in a that do not have a match in b 11 12

  3. Combining data Combining data 1) Join water characteristics to fish abundance data using inner_join() 2) Add GPS locations to new Fish_and_Water data set using inner_join() F i s h _ a n d _ W a t e r < - i n n e r _ j o i n ( F i s h _ s u r v e y _ l o n g , F i s h _ s u r v e y _ c o m b i n e d < - i n n e r _ j o i n ( F i s h _ a n d _ W a t e r , W a t e r _ d a t a , G P S _ l o c a t i o n , b y = c ( " S i t e " , " M o n t h " ) ) b y = c ( " S i t e " , " T r a n s e c t " ) ) 13 14 What you should know after day 5 Adding new variables Rearranging and manipulating data We will use data on bird behaviour. ● Reshaping data ● Combining data sets B i r d _ B e h a v i o u r < - r e a d . c s v ( " B i r d _ B e h a v i o u r . c s v " , ● Making new variables h e a d e r = T R U E , ● Subsetting data s t r i n g s A s F a c t o r s = F A L S E ) ● Summarizing data # G e t a n o v e r v i e w s t r ( B i r d _ B e h a v i o u r ) X1 X2 X1 X2 X3 A 1 A 1 T B 1 B 1 F A 2 A 2 T B 2 B 2 F We want to add the new variable (column) l o g _ F I D 15 16 Adding new variables Adding new variables Three possibilities: The outcome: a) Using $ h e a d ( B i r d _ B e h a v i o u r ) B i r d _ B e h a v i o u r $ l o g _ F I D < - l o g ( B i r d _ B e h a v i o u r $ F I D ) b) Using the [ ] - operator B i r d _ B e h a v i o u r [ , " l o g _ F I D " ] < - l o g ( B i r d _ B e h a v i o u r $ F I D ) c) Using the function mutate() from dplyr package B i r d _ B e h a v i o u r < - m u t a t e ( B i r d _ B e h a v i o u r , l o g _ F I D = l o g ( F I D ) ) 17 18

  4. Adding new variables Combining variables We can split one column into two using the function separate() from We can combine two columns into one using the function unite() from dplyr package: the tidyr package: B i r d _ B e h a v i o u r < - s e p a r a t e ( B i r d _ B e h a v i o u r , B i r d _ B e h a v i o u r < - u n i t e ( B i r d _ B e h a v i o u r , S p e c i e s , " G e n u s _ S p e c i e s " , c ( " G e n u s " , " S p e c i e s " ) , c ( G e n u s , S p e c i e s ) , s e p = " _ " , s e p = " _ " , r e m o v e = T R U E ) r e m o v e = T R U E ) X1 X2.1 X2.2 X1 X2 X1 X2 X1 X2.1 X2.2 A 1 1 A 1_1 A 1_1 A 1 1 B 1 2 B 1_2 B 1_2 B 1 2 A 2 1 A 2_1 A 2_1 A 2 1 B 2 2 B 2_2 B 2_2 B 2 2 19 20 What you should know after day 5 Subsetting data Rearranging and manipulating data You can subset your data with: ● Reshaping data ● Combining data sets • The [ ] -operator ● Making new variables ● Subsetting data • The function subset() ● Summarizing data • With functions from the dplyr package  slice()  filter()  sample_frac()  sample_n()  select() 21 22 Subsetting data with the [ ]-operator Subsetting data with the [ ] and $-operators Examples: Example: # s e l e c t s t h e fj r s t 4 c o l u m n s # s e l e c t s a l l r o w s w i t h m a l e s B i r d _ B e h a v i o u r [ , 1 : 4 ] B i r d _ B e h a v i o u r [ B i r d _ B e h a v i o u r $ S e x = = " m a l e " , ] # s e l e c t s r o w s 2 a n d 3 B i r d _ B e h a v i o u r [ c ( 2 , 3 ) , ] # s e l e c t s t h e r o w s 1 t o 3 a n d c o l u m n s 1 t o 4 B i r d _ B e h a v i o u r [ 1 : 3 , 1 : 4 ] # s e l e c t s t h e r o w s 1 t o 3 a n d 6 , a n d t h e c o l u m n s 1 t o 4 # a n d 8 B i r d _ B e h a v i o u r [ c ( 1 : 3 , 6 ) , c ( 1 : 4 , 8 ) ] 23 24

Recommend


More recommend