An introduction to WS 2019/2020 Which expression(s) equal to TRUE? ( x equals 5) Basics of Algorithmics in R a ) T R U E | F A L S E b ) x > 5 c ) F A L S E & T R U E d ) x < = 1 0 | x > 5 Answer: a) and d) Dr. Noémie Becker What is the value of y at the end of the loop if it was 0 and the beginning? How many Dr. Eliza Argyridou iterations of the loop occurred? while (y <= 10) { y <- 2*y + 3} Answer: y = 21; the loop ran 3 times. Special thanks to : Dr. Sonja Grath for addition to slides What you should know after days 7 & 8 Basics Syntax: Review: Data frames and import your data m y f u n < - f u n c t i o n ( a r g 1 , a r g 2 , … ) { c o m m a n d s } Conditional execution in R Example: ● Logic rules We want to define a function that takes a DNA sequence as input and ● if(), else(), ifelse() gives as output the GC content (proportion of G and C in the sequence). ● Example from day 1 How can we name our function? Loops Idea: g c Executing a command from a script ? g c # T h e r e i s a l r e a d y a f u n c t i o n g c ( ) Writing your own functions Another idea: g c C o n t e n t How to avoid slow R code ? g c C o n t e n t N o d o c u m e n t a t i o n f o r ‘ g c C o n t e n t ’ i n s p e c i fj e d p a c k a g e s a n d l i b r a r i e s : y o u c o u l d t r y ‘ ? ? g c C o n t e n t ’ → We can name our function g c C o n t e n t ( ) 3 4 Our function gcContent() from Day 1 Dealing with problems Version 1 Problems: ● R gives an error message if the input is not a character value g c C o n t e n t < - f u n c t i o n ( d n a , c o u n t e r = 0 ) { ● Our function calculates values if the input is most likely not a DNA d n a < - u n l i s t ( s t r s p l i t ( d n a , " " ) ) sequence f o r ( i i n 1 : l e n g t h ( d n a ) ) { i f ( d n a [ i ] = = " C " | d n a [ i ] = = " G " ) How could we deal with these problems? { c o u n t e r = c o u n t e r + 1 } } What do we want our function to output in these cases? r e t u r n ( c o u n t e r / l e n g t h ( d n a ) ) } Does our function works correctly? # T e s t t h e f u n c t i o n w i t h s o m e e x a m p l e d a t a g c C o n t e n t ( " A A C G T G G C T A " ) YOUR TURN g c C o n t e n t ( " A A T A T A T T A T " ) g c C o n t e n t ( 2 3 ) g c C o n t e n t ( T R U E ) g c C o n t e n t ( " n o t D N A " ) g c C o n t e n t ( " C o o l " ) 5 6
Error and Warning Dealing with non-character arguments Self-defined There are two types of error messages in R: Version 2: error message g c C o n t e n t < - f u n c t i o n ( d n a , c o u n t e r = 0 ) { ● Error : Stops execution and returns no value i f ( ! i s . c h a r a c t e r ( d n a ) ) { ● Warning message: Continues execution s t o p ( " T h e a r g u m e n t m u s t b e o f t y p e c h a r a c t e r . " ) } Example: d n a < - u n l i s t ( s t r s p l i t ( d n a , " " ) ) x < - s u m ( " h e l l o " ) f o r ( i i n 1 : l e n g t h ( d n a ) ) { E r r o r i n s u m ( " h e l l o " ) : i n v a l i d ' t y p e ' ( c h a r a c t e r ) o f i f ( d n a [ i ] = = " C " | d n a [ i ] = = " G " ) a r g u m e n t { c o u n t e r = c o u n t e r + 1 } } x < - m e a n ( " h e l l o " ) r e t u r n ( c o u n t e r / l e n g t h ( d n a ) ) W a r n i n g m e s s a g e : I n m e a n . d e f a u l t ( " h e l l o " ) : } a r g u m e n t i s n o t n u m e r i c o r l o g i c a l : r e t u r n i n g N A We can define such messages with the functions s and w t o p ( ) a r n i n g ( ) In our example: ● (Specific) Error when argument is not character ● Warning if character argument is not DNA 7 8 Dealing with input that is not DNA Dealing with input that is not DNA ● We define as 'not DNA' any character different from A, C, G or T. Version 3 ● If the input contains any other character, we compute the value but throw g c C o n t e n t < - f u n c t i o n ( d n a , c o u n t e r = 0 ) { a warning. i f ( ! i s . c h a r a c t e r ( d n a ) ) { s t o p ( " T h e a r g u m e n t m u s t b e o f t y p e c h a r a c t e r . " ) To solve this task, we can use the function g as follows: } r e p ( ) i f ( l e n g t h ( g r e p ( " [ ^ A C G T ] " , d n a ) ) > 0 ) { g r e p ( " [ ^ A C G T ] " , " A A T G A C " ) w a r n i n g ( " T h e i n p u t c o n t a i n s c h a r a c t e r s o t h e r t h a n A , I n t e g e r ( 0 ) # l e n g t h i s 0 C , G o r T - v a l u e s h o u l d n o t b e t r u s t e d ! " ) g r e p ( " [ ^ A C G T ] " , " N A T G A C " ) } [ 1 ] 1 # l e n g t h i s 1 d n a < - u n l i s t ( s t r s p l i t ( d n a , " " ) ) Self-defined f o r ( i i n 1 : l e n g t h ( d n a ) ) { warning message i f ( d n a [ i ] = = " C " | d n a [ i ] = = " G " ) { c o u n t e r = c o u n t e r + 1 } } r e t u r n ( c o u n t e r / l e n g t h ( d n a ) ) } 9 10 Giving several arguments to a function Giving several arguments to a function R functions can have several arguments. Version 4 You can see them listed in the help page for the function. g c C o n t e n t < - f u n c t i o n ( d n a , c o u n t e r = 0 , A T ) { i f ( ! i s . c h a r a c t e r ( d n a ) ) { Example s t o p ( " T h e a r g u m e n t m u s t b e o f t y p e c h a r a c t e r . " ) } ? m e a n ( ) i f ( l e n g t h ( g r e p ( " [ ^ A C G T ] " , d n a ) ) > 0 ) { A frequent argument in R functions is na.rm. This argument (when set to w a r n i n g ( " T h e i n p u t c o n t a i n s c h a r a c t e r s o t h e r t h a n A , TRUE) removes NA values from vectors. C , G o r T - v a l u e s h o u l d n o t b e t r u s t e d ! " ) } d n a < - u n l i s t ( s t r s p l i t ( d n a , " " ) ) m e a n ( c ( 1 , 2 , N A ) ) f o r ( i i n 1 : l e n g t h ( d n a ) ) { [ 1 ] N A YOUR TURN i f ( d n a [ i ] = = " C " | d n a [ i ] = = " G " ) { c o u n t e r = c o u n t e r + 1 } m e a n ( c ( 1 , 2 , N A ) , n a . r m = T R U E ) } [ 1 ] 1 . 5 i f ( A T = = T R U E ) { r e t u r n ( 1 - c o u n t e r / l e n g t h ( d n a ) ) } e l s e { We now want to give our function another argument to output the AT r e t u r n ( c o u n t e r / l e n g t h ( d n a ) ) content instead of the GC content. } } 11 12
Recommend
More recommend