chaining r markdown
play

Chaining, R Markdown Steve Bagley somgen223.stanford.edu 1 cw - PowerPoint PPT Presentation

Chaining, R Markdown Steve Bagley somgen223.stanford.edu 1 cw <- read_csv ( str_c (data_dir, "cw.csv")) data_dir <- "https://somgen223.stanford.edu/data/" Set up cw somgen223.stanford.edu 2 1 4 # A tibble: 4 x 1


  1. Chaining, R Markdown Steve Bagley somgen223.stanford.edu 1

  2. cw <- read_csv ( str_c (data_dir, "cw.csv")) data_dir <- "https://somgen223.stanford.edu/data/" Set up cw somgen223.stanford.edu 2

  3. 1 4 # A tibble: 4 x 1 diet < dbl > 1 2 2 distinct (cw, diet) 3 3 4 distinct : how many different diets? • distinct returns a new data frame with all duplicate rows, as determined by specified column or columns, removed. somgen223.stanford.edu 3

  4. 5 5 10 9 9 8 8 7 7 6 6 distinct (cw, chick) 4 # ... with 40 more rows 4 3 3 2 2 1 1 < dbl > chick # A tibble: 50 x 1 10 How many different chicks? somgen223.stanford.edu 4

  5. length ( pull ( distinct (cw, chick), chick)) [1] 50 How many different chicks? • pull returns a data frame column as a vector. somgen223.stanford.edu 5

  6. length ( pull ( distinct (cw, chick), chick)) x1 <- distinct (cw, chick) x2 <- pull (x1, chick) length (x2) Chaining: combining a sequence of data frame function calls • In the first expression, the functions are executed “inside out”: first distinct , then pull , then length . That can be a little hard to follow. • In the second series of expressions, we use temporary variables to store intermediate results. somgen223.stanford.edu 6

  7. cw %>% distinct (chick) %>% pull (chick) %>% length () Chaining using the pipe operator • We can use a new operator, %>% , to “pipe” the result from the first function call to the second function call, and then from that to the third function call …. • In English: 1. start with cw 2. pass it to distinct 3. pass that result to pull 4. pass that result to length somgen223.stanford.edu 7

  8. Keyboard help • The pipe operator %>% is a bit ugly, but RStudio will insert it for you. • Mac: Command-Shift-M • Windows/Linux: Ctrl-Shift-M somgen223.stanford.edu 8

  9. df1 %>% fun (x) fun (df1, x) df1 %>% fun (x) %>% fun2 (y, z) Pipe: technical details is converted into: • The object being piped in is used as the first argument of fun . • The tidyverse functions are consistently designed so that the first argument is a data frame, and the result is a data frame. • If fun produces a data frame, we can pass it along to the next function: somgen223.stanford.edu 9

  10. 3 bod %>% 1 6 0.0971 10.3 2 5 0.0641 15.6 5 4 0.0625 16 4 0.0526 0.120 19 3 2 0.0505 19.8 7 1 < dbl > < dbl > < dbl > Time demand inv_demand # A tibble: 6 x 3 arrange (inv_demand) mutate (inv_demand = 1 / demand) %>% 8.3 Another chaining example somgen223.stanford.edu 10

  11. Reproducible analysis and RMarkdown somgen223.stanford.edu 11

  12. Reproducible analysis and RMarkdown The goal of reproducible analysis is to produce a computational artifact that others can view, scrutinize, test, and run, to convince themselves that your ideas are valid. (It’s also good for you to be as skeptical of your work.) This means you should write code to be run more than once and by others. Doing so requires being organized in several ways: • Combining text with code (the focus of this module) • Project/directory organization • Version control somgen223.stanford.edu 12

  13. The problem • You write text (in a word processor). • You write code (in RStudio or similar) to compute with data and produce output and graphics. • These are performed using different software. • So when integrating both kinds of information into a notebook, report, or publication, it is very easy to make mistakes, copy/paste the wrong version, and have information out of sync. somgen223.stanford.edu 13

  14. A solution • Write text and code in the same file. • Use special syntax to separate text from code. • Use special syntax for annotating the text with formatting operations (if desired). • RStudio can then: 1. run the code blocks, 2. insert the output and graphs at the correct spot in the text, 3. then call a text processor for final formatting. • This whole process is called “knitting”. somgen223.stanford.edu 14

  15. ```{r} ## your code goes here, eg: 1 + 2 ``` The special syntax for code blocks • Special syntax groups successive lines of code into chunks. • This is a bit ugly, but RStudio will insert it for you. • Mac: Command-Option-I • Windows/Linux: Ctrl-Alt-I somgen223.stanford.edu 15

  16. Evaluating RMarkdown • Use the command Run Current Chunk (the little green right arrow at the top of the chunk) to evaluate. • Mac: Command-Shift-Return • Windows/Linux: Ctrl-Shift-Enter • There are more commands under the Run menu. • Use the command Knit to convert the entire document into • html • pdf (only if you have latex installed) • Word docx somgen223.stanford.edu 16

  17. R Markdown: The special syntax for formatting text • RStudio supports a simple and easy-to-use format called “R Markdown”. • This is a very simple markup language: • use * or _ around italics. • use ** or __ around bold. • Markdown Quick Reference (RStudio internal help) • Introduction to R Markdown • R Markdown web page • R Markdown Cheat Sheet somgen223.stanford.edu 17

  18. Homework • You need to submit homework as a single pdf file. • Please use formatting to clearly identify the part of each question. • Make sure you put all the necessary library calls at the top of your file. You need to do this even if you have loaded the package into your current R session. • If you have TeX installed on your system, RStudio can export directly to pdf. • If you do not have TeX installed on your system, then: 1. Export to html format. 2. Select “Open in Browser”. 3. In your browser, File > Print, and select pdf as the destination. • See the course website for more instructions for how to upload to gradescope. somgen223.stanford.edu 18

  19. Reading • Read: 18 Pipes | R for Data Science (skim whole chapter) • Read: 27 R Markdown | R for Data Science (skim whole chapter) somgen223.stanford.edu 19

Recommend


More recommend