Manage your analyses workflows with the drake R package Grenoble R - PowerPoint PPT Presentation

Manage your analyses workflows with the drake R package Grenoble R Users Group Xavier Laviron December 6, 2018

A data analyst’s job 1/23

A data analyst’s job 2 options: ◮ Run everything from scratch (simple, but can be too long...) ◮ Track the dependencies between your objects (boring, perfect job for a pipeline toolkit...) 5/23

The drake package is here to help you Why use drake ? ◮ Keeps track of dependencies in your code ◮ Keeps track of changes in your code ◮ Runs only what needs to be run, and skip the rest ◮ It has a cool name :-) 6/23

The drake package is here to help you Why use drake ? ◮ Keeps track of dependencies in your code ◮ Keeps track of changes in your code ◮ Runs only what needs to be run, and skip the rest ◮ It has a cool name :-) In other words, ‘drake‘ can save a lot of time! * * more time for coffee breaks 6/23

drake tracks changes in functions Encapsulate your code in functions: # Process the data process_data <- function(raw.data) { raw.data[raw.data$Sepal.Length > 5, ] } # fit a model fit_model <- function(data) { lm(Sepal.Length ~ Petal.Width + Species, data = data) } # create plots create_plot <- function(data) { ggplot(data, aes(x = Petal.Width, fill = Species)) + geom_histogram() } 7/23

The plan The central piece of ‘drake‘: the workflow plan The plan is a simple data.frame with two columns: ◮ target : the objects you want to build ◮ command : the functions to build them 8/23

The plan The central piece of ‘drake‘: the workflow plan The plan is a simple data.frame with two columns: ◮ target : the objects you want to build ◮ command : the functions to build them Different ways to create the plan: ◮ Like any data.frame : data.frame() , expand.grid() , ... ◮ With one of drake’s helper functions: drake_plan() , evaluate_plan() , ... 8/23

The drake_plan() function Usage: drake_plan(target1 = command1, target2 = command2, ...) 9/23

The drake_plan() function Usage: drake_plan(target1 = command1, target2 = command2, ...) my.plan <- drake_plan(raw.data = read.csv(file_in("data/raw_data.csv")), proc.data = process_data(raw.data), plot = create_plot(proc.data), model = fit_model(proc.data), report = render(input = knitr_in("report.Rmd"), output_file = file_out("report.pdf"), quiet = TRUE)) 9/23

The drake_plan() function print(my.plan) ## # A tibble: 5 x 2 ## target command ## * <chr> <chr> ## 1 raw.data read.csv(file_in('data/raw_data.csv')) ## 2 proc.data process_data(raw.data) ## 3 plot create_plot(proc.data) ## 4 model fit_model(proc.data) ## 5 report "render(input = knitr_in('report.Rmd'), output_file = file_ou~ 10/23

Files dependencies Files are not tracked by drake , you have to declare them explicitly as dependencies: ◮ file_in("some_data.csv") : an input file ◮ file_out("some_data.Rds") : an output file ◮ knitr_in("report.Rmd") : an rmarkdown file, drake will scan it to find its dependencies 11/23

The dependency graph vis_drake_graph(drake_config(my.plan), from = "raw.data") Dependency graph 12/23

Dependency graph The make() command The central command of drake , runs everything that needs to run. make(my.plan) 13/23

The make() command The central command of drake , runs everything that needs to run. make(my.plan) vis_drake_graph(drake_config(my.plan), from = "raw.data") Dependency graph 13/23

Accessing the objects All objects are stored in a hidden cache ( .drake/ ). To access them: loadd(model) model <- readd(model) 14/23

Accessing the objects All objects are stored in a hidden cache ( .drake/ ). To access them: loadd(model) model <- readd(model) print(readd(model)) ## ## Call: ## lm(formula = Sepal.Length ~ Petal.Width + Species, data = data) ## ## Coefficients: ## (Intercept) Petal.Width Speciesversicolor ## 5.13118 0.65802 -0.01955 ## Speciesvirginica ## 0.15373 14/23

Dependency graph An update in the code! What happens if we modify a function? create_plot <- function(data) { ggplot(data, aes(x = Petal.Width, y = Sepal.Width, fill = Species)) + geom_point() } 15/23

An update in the code! What happens if we modify a function? create_plot <- function(data) { ggplot(data, aes(x = Petal.Width, y = Sepal.Width, fill = Species)) + geom_point() } vis_drake_graph(drake_config(my.plan), from = "raw.data") Dependency graph 15/23

Other advantages Reproducibilty You have proof of what is done: make(my.plan) ## All targets are already up to date. 16/23

Other advantages Independant replication is made easy ◮ Your code is separated into functions: more readability and maintainability ◮ The plan allows an independent user to easily understand the analyses ◮ Restart everything from scratch easily: outdated(drake_config(my.plan)) ## character(0) clean() outdated(drake_config(my.plan)) ## [1] "model" "plot" "proc.data" "raw.data" "report" 17/23

Parallelization ◮ drake can manage multi-core computing (on a local machine or a HPC) ◮ Simply change the jobs argument of make() : make(my.plan, jobs = 2) 18/23

Parallelization ◮ drake can manage multi-core computing (on a local machine or a HPC) ◮ Simply change the jobs argument of make() : make(my.plan, jobs = 2) ◮ drake will automatically know which targets can be run in parallel and which cannot 18/23

Ressources To go further https://github.com/ropensci/drake ◮ Online documentation ◮ Cheatsheet ◮ FAQ The package is in active development and there are a lot of other functionnalities 19/23

Exercices There exists a bunch of built-in examples, you can list them with: drake_examples() And then load one with: drake_example("example_name") This will create a directory with all the necessary files, that you can open in the IDE of your choice (Rstudio, vim, ...). 20/23

Exercice: The basic example The most accessible example for beginners drake_example("main") 21/23

Exercice: The mtcars example This chapter is a walkthrough of drake ’s main functionality based on the mtcars example. It sets up the project and runs it repeatedly to demonstrate drake ’s most important functionality. drake_example("mtcars") 22/23

Exercice: An analysis of R package download trends This example explores R package download trends using the cranlogs package, and it shows how drake ’s custom triggers can help with workflows with remote data sources. drake_example("packages") 23/23

Manage your analyses workflows with the drake R package Grenoble R - PowerPoint PPT Presentation

Manage your analyses workflows with the drake R package Grenoble R Users Group Xavier Laviron December 6, 2018 A data analysts job 1/23 A data analysts job 2/23 A data analysts job 3/23 A data analysts job 4/23 A data

J F DRAKE MIDDLE DESIGN PRESENTATION CARY WOODS ELEMENTARY & JF DRAKE MIDDLE \

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

CD-1 Rezoning Application 530 & 575 Drake Street Public Hearing January 24, 2017 Site

Stacey Henson-Drake LAC447 Fall 2016 Stacey Henson-Drake Past & Present Safe

Drake University Law School study FRANCE Jolaine R. Sweiger jolaine.sweiger@drake.edu Student

SETI and Consciousness Dr. Matthew Colborn The Drake Equation Does the Drake Equation need an

NHSN for COVID-19 LTC May 27, 2020 Margaret Drake, MT (ASCP) CIC Magaret.drake@nebraska.gov

Automate your workflows with Kotlin Fosdem - 2020 1 Automate your workflows with Kotlin

BIL Manage Invest Structuring and servicing your investment fund needs Introducing BIL Manage

Package Managers CC-BY-SA 2016 Nate Levesque What is a Package Manager? A package manager or

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Workflows Description, Workflows Description, Enactment and Monitoring in Enactment and

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

Convergence of computation and data workflows IS-ENES Workshop on Workflows and Metadata

Achieving Coordination Through Dynamic Construction of Open Workflows Louis Thomas, Justin

Cirrus: A Serverless Framework for End-to-end ML Workflows Joao Carreira , Pedro Fonseca, Alexey

HexPADS: a platform to detect stealth attacks Mathias Payer (@gannimo), Purdue University

Accessing Data in the Cloud Using SAS to read data from Amazon Simple Storage Service (S3)

SB&WRC Project Facility Presentation Sheet: Brighton Waste House Pilot Site April 2019 1.

Detecting routing anomalies using RIPE Atlas Todor Yakimov Graduate School of Informatics

Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos

October 14, 2017, Vienna, Austria Contents of the presentation I. Introduction II.

The Specialist Committee on Stability in Waves Final Report and Recommendations to the 25th ITT C

Deteriorating Residential Concrete Foundations Call in number: 1.888.331.8226 Access code:

Manage your analyses workflows with the drake R package Grenoble R - PowerPoint PPT Presentation

Manage your analyses workflows with the drake R package Grenoble R Users Group Xavier Laviron December 6, 2018 A data analysts job 1/23 A data analysts job 2/23 A data analysts job 3/23 A data analysts job 4/23 A data

J F DRAKE MIDDLE DESIGN PRESENTATION CARY WOODS ELEMENTARY &amp; JF DRAKE MIDDLE \

Importing data Peter Humburg Statistician, Macquarie University DataCamp ChIP-seq Workflows in

CD-1 Rezoning Application 530 &amp; 575 Drake Street Public Hearing January 24, 2017 Site

Stacey Henson-Drake LAC447 Fall 2016 Stacey Henson-Drake Past &amp; Present Safe

Drake University Law School study FRANCE Jolaine R. Sweiger jolaine.sweiger@drake.edu Student

SETI and Consciousness Dr. Matthew Colborn The Drake Equation Does the Drake Equation need an

NHSN for COVID-19 LTC May 27, 2020 Margaret Drake, MT (ASCP) CIC Magaret.drake@nebraska.gov

Automate your workflows with Kotlin Fosdem - 2020 1 Automate your workflows with Kotlin

BIL Manage Invest Structuring and servicing your investment fund needs Introducing BIL Manage

Package Managers CC-BY-SA 2016 Nate Levesque What is a Package Manager? A package manager or

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Workflows Description, Workflows Description, Enactment and Monitoring in Enactment and

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

Convergence of computation and data workflows IS-ENES Workshop on Workflows and Metadata

Achieving Coordination Through Dynamic Construction of Open Workflows Louis Thomas, Justin

Cirrus: A Serverless Framework for End-to-end ML Workflows Joao Carreira , Pedro Fonseca, Alexey

HexPADS: a platform to detect stealth attacks Mathias Payer (@gannimo), Purdue University

Accessing Data in the Cloud Using SAS to read data from Amazon Simple Storage Service (S3)

SB&amp;WRC Project Facility Presentation Sheet: Brighton Waste House Pilot Site April 2019 1.

Detecting routing anomalies using RIPE Atlas Todor Yakimov Graduate School of Informatics

Self-supervised Learning of Interpretable Keypoints from Unlabelled Videos

October 14, 2017, Vienna, Austria Contents of the presentation I. Introduction II.

The Specialist Committee on Stability in Waves Final Report and Recommendations to the 25th ITT C

Deteriorating Residential Concrete Foundations Call in number: 1.888.331.8226 Access code:

J F DRAKE MIDDLE DESIGN PRESENTATION CARY WOODS ELEMENTARY & JF DRAKE MIDDLE \

CD-1 Rezoning Application 530 & 575 Drake Street Public Hearing January 24, 2017 Site

Stacey Henson-Drake LAC447 Fall 2016 Stacey Henson-Drake Past & Present Safe

SB&WRC Project Facility Presentation Sheet: Brighton Waste House Pilot Site April 2019 1.