[1] Managing Messes in [2] Computational Notebooks [6] [3] Andrew Head · Fred Hohman · Titus Barik · Steven M. Drucker · and Robert DeLine [7] UC Berkeley · Georgia Tech · Microsoft Research
Computational Notebooks: Code, Text, and Output Rich descriptions Code Output
Notebook Programming Interfaces Abound
Notebook Model of Exploratory Programming 1. Incremental execution
Notebook Model of Exploratory Programming 1. Incremental execution 2. In-situ output
Notebook Model of Exploratory Programming 1. Incremental execution 2. In-situ output 3. Incremental changes
Notebook Model of Exploratory Programming 1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout
Notebook Model of Exploratory Programming 1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout 1 W EEK P ASSES
Notebook Model of Exploratory Programming 1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout
Notebook Model of Exploratory Programming 1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout How did I 1 W EEK L ATER produce this? 1. How did I produce this result?
Notebook Model of Exploratory Programming 1. Incremental execution 2. In-situ output 3. Incremental changes which 4. Control over layout petal_length? How did I 1 W EEK L ATER produce this? 1. How did I produce this result?
Notebook Model of Exploratory Programming Didn't I have a better 1. Incremental execution version of this? 2. In-situ output 3. Incremental changes 4. Control over layout 1 W EEK L ATER 1. How did I produce this result? 2. Didn't I have a better version of this?
Notebook Model of Exploratory Programming 1. Incremental execution 2. In-situ output 3. Incremental changes 4. Control over layout What can I 1 W EEK L ATER get rid of? 1. How did I produce this result? 2. Didn't I have a better version of this? 3. What can I get rid of?
Messes in Computational Notebooks [1] Disappearance [2] Notebooks contain Deleted / overwritten code ugly code and dirty [6] Disorder tricks [Rule et al. 2018] [3] Out-of-order execution 1/2 of notebooks on GitHub [Rule et al. 2018] 31 / 41 surveyed [7] participants had trouble Dispersion finding prior analyses Too many cells [Kery et al. 2018]
Managing Messes in Computational Notebooks How can tools help analysts find, recover, and compare code in messy notebooks? [1] How messes happen C ODE G ATHERING T OOLS [*] Tools in context [ ] Implementation [ ] [ ] Qualitative usability study
C ODE G ATHERING T OOLS Demo 1 W EEK P ASSES
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code How did I produce this?
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code Variables How did I produce this? Outputs
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code How did I produce this?
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code How did I produce this? Request cell subset that produced the result. 1 W EEK P ASSES
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code How did I produce this? Request cell subset that produced the result. 1 W EEK P ASSES
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code How did I produce this? The gathered code is... Request cell subset that produced the result. • reduced • ordered • complete
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code Request cell subset that produced the result. Task 2: Comparing Versions Didn't I have a better version of this?
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code Request cell subset that produced the result. Task 2: Comparing Versions Didn't I have a better version of this? 1 W EEK P ASSES Open a version browser for a result.
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code Request cell subset that produced the result. Task 2: Comparing Versions Didn't I have a better version of this? Open a version browser for a result.
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code Request cell subset that produced the result. Task 2: Comparing Versions Didn't I have a better version of this? Open a version browser for a result.
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code Request cell subset that produced the result. Task 2: Comparing Versions Didn't I have a better version of this? Open a version browser for a result.
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code Request cell subset that produced the result. Task 2: Comparing Versions Didn't I have a better version of this? 1 W EEK P ASSES Open a version browser for a result.
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code Request cell subset that produced the result. Task 2: Comparing Versions Open a version browser for a result. Task 3: Cleaning Notebook What code can I get rid of?
C ODE G ATHERING T OOLS Demo Task 1: Recovering Code Request cell subset that produced the result. Task 2: Comparing Versions Open a version browser for a result. Task 3: Cleaning Notebook What code can I get rid of? ... Request cell subset that produced the result.
C ODE G ATHERING T OOLS Demo How can tools help analysts manage messes in their notebooks? Task 1: Recovering Code Request cell subset that produced the result. Task 2: Comparing Versions Open a version browser for a result. Task 3: Cleaning Notebook ... Request cell subset that produced the result.
Post-Hoc Mess Management Helping analysts clean and navigate their code whether or not they adopted a strategy to version or organize their code.
Managing Messes in Computational Notebooks How can tools help analysts find, recover, and compare code in messy notebooks? [1] How messes happen C ODE G ATHERING T OOLS [2] Tools in context [3] Implementation [*] [ ] Qualitative usability study
Implementation: Slicing Notebooks Notebook 1 some cells missing, cleaned, ordered some cells out-of-order notebooks [10] [ ] [11] [ ] [ ] ? versioned results [ ] [1] [2] [12] [3]
Implementation: Slicing Notebooks Execution Log Notebook 1 2 some cells missing, all cells present, in-order some cells out-of-order [1] · · · [10] [6] [11] [7] · · · execution time [1] [10] [11] [2] [12] [3] [12]
Implementation: Slicing Notebooks Execution Log Notebook 1 2 some cells missing, all cells present, in-order some cells out-of-order [1] · · · [10] [6] [11] [7] · · · execution time [1] [10] [11] [2] [12] [3] [12]
Implementation: Slicing Notebooks Execution Log Program Slices [Weiser '81] Notebook 1 2 3 some cells missing, all cells present, in-order some cells out-of-order [1] · · · [10] [6] [11] [7] · · · execution time [1] [10] [11] [2] [12] [3] [12]
Implementation: Slicing Notebooks Execution Log Program Slices [Weiser '81] Notebook 1 2 3 some cells missing, all cells present, in-order which can be used to make... some cells out-of-order [1] · · · [10] [ ] [6] cleaned, ordered [11] [ ] [7] notebooks [ ] [ ] (preserve cell boundaries and outputs) · · · execution time [1] [10] versioned [11] results [2] (slice all cell [12] versions) [3] [12]
Cleaning and Exploring Messy Notebooks A Sample of Recent Research output recipes artifact explorer cell folding tabbed browsing cell version diffs of cell versions Interactions for Untangling Towards Effective Foraging Aiding Collaborative Reuse of Design and Use of Messy History in a by Data Scientists to Find Computational Notebooks with Computational Computational Notebook Past Analysis Choices Annotated Cell Folding Notebooks Kery et al., VL/HCC '18 Kery et al., CHI '19 Rule et al., CSCW '18 Rule, Ph.D. Thesis, '18
Evaluating Code Gathering Tools Q1 . What is the meaning of "cleaning"? Q2 . How do analysts use code gathering tools during exploratory data analysis?
A Qualitative Study of Gathering Participants : N = 12 professional data analysts Cleaning Task × 2 : Clean a computational notebook, with and without code gathering tools. Exploration : Rank movies in from a movies dataset. Use code gathering tools as you wish.
Q1 . The Meaning of "Cleaning" Picking a subset of cells [P1-P12] ... and removing the rest [P8, P10-12] . "I picked a plot that looked interesting and, if you think of a dependency tree of cells, walked backwards and removed everything that wasn’t necessary." ... And many additional stages: merging cells writing documentation [P11] [P1, P5, P7, P10, P11] polishing visualizations restructuring code [P3, P4, P6, P12] [P1, P6] integrating with version control [P7]
Q2 . How do analysts use code gathering tools during exploratory data analysis? Gathering to a notebook Very useful Highlighting dependencies Somewhat useful Not useful Version browser No basis to answer 0 3 6 9 12 # participants Participants described gathering to a notebook as "beautiful" and "amazing": it "hits the nail on the head."
Some Observed Uses of Gathering Tools Gathering for multiple audiences "Finishing moves" Creating personal references Lightweight branching x
Recommend
More recommend