Whats the Problem? What does it mean to collect provenance when you - PowerPoint PPT Presentation

Provenance in the Wild Peter Macko, Margo Seltzer June 14, 2012

What’s the Problem? • What does it mean to collect provenance when you don’t control: – The data (types, format, organization, structure) – The operators – The environment in which its processed • Can you impose/ extract any semantic meaning to provenance when it’s collected by a herd of cats? http://www.newsrealblog.com/wp-content/uploads/2011/04/Herding-Cats.jpg June 2011 2

What do the Cats do? • They use data in arbitrary formats – Flat files – Unstructured, semi-structured, badly-structured – Proprietary formats – The cram twelve different kinds of data into a single container. • Transformations are arbitrary code – Pick your favorite turing-complete language. – Apply said language to data. – Transformations can depend on the environment. – Repeat • They move data around – Download objects from the web – Copy, rename objects – Replace objects • They install new software – New programs – New libraries – New compilers June 2011 3

A Proposed Architecture Applications Cmd line In multiple languages Language Python Perl R Java C adapters Provenance Library C++ Database DB adapter DB adapter DB adapter ODBC driver DB adapter adapters Provenance Store With multiple implementations Hbase Riak BDB MySQL SPARQL/RDF PostgreSQL adapter June 2011 4 4store

Why do we think this is a good idea? • Heterogeneous environments are the norm. • Provenance must span those environments. • Users and/or applications can: – create connections that are implicit or unobservable by software systems. – Integrate both static and dynamic dependencies. Bring provenance to the users rather than the users to the provenance. June 2012 5

Basic Use Model • Connect to the library: cpl_attach � • Disclose provenance – Create/lookup objects: cpl_create_object, cpl_lookup_object � – Disclose data flow: cpl_data_flow � – Disclose control flow: cpl_control_flow � – Add properties to objects: cpl_add_property � • Disconnect from the library: cpl_detach � June 2012 6

Naming • Goal is to allow interoperability with minimal coordination. • Objects are identified by three parameters: – Namespace: the application or system component that “owns” the object. Examples: OS, a specific database, workflow engine or application, or a project. – Name: local name (unique within a namespace) – Type: file, process, or namespace-specific type – Version: cycle avoidance algorithm create versions June 2012 7

Additional Automatic Capture • Capture object creation MAC address so that we can transmit provenance across a network (and still identify it). • Capture provenance of provenance – Ties provenance to a specific instance of an application (e.g., a process). – Results in capture of command line arguments (e.g., size of the Java heap). June 2012 8

Use Case: GraphDB Bench • A benchmark suite (and lots of experiments) to evaluate absolute and relative performance of graph databases. • Instrument flow from the graph database to the benchmark operators to results. • Modifications: 270 lines of code (out of 7500 total) – Most is cut and paste • Result: every csv result file has provenance indicating which operations were run, what the source database was, etc. • Helped us debug benchmark suite, identify missing benchmark results, etc. • Integration with scripts led us to develop command-line tool to track directory creation, file copies, etc. June 2012 9

Discussion • Won’t this free for all lead to semantically meaningless provenance? – Some provenance is better than no provenance. – Users/application developers who care are likely to provide more semantically meaningful provenance than is available by less flexible systems. • What do you do about missing provenance? – Some provenance is better than no provenance. – “Downstream” applications can connect upstream to bypass provenance oblivious applications. • Bottom line: We make rope – make it possible to have provenance without requiring that analysts or programmers use specific languages or tools. June 2012 10

Whats the Problem? What does it mean to collect provenance when you - PowerPoint PPT Presentation

Provenance in the Wild Peter Macko, Margo Seltzer June 14, 2012 Whats the Problem? What does it mean to collect provenance when you dont control: The data (types, format, organization, structure) The operators The

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Problems Problem Spaces Problems, Problem Spaces, and Search Ahmed Rafea Ahmed Rafea Problem

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Last time: Problem-Solving Problem solving: Goal formulation Problem formulation

Computational Aesthetics CS 294-69 Final Project Armin Samii Tim Althoff Problem Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

The Problem with Problem-Solving Dr. Ashley Nahornick, George Brown College Introduction:

Reduction Informal Definition A problem A is reducible to problem B iff the solution to problem B

Consciousness (cont.) Phil 255 The hard problem The hard problem is the mind - body problem

Weber Problem Louis Luangkesorn University of Pittsburgh June 22, 2009 Weber Problem

Chapter Two Problem Solving Using Search Defining the Problem How do you represent a problem

Problem Solving and Search Chapter 3 Outline Problem-solving agents Problem formulation

Projects 3-4 person groups preferred Deliverables: Poster & Report & main code (plus

Sorting a Permutation by Cut-and-Paste Moves Daniel W. Cranston Virginia Commonwealth University

Sta$cDetec$onofSecurity Vulnerabili$esinScrip$ngLanguages

A call to ac;on 25 years is too long to wait for

ACT-IAC Evolving the Workforce Community of Interest Topic: Fair and Effective Hiring: Are

CUTTING AND PASTING MANIFOLDS FROM THE ALGEBRAIC POINT OF VIEW ANDREW RANICKI (Edinburgh)

Perturbative topological field theory with Segal-like gluing Pavel Mnev Max Planck Institute for

EMSE Academic Honesty Seminar* *Belter, R. W. & du Pre, A. (2009). A strategy to reduce

Sambuz

Useful Links

Newsletter

Mail Us

Whats the Problem? What does it mean to collect provenance when you - PowerPoint PPT Presentation

Provenance in the Wild Peter Macko, Margo Seltzer June 14, 2012 Whats the Problem? What does it mean to collect provenance when you dont control: The data (types, format, organization, structure) The operators The

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Problems Problem Spaces Problems, Problem Spaces, and Search Ahmed Rafea Ahmed Rafea Problem

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Last time: Problem-Solving Problem solving: Goal formulation Problem formulation

Computational Aesthetics CS 294-69 Final Project Armin Samii Tim Althoff Problem Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

The Contextual Bandits Problem The Contextual Bandits Problem The Contextual Bandits Problem The

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

The Problem with Problem-Solving Dr. Ashley Nahornick, George Brown College Introduction:

Reduction Informal Definition A problem A is reducible to problem B iff the solution to problem B

Consciousness (cont.) Phil 255 The hard problem The hard problem is the mind - body problem

Weber Problem Louis Luangkesorn University of Pittsburgh June 22, 2009 Weber Problem

Chapter Two Problem Solving Using Search Defining the Problem How do you represent a problem

Problem Solving and Search Chapter 3 Outline Problem-solving agents Problem formulation

Projects 3-4 person groups preferred Deliverables: Poster &amp; Report &amp; main code (plus

Sorting a Permutation by Cut-and-Paste Moves Daniel W. Cranston Virginia Commonwealth University

Sta$cDetec$onofSecurity Vulnerabili$esinScrip$ngLanguages

A call to ac;on 25 years is too long to wait for

ACT-IAC Evolving the Workforce Community of Interest Topic: Fair and Effective Hiring: Are

CUTTING AND PASTING MANIFOLDS FROM THE ALGEBRAIC POINT OF VIEW ANDREW RANICKI (Edinburgh)

Perturbative topological field theory with Segal-like gluing Pavel Mnev Max Planck Institute for

EMSE Academic Honesty Seminar* *Belter, R. W. &amp; du Pre, A. (2009). A strategy to reduce

Sambuz

Useful Links

Newsletter

Mail Us

Projects 3-4 person groups preferred Deliverables: Poster & Report & main code (plus

EMSE Academic Honesty Seminar* *Belter, R. W. & du Pre, A. (2009). A strategy to reduce