a graph model for data and workflow provenance
play

A graph model for data and workflow provenance Umut Acar, Peter - PowerPoint PPT Presentation

A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney , Natalia Kwasnikowska, Jan van den Bussche, & Stijn Vansummeren TaPP 2010 Provenance in ... Databases Workflows Mainly for (nested)


  1. A graph model for data and workflow provenance Umut Acar, Peter Buneman, James Cheney , Natalia Kwasnikowska, Jan van den Bussche, & Stijn Vansummeren TaPP 2010

  2. Provenance in ... • Databases • Workflows • • Mainly for (nested) Many different systems relational model • Many different models • Where-provenance • ("source location") (converging on OPM?) • • Lineage, why ("witnesses") Graphs/DAGs • • How/semiring model Relatively informal • Relatively formal

  3. Provenance in ... • Databases • Workflows • • Mainly for (nested) Many different systems relational model • Many different models • Where-provenance ????? • ("source location") (converging on OPM?) • • Lineage, why ("witnesses") Graphs/DAGs • • How/semiring model Relatively informal • Relatively formal

  4. This talk • Relate database & workflow "styles" • Develop a common graph formalism • Need a common, expressive language that • supports many database queries • describes some (simple) workflows

  5. Previous work • Dataflow calculus (DFL), based on nested relational calculus (NRC) • Provenance "run" model by Kwasnikowska & Van den Bussche (DILS 07, IPAW 08) • "Provenance trace" model for NRC • by (Acar, Ahmed & C. '08) • Open Provenance Model (bipartite graphs) • (Moreau et al. 2008-9), used in many WF systems

  6. NRC/DFL background • A very simple, functional language: • basic functions +, *,... & constants 0,1,2,3... • variables x,y,z • pair/record types (A:e,...,B:e), π A (e) • collection (set) types • {e,...} e ∪ e {e | x in e'} ∪ e

  7. An example

  8. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)}

  9. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y}

  10. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y} = sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}}

  11. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y} = sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}} = sum {1 * 2, 4 * 5}

  12. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y} = sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}} = sum {1 * 2, 4 * 5} = sum {2,20}

  13. An example • Suppose R = {(1,2,3), (4,5,6), (9,8,7)} sum { x * y | (x,y,z) in R, x < y} = sum { x * y | (x,y,z) in {(1,2,3), (4,5,6)}} = sum {1 * 2, 4 * 5} = sum {2,20} = 22

  14. Another example • In DFL, built-in functions / constants can be whole programs & files, • as in Provenance Challenge 1 workflow: let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} in let Reslices := {reslice(wp) | wp in WarpParams} in softmean(Reslices)

  15. Goal: Define "provenance graphs" for DFL

  16. Goal: Define "provenance graphs" for DFL let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} in let Reslices := {reslice(wp) | wp in WarpParams} in in softmean(Reslices)

  17. Goal: Define "provenance graphs" for DFL let WarpParams := {align_warp(img,hdr}) | (img,hdr) in Inputs} in let Reslices := {reslice(wp) | wp in WarpParams} in in softmean(Reslices) http://www.flickr.com/photos/schneertz/679692806/

  18. First step: values or v c copy v v v elem A 1 or or {} ... <> ... elem A n v v

  19. Example value 1 A <> elem B {} 2 A elem <> B 3

  20. Next step: evaluation nodes ("process") Constants, 1 e primitive c f ... functions e n Variables & e x let x temporary head e body bindings

  21. Pairing A 1 e Record building <> ... e A n Field lookup π A e

  22. Conditionals test test e e if if e e then else Note: Only taken branch is recorded

  23. Sets: basic operations Empty set ∅ Singleton {} e 1 Union e ∪ e 2

  24. Sets: complex operations Flattening e ∪ e head e for x Iteration body ... e body

  25. Provenance graphs • are graphs with "both value and evaluation structure" ./01 " # % +,- * &'( % ! & &'( % # # " # % ! ) $ $ 2/34 ) ' $%&" ./01! 6%4" ! # , ( $%&" ' $%&" ( + (5 6%4" ' # '- ./01" " $%&" ( 2/34 *

  26. A bigger example 0 #$% &'() 0 0 " &'() $-. 1 &'() / ;<=$8 %8$% 8=$8 1 *+, 2+3 8:(%) 2+3 4# 1 98<. &'() 2+3 =8%+! >'.) &'() >'.) &'() &'() 1 ! 98<. =8%+@ 98<. 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 >'.) %8$% &'() / 5678 *+, 2+3 >'.) 0 98<. &'() #'6+) 2+3 #$% 4# %98- @ >'.) 0 &'() " &'() 2+3 2+3 1 1 &'() A 0 $-. ) &'() #$% &'()

  27. Value structure 0 #$% &'() 0 0 " &'() $-. 1 &'() / ;<=$8 %8$% 8=$8 1 *+, 2+3 8:(%) 2+3 4# 1 98<. &'() 2+3 =8%+! >'.) &'() >'.) &'() &'() 1 ! 98<. =8%+@ 98<. 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 >'.) %8$% &'() / 5678 *+, 2+3 >'.) 0 98<. &'() #'6+) 2+3 #$% 4# %98- @ >'.) 0 &'() " &'() 2+3 2+3 1 1 &'() A 0 $-. ) &'() #$% &'()

  28. Value structure 1 C 0 #$% &'() 0 C 0 " &'() $-. 1 F C &'() / ;<=$8 %8$% <> 8=$8 2 {} {} 1 *+, 2+3 8:(%) 2+3 4# 1 C 98<. &'() {} 2+3 =8%+! C >'.) &'() C >'.) C &'() &'() 1 ! 98<. 1 =8%+@ C {} 98<. {} <> {} T 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 C >'.) C %8$% &'() / 5678 *+, 2+3 >'.) {} 0 98<. &'() #'6+) 2+3 1 #$% 4# %98- C @ >'.) 0 &'() C {} " &'() 2+3 2+3 2 1 C 1 &'() A C 0 $-. ) &'() C #$% &'()

  29. Input values 1 C 0 #$% &'() 0 C 0 " &'() $-. 1 F C &'() / ;<=$8 %8$% <> 8=$8 2 {} {} 1 *+, 2+3 8:(%) 2+3 4# 1 C 98<. &'() {} 2+3 =8%+! C >'.) &'() C >'.) C &'() &'() 1 ! 98<. 1 =8%+@ C {} 98<. {} <> {} T 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 C >'.) C %8$% &'() / 5678 *+, 2+3 >'.) {} 0 98<. &'() #'6+) 2+3 1 #$% 4# %98- C @ >'.) 0 &'() C {} " &'() 2+3 2+3 2 1 C 1 &'() A C 0 $-. ) &'() C #$% &'()

  30. Return value 1 C 0 #$% &'() 0 C 0 " &'() $-. 1 F C &'() / ;<=$8 %8$% <> 8=$8 2 {} {} 1 *+, 2+3 8:(%) 2+3 4# 1 C 98<. &'() {} 2+3 =8%+! C >'.) &'() C >'.) C &'() &'() 1 ! 98<. 1 =8%+@ C {} 98<. {} <> {} T 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 C >'.) C %8$% &'() / 5678 *+, 2+3 >'.) {} 0 98<. &'() #'6+) 2+3 1 #$% 4# %98- C @ >'.) 0 &'() C {} " &'() 2+3 2+3 2 1 C 1 &'() A C 0 $-. ) &'() C #$% &'()

  31. Expression structure 0 #$% &'() 0 0 " &'() $-. 1 &'() / ;<=$8 %8$% 8=$8 1 *+, 2+3 8:(%) 2+3 4# 1 98<. &'() 2+3 =8%+! >'.) &'() >'.) &'() &'() 1 ! 98<. =8%+@ 98<. 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 >'.) %8$% &'() / 5678 *+, 2+3 >'.) 0 98<. &'() #'6+) 2+3 #$% 4# %98- @ >'.) 0 &'() " &'() 2+3 2+3 1 1 &'() A 0 $-. ) &'() #$% &'()

  32. Expression structure fst 0 #$% &'() 0 x 0 " &'() $-. snd 1 = &'() / ;<=$8 %8$% 8=$8 empty 1 *+, 2+3 8:(%) 2+3 4# 1 if 98<. let R &'() 2+3 =8%+! >'.) &'() >'.) &'() let S snd &'() 1 ! R 98<. =8%+@ fst for x U 98<. = 1 &'() $-. 0 #'6+" 2+3 ? 2+3 0 >'.) %8$% &'() / 5678 *+, 2+3 >'.) s for y 0 98<. &'() #'6+) 2+3 #$% 4# if %98- @ >'.) 0 &'() {} x " &'() 2+3 2+3 1 + snd 1 &'() A 0 $-. y ) &'() fst #$% &'()

  33. Building provenance graphs • is complicated • Here we'll use high-level "graph rewrite rule" formalism • Mostly because it is nicer to look at than formal version

  34. c c c v 1 v 1 1 1 f f f(v 1 ,...,v n ) ... ... n n v n v n v v head head let x let x copy e e x x body copy body

  35. v 1 v 1 A 1 A 1 A 1 <> ... <> <> ... A n v n A n A n v n v 1 v A 1 A 1 ... ... ... π Ai v i <> <> π Ai v i copy ... ... A n A n v n v

  36. True test True test e 1 if then if copy else e 1 then e 2 False test False test e 1 if then if copy else e 2 else e 2

  37. v elem v elem empty? {} empty? False {} ... ... elem elem v v empty? empty? True {} {}

  38. ∅ ∅ ∅ elem {} {} {} v v v elem v elem elem {} ... elem ... {} ... v elem ∪ v {} ∪ v elem v elem {} ... elem ... {} ... v elem elem v

  39. OK, take a deep breath!

Recommend


More recommend