drex a declarative language for efficiently evaluating
play

DReX: A Declarative Language for Efficiently Evaluating Regular - PowerPoint PPT Presentation

DReX: A Declarative Language for Efficiently Evaluating Regular String Transformations Rajeev Alur Loris DAntoni Mukund Raghothaman POPL 2015 1 DReX is a DSL for String Transformations align-bibtex ... ... @book{Book1 , @book{Book1 ,


  1. DReX: A Declarative Language for Efficiently Evaluating Regular String Transformations Rajeev Alur Loris D’Antoni Mukund Raghothaman POPL 2015 1

  2. DReX is a DSL for String Transformations align-bibtex ... ... @book{Book1 , @book{Book1 , title = {Title0}, title = {Title1}, author = {Author1}, author = {Author1}, year = {Year1}, year = {Year1}, } } @book{Book2 , ... title = {Title1}, author = {Author2}, year = {Year2}, } ... 2

  3. Describing align-bibtex Using DReX The simpler issue of make-entry Given two entries, Entry 1 and Entry 2 , make-entry outputs the title of Entry 2 and the remaining body of Entry 1 Entry 1 Entry 2 Title only All but title 3

  4. Describing align-bibtex Using DReX align-bibtex = chain ( make-entry , R Entry ) Entry 1 Entry 2 Entry 3 Entry k − 1 Entry k make-entry make-entry make-entry make-entry ( Entry 1 Entry 2 ) ( Entry 2 Entry 3 ) ( Entry 3 Entry 4 ) ( Entry k − 1 Entry k ) Function combinators — such as chain — combine smaller functions into bigger ones 4

  5. Why DReX? ◮ DReX is declarative Languages, Σ ∗ → bool Regular expressions ≡ Tranformations, Σ ∗ → Γ ∗ DReX ≡ ◮ DReX is fast: Streaming evaluation algorithm for well-typed expressions ◮ Based on robust theoretical foundations ◮ Expressively equivalent to regular string transformations ◮ Multiple characterizations: two-way finite state transducers, MSO-definable graph transformations, streaming string transducers ◮ Closed under various operations: function composition, regular look-ahead etc. ◮ DReX supports algorithmic analysis ◮ Is the transformation well-defined for all inputs? ◮ Does the output always have some “nice” property? ∀ σ , is it the case that f ( σ ) ∈ L ? ◮ Are two transformations equivalent? 5

  6. DReX is publicly available! Go to drexonline.com 6

  7. Function Combinators 7

  8. Base functions: σ �→ γ Map input string σ to γ , and undefined everywhere else “ .c ” �→ “ .cpp ” σ ∈ Σ ∗ and γ ∈ Γ ∗ are constant strings Analogue of basic regular expressions: { σ } , for σ ∈ Σ ∗ 8

  9. Conditionals: try f else g If f ( σ ) is defined, then output f ( σ ) , and otherwise output g ( σ ) try [ 0-9 ] ∗ �→ “ Number ” else [ a-z ] ∗ �→ “ Name ” Analogue of unambiguous regex union 9

  10. Split sum: split ( f , g ) Split σ into σ = σ 1 σ 2 with both f ( σ 1 ) and g ( σ 2 ) defined. If the split is unambiguous then split ( f , g )( σ ) = f ( σ 1 ) g ( σ 2 ) σ 1 σ 2 g f f ( σ 1 ) g ( σ 2 ) ◮ Analogue of regex concatenation ◮ If title maps a BibTeX entry to its title, and body maps a BibTeX entry to the rest of its body, then make-entry = split ( body , title ) 10

  11. Iterated sum: iterate ( f ) Split σ = σ 1 σ 2 . . . σ k , with all f ( σ i ) defined. If the split is unambiguous, then output f ( σ 1 ) f ( σ 2 ) . . . f ( σ k ) σ 1 σ 2 σ k f f f f ( σ 1 ) f ( σ 2 ) f ( σ k ) ◮ Kleene-* ◮ If echo echoes a single character, then id = iterate ( echo ) is the identity function 11

  12. Left-iterated sum: left-iterate ( f ) Split σ = σ 1 σ 2 . . . σ k , with all f ( σ i ) defined. If the split is unambiguous, then output f ( σ k ) f ( σ k − 1 ) . . . f ( σ 1 ) σ k − 1 σ k σ 1 f ( σ k ) f ( σ k − 1 ) f ( σ 1 ) Think of string reversal: left-iterate ( echo ) 12

  13. “Repeated” sum: combine ( f , g ) combine ( f , g )( σ ) = f ( σ ) g ( σ ) σ g f f ( σ ) g ( σ ) ◮ No regex equivalent ◮ σ �→ σσ : combine ( id , id ) 13

  14. Chained sum: chain ( f , R ) σ 1 ∈ L ( R ) σ 2 ∈ L ( R ) σ 3 ∈ L ( R ) σ k ∈ L ( R ) f ( σ 1 σ 2 ) f ( σ 2 σ 3 ) f ( σ 3 σ 4 ) f ( σ k − 1 σ k ) And similarly for left-chain ( f , R ) 14

  15. Summary of Function Combinators Purpose Regular Transformations Regular Expressions Base ⊥ , σ �→ γ ∅ , { σ } Concatenation split ( f , g ) , left-split ( f , g ) R 1 · R 2 Union try f else g R 1 ∪ R 2 Kleene-* iterate ( f ) , left-iterate ( f ) R ∗ Repetition combine ( f , g ) Chained sum chain ( f , R ) , New! left-chain ( f , R ) 15

  16. Regular String Transformations Or, why our choice of combinators was not arbitrary Languages, Σ ∗ → bool ≡ DFA Tranformations, Σ ∗ → Γ ∗ ≡ ? 16

  17. Historical Context Regular languages Beautiful theory Regular expressions DFA ≡ Analysis questions (mostly) efficiently decidable Lots of practical implementations 17

  18. String Transducers One-way transducers: Mealy machines a / babc Folk knowledge [Aho et al 1969] Two-way transducers strictly more powerful than one-way transducers Gap includes many interesting transformations Examples: string reversal, copy, substring swap, etc. 18

  19. String Transducers Two-way finite state transducers ◮ Known results ◮ Closed under composition [Chytil, Jákl 1977] ◮ Decidable equivalence checking [Gurari 1980] ◮ Equivalent to MSO-definable string transformations [Engelfriet, Hoogeboom 2001] ◮ Streaming string transducers: Equivalent one-way deterministic model with applications to the analysis of list-processing programs [Alur, Černý 2011] ◮ Two-way finite state transducers are our notion of regularity 19

  20. Function Combinators are Expressively Complete Theorem (Completeness, Alur et al 2014) All regular string transformations can be expressed using the following combinators: ◮ Basic functions: ⊥ , σ �→ γ , ◮ split ( f , g ) , left-split ( f , g ) , ◮ try f else g , ◮ iterate ( f ) , left-iterate ( f ) , ◮ combine ( f , g ) , ◮ chained sums: chain ( f , R ) , and left-chain ( f , R ) . 20

  21. Evaluating DReX Expressions 21

  22. The Anatomy of a Streaming Evaluator ( a , 1 ) ( b , 2 ) ( b , 3 ) ( a , 4 ) ( b , 5 ) ( σ n , n ) ( Result , γ ) ( Result , γ ′ ) ( Result , γ ) Evaluator for f ( σ i , i ) 22

  23. The Case of split ( f , g ) g defined f defined 1 j i n ( Result , γ ) ( Result , γ ) T g T f ( σ i , i ) ( σ i , i ) 23

  24. The Case of split ( f , g ) g defined f defined 1 j i n ( Start , i ) ( Result , γ ) ( Start , i ) ( Result , γ ) T g T f ( σ i , i ) ( σ i , i ) 23

  25. The Case of split ( f , g ) g defined f defined 1 f defined j i n ( Start , i ) ( Result , γ ) ( Start , i ) ( Result , γ ) T g T f ( σ i , i ) ( σ i , i ) 23

  26. The Case of split ( f , g ) g defined f defined 1 f defined j i n ( Start , i ) ( Result , j , γ ) ( Start , i ) ( Result , j , γ ) T g T f ( σ i , i ) ( σ i , i ) 23

  27. The Case of split ( f , g ) g defined f defined 1 f defined j i n ( Start , i ) ( Result , j , γ ) ( Start , i ) ( Result , j , γ ) T g T f ( σ i , i ) ( σ i , i ) Thread starting Index at which Result reported at index T f responded by T f 2 9 aaab 3 7 abbab . . . . . . . . . 23

  28. The Case of split ( f , g ) g defined f defined 1 f defined j i n ( Start , i ) ( Result , j , γ ) ( Start , i ) ( Result , j , γ ) T g T f ( σ i , i ) ( Kill , j ) ( σ i , i ) ( Kill , j ) Thread starting Index at which Result reported at index T f responded by T f 2 9 aaab 3 7 abbab . . . . . . . . . 23

  29. The Case of split ( f , g ) ◮ What if two threads of T g report results simultaneously? g defined f defined f defined g defined ◮ Statically disallow! ◮ split ( f , g ) is well-typed iff ◮ both f and g are well-typed, and ◮ their domains are unambiguously concatenable 24

  30. Main Result Theorem 1. All regular string transformations can be expressed as well-typed DReX expressions. 2. DReX expressions can be type-checked in O ( poly ( | f | , | Σ | )) . 3. Given a well-typed DReX expression f , and an input string σ , f ( σ ) can be computed in time O ( | σ | , poly ( | f | )) . 25

  31. Summary of Typing Rules ◮ ⊥ , σ �→ γ are always well-typed ◮ split ( f , g ) and left-split ( f , g ) are well-typed iff ◮ f and g are well-typed, and ◮ Dom ( f ) and Dom ( g ) are unambiguously concatenable ◮ try f else g is well-typed iff ◮ f and g are well-typed, and ◮ Dom ( f ) and Dom ( g ) are disjoint ◮ iterate ( f ) and left-iterate ( f ) are well-typed iff ◮ f is well-typed, and ◮ Dom ( f ) is unambiguously iterable ◮ chain ( f , R ) and left-chain ( f , R ) are well-typed iff ◮ f is well-typed, R is an unambiguous regular expression, ◮ Dom ( f ) is unambiguously iterable, and ◮ Dom ( f ) = � R · R � 26

  32. Experimental Results 27

  33. Experimental Results Streaming evaluation algorithm for well-typed expressions 8 seconds 7 6 delete-comm insert-quotes 5 get-tags 4 reverse 3 swap-bibtex 2 align-bibtex 1 characters 0 0 20000 40000 60000 80000 100000 ◮ align-bibtex has 3500 nodes in syntax tree, typechecks in ≈ half a second ◮ Type system did not get in the way 28

  34. Conclusion ◮ Introduced a DSL for regular string transformations ◮ Described a fast streaming algorithm to evaluate well-typed expressions 29

  35. Conclusion Summary of operators Purpose Regular Transformations Regular Expressions Base ⊥ , σ �→ γ ∅ , { σ } Concatenation split ( f , g ) , left-split ( f , g ) R 1 · R 2 Union try f else g R 1 ∪ R 2 Kleene-* iterate ( f ) , left-iterate ( f ) R ∗ Repetition combine ( f , g ) Chained sum chain ( f , R ) , New! left-chain ( f , R ) 30

Recommend


More recommend