fast synthesis of fast collections
play

Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak - PowerPoint PPT Presentation

Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak Michael D. Ernst University of Washington Data structures are everywhere Lists, maps, and sets solve many problems What if I need a custom data structure? 2 Cozy


  1. Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak Michael D. Ernst University of Washington

  2. Data structures are everywhere Lists, maps, and sets solve many problems What if I need a custom data structure? 2

  3. Cozy synthesizes collections Rep. Impl. Inductive Specification Outline Rep. Impl. Verifier Synthesizer Rep. Impl. • Correct by construction • Specifications orders-of-magnitude shorter than implementations, synthesized in < 90 seconds • Equivalent performance to human-written code 3

  4. Myria Analytics Storage Request 1 Goal: efficient retrieval of entries for a particular request ID in a Request 2 particular timespan time 4

  5. 
 
 
 Myria Analytics Storage Insert an entry into the data structure class AnalyticsLog { 
 void log (Entry e) 
 Iterator<Entry> getEntries ( 
 int queryId, 
 int subqueryId, 
 int fragmentId, 
 long start, 
 long end) 
 Retrieve entries } 5

  6. 
 
 
 Myria Analytics Storage Specification: class AnalyticsLog { 
 Entry has: 
 queryId : Int, 
 subqueryId : Int, 
 void log (Entry e) 
 fragmentId : Int, 
 start, end : Long, 
 Iterator<Entry> getEntries ( 
 … int queryId, 
 int subqueryId, 
 getEntries : all e where 
 int fragmentId, 
 e.queryId = queryId and 
 e.subqueryId = subqueryId and 
 long start, 
 e.fragmentId = fragmentId and 
 long end) 
 e.end >= start and 
 e.start <= end } 6

  7. 
 
 
 
 
 
 
 Cozy synthesizes collections Specification: class Structure { 
 Entry has: 
 field1 : Type1 , 
 field2 : Type2 , 
 void add (Entry e) 
 …, 
 void remove (Entry e) 
 start, end, 
 Cozy void update (Entry e, …) 
 … Iterator<Entry> retrieveA (…) 
 retrieveA : all e where 
 Iterator<Entry> retrieveB (…) 
 condition 
 e.subqueryId = subqueryId and 
 retrieveB : all e where 
 } 
 condition art and 
 e.start <= end 7

  8. 
 Trivial Solution retrieve : all e where 
 P(e, input) 
 List<Entry> data; 
 There has to be a better way! Iterator<Entry> retrieve (input) { 
 for e in data: 
 if P(e, input) : 
 yield e 
 } 8

  9. 
 Entry has: 
 void add (Entry e) 
 field1 , field2 , … 
 void remove (Entry e) 
 Intractable retrieveA : all e where 
 void update (Entry e, …) 
 condition 
 retrieveB : all e where 
 Iterator retrieveA (…) 
 condition Iterator retrieveB (…) synthesis Specification Implementation algorithm Tractable ? Tractable Specification → Outline Outline → Implementation In the quest for a good solution, the specific enough to describe search space of “all asymptotic performance Outline possible programs” general enough to encode is simply too large a data structure succinctly 9

  10. Outlines Plans for retrieving entries • All ( ) • HashLookup ( outline, field = var ) • BinarySearch ( outline, field > var ) • Concat ( outline 1 , outline 2 ) • Filter ( outline, predicate ) 10

  11. Outlines → Implementations Rep. Rep. Impl. Impl. Inductive Inductive Specification Specification Outline Outline Rep. Rep. Impl. Impl. Verifier Verifier Synthesizer Synthesizer Rep. Rep. Impl. Impl. 11

  12. 
 
 
 Outlines → Implementations class Structure { 
 HashLookup ( 
 T data; 
 All (), 
 Iterator<Entry> 
 e.queryId = q ) retrieve (q) { … } 
 } 
 12

  13. 
 
 
 Outlines → Implementations class Structure { 
 HashLookup ( 
 T data; 
 data , 
 Iterator<Entry> 
 e.queryId = q ) retrieve (q) { … } 
 } 
 13

  14. 
 
 
 Outlines → Implementations class Structure { 
 HashLookup ( 
 HMap< K,V > data; 
 data , 
 Iterator<Entry> 
 e.queryId = q ) retrieve (q) { … } 
 } 
 14

  15. 
 
 
 Outlines → Implementations class Structure { 
 HashLookup ( 
 HMap<int ,V > data; 
 data , 
 Iterator<Entry> 
 e.queryId = q ) retrieve (q) { … } 
 } 
 V = ArrayList<Entry> V = LinkedList<Entry> 15

  16. 
 
 
 Outlines → Implementations class Structure { 
 HashLookup ( 
 HMap<int , V> data; 
 data , 
 Iterator<Entry> 
 e.queryId = q ) retrieve (q) { … } 
 } 
 16

  17. 
 
 
 
 Outlines → Implementations add, remove, class Structure { 
 update HashLookup ( 
 HMap<int , V> data; 
 data , 
 Iterator<Entry> 
 e.queryId = q ) retrieve (q) 
 { v = data.get(q); 
 return v.iterator(); } 17

  18. Specification → Outline Rep. Rep. Impl. Impl. Inductive Inductive Specification Specification Outline Outline Rep. Rep. Impl. Impl. Verifier Verifier Synthesizer Synthesizer Rep. Rep. Impl. Impl. 18

  19. Specification → Outline CEGIS Remembers all Must ensure the examples; only outline is correct for reasons about all possible inputs examples collected and all possible data candidate thus far. structure states. Inductive Verifier Synthesizer retrieve : all e where 
 e.queryId = q and … counterexample - or - ∀ I ∀ S , out = certification of correctness { e | e ∈ S ∧ P ( I , e ) } 19

  20. Cost Model O ( n ) O (1) O (1) O (1) Filter ( 
 HashLookup ( 
 > All (), 
 All (), 
 e.queryId = q ) e.queryId = q ) Cozy prefers outlines with lower cost 20

  21. Inductive Synthesis Enumerative search Concat( HashLookup(…) ,…) vs correct on all current examples Concat( Filter(…) ,…) size 1 size 2 size 3 HashLookup( HashLookup(All, x=y) HashLookup(…), a=b) Filter( Filter( All Filter(All, x=y) HashLookup(…), p=q) HashLookup(…), p=q) Filter( BinarySearch(All, x>y) BinarySearch(…), x<y) … … 21

  22. 
 
 
 Outline Verification Specification: HashLookup ( 
 All (), 
 { e | e ∈ S ∧ P ( I , e ) } { e | e ∈ S ∧ Q ( I , e ) } Entry has: 
 queryId : Int, 
 e.queryId = q) subqueryId : Int, 
 … representative predicate Q retrieve : all e where 
 P e.queryId = q and … 
 e.queryId = q 22

  23. Outline Verification ? = { e | e ∈ S ∧ P ( I , e ) } { e | e ∈ S ∧ Q ( I , e ) } yes if and only if for all I , e : P ( I , e ) = Q ( I , e ) equivalence can be checked with an SMT solver 23

  24. Evaluation • Improve correctness • Save programmer effort • Match performance 24

  25. Case studies • Myria: analytics • ZTopo: tile cache Analytics data Tracks map tiles in a indexed by least-recently-used timespan and by cache request ID 11 bugs • Bullet: volume tree • Sat4j: variable metadata Stores axis-aligned Tracks information bounding boxes for fast about each variable collision detection in the formula 15 bugs 7 bugs 25

  26. Specifications vs. Implementations Original Spec 2582 Lines of code 1383 269 292 22 25 11 23 Myria ZTopo Sat4j Bullet 26

  27. Synthesis Time Outline Synthesis Auto-Tuning 90 Time (s) 60 30 0 Myria ZTopo Sat4j Bullet 27

  28. Performance Original Synthesized Data structures are Binary search tree vs. Small overhead; Original implementation has nearly identical space partitioning tree performance dominated worst-case linear time by other factors Myria ZTopo Bullet Sat4j 28

  29. Related Work • J. Earley: “ High level iterators and a method for automatically designing data structure representation ” (1974) • Hard-coded rewrite rules • S. Agrawal et al: “ Automated selection of materialized views and indexes in sql databases ” (2000) • Enumerate possible views & indexes based on query syntax and use the planner to decide which ones to keep • P. Hawkins et al: “ Data representation synthesis ” (2011) • Enumerate representations and use a planner to implement retrieval operations; conjunctions of equalities only 29

  30. http://cozy.uwplse.org Rep. Impl. Inductive Specification Outline Rep. Impl. Verifier Synthesizer Rep. Impl. • Implementation outlines make the Special thanks to: problem tractable • Synthesis completes < 90 seconds • Cozy generates correct code, and Michael Emina matches handwritten implementation Ernst Torlak performance also Haoming Liu & Daniel Perelman 30

Recommend


More recommend