Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak Michael D. Ernst University of Washington
Data structures are everywhere Lists, maps, and sets solve many problems What if I need a custom data structure? 2
Cozy synthesizes collections Rep. Impl. Inductive Specification Outline Rep. Impl. Verifier Synthesizer Rep. Impl. • Correct by construction • Specifications orders-of-magnitude shorter than implementations, synthesized in < 90 seconds • Equivalent performance to human-written code 3
Myria Analytics Storage Request 1 Goal: efficient retrieval of entries for a particular request ID in a Request 2 particular timespan time 4
Myria Analytics Storage Insert an entry into the data structure class AnalyticsLog { void log (Entry e) Iterator<Entry> getEntries ( int queryId, int subqueryId, int fragmentId, long start, long end) Retrieve entries } 5
Myria Analytics Storage Specification: class AnalyticsLog { Entry has: queryId : Int, subqueryId : Int, void log (Entry e) fragmentId : Int, start, end : Long, Iterator<Entry> getEntries ( … int queryId, int subqueryId, getEntries : all e where int fragmentId, e.queryId = queryId and e.subqueryId = subqueryId and long start, e.fragmentId = fragmentId and long end) e.end >= start and e.start <= end } 6
Cozy synthesizes collections Specification: class Structure { Entry has: field1 : Type1 , field2 : Type2 , void add (Entry e) …, void remove (Entry e) start, end, Cozy void update (Entry e, …) … Iterator<Entry> retrieveA (…) retrieveA : all e where Iterator<Entry> retrieveB (…) condition e.subqueryId = subqueryId and retrieveB : all e where } condition art and e.start <= end 7
Trivial Solution retrieve : all e where P(e, input) List<Entry> data; There has to be a better way! Iterator<Entry> retrieve (input) { for e in data: if P(e, input) : yield e } 8
Entry has: void add (Entry e) field1 , field2 , … void remove (Entry e) Intractable retrieveA : all e where void update (Entry e, …) condition retrieveB : all e where Iterator retrieveA (…) condition Iterator retrieveB (…) synthesis Specification Implementation algorithm Tractable ? Tractable Specification → Outline Outline → Implementation In the quest for a good solution, the specific enough to describe search space of “all asymptotic performance Outline possible programs” general enough to encode is simply too large a data structure succinctly 9
Outlines Plans for retrieving entries • All ( ) • HashLookup ( outline, field = var ) • BinarySearch ( outline, field > var ) • Concat ( outline 1 , outline 2 ) • Filter ( outline, predicate ) 10
Outlines → Implementations Rep. Rep. Impl. Impl. Inductive Inductive Specification Specification Outline Outline Rep. Rep. Impl. Impl. Verifier Verifier Synthesizer Synthesizer Rep. Rep. Impl. Impl. 11
Outlines → Implementations class Structure { HashLookup ( T data; All (), Iterator<Entry> e.queryId = q ) retrieve (q) { … } } 12
Outlines → Implementations class Structure { HashLookup ( T data; data , Iterator<Entry> e.queryId = q ) retrieve (q) { … } } 13
Outlines → Implementations class Structure { HashLookup ( HMap< K,V > data; data , Iterator<Entry> e.queryId = q ) retrieve (q) { … } } 14
Outlines → Implementations class Structure { HashLookup ( HMap<int ,V > data; data , Iterator<Entry> e.queryId = q ) retrieve (q) { … } } V = ArrayList<Entry> V = LinkedList<Entry> 15
Outlines → Implementations class Structure { HashLookup ( HMap<int , V> data; data , Iterator<Entry> e.queryId = q ) retrieve (q) { … } } 16
Outlines → Implementations add, remove, class Structure { update HashLookup ( HMap<int , V> data; data , Iterator<Entry> e.queryId = q ) retrieve (q) { v = data.get(q); return v.iterator(); } 17
Specification → Outline Rep. Rep. Impl. Impl. Inductive Inductive Specification Specification Outline Outline Rep. Rep. Impl. Impl. Verifier Verifier Synthesizer Synthesizer Rep. Rep. Impl. Impl. 18
Specification → Outline CEGIS Remembers all Must ensure the examples; only outline is correct for reasons about all possible inputs examples collected and all possible data candidate thus far. structure states. Inductive Verifier Synthesizer retrieve : all e where e.queryId = q and … counterexample - or - ∀ I ∀ S , out = certification of correctness { e | e ∈ S ∧ P ( I , e ) } 19
Cost Model O ( n ) O (1) O (1) O (1) Filter ( HashLookup ( > All (), All (), e.queryId = q ) e.queryId = q ) Cozy prefers outlines with lower cost 20
Inductive Synthesis Enumerative search Concat( HashLookup(…) ,…) vs correct on all current examples Concat( Filter(…) ,…) size 1 size 2 size 3 HashLookup( HashLookup(All, x=y) HashLookup(…), a=b) Filter( Filter( All Filter(All, x=y) HashLookup(…), p=q) HashLookup(…), p=q) Filter( BinarySearch(All, x>y) BinarySearch(…), x<y) … … 21
Outline Verification Specification: HashLookup ( All (), { e | e ∈ S ∧ P ( I , e ) } { e | e ∈ S ∧ Q ( I , e ) } Entry has: queryId : Int, e.queryId = q) subqueryId : Int, … representative predicate Q retrieve : all e where P e.queryId = q and … e.queryId = q 22
Outline Verification ? = { e | e ∈ S ∧ P ( I , e ) } { e | e ∈ S ∧ Q ( I , e ) } yes if and only if for all I , e : P ( I , e ) = Q ( I , e ) equivalence can be checked with an SMT solver 23
Evaluation • Improve correctness • Save programmer effort • Match performance 24
Case studies • Myria: analytics • ZTopo: tile cache Analytics data Tracks map tiles in a indexed by least-recently-used timespan and by cache request ID 11 bugs • Bullet: volume tree • Sat4j: variable metadata Stores axis-aligned Tracks information bounding boxes for fast about each variable collision detection in the formula 15 bugs 7 bugs 25
Specifications vs. Implementations Original Spec 2582 Lines of code 1383 269 292 22 25 11 23 Myria ZTopo Sat4j Bullet 26
Synthesis Time Outline Synthesis Auto-Tuning 90 Time (s) 60 30 0 Myria ZTopo Sat4j Bullet 27
Performance Original Synthesized Data structures are Binary search tree vs. Small overhead; Original implementation has nearly identical space partitioning tree performance dominated worst-case linear time by other factors Myria ZTopo Bullet Sat4j 28
Related Work • J. Earley: “ High level iterators and a method for automatically designing data structure representation ” (1974) • Hard-coded rewrite rules • S. Agrawal et al: “ Automated selection of materialized views and indexes in sql databases ” (2000) • Enumerate possible views & indexes based on query syntax and use the planner to decide which ones to keep • P. Hawkins et al: “ Data representation synthesis ” (2011) • Enumerate representations and use a planner to implement retrieval operations; conjunctions of equalities only 29
http://cozy.uwplse.org Rep. Impl. Inductive Specification Outline Rep. Impl. Verifier Synthesizer Rep. Impl. • Implementation outlines make the Special thanks to: problem tractable • Synthesis completes < 90 seconds • Cozy generates correct code, and Michael Emina matches handwritten implementation Ernst Torlak performance also Haoming Liu & Daniel Perelman 30
Recommend
More recommend