Bringing SQL to the Masses with Program Synthesis Chenglong Wang, Alvin Cheung, Ras Bodik University of Washington 1
End-User SQL Select x.id, x.customer, x.total From PURCHASES x Select rows with Join (Select p.customer, maximum value for Max(total) From PURCHASES p each group. Group By p.customer) y On y.customer = x.customer And y.max_total = x.total Select * Find rows From Users a containing duplicate Where Exists (Select * values. From Users b Where (a.name = b.name Or a.email = b.email) And a.ID <> b.id) Calculate the Select a.ord, a.total, Sum(b.total) running total for a From t As a Join t As b table. Where b.ord <= a.ord Group By a.ord,a.total Order By a.ord 2
Observations A lot of common tasks require using complex SQL constructs . greatest-n-per-group Aggregation running-total Subquery duplicates Exists/In-clauses Many tasks can be concisely expressed with input-output examples . Idea: summarize our observation on StackOverflow Transition: these problems can be concisely expressed with examples, can we build some system that allows users to ask question using examples only? 3
Programming by Example System T2 oid val T1 id date uid 1 30 Join two tables and return 1 12/25 1 1 10 the rows containing the 2 11/21 3 1 10 maximum val below 50 for 4 12/24 2 2 50 2 10 each group. Synthesize Out oid date uid oid MaxVal Select * 1 12/25 1 1 30 From T1 4 12/24 2 2 10 Join (Select id, Max(val) From T2 Where val < 50 Group By oid) T3 On T3.oid = T1.uid Idea: introduce what we want to do: build a PBE system. Transition: let’s first see what is the traditional algorithm to build such system. 4
Synthesis Algorithm: Value-directed Search FlashFill Input : 2, 2, Output : 6, Operators : add, mul add(2, add(2, 2)) mul(2, add(2, 2)) add(2, 4) 6 add(2, 2) 2 4 add(2, mul(2, 2)) mul(2, 2) 2 add(4, 4) 8 mul(2, mul(2,2 )) SuperOptimizer add(2, add(2, 2)) = 6 add(2, mul(2, 2)) = 6 5
id date uid oid MaxVal 1 12/25 1 1 30 Enumerative Search on SQL 2 11/21 1 1 30 4 12/24 2 1 30 1 12/25 1 1 10 2 11/21 1 1 10 Select * 4 12/24 2 1 10 From T1 Input: T1, T2, Output: T out , Operators: Select, Join, Aggr 1 12/25 1 2 50 Join (Select id, Max(val) As MaxVal 2 11/21 1 2 50 From T2 4 12/24 2 2 50 Where val < 50 1 12/25 1 2 10 evaluate enumerate Select * 2 11/21 1 2 10 Group By oid) T3 Select * From T1 From T3 Join T2 id date uid 4 12/24 2 2 10 Where id > 1 On T3.oid = T1.uid On True 2 11/21 3 Select * From T1 4 12/24 2 …… T1 Select id, Max(uid) Where id > uid From T3 id date uid Group By id T3 …… 1 12/25 1 …… id date uid q4 2 11/21 3 Select * …… 1 12/25 1 q1 From T3 4 12/24 2 Select * Select * Join T4 2 11/21 3 From T3 From T1 On uid = oid Join T5 Where True 4 12/24 2 On id = oid T2 …… T6 Select * q4 oid val T4 From T2 oid date uid oid MaxVal Select * 1 30 From T3 oid MaxVal q3 2 12/25 1 1 30 Join T5 1 10 Select oid,Max(val) 1 30 On uid = oid q2 4 12/24 2 2 10 T5 From T4 1 10 1 10 Select * Group By oid …… oid MaxVal = From T2 2 50 1 10 Where val < 50 1 30 2 10 2 10 2 10 Select oid,Max(val) oid date uid oid MaxVal From T4 Select * oid MaxVal Group By oid From T2 1 12/25 1 1 30 Having maxVal < 50 2 50 Tout Where val = 50 4 12/24 2 2 10 oid date uid oid MaxVal …… 1 12/25 1 1 30 4 12/24 2 2 10 Challenge 1: Challenge 2: Large number of queries Large tables 6
Insight: Decomposition Select * From T1 T2 Join (Select id, Max(val) T1 oid val From T2 id date uid 1 30 1 12/25 1 1 10 Where val < 50 2 11/21 3 1 10 Group By oid) T3 4 12/24 2 2 50 On T3.oid = T1.uid 2 10 Search in the space of SQL queries Tout oid date uid oid MaxVal 2 12/25 1 1 30 4 12/24 2 2 10 7
Insight: Decomposition Abstract Queries Select * Select * From (Select * From (Select * From T1 From T1 Where □ ) T2 Where True) Join (Select id, Max(val) oid val Join (Select id, Max(val) T1 From T2 1 30 From T2 id date uid Where □ 1 10 Where val < 50 1 12/25 1 Group By oid 1 10 Group By oid 2 11/21 3 2 50 Having □ ) T3 Having True) T3 4 12/24 2 2 10 On □ On T3.oid = T1.uid Search abstract Instantiate SQL queries abstract queries Tout oid date uid oid MaxVal 1 12/25 1 1 30 4 12/24 2 2 10 Pro: Smaller space of programs. Challenge: which ones to search for? 8
Goal: Select * Search with Abstract Queries From (Select * From T1 Where □ ) Join (Select id, Max(val) From T2 Input: T1, T2, Output: T out , Operators: abstract query operators Where □ Group By oid Having □ ) T3 Select id,Max(uid) On □ … … From T3 Group By id T3 Having □ T1 id date uid id date uid T6 Select * 1 12/25 1 1 12/25 1 ? id date uid oid MaxVal From T1 2 11/21 3 2 11/21 3 Where □ 1 12/25 1 1 30 4 12/24 2 4 12/24 2 2 11/21 3 1 30 Select * 4 12/24 2 1 30 From T3 1 12/25 1 1 10 Join T5 T5 T2 T4 On □ 2 11/21 3 1 10 Select oid, MAX(val) oid MaxVal oid uid oid uid 4 12/24 2 1 10 From T4 1 30 1 30 1 30 1 12/25 1 2 50 Group By oid Select * 1 10 1 10 1 10 2 11/21 3 2 50 ? Having □ From T2 2 50 1 10 1 10 4 12/24 2 2 50 Where □ 2 10 2 50 2 50 1 12/25 1 2 10 … … 2 10 2 10 2 11/21 3 2 10 4 12/24 2 2 10 ⊆ Tout oid date uid oid MaxVal How to evaluate 1 12/25 1 1 30 oid date uid oid MaxVal 4 12/24 2 2 10 1 12/25 1 1 30 abstract queries? 4 12/24 2 2 10 9
Instantiate Abstract Queries Abstract Queries Select * Select * From (Select * From (Select * From T1 From T1 Where □ ) T2 Where True) Join (Select id, Max(val) oid val Join (Select id, Max(val) T1 From T2 1 30 From T2 id date uid Where □ 1 10 Where val < 50 1 12/25 1 Group By oid 1 10 Group By oid 2 11/21 3 2 50 Having □ ) T3 Having True) T3 4 12/24 2 2 10 On □ On T3.oid = T1.uid Search abstract Instantiate SQL queries abstract queries Tout oid date uid oid MaxVal 1 12/25 1 1 30 4 12/24 2 2 10 10
Instantiate Abstract Queries True, False, Select * uid < id, From (Select * … True + val < 50 + False + T1.uid = T3.oid From T1 Where □ ) Join (Select id, Max(val) False + val < 50 + False + T1.uid = T3.oid val < 50, From T2 val == 50, Where □ val > id, True + val == 50 + False + T1.uid = T3.oid Group By oid True, Having □ ) T3 … On □ True + val == 50 + MaxVal < 50 + T1.uid = T3.oid True, …… False, A intuitive solution that does no scale. MaxVal < 50, T1.uid = T3.oid, Transition: can we use … properties of the abstract T1.id = T3.oid,… query to optimize this? 11
Select * From (Select * From T1 Instantiate Abstract Queries Where □ ) Join (Select id, Max(val) From T2 Where □ Group By oid True, Having □ ) T3 False, uid < id, On □ … T3 T1 id date uid T1.uid = T3.oid, id date uid T6 Select * 1 12/25 1 T1.id = T3.oid,… 1 12/25 1 id date uid oid MaxVal From T1 [100000000001] 2 11/21 3 2 11/21 3 Where □ 1 12/25 1 1 30 1 12/25 1 1 30 4 12/24 2 4 12/24 2 2 11/21 3 1 30 Select * 4 12/24 2 1 30 From T3 1 12/25 1 1 10 Join T5 T2 T4 2 11/21 3 1 10 On □ oid uid oid uid 4 12/24 2 1 10 Select oid, MAX(val) 1 30 1 30 1 12/25 1 2 50 Select * From T4 1 10 1 10 2 11/21 3 2 50 From T2 Group By oid 1 10 1 10 4 12/24 2 2 50 Where □ Having □ 2 50 2 50 1 12/25 1 2 10 2 10 2 10 2 11/21 3 2 10 4 12/24 2 2 10 4 12/24 2 2 10 val < 50, True, val == 50, ⊆ False, val > id, MaxVal < 50, True, oid date uid oid MaxVal … … 1 12/25 1 1 30 4 12/24 2 2 10 12
Generating Solutions Select * Select * … True id <> uid From T1 From (Select * Join (Select id, Max(val) From T1 From T2 Where □ ) Where val < 50 Group By oid) T3 Join (Select id, Max(val) On T3.oid = T1.uid From T2 Where □ val < 50 Group By oid …… val <> 50 Having □ ) T3 … On □ Select * From (Select * From T1 Where id <> uid) Join (Select id, Max(val) … True True From T2 T1.uid = T3.oid Where val <> 50 Group By oid) T3 On T1.uid = T3.oid And T1.id <> T2.id T1.uid = T3.oid And T1.id <> T2.id … 13
Ranking & Interaction • Heuristically rank candidate queries. Criteria: complexity, naturalness etc. • • When the result is not desirable: Provide new input-output examples. • 14
Evaluation Benchmark: 193 34: more features 15: run out of time 1: fail to disambiguate Scythe: 143 Enum: 92 34X 15
Demo 16
Conclusion • Goal: Helping end users to program SQL with input-output examples. • Solution: An efficient two-phase synthesis algorithm. • Evaluation: Able to solve 143/193 problems on StackOverflow. 17
Recommend
More recommend