synthesizing highly expressive sql queries from input
play

Synthesizing Highly Expressive SQL Queries From Input-Output - PowerPoint PPT Presentation

Synthesizing Highly Expressive SQL Queries From Input-Output Examples http://scythe.cs.washington.edu Chenglong Wang, Alvin Cheung, Ras Bodk University of Washington Tasks SQL Query Select id Select the id for From table user Tom


  1. Synthesizing Highly Expressive SQL Queries From Input-Output Examples http://scythe.cs.washington.edu Chenglong Wang, Alvin Cheung, Ras Bodík University of Washington

  2. Tasks SQL Query Select id Select the id for From table user “Tom” Where name = “Tom” Select x.id, x.customer, x.total From PURCHASES x Join (Select p.customer, Select rows with Max(total) maximum value for From PURCHASES p each user. Group By p.customer) y On y.customer = x.customer And y.max_total = x.total Select a.ord, a.val, Avg(b.val) From t As a Join t As b Calculate moving Where b.ord <= a.ord average over id. Group By a.ord,a.val Order By a.ord Problem: Advanced SQL operators make SQL powerful but hard to master.

  3. Synthesize queries from …? Output Example Input Example AuthorId AuthorName BookId AuthorId AuthorName Title 1 Alice 1 1 Alice aaa1 2 Bob 2 1 Alice aaa2 3 Carol 4 2 Bob ddd1 5 2 Bob ddd2 BookId AuthorId Title 19 3 Carol fff1 1 1 aaa1 20 3 Carol fff2 2 1 aaa2 3 1 aaa3 4 2 ddd1 5 2 ddd2 19 3 fff1 20 3 fff2 21 3 fff3 22 3 fff4 Aggregation Functions Constants (Optional) { 2 } Count, Max, { } Min, Sum, Avg … Key: The synthesizer takes inputs that users can provide online.

  4. Talk Outline • Motivation & Problem Definition • Synthesis Algorithm Select b.BookId, a.AuthorId, a.AuthorName, b.Title From Author a Join Book b On a.AuthorId = b.AuthorId Where (Select count(*) From book b2 Where b2.bookId <= b.BookId constants And b2.AuthorId = b.AuthorId ) <= 2; aggregation functions • Evaluation on Stack Overflow Posts

  5. Running Example Task: Collect the max vals below 50 for all oid groups in T2 and join them with T1 . T2 T1 oid val id date uid 1 30 1 12/25 1 1 10 2 11/21 3 Select * Select * Select * 1 10 4 12/24 2 From (Select oid, Max(val) From (Select oid, Max(val) From ( Select oid, Max(val) 2 50 From T2 From T2 From T2 2 10 Where val < 50 Where val < 50 Where val < 50 Group By oid) T3 Group By oid) T3 Group By oid ) T3 Out Join T1 Join T1 Join T1 oid date uid oid MaxVal On T3.oid = T1.uid On T3.oid = T1.uid On T3.oid = T1.uid 1 12/25 1 1 30 4 12/24 2 2 10 Constants = { 50 } Max, Min AggrFunc = { }

  6. Basic Algorithm: Enumerative Search Input: 2 Output: 6 Operators: add, mul add(2, add(2, 2)) add(2, 2) 6 add(2, 4) Synthesize Distributed protocols, mul(2, add(2, 2)) 4 2 Super-optimization add(2, mul(2, 2)) 8 add(4, 4) mul(2, 2) mul(2, mul(2, 2)) add(2, add(2, 2)) = 6 add(2, mul(2, 2)) = 6 Key: Compressing the search space by memoizing values.

  7. Input: T1, T2 Output: T out Operators: Select, Join, Aggr Select * id val From T3 Join T2 1 30 Select * id date uid On True Select * From T1 2 30 Select * 2 11/21 3 From T32 Where id > 1 From (Select * From T1) 4 30 T1 Join T52 4 12/24 2 Select id, Join (Select id, Max(val) 1 10 On id = oid Max(uid) …… id date uid From T2 2 10 From T3 Where val < 50 1 12/25 1 Group By id 4 10 Group By oid) T3 Select * 2 11/21 3 Select * T3 …… 1 50 On T3.oid = T1.uid From T31 From T1 4 12/24 2 id date uid Join T51 2 50 Where True On id = oid 1 12/25 1 Select id, val 4 50 2 11/21 3 From T3 1 10 Join T4 4 12/24 2 Select * Select * 2 10 On uid = oid From T31 From T1 4 10 Join T5 Where id ≥ uid …… On id = oid T4 Select oid, oid val T2 Max(val) T6 Select * T5 Select * From T4 1 30 From T2 id date uid oid MaxVal oid val From T3 Group By oid oid MaxVal 1 10 Where val < 50 Join T5 2 12/25 1 1 30 1 30 1 30 1 10 On uid = oid 4 12/24 2 2 10 1 10 …… Select oid, 2 10 2 10 …… 1 10 Max(val) From T4 2 50 Select * Group By oid oid val 2 10 From T2 Select * Having maxVal < 50 2 50 Where val = 50 From T3 Join T5 On id = oid Select oid, oid MaxVal Max(val) 2 50 Out From T42 Select * Group By oid id date uid oid MaxVal oid date uid oid MaxVal From T3 2 11/21 3 2 50 1 12/25 1 1 30 Join T52 On id = oid 4 12/24 2 2 10

  8. Input: T1, T2 Output: T out Operators: Select, Join, Aggr id date uid oid MaxVal 1 12/25 1 1 30 2 11/21 1 1 30 Select * 4 12/24 2 1 30 id val From T3 1 12/25 1 1 10 Join T2 1 30 Select * id date uid On True 2 11/21 1 1 10 Select * From T1 2 30 Select * 2 11/21 3 From T32 4 12/24 2 1 10 Where id > 1 From (Select * From T1) 4 30 T1 Join T52 4 12/24 2 Select id, 1 12/25 1 2 50 Join (Select id, Max(val) id date uid oid MaxVal 1 10 On id = oid Max(uid) …… id date uid From T2 2 11/21 1 2 50 1 12/25 1 1 30 2 10 From T3 Where val < 50 1 12/25 1 4 12/24 2 2 50 2 11/21 1 1 30 Group By id 4 10 Group By oid) T3 Select * 2 11/21 3 Select * 1 12/25 1 2 10 T3 4 12/24 2 1 30 …… 1 50 On T3.oid = T1.uid From T31 From T1 4 12/24 2 2 11/21 1 2 10 id date uid 1 12/25 1 1 10 Join T51 2 50 Where True 4 12/24 2 2 10 On id = oid 1 12/25 1 2 11/21 1 1 10 Select id, val 4 50 2 11/21 3 From T3 4 12/24 2 1 10 1 10 Join T4 4 12/24 2 1 12/25 1 2 50 Select * Select * 2 10 On uid = oid From T31 From T1 2 11/21 1 2 50 4 10 Join T5 Where id ≥ uid …… 4 12/24 2 2 50 On id = oid 1 12/25 1 2 10 T4 Select oid, 2 11/21 1 2 10 oid val T2 Max(val) T6 Select * 4 12/24 2 2 10 T5 Select * From T4 1 30 From T2 id date uid oid MaxVal oid val From T3 Group By oid oid MaxVal 1 10 Where val < 50 Join T5 2 12/25 1 1 30 1 30 1 30 1 10 On uid = oid 4 12/24 2 2 10 1 10 …… Select oid, 2 10 Challenge 2: Big tables 2 10 …… 1 10 Max(val) 1,889 --> 42,600 cells From T4 2 50 Select * Group By oid oid val 2 10 From T2 Select * Having maxVal < 50 2 50 Where val = 50 From T3 Join T5 On id = oid Select oid, oid MaxVal Max(val) 2 50 Out Challenge 1: Large number From T42 Select * Group By oid id date uid oid MaxVal oid date uid oid MaxVal of queries per-stage. From T3 2 11/21 3 2 50 1 12/25 1 1 30 Join T52 ~500,000 in the last stage. On id = oid 4 12/24 2 2 10 Problem: Value-based compression is inefficient & ineffective.

  9. Insight: Decompose Search Process Select * Select * From (Select * Select * From (Select * From T1 Select * From (Select * From T1 Where True) From (Select * From T1 Where True) Join (Select id, From T1 Where True) Join (Select id, Max(val) Where True) T2 Join (Select id, Max(val) From T2 Join (Select id, oid val Max(val) From T2 Where val < 50 Max(val) From T2 T1 1 30 Where val < 50 Group By oid From T2 Where val < 50 id date uid Group By oid 1 10 Having True) T3 Where val < 50 Group By oid Having True) T3 1 12/25 1 On T3.oid = T1.uid Group By oid 1 10 Having True) T3 On T3.oid = T1.uid Having True) T3 2 11/21 3 On T3.oid = T1.uid 2 50 On T3.oid = T1.uid 4 12/24 2 2 10 Search SQL queries Tout oid date uid oid MaxVal 2 12/25 1 1 30 4 12/24 2 2 10

  10. Insight: Decompose Search Process With Abstract Queries Queries whose predicates are holes. Select * Select * Select * From (Select * Select * From (Select * Select * From (Select * Select * From T1 From (Select * Select * From T1 From (Select * Select * From T1 From (Select * Where True) From T1 From (Select * Where □ ) From T1 From (Select * Where True) From T1 Join (Select id, Where □ ) From T1 Join (Select id, Where True) From T1 Join (Select id, Where □ ) Max(val) Join (Select id, Where True) Max(val) Join (Select id, Where □ ) T2 Max(val) Join (Select id, From T2 Max(val) Join (Select id, From T2 Max(val) Join (Select id, From T2 Max(val) oid val Where val < 50 From T2 Max(val) Where □ From T2 Max(val) Where val < 50 From T2 T1 Group By oid Where □ From T2 1 30 Group By oid Where val < 50 From T2 Group By oid Where □ Having True) T3 Group By oid Where val < 50 id date uid Having □ ) T3 Group By oid 1 10 Where □ Having True) T3 Group By oid On T3.oid = T1.uid Having □ ) T3 Group By oid On □ Having True) T3 1 12/25 1 Group By oid On T3.oid = T1.uid 1 10 Having □ ) T3 On □ Having True) T3 On T3.oid = T1.uid Having □ ) T3 2 11/21 3 On □ 2 50 On T3.oid = T1.uid On □ 4 12/24 2 Instantiate 2 10 Search abstract Abstract SQL queries Queries Tout oid date uid oid MaxVal 1. Prune query families 2 12/25 1 1 30 4 12/24 2 2 10 If a skeleton cannot be instantiated 2. Speed up to return output, prune all queries predicate synthesis with the skeleton

Recommend


More recommend