Scaling Up Symbolic Reasoning for Relational Queries or, speeding up debugging & verification of database queries Chenglong Wang , Alvin Cheung, Ras Bodik University of Washington � 1
Relational Queries • The language between human and relational databases (tables) Select (filter & projection) Select val From T Where color = red; � 2
Relational Queries • The language between human and relational databases (tables) Select Join (filter & projection) Select val From T T 1 Join T 2 On T 1 .color=T 2 .color; Where color = red; � 3
Relational Queries • The language between human and relational databases (tables) Select Group Join (filter & projection) & Aggregation Select val Select color, Sum(val) From T From T T 1 Join T 2 On T 1 .color=T 2 .color; Where color = red; Group by color; � 4
“The Count Bug” Rewrite rules for nested queries q 1 q 2 S’ = (Select S.c n , Agg (S.c m ) FROM S Select R.c k Group By S.cn): From R Where R.c h = (Select Agg (S.c m ) Select R.c k From S From R Where R.c h = (Select Agg (S’.c m ) Where S.c n = S.c p ); From S’ 1982 “On Optimizing an Where T’.c n = R.c p ); SQL-like Nested Query” (Kim Won) � 5
“The Count Bug” Rewrite rules for nested queries q 1 q 2 S’ = (Select S.c n , Agg (S.c m ) FROM S Select R.c k Group By S.cn): From R Where R.c h = (Select Agg (S.c m ) Select R.c k From S From R Where R.c h = (Select Agg (S’.c m ) Where S.c n = S.c p ); From S’ 1982 “On Optimizing an Where T’.c n = R.c p ); SQL-like Nested Query” (Kim Won) Found a bug in the 1982 paper! q 1 q 2 1987 “Optimization of Nested SQL Queries Revisited” (Ganski & Wong) � 6
Reasoning Tasks Verification q ≡ q’ q, q’ “Are two queries equivalent on ALL inputs” Property Checking (for optimization) q q(T) = empty “Can the query return empty output on SOME input?” Mutation testing / Grading q, q’ q(T) ≠ q’(T) “Find a distinguishing input between queries.” � 7
Relational Queries tens to hundreds of HUGE tables …… complex analytical functions generated by computer highly optimized “analyze transition plays important roles in industry history” can’t a ff ord 5 years to find a bug! � 8
Automatic Reasoning Check whether q 1 is equivalent to q 2 (on ALL inputs) assert q 1 ≠ q 2 q 1 , q 2 q 1 , q 2 assert q 1 ≠ q 2 (queries) (queries) (property) (property) found T, found T, unsatisfiable unsatisfiable q 1 (T) ≠ q 2 (T) q 1 (T) ≠ q 2 (T) (proved q 1 =q 2 ) (proved q 1 =q 2 ) � 9
Automatic Reasoning (unbounded) equivalence is undecidable Check whether q 1 is equivalent to q 2 (on ALL inputs) assert q 1 ≠ q 2 q 1 , q 2 q 1 , q 2 assert q 1 ≠ q 2 (queries) (queries) (property) (property) found T, found T, unsatisfiable unsatisfiable q 1 (T) ≠ q 2 (T) q 1 (T) ≠ q 2 (T) (proved q 1 =q 2 ) (proved q 1 =q 2 ) � 10
Symbolic Reasoning Check whether q 1 is equivalent to q 2 (on ALL inputs within a search space) tables with at most assert q 1 ≠ q 2 q 1 , q 2 q 1 , q 2 assert q 1 ≠ q 2 k rows (queries) (queries) (property) (property) (search space) found T, found T, unsatisfiable unsatisfiable q 1 (T) ≠ q 2 (T) q 1 (T) ≠ q 2 (T) (proved q 1 =q 2 ) (proved q 1 =q 2 ) � 11
Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL tables with at most k tuples” (1) Target queries q 2 : Select id, val q 1 : Select id, val From T From T Where id ≠ 1 Where id > 1 (2) Search space tables with at most k rows Solver (3) Property q 1 (T) ≠ q 2 ( T) � 12
Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL tables with at most k tuples” (1) Target queries q 2 : Select id, val q 1 : Select id, val From T From T id val Where id ≠ 1 Where id > 1 0 1 1 2 (2) Search space tables with at q 1 q 2 most k rows Solver id val id val (3) Property (empty) 0 1 q 1 (T) ≠ q 2 ( T) � 13
Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” Grouping (1) Target queries & aggregation q 1 , q 2 “ Select f(val) From T (2) Search space Group By id ” tables with at most k rows Solver (3) Property T out1 ≠ T out2 � 14
Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” id val Grouping 1 y 1 (1) Target queries & aggregation 1 y 2 … q 1 , q 2 “ Select f(val) 1 y k From T (2) Search space Group By id ” … id val id val Exponential ways to 1 y 1 tables with at x 1 y 1 1 y 2 partition the table most k rows x 2 y 2 … … 2 y k x k y k Solver id val (3) Property 1 y 1 T out1 ≠ T out2 2 y 2 … k y k � 15
Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” id val Grouping 1 y 1 (1) Target queries & aggregation 1 y 2 … q 1 , q 2 “ Select f(val) 1 y k From T (2) Search space Group By id ” … id val id val Exponential ways to 1 y 1 tables with at x 1 y 1 1 y 2 partition the table most k rows x 2 y 2 … … 2 y k x k y k Solver id val (3) Property 1 y 1 Computationally T out1 ≠ T out2 2 y 2 expensive … T out1 ⊂ T out2 & T out2 ⊂ T out1 k y k � 16
Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” id val Grouping 1 y 1 (1) Target queries & aggregation 1 y 2 … q 1 , q 2 “ Select f(val) 1 y k From T (2) Search space Group By id ” … Unsatisfying id val Scalability id val Exponential ways to 1 y 1 tables with at x 1 y 1 1 y 2 partition the table most k rows x 2 y 2 … … 2 y k x k y k Solver id val (3) Property 1 y 1 Computationally T out1 ≠ T out2 2 y 2 expensive … T out1 ⊂ T out2 & T out2 ⊂ T out1 k y k � 17
Symbolic Reasoning “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” Grouping (1) Target queries & aggregation “Small Model” q 1 , q 2 “ Select f(val) A smaller search space From T to achieve same (2) Search space Group By id ” reasoning guarantee Exponential ways to tables with at partition the table most k rows Solver (3) Property Computationally T out1 ≠ T out2 expensive T out1 ⊂ T out2 & T out2 ⊂ T out1 � 18
Space Refinement “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” T out1 ≠ q 1, q 2 T out2 (queries) (property) tables with at most k rows tables with at most k rows S’ (refined search space) S (search space) “Small Model” (1) If exists T ∈ S satisfying the property, we can find one in the S’ too. (2) If none of tables in S’ satisfying the property, then no T exists in S too. � 19
Space Refinement “Check whether q 1 , q 2 are equivalent on ALL inputs within size k” T out1 ≠ q 1, q 2 T out2 (queries) (property) provenance analysis tables with at most k rows tables with at most k rows S’ (refined search space) S (search space) “Small Model” (1) If exists T ∈ S satisfying the property, we can find one in the S’ too. (2) If none of tables in S’ satisfying the property, then no T exists in S too. � 20
Insight from Property • Many properties requires only one tuple in the output to invalidate. T from search space S “Check whether q 1 , q 2 are equivalent” q 1 (T) ≠ q 2 ( T ) q 1 q 2 Exists a row r with di ff erent r multiplicities in T out1 and T out2 T out1 , r ∉ T out2 → q 1 (T) ≠ q 2 ( T ) r ∈ � 21
Insight from the Property • Many important properties requires only one tuple in the output to be invalidated. T from search space S T’ q 1 q 2 q 1 q 2 r r r ∈ T out1 , r ∉ T out2 r ∈ T out1 , r ∉ T out2 T’ can also distinguish q1 from q2! � 22
Provenance Analysis Assume r=(a, b) is the output tuple showing the di ff erence between two queries r ∈ T out1 , r ∉ T out2 q 1 id val q 1 : Select id, max(val) … From T id val Group By id ? q 2 : Select id, min(val) id val q 2 From T a b Group By id … � 23
Provenance Analysis Assume r=(a, b) is the output tuple showing the di ff erence between two queries r ∈ T out1 , r ∉ T out2 q 1 id val q 1 : Select id, max(val) … From T id val Group By id ??? q 2 : Select id, min(val) … id val q 2 From T a b Group By id … � 24
Provenance Analysis Assume r=(a, b) is the output tuple showing the di ff erence between two queries r ∈ T out1 , r ∉ T out2 q 1 id val q 1 id val q 1 : Select id, max(val) … a c From T id val Group By id ??? id val a ? q 2 : Select id, min(val) … a ? id val q 2 From T id val q 2 a b Group By id a b … T’={r ∈ T| r.id = a} � 25
Provenance Analysis Assume r=(a, b) is the output tuple showing the di ff erence between two queries r ∈ T out1 , r ∉ T out2 q 1 id val q 1 id val q 1 : Select id, max(val) … a c From T id val Group By id ??? id val a ? q 2 : Select id, min(val) … a ? id val q 2 From T id val q 2 a b Group By id a b … T’={r ∈ T| r.id = a} refine tables with at S most k rows tables with at most k rows S’ = {T ∈ S | T contain only one group} (the group with id “ a ”) � 26
Recommend
More recommend