Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic Yolifé Arvelo Blai Bonet María Esther Vidal Departamento de Computación Universidad Simón Bolívar Caracas, Venezuela Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 1/19
Introduction ■ We consider the problem of rewriting a query using materialized views ■ This problem appears frequently in the context of Data Integration, Web Infrastructures and Query Optimization: – [Duschka & Genesereth 1997; Kwok & Weld 1996; Lambrecht, Kambhampati & Gnanaprakasam 1999] – [Levy, Rajaraman & Ordille 1996; Zaharioudakis et al. 2000; Mitra 2001] ■ The problem is in general intractable and existing algorithms do not scale well even in simple cases Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 2/19
Data Integration ■ OBJECTIVE: Given a query Q , retrieve all tuples obtainable from the data sources that satisfy Q ■ Data sources are assumed to be: ◆ Independent (i.e. maintained in a distributed manner) ◆ Described as views (i.e. the Local As View model) ◆ Incomplete Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 3/19
Data Integration: Example QUERY: Find round-trip flights that start in the US Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 4/19
Query Rewriting Problem: Example QUERY: Find round-trip flights that start in the US Q ( x, y ) : − flight ( x, y ) , flight ( y, x ) , uscity ( x ) Data sources modelled as views: national ( x 1 , y 1 ) : − flight ( x 1 , y 1 ) , uscity ( x 1 ) , uscity ( y 1 ) oneway ( x 2 , y 2 ) : − flight ( x 2 , y 2 ) onestop ( x 3 , z 3 ) : − flight ( x 3 , y 3 ) , flight ( y 3 , z 3 ) Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 5/19
Query Rewriting Problem: Solution ■ ASSUMPTION: Views may be incomplete ■ Then, the solution is the collection of rewritings: R 1 ( x, y ) : − oneway ( x, y ) , oneway ( y, x ) , national ( x, w ) R 2 ( x, y ) : − oneway ( x, y ) , oneway ( y, x ) , national ( w, x ) R 3 ( x, y ) : − national ( x, y ) , national ( y, x ) R 4 ( x, y ) : − oneway ( x, y ) , national ( y, x ) R 5 ( x, y ) : − national ( x, y ) , oneway ( y, x ) ■ Observe that there is no rewriting using onestop ( x, y ) Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 6/19
Query Rewriting Problem: Formal ■ INPUT: A query Q and set of views V = { V 1 , V 2 , . . . , V n } ■ TASK: Find a maximal-contained set of rewritings of Q using the views ■ A rewriting is a query-like expression that refers only to the views ■ ASSUMPTION: Q and V i are conjunctive queries without arithmetic predicates Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 7/19
Related Work: Algorithms ■ Bucket algorithm [Levy & Rajaraman & Ullman 1996] ■ Inverse rules algorithm [Duscka & Genesereth 1997] ■ MiniCon algorithm [Pottinger & Halevy 2001] Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 8/19
The MiniCon Algorithm [Pottinger & Halevy 2001] ■ Exploit independences to decompose into smaller subproblems and then combine solutions ■ Solutions to subproblems are called MCDs MCD View Mapping Covered subgoals { X → X 1 , Y → Y 1 } { 0 } M 1 national { X → Y 1 , Y → X 1 } { 1 } M 2 national M 3 { X → X 1 } { 2 } national M 4 { X → Y 1 } { 2 } national { X → X 2 , Y → Y 2 } { 0 } M 5 oneway { X → Y 2 , Y → X 2 } { 1 } M 6 oneway Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 9/19
The MiniCon Algorithm: How does it work? ■ Generate all MCDs (very expensive since performs blind search ) ■ Rewritings generated greedily as combination of MCDs such that: ◆ Cover disjoint subsets of subgoals in the query ◆ Cover all subgoals in the query ■ In the example, combining M 3 , M 5 , M 6 produces the rewriting: R 1 ( x, y ) : − oneway ( x, y ) , oneway ( y, x ) , national ( x, w ) Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 10/19
Our Approach: M CD S AT ■ Given a query Q and a set of views V ■ Build a propositional theory such that its models are in correspondence with the MCDs ■ Generating MCDs is now a problem of model enumeration ■ Model enumeration can be done with modern SAT techniques that implement: ◆ Non-chronological backtracking via clause learning ◆ Caching of common subproblems ◆ Heuristics ■ We also extend propositional theory such that its models are in correspondence with the rewritings ■ We call our approach M CD S AT !! Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 11/19
Negation Normal Forms (NNF) ■ A formula is in Negation Normal Form (NNF) if constructed from literals using only conjunctions and disjunctions [Barwise 1977] ■ It can be represented as a rooted DAG whose leaves are literals and internal nodes are labeled with conjunction or disjunction or and and or or or or and and and and and and and and ~B ~A B A C ~D ~C D Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 12/19
Deterministic and Decomposable NNFs (d-DNNFs) ■ Introduced by [Darwiche 2001] ■ A NNF is decomposable if each variable appears at most once below each conjunct ■ A NNF is deterministic if disjuncts are pairwise logically inconsistent ■ A d-DNNF supports a number of operations in linear time : ◆ satisfiability ◆ clause entailment ◆ model counting ◆ model enumeration (output linear time) ◆ ... ■ Transformation into d-DNNF is intractable in the worst case , but not necessarily so on average Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 13/19
Implementation ■ M CD S AT translates QRP into a propositional theory T ■ T is compiled into d-DNNF using Darwiche’s c2d compiler ■ Models are obtained from the d-DNNF and transformed into MCDs or rewritings ■ c2d and models are off-the-shelf components ■ M CD S AT written in scripting language Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 14/19
Experimental Study OBJECTIVE: To study the effect of the query sizes and number of views in the performance of M CD S AT and MiniCon ■ Large benchmark with problems of different sizes and structures ■ Comparison metric: time ■ For lack of space, we only report few instances Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 15/19
Experimental Results ■ MCD Theory: time to generate MCDs (no combination) ■ Extended Theory: time to generate rewritings ■ Structure: Chain and Star ■ Half distinguished variables ■ Queries of different length ■ Different number of views ■ Each point is average over 10 instances ■ Random instances created with generator of [Afrati, Li & Ullman 2001] Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 16/19
Experimental Results: MCD Theories chain queries / half distinguished vars / 80 views star queries / half distinguished vars / 80 views 1000 1000 100 100 time in seconds time in seconds 10 10 1 1 MiniCon MiniCon McdSat McdSat 0.1 0.1 3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 10 number of goals in query number of goals in query chain queries / half distinguished vars / 8 subgoals star queries / half distinguished vars / 8 subgoals 10000 10000 1000 1000 time in seconds time in seconds 100 100 10 10 1 1 MiniCon MiniCon McdSat McdSat 0.1 0.1 20 40 60 80 100 120 140 20 40 60 80 100 120 140 number of views number of views Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006. Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 17/19
Recommend
More recommend