Efficient Detection of Empty-Result Queries Gang Luo IBM Watson Research Centre Damon Sotoudeh
Agenda Introduction The detection method Related work Future work Conclusion
Empty-Result Queries Queries that return nothing Do not provide much information May take much time to produce Frequently encountered: ○ CRM (at IBM): 18% ○ Biomedical domain: up to 40% ○ In interactive systems
Empty-Result Queries In interactive systems Users keep refining queries Few parameters are changed Much of query parts are common ○ In IBM CRM application, only 38% of queries are distinct
Intuition Remember query parts that previously led to empty result sets If a new query matches those parts, it will generate empty results No query execution required
Detection Method Numbers are set cardinalities
Detection Method Identify lowest set with cardinality zero, and the sub-tree rooted at that point
Detection Method Easy to see that the set cardinalities above this point are all zero
Detection Method If a new query has this query part, it is an empty-result query Only if all the operators above it are empty-result propagating ○ Selection ○ Projection ○ Join ○ And most of SQL operators
Simplifying query plans Abstractly Certain operators have no influence on the emptiness of output ○ Projection ○ Hash ○ Sort, ... Any join operator is simply a join ○ Hash join ○ Sort-merge join ○ Nested-loops join
Simplifying query plans
Simplifying query plans Previous figure corresponds to the following query:
Further simplification Convert selection conditions to DNF Disjunctive normal form For example: = Interval selection does not need to be changed
Further simplification After rewriting selections in DNF, combine the individual selection terms in each relation
Further simplification Great news: The output of the four simplified query parts is also empty! ○ Proof by intuition! They are called atomic query parts ○ Cannot be further simplified But generating them is exponential ○ Poor performance for complex queries
Detection How to detect an empty-result query Q? Break Q into its atomic parts Is there any atomic part in container that covers Q? ○ If yes, then it is an empty-result query
Coverage A selection condition X covers selection condition Y, if and only if when Y is true, then X is true. In other words, if X is false, then so is Y.
Coverage Notion of coverage expands the detection possibilities But deciding coverage is exponential Paper uses a restricted coverage detection Trade off between efficiency and coverage detection If an empty result atomic query part covers an atomic part of query Q, then Q definitely generates empty results But we may not necessarily find such match
Atomic query container Is fully stored in memory For fast access Is of fixed size M, but M can be fairly large Trade off between efficiency and coverage Once the container is full, maintain the most frequently used atomic parts only ○ E.g. Least recently used (LRU) algorithm
Atomic query container To avoid scanning the whole container Index the container based on involved relations
Experiments Based on two queries Q 1 : Find the information about certain parts that were sold on certain days Q 2 : Find the information about certain parts that were sold to certain customers on certain days
Experiments The overhead is trivial compared to query execution overhead 1000 execution time or overhead 100 10 (second) execute Q1 check Q1 1 execute Q2 0.1 check Q2 0.01 0.001 1 2 3 database size (GB)
Experiments The overhead of our method increases with both query complexity and the number of atomic query parts stored in C When check fails, the overhead of our method is higher than that when check succeeds
Related Work Two general approaches Find what leads to empty results 1. ○ Time consuming ○ A lot of possibilities Automatically generalize the query to obtain 2. some answers ○ Domain specific ○ Restricted forms of queries No best approach
Open issues How to include updates? Extension beyond empty result propagating operators A method that takes into account advantages of all current solutions Not restrictive Efficient
Conclusion An efficient detection method of empty result sets High detection rate once the container is highly filled Low overhead compared to actual execution of query Small storage requirements Perfect for interactions Existence of hotspots is reflected
Thanks for listening! Questions?
Recommend
More recommend