efficient detection of empty result
play

Efficient Detection of Empty-Result Queries Gang Luo IBM Watson - PowerPoint PPT Presentation

Efficient Detection of Empty-Result Queries Gang Luo IBM Watson Research Centre Damon Sotoudeh Agenda Introduction The detection method Related work Future work Conclusion Empty-Result Queries Queries that return


  1. Efficient Detection of Empty-Result Queries Gang Luo IBM Watson Research Centre Damon Sotoudeh

  2. Agenda  Introduction  The detection method  Related work  Future work  Conclusion

  3. Empty-Result Queries  Queries that return nothing  Do not provide much information  May take much time to produce  Frequently encountered: ○ CRM (at IBM): 18% ○ Biomedical domain: up to 40% ○ In interactive systems

  4. Empty-Result Queries  In interactive systems  Users keep refining queries  Few parameters are changed  Much of query parts are common ○ In IBM CRM application, only 38% of queries are distinct

  5. Intuition  Remember query parts that previously led to empty result sets  If a new query matches those parts, it will generate empty results No query execution required

  6. Detection Method Numbers are set cardinalities

  7. Detection Method Identify lowest set with cardinality zero, and the sub-tree rooted at that point

  8. Detection Method  Easy to see that the set cardinalities above this point are all zero

  9. Detection Method  If a new query has this query part, it is an empty-result query  Only if all the operators above it are empty-result propagating ○ Selection ○ Projection ○ Join ○ And most of SQL operators

  10. Simplifying query plans  Abstractly  Certain operators have no influence on the emptiness of output ○ Projection ○ Hash ○ Sort, ...  Any join operator is simply a join ○ Hash join ○ Sort-merge join ○ Nested-loops join

  11. Simplifying query plans

  12. Simplifying query plans  Previous figure corresponds to the following query:

  13. Further simplification  Convert selection conditions to DNF  Disjunctive normal form  For example: =  Interval selection does not need to be changed

  14. Further simplification  After rewriting selections in DNF, combine the individual selection terms in each relation

  15. Further simplification  Great news:  The output of the four simplified query parts is also empty! ○ Proof by intuition!  They are called atomic query parts ○ Cannot be further simplified  But generating them is exponential ○ Poor performance for complex queries

  16. Detection  How to detect an empty-result query Q?  Break Q into its atomic parts  Is there any atomic part in container that covers Q? ○ If yes, then it is an empty-result query

  17. Coverage  A selection condition X covers selection condition Y, if and only if when Y is true, then X is true.  In other words, if X is false, then so is Y.

  18. Coverage  Notion of coverage expands the detection possibilities  But deciding coverage is exponential  Paper uses a restricted coverage detection  Trade off between efficiency and coverage detection  If an empty result atomic query part covers an atomic part of query Q, then Q definitely generates empty results  But we may not necessarily find such match

  19. Atomic query container  Is fully stored in memory  For fast access  Is of fixed size M, but M can be fairly large  Trade off between efficiency and coverage  Once the container is full, maintain the most frequently used atomic parts only ○ E.g. Least recently used (LRU) algorithm

  20. Atomic query container  To avoid scanning the whole container  Index the container based on involved relations

  21. Experiments  Based on two queries  Q 1 : Find the information about certain parts that were sold on certain days  Q 2 : Find the information about certain parts that were sold to certain customers on certain days

  22. Experiments  The overhead is trivial compared to query execution overhead 1000 execution time or overhead 100 10 (second) execute Q1 check Q1 1 execute Q2 0.1 check Q2 0.01 0.001 1 2 3 database size (GB)

  23. Experiments  The overhead of our method increases with both query complexity and the number of atomic query parts stored in C When check fails, the overhead of our method is higher than that  when check succeeds

  24. Related Work  Two general approaches Find what leads to empty results 1. ○ Time consuming ○ A lot of possibilities Automatically generalize the query to obtain 2. some answers ○ Domain specific ○ Restricted forms of queries  No best approach

  25. Open issues  How to include updates?  Extension beyond empty result propagating operators  A method that takes into account advantages of all current solutions  Not restrictive  Efficient

  26. Conclusion  An efficient detection method of empty result sets  High detection rate once the container is highly filled  Low overhead compared to actual execution of query  Small storage requirements  Perfect for interactions  Existence of hotspots is reflected

  27. Thanks for listening! Questions?

Recommend


More recommend