shooting stars in the sky
play

Shooting Stars in the Sky An Online Algorithm for Skyline Queries - PowerPoint PPT Presentation

Shooting Stars in the Sky An Online Algorithm for Skyline Queries Donald Kossmann Frank Ramsak Steffen Rost kossmann@in.tum.de frank.ramsak@forwiss.de rost@in.tum.de Technische Universitt Mnchen Institut fr Informatik Boltzmannstr. 3


  1. Shooting Stars in the Sky An Online Algorithm for Skyline Queries Donald Kossmann Frank Ramsak Steffen Rost kossmann@in.tum.de frank.ramsak@forwiss.de rost@in.tum.de Technische Universität München Institut für Informatik Boltzmannstr. 3 85748 Garching b. München Germany

  2. Outline � Motivation – Skyline & known algorithms – Challenges in online scenarios � The NN algorithm for Skyline queries – Algorithm for 2D – Relationship between NN and Skyline – Algorithm for higher dimensionality � Evaluation � Supporting user control � Summary Shooting Stars in the Sky, VLDB 2002, Hong Kong

  3. What is the Skyline? � Literature: Minimum/maximum vector problem – Two vectors are not comparable – Dominance: A vector/point dominates another point if it is as good or better in all dimensions and better in at least one dimension – Skyline : All points of a data set that are not dominated by any other point y: distance to the beach [km] x: price [€] Shooting Stars in the Sky, VLDB 2002, Hong Kong

  4. Traditional Skyline algorithms � Blocking algorithms: require to read the complete data set – Compare each point with all other points – [Börzsönyi et.al. ICDE 01] • Block-Nested-Loops (BNL): keep window of candidate Skyline points • Divide-and-Conquer (D&C): divide data set and compute partial Skylines and merge them � Progressive algorithms [Tan et.al. VLDB 01] – Bitmap: operations on range-encoded bitmaps – Index: transformation of d-dimensional space to one dimension + B-Tree Shooting Stars in the Sky, VLDB 2002, Hong Kong

  5. The challenge in online scenarios � Compute the first few Skyline points almost instantaneously � Compute more and more results incrementally � “Big picture”: compute Skyline points from the whole range, do not favor points that are good in one dimension Shooting Stars in the Sky, VLDB 2002, Hong Kong

  6. Result ≠ Result - Quality of Results Complete Skyline Progressive:first Online: first 10 points 10 points by Index; by our NN algorithm Order of Bitmap depends on insertion order Shooting Stars in the Sky, VLDB 2002, Hong Kong

  7. The challenge in online scenarios � Compute the first few Skyline points almost instantaneously � Compute more and more results incrementally � “Big picture”: compute Skyline points from the whole range, do not favor points that are good in one dimension � Do not compute good approximations, do only return real Skyline points � User should be able to make preferences while the algorithm is running � control which Skyline points are produced next � Universality w.r. to data sets and type of Skyline queries Shooting Stars in the Sky, VLDB 2002, Hong Kong

  8. The NN algorithm: 2D example y: distance to the beach [km] RangeNNSearch RangeNNSearch x: price [€] Shooting Stars in the Sky, VLDB 2002, Hong Kong

  9. The NN algorithm for 2 dimensions � Input: data set D , monotonic distance function f � Additional structures: to-do list T , keeps information of regions to be processed � Algorithm: T = {( O , ∞ )} while ( T = ∅ ) do ( m x , m y ) = takeElement( T ) if ( ∃ RangeNNSearch( O , D , ( m x , m y ), f )) then ( n x , n y ) = RangeNNSearch( O , D , ( m x , m y ), f ) output n T = T ∪ {( n x , m y ), ( m x , n y )} endif endwhile Shooting Stars in the Sky, VLDB 2002, Hong Kong

  10. Correctness of the NN algorithm � Relationship between Nearest Neighbor (NN) and Skyline Given a data set D with origin O and an arbitrary monotonic distance function f , we can state: – Observation 1: The Nearest Neighbor NN of O in D w.r.t. f is in the Skyline – Observation 2: Given a region R with R= ( O, X )= X , the Nearest Neighbor NN of O in R w.r.t. f is in the Skyline Shooting Stars in the Sky, VLDB 2002, Hong Kong

  11. Extending NN algorithm to higher dimensionality � Observations also hold in d dimensional space � Modification: Processed region is partitioned into d subregions w.r.t. the NN y y n n n n p z z x x n n n n p p � Problem: duplicate Skyline points may occur Solutions: post-filtering, merging, propagation, ... Shooting Stars in the Sky, VLDB 2002, Hong Kong

  12. Comparison with other approaches � Algorithms: – Online algorithm: our NN algorithm – Blocking algorithms: BNL, D&C – Progressive algorithms: Bitmap, Index � Data sets: – Sizes: 100 K points, 1 M points – Distributions: correlated, anti-correlated, independent – Dimensionality: 2 - 10 Shooting Stars in the Sky, VLDB 2002, Hong Kong

  13. Performance in 2D 100K 1M anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 Shooting Stars in the Sky, VLDB 2002, Hong Kong

  14. Performance in 2D 100K 1M anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 � NN depends on data distribution ( � Skyline size), not on data set size Shooting Stars in the Sky, VLDB 2002, Hong Kong

  15. Performance in 2D 100K 1M anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 � NN depends on data distribution ( � Skyline size), not on data set size � BNL, D&C depend on data set size only Shooting Stars in the Sky, VLDB 2002, Hong Kong

  16. Performance in 2D 100K 1M anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 � NN depends on data distribution ( � Skyline size), not on data set size � BNL, D&C depend on data set size only � Bitmap depends on data size and data distribution Shooting Stars in the Sky, VLDB 2002, Hong Kong

  17. Performance in 2D 100K 1M anti corr ind anti corr ind Skyline 49 1 12 54 1 12 NN 0.57 0.02 0.2 0.69 0.02 0.5 BNL 1.77 1.65 1.68 17.16 16.24 16.07 D&C 2.63 2.56 2.63 28.65 28.53 28.50 Bitmap 6.09 0.84 1.40 57.12 12.23 17.90 B-tree 13.86 0.01 0.26 >200 0.12 0.92 � NN depends on data distribution ( � Skyline size), not on data set size � BNL, D&C depend on data set size only � Bitmap depends on data size and data distribution � Index depends on data distribution and on data size Shooting Stars in the Sky, VLDB 2002, Hong Kong

  18. Performance in higher dimensional spaces � d < 4: NN is typically the winner in all respects � d >= 4: Performance depends on goal – Complete Skyline: BNL and D&C are usually the best choice to compute the complete Skyline – Big picture: NN produces the big picture the fastest – Data rate: Index produces Skyline points at the highest rate, but always returns “extreme” points first � no big picture Shooting Stars in the Sky, VLDB 2002, Hong Kong

  19. Providing control � Goal: user should be able to determine order of Skyline points at run-time � Order of region processing – Influences „direction“ � Adaptation of distance function – Does not change Skyline → Observation 1&2 – Influences order of points � Bitmap and Index lack control – Order of Skyline points is determined by one- dimensional mapping and insertion order resp. Shooting Stars in the Sky, VLDB 2002, Hong Kong

  20. Example: User control y: distance to the beach [km] RangeNNSearch (DF2) RangeNNSearch (DF1) x: price [€] Shooting Stars in the Sky, VLDB 2002, Hong Kong

  21. Summary and future work � Online algorithm for Skyline based on NN-search � NN algorithm – returns first Skyline points instantaneously – builds complete Skyline incrementally – generates a “big picture“ of the Skyline – generates only Skyline points → no approximation – supports user interaction – is universal � Future work: – Main memory, multidimensional indexing for region list – Continuous Skyline queries Shooting Stars in the Sky, VLDB 2002, Hong Kong

Recommend


More recommend