zinc efficient indexing for skyline computation
play

ZINC: Efficient Indexing for Skyline Computation Bin Liu Chee-Yong - PowerPoint PPT Presentation

ZINC: Efficient Indexing for Skyline Computation Bin Liu Chee-Yong Chan Department of Computer Science National University of Singapore Skyline Queries Skyline points that are not dominated by other points wrt a set of dimensions


  1. ZINC: Efficient Indexing for Skyline Computation Bin Liu Chee-Yong Chan Department of Computer Science National University of Singapore

  2. Skyline Queries ◮ Skyline – points that are not dominated by other points wrt a set of dimensions ◮ Point x dominates point y if (1) x is as good as y in all dimensions, and (2) x is better than y in at least one dimension ◮ Example : Find used cars that are cheap and have low mileage Price A E B I J F H C G D Mileage VLDB 2011 2

  3. Skyline Queries ◮ Skyline – points that are not dominated by other points wrt a set of dimensions ◮ Point x dominates point y if (1) x is as good as y in all dimensions, and (2) x is better than y in at least one dimension ◮ Example : Find used cars that are cheap and have low mileage Price A E B I J F H C G D Mileage VLDB 2011 2

  4. Simple Evaluation Algorithm Input : set of data points P Output : set of skyline points in P initialize set of candidate skyline points S to be empty for each data point p in P do if ( p is not dominated by any point in S ) then delete each s ∈ S if p dominates s insert p into S return S VLDB 2011 3

  5. Simple Evaluation Algorithm Input : set of data points P Output : set of skyline points in P initialize set of candidate skyline points S to be empty for each data point p in P do if ( p is not dominated by any point in S ) then delete each s ∈ S if p dominates s insert p into S return S Drawbacks: ◮ Need to scan entire data set ◮ Performs many dominance comparisons ◮ Non-progressive VLDB 2011 3

  6. Processing Skyline Queries ◮ Scan-based solutions: ◮ BNL, D&C [ Börzsönyi, Kossmann, Stocker, ICDE’01 ] ◮ SFS [ Chomicki, Godfrey, Gryz, Liang, ICDE’03 ] ◮ LESS [ Godfrey, Shipley, Gryz, VLDB’05 ] ◮ LS [ Morse, Patel, Jagadish, VLDB’07 ] ◮ Index-based solutions: ◮ Bitmap, Index [ Tan, Eng, Ooi, VLDB’01 ] ◮ NN [ Kossmann, Ramsak, Rost, VLDB’02 ] ◮ BBS [ Papadias, Tao, Fu, Seeger, SIGMOD’03 ] ◮ ZB-tree [ Lee, Zheng, Li, Lee, VLDB’07 ] ◮ OPS, LCRS [ Zhang, Mamoulis, Cheung, SIGMOD’09 ] ◮ BSkyTree [ Lee, Hwang, EDBT’10 ] VLDB 2011 4

  7. Partially-Ordered Domains ◮ Many data have partially-ordered domains: ◮ User preferences Ferrari Audi Honda Toyota BMW Yugo ◮ Interval data (e.g., availability period, price range) ◮ Type/class hierarchies (e.g., categorical data) ◮ Set-valued domains (e.g., skill sets, hotel facilities) VLDB 2011 5

  8. Our Work: ZINC ◮ Index method for skyline queries with PO domains ◮ Inspired by ZB-tree ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07] ◮ Index method for totally-ordered domains ◮ Outperforms BBS [ Papadias, Tao, Fu, Seeger, SIGMOD’03 ] VLDB 2011 6

  9. Our Work: ZINC ◮ Index method for skyline queries with PO domains ◮ Inspired by ZB-tree ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07] ◮ Index method for totally-ordered domains ◮ Outperforms BBS [ Papadias, Tao, Fu, Seeger, SIGMOD’03 ] ◮ Related work ◮ SDC + [ Chan, Eng, Tan, SIGMOD’05 ] ◮ TSS [ Sacharidis, Papadopoulos, Papadias, ICDE’09 ] VLDB 2011 6

  10. Our Work: ZINC ◮ Index method for skyline queries with PO domains ◮ Inspired by ZB-tree ◮ ZB-tree [Lee, Zheng, Li, Lee, VLDB’07] ◮ Index method for totally-ordered domains ◮ Outperforms BBS [ Papadias, Tao, Fu, Seeger, SIGMOD’03 ] ◮ Related work ◮ SDC + [ Chan, Eng, Tan, SIGMOD’05 ] ◮ TSS [ Sacharidis, Papadopoulos, Papadias, ICDE’09 ] ◮ Recent technique: ⋆ CPS, SCL [ Zhang, Mamoulis, Cheung, Kao, VLDB’10 ] VLDB 2011 6

  11. ZB-tree ◮ Maps multi-dimensional data point to 1-dimensional Z-address ◮ Z-address = Interleaved bitstring representation of attribute values ◮ Example: (0,5) = (000,101) → 010001 ◮ Index Z-addresses using B + -tree VLDB 2011 7

  12. ZB-tree: Example y 7 d 6 b i 5 h c 4 a 3 g e 2 1 f x 0 0 1 2 3 4 5 6 7 VLDB 2011 8

  13. ZB-tree: Example y 7 d 6 b i 5 h c 4 a 3 g e e 2 1 f x 0 0 1 2 3 4 5 6 7 VLDB 2011 8

  14. ZB-tree: Example Monotonic ordering property: if p dominates q, then p precedes q in Z-order y 7 d 6 b i 5 h c 4 a 3 g e e 2 1 f x 0 0 1 2 3 4 5 6 7 VLDB 2011 8

  15. ZB-tree: Example y 7 d 6 b i [a, d] [e, i] 5 h [a, a] [b, d] [e, g] [h, i] c 4 a b c d e f g h i 3 a g 2 e e 1 f x 0 0 1 2 3 4 5 6 7 VLDB 2011 9

  16. Encoding Schemes for Partial Orders ◮ Given a partial order domain D , find the smallest set S and an embedding f : D → 2 S such that x dominates y iff f ( x ) ⊆ f ( y ) ◮ Many proposed heuristics: ◮ Ait-Kaci et al, ACM TOPLS 1989 ◮ Caseau, OOPSLA 1993 ◮ Krall, Vitek, Horspool, ECOOP 1997 ◮ etc VLDB 2011 10

  17. ZINC: Nested Encoding Scheme ◮ ZINC = Z-order Indexing with Nested Code ◮ Key idea : ◮ Organize PO into nested layers of simpler POs ◮ Encode each value in PO as a concatenation of encodings in simpler POs VLDB 2011 11

  18. Example of Partial Order Reduction a b i c e j d f k l g m n o h p G 0 VLDB 2011 12

  19. Example of Partial Order Reduction a b i R 1 c e j R 2 d f k l g m n o h p G 0 VLDB 2011 12

  20. Example of Partial Order Reduction a b i R 1 A subset of nodes R in PO is a region if every node in R has the same dominance relationship c e j R 2 wrt nodes outside of R d f k l ◮ if u ∈ R dominates v / ∈ R , then every u ′ ∈ R dominates v g m n ◮ if v / ∈ R dominates u ∈ R , then v dominates every u ′ ∈ R o h p G 0 VLDB 2011 12

  21. Example of Partial Order Reduction a a b i b i R 1 c e j j R 2 v 1 d f k l v 2 g m n g o h o h p p G 0 G 1 VLDB 2011 12

  22. Example of Partial Order Reduction a a R 3 b i b i R 1 c e j j R 2 v 1 d f k l v 2 g m n g o h o h p p G 0 G 1 VLDB 2011 12

  23. Example of Partial Order Reduction a a R 3 a b i b i R 1 c e j j R 2 v 1 d f k l v 3 v 2 g m n g o h o h p p p G 0 G 1 G 2 VLDB 2011 12

  24. Example of Nested Encodings G 0 G 1 G 2 a a a R 3 b i b i R 1 j j c e R 2 v 1 v 3 d f k l v 2 g g m n o o h h p p p Encode( a , G 0 ) = Encode( a , G 2 ) Encode( h , G 0 ) = Encode( v 3 , G 2 ) + Encode( h , R 3 ) Encode( k , G 0 ) = Encode( v 3 , G 2 ) + Encode( v 2 , R 3 ) + Encode( k , R 2 ) VLDB 2011 13

  25. Vertical Regions A region R in a PO a vertical region if ◮ R = S 0 ∪ ··· ∪ S k , k ≥ 1, each S i is a total order, ◮ nodes from different total orders are incomparable ◮ R is maximal subgraph of PO that satisfies the above properties R = S 0 ∪ S 1 a S 0 = { c , d } , S 1 = { e , f } b i R c e j Each v ∈ R is encoded by two components: (1) which S i contains d f k l v , and (2) rank of v within S i g m n c = 00 , d = 01 , e = 10 , f = 11 h o p VLDB 2011 14

  26. Horizontal Regions A region R in a PO is a horizontal region if ◮ R = S 0 ∪ ··· ∪ S k , k ≥ 1, ◮ the nodes within each S i are incomparable, ◮ u ∈ S i dominates v ∈ S j if i < j , and ◮ R is maximal subgraph of PO that satisfies the above properties R = S 0 ∪ S 1 a S 0 = { k , l } , S 1 = { m , n } b i c e j R Each v ∈ R is encoded by i if d f k l v ∈ S i g m n k = 0 , l = 0 , m = 1 , n = 1 o h p VLDB 2011 15

  27. Regular & Irregular Regions ◮ A region R in a PO is a regular region if R is either a vertical or horizontal region ◮ A region R in a PO is an irregular region if ◮ R is not a regular region, and ◮ R is a minimal subgraph of PO containing at least two nodes ◮ Example of an irregular region: a R 4 b c e d f ◮ Irregular regions are encoded using Compact Hierarchical Encoding (CHE) [Caseau, OOPSLA 1993] VLDB 2011 16

  28. Putting everything together G 0 a G 1 G 2 a 00 a R 3 b i R 1 000 b i 100 00 c j d 10 R 2 j 101 001 v 1 01 c e 0 k l 0 11 01 v 3 v 2 110 g 1 m n 1 010 g o h 011 h o 111 p p p 10 Encode( a , G 0 ) = Encode( a , G 2 ) = 00 00000 Encode( h , G 0 ) = Encode( v 3 , G 2 ) + Encode( h , R 3 ) = 01 011 00 Encode( k , G 0 ) = Encode( v 3 , G 2 ) + Encode( v 2 , R 3 ) + Encode( k , R 2 ) = 01 110 0 0 VLDB 2011 17

  29. Performance Comparison 50 TSS TSS+ZB 40 Processing time (sec) CHE+ZB ZINC� 30 20 10 0 (2,1) (3,1) (4,1) (2,2) (3,2) (4,2) (|TO|, |PO|) VLDB 2011 18

  30. Conclusion ◮ Presented a novel index method for computing skyline queries on data with partially-ordered attribute domains ◮ ZINC = Z-order based indexing (ZB-tree) + Nested encoding scheme ◮ Future work: ◮ ZINC vs CPS, SCL [ Zhang, Mamoulis, Cheung, Kao, VLDB’10 ] ◮ Other techniques? VLDB 2011 19

Recommend


More recommend