ranking distributed probabilistic data
play

Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke - PowerPoint PPT Presentation

Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke Yi 1-1 Introduction Ranking queries are important tools used to return only the most significant results 2-1 Introduction Ranking queries are important tools used to return


  1. Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-3

  2. Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-4

  3. Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-5

  4. Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-6

  5. Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-7

  6. Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-8

  7. Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-9

  8. Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-10

  9. Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-11

  10. Expected Ranks It has been shown that r ( t i ) may be written as b i � p i,l ( q ( v i,l ) − Pr [ X i > v i,l ]) r ( t i ) = (2) l =1 where, = number of choices in the pdf of t i b i = probability of choice l in tuple t i p i,l � q ( v i,l ) = j Pr [ X j > v i,l ] = pdf of tuple t i X i Pr [ X i > v i,l ] = contribution of t i to q ( v i,l ) 9-1

  11. Expected Ranks It has been shown that r ( t i ) may be written as b i � p i,l ( q ( v i,l ) − Pr [ X i > v i,l ]) r ( t i ) = (2) l =1 where, = number of choices in the pdf of t i b i = probability of choice l in tuple t i p i,l � q ( v i,l ) = j Pr [ X j > v i,l ] = pdf of tuple t i X i Pr [ X i > v i,l ] = contribution of t i to q ( v i,l ) q ( v i,l ) is the sum of the probabilities that a tuple will out- rank a tuple with score v i,l 9-2

  12. Expected Ranks It has been shown that r ( t i ) may be written as b i � p i,l ( q ( v i,l ) − Pr [ X i > v i,l ]) r ( t i ) = (2) l =1 where, = number of choices in the pdf of t i b i = probability of choice l in tuple t i p i,l � q ( v i,l ) = j Pr [ X j > v i,l ] = pdf of tuple t i X i Pr [ X i > v i,l ] = contribution of t i to q ( v i,l ) X i may contain value-probability pairs ( v, p ) s.t. v > v i,l , since the existence of t i = v i,l precludes t i = v , we must subtract the corresponding p ’s from q ( v i,l ) 9-3

  13. Expected Ranks It has been shown that r ( t i ) may be written as b i � p i,l ( q ( v i,l ) − Pr [ X i > v i,l ]) r ( t i ) = (2) l =1 where, = number of choices in the pdf of t i b i = probability of choice l in tuple t i p i,l � q ( v i,l ) = j Pr [ X j > v i,l ] = pdf of tuple t i X i Pr [ X i > v i,l ] = contribution of t i to q ( v i,l ) Efficient algorithms exist to compute the Expected ranks in O ( NlogN ) time for a database of N tuples 9-4

  14. Computing Expected Ranks by q ( v ) ’s tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 3.0 2.8 2.5 1.5 0.8 0 - ∞ 62 70 98 103 120 10-1

  15. Computing Expected Ranks by q ( v ) ’s tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 3.0 2.8 2.5 1.5 0.8 0 - ∞ 62 70 98 103 120 10-2

  16. Computing Expected Ranks by q ( v ) ’s tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 3.0 2.8 2.5 1.5 0.8 0 - ∞ 62 70 98 103 120 r ( t 1 ) = 0 . 8 × 0 10-3

  17. Computing Expected Ranks by q ( v ) ’s tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 3.0 2.8 2.5 1.5 0.8 0 - ∞ 62 70 98 103 120 r ( t 1 ) = 0 . 8 × 0 + 0 . 2 × (2 . 8 − 0 . 8) = 0 . 4 10-4

  18. Distributed Probabilistic Data Model site 1 tuples score t 1 , 1 X 1 , 1 t 1 , 2 X 1 , 2 . . . . . . . . . site m tuples score t 2 , 1 X 2 , 1 t 2 , 2 X 2 , 2 . . . . . . 11-1

  19. Distributed Probabilistic Data Model site 1 tuples score t 1 , 1 X 1 , 1 t 1 , 2 X 1 , 2 tuples . . . . . . t 1 . t 2 . . . . . site m tuples score t N t 2 , 1 X 2 , 1 Conceptual t 2 , 2 X 2 , 2 Database D . . . . . . We can think of the union of the individual databases D i at each site s i as a conceptual database D 11-2

  20. Ranking Queries for Distributed Probabilistic Data We introduce two frameworks for ranking queries for distributed probabilistic data Sorted Access on Local Ranks Sorted Access on Expected Scores 12-1

  21. Sorted Access on Local Ranks Framework site 2 site m site 1 t 2 , 1 t 1 , 1 t m, 1 . . . t 2 , 2 t m, 2 t 1 , 2 . . . . . . . . . t 2 ,n 2 t 2 ,n m t 1 ,n 1 Every site calculates the local ranks of its tuples and stores tuples in ascending order of local ranks 13-1

  22. Sorted Access on Local Ranks Framework SERVER site 2 site m site 1 t 2 , 1 t 1 , 1 t m, 1 . . . t 2 , 2 t m, 2 t 1 , 2 . . . . . . . . . t 2 ,n 2 t 2 ,n m t 1 ,n 1 The server accesses tuples in ascending order of local ranks and combines the local ranks to get the global ranks 13-2

  23. Local and Global Ranks The local rank of a tuple t i,j at a site s i in database D i is b i,j � p i,j,l ( q i ( v i,j,l ) − Pr [ X i,j > v i,j,l ]) r ( t i,j , D i ) = (3) l =0 The local rank for a tuple t i,j at a site s y with database D y , s.t. i � = y is b i,j � r ( t i,j , D y ) = p i,j,l ( q y ( v i,j,l )) (4) l =1 The global rank for a tuple t i,j is m � r ( t i,j , D y ) = r ( t i,j , Dy ) (5) y =1 14-1

  24. Local and Global Ranks The local rank of a tuple t i,j at a site s i in database D i is b i,j � p i,j,l ( q i ( v i,j,l ) − Pr [ X i,j > v i,j,l ]) r ( t i,j , D i ) = (3) l =0 The local rank for a tuple t i,j at a site s y with database D y , s.t. i � = y is b i,j � r ( t i,j , D y ) = p i,j,l ( q y ( v i,j,l )) (4) l =1 14-2

  25. Local and Global Ranks The local rank of a tuple t i,j at a site s i in database D i is b i,j � p i,j,l ( q i ( v i,j,l ) − Pr [ X i,j > v i,j,l ]) r ( t i,j , D i ) = (3) l =0 The local rank for a tuple t i,j at a site s y with database D y , s.t. i � = y is b i,j � r ( t i,j , D y ) = p i,j,l ( q y ( v i,j,l )) (4) l =1 The global rank for a tuple t i,j is m � r ( t i,j , D y ) = r ( t i,j , Dy ) (5) y =1 14-3

  26. Sorted Access on Local Ranks Initialization Rep. Queue tuple lrank 0.8 t 3 , 1 1.2 t 1 , 1 2.3 t 2 , 1 site 2 site 2 site 3 site 3 site 1 site 1 tuple tuple lrank lrank tuple tuple lrank lrank tuple tuple lrank lrank → → → 2.3 2.3 0.8 0.8 1.2 1.2 t 2 , 1 t 2 , 1 t 3 , 1 t 3 , 1 t 1 , 1 t 1 , 1 → → 3.4 3.4 4.1 4.1 → 5.9 5.9 t 2 , 2 t 2 , 2 t 3 , 2 t 3 , 2 t 1 , 2 t 1 , 2 . . . . . . . . . . . . . . . . . . 29.1 29.1 40.4 40.4 t 2 ,n 2 t 2 ,n 2 t 3 ,n 3 t 3 ,n 3 34.2 34.2 t 1 ,n 1 t 1 ,n 1 15-1

  27. Sorted Access on Local Ranks Initialization Rep. Queue tuple lrank 0.8 t 3 , 1 1.2 t 1 , 1 2.3 t 2 , 1 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 → → 3.4 4.1 → 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . . . . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 15-2

  28. Sorted Access on Local Ranks: a Round Rep. Queue top − 2 Queue tuple lrank tuple grank 3.4 t 2 , 2 5.4 t 2 , 1 4.1 t 3 , 2 7.9 t 1 , 1 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-1

  29. Sorted Access on Local Ranks: a Round Rep. Queue top − 2 Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 3.4 t 2 , 2 5.4 t 2 , 1 4.1 t 3 , 2 7.9 t 1 , 1 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-2

  30. Sorted Access on Local Ranks: a Round Rep. Queue top − 2 Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 3.4 t 2 , 2 5.4 t 2 , 1 4.1 t 3 , 2 7.9 t 1 , 1 5.9 t 1 , 2 tuple lrank 4.8 t 2 , 3 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-3

  31. Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-4

  32. Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 5.9 t 1 , 2 X 2 , 2 X 2 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-5

  33. Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 5.9 t 1 , 2 lrank lrank 0.7 1.5 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-6

  34. Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 grank 5.9 t 1 , 2 5.6 lrank lrank 0.7 1.5 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-7

  35. Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple grank tuple lrank tuple grank 5.6 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-8

  36. Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple grank 4.1 5.4 t 3 , 2 t 2 , 1 4.8 5.6 t 2 , 3 t 2 , 2 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-9

  37. Sorted Access on Local Ranks: a Round We can safely terminate top − 2 Queue Rep. Queue whenever the largest grank tuple lrank tuple grank from top − k queue is ≤ 4.1 5.4 t 3 , 2 t 2 , 1 smallest lrank from Rep. 4.8 5.6 t 2 , 3 t 2 , 2 Queue 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-10

  38. Sorted Access on Local Ranks: a Round We can safely terminate top − 2 Queue Rep. Queue whenever the largest grank tuple lrank tuple grank from top − k queue is ≤ 4.1 5.4 t 3 , 2 t 2 , 1 smallest lrank from Rep. 4.8 5.6 t 2 , 3 t 2 , 2 Queue 5.9 t 1 , 2 A-LR site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-11

  39. Sorted Access on Expected Scores Framework site 2 site m site 1 t 2 , 1 t 1 , 1 t m, 1 . . . t 2 , 2 t m, 2 t 1 , 2 . . . . . . . . . t 2 ,n 2 t 2 ,n m t 1 ,n 1 Every site calculates the local ranks and the expected scores of its tuples and stores the tuples in descending order of expected scores 17-1

  40. Sorted Access on Expected Scores Framework SERVER site 2 site m site 1 t 2 , 1 t 1 , 1 t m, 1 . . . t 2 , 2 t m, 2 t 1 , 2 . . . . . . . . . t 2 ,n 2 t 2 ,n m t 1 ,n 1 Tuples are accessed by descending order of expected scores and the server calculates global ranks 17-2

  41. Sorted Access on Expected Scores Initialization site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] → → → 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . . . . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 18-1

  42. Sorted Access on Expected Scores Initialization Rep. Queue tuple E [ X ] 500 t 3 , 1 489 t 1 , 1 476 t 2 , 1 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 → → 464 432 → 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . . . . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 18-2

  43. Sorted Access on Expected Scores: a Round Rep. Queue top − 2 Queue tuple E [ X ] tuple grank 464 t 2 , 2 5.4 t 2 , 1 432 t 3 , 2 7.9 t 1 , 1 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-1

  44. Sorted Access on Expected Scores: a Round Rep. Queue top − 2 Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 464 t 2 , 2 5.4 t 2 , 1 432 t 3 , 2 7.9 t 1 , 1 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-2

  45. Sorted Access on Expected Scores: a Round Rep. Queue top − 2 Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 464 t 2 , 2 5.4 t 2 , 1 432 t 3 , 2 7.9 t 1 , 1 421 t 1 , 2 tuple E [ X ] 429 t 2 , 3 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-3

  46. Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-4

  47. Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 421 t 1 , 2 X 2 , 2 X 2 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-5

  48. Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 421 t 1 , 2 lrank lrank 0.7 1.5 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-6

  49. Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 grank 421 t 1 , 2 5.6 lrank lrank 0.7 1.5 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-7

  50. Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple grank tuple E [ X ] tuple grank 5.6 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-8

  51. Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple E [ X ] tuple grank 432 5.4 t 3 , 2 t 2 , 1 429 5.6 t 2 , 3 t 2 , 2 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-9

  52. Sorted Access on Expected Scores: a Round Now the only question is top − 2 Queue Rep. Queue when may we safely termi- tuple E [ X ] tuple grank nate and be certain we have 432 5.4 t 3 , 2 t 2 , 1 the global top − k 429 5.6 t 2 , 3 t 2 , 2 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-10

  53. Sorted Access on Expected Scores: Termina- tion The largest element from the top − k queue is clearly an upper bound r + λ for the global rank of any seen tuple t with pdf X to be in the top − k at round λ 20-1

  54. Sorted Access on Expected Scores: Termina- tion top − 2 Queue Rep. Queue tuple grank tuple E [ X ] 5.4 432 t 2 , 1 t 3 , 2 5.6 429 t 2 , 2 t 2 , 3 421 t 1 , 2 The largest element from the top − k queue is clearly an upper bound r + λ for the global rank of any seen tuple t with pdf X to be in the top − k at round λ 20-2

  55. Sorted Access on Expected Scores: Termina- tion top − 2 Queue Rep. Queue tuple grank tuple E [ X ] 5.4 432 t 2 , 1 t 3 , 2 5.6 429 t 2 , 2 t 2 , 3 421 t 1 , 2 The largest element from the top − k queue is clearly an upper bound r + λ for the global rank of any seen tuple t with pdf X to be in the top − k at round λ The head from the Representative queue with expectance τ is an upper bound for the expectance of any unseen t s.t. E [ X ] ≤ τ 20-3

  56. Sorted Access on Expected Scores: Termina- tion top − 2 Queue Rep. Queue tuple grank tuple E [ X ] 5.4 432 t 2 , 1 t 3 , 2 5.6 429 t 2 , 2 t 2 , 3 421 t 1 , 2 The largest element from the top − k queue is clearly an upper bound r + λ for the global rank of any seen tuple t with pdf X to be in the top − k at round λ The head from the Representative queue with expectance τ is an upper bound for the expectance of any unseen t s.t. E [ X ] ≤ τ How can we derive a lower bound r − λ for the global rank of any unseen tuple t s.t. when r + λ ≤ r − λ it is safe to terminate at round λ ? 20-4

  57. Sorted Access on Expected Scores: a Lower Bound? We introduce two methods to find a lower bound r − λ for any unseen tuple t at round λ 21-1

  58. Sorted Access on Expected Scores: a Lower Bound? We introduce two methods to find a lower bound r − λ for any unseen tuple t at round λ Markov Inequality 21-2

  59. Sorted Access on Expected Scores: a Lower Bound? We introduce two methods to find a lower bound r − λ for any unseen tuple t at round λ Markov Inequality Linear Programming 21-3

  60. Markov Inequality Lower Bound We know that the pdf of any unseen t must satisfy E [ X ] ≤ τ 22-1

  61. Markov Inequality Lower Bound We know that the pdf of any unseen t must satisfy E [ X ] ≤ τ We can use the Markov Inequality to lower bound the rank of any site s i with database D i as, n i n i � � r ( t, D i ) = Pr[ X j > X ] = n i − Pr[ X ≥ X j ] j =1 j =1 b ij n i p i,j,ℓ E [ X ] � � (Markov Ineq.) n i − v i,j,ℓ . ≥ j =1 ℓ =1 b ij n i τ � � v i,j,ℓ = r − ( t, D i ) . (6) n i − p i,j,ℓ ≥ j =1 ℓ =1 22-2

  62. Markov Inequality Lower Bound We know that the pdf of any unseen t must satisfy E [ X ] ≤ τ We can use the Markov Inequality to lower bound the rank of any site s i with database D i as, n i n i � � r ( t, D i ) = Pr[ X j > X ] = n i − Pr[ X ≥ X j ] j =1 j =1 b ij n i p i,j,ℓ E [ X ] � � (Markov Ineq.) n i − v i,j,ℓ . ≥ j =1 ℓ =1 b ij n i τ � � v i,j,ℓ = r − ( t, D i ) . (6) n i − p i,j,ℓ ≥ j =1 ℓ =1 Now the global rank r ( t ) must satisfy m � r ( t ) ≥ r − ( t, D i ) = r − (7) λ i =1 22-3

  63. Markov Inequality Lower Bound We know that the pdf of any unseen t must satisfy E [ X ] ≤ τ We can use the Markov Inequality to lower bound the rank of any site s i with database D i as, n i n i � � r ( t, D i ) = Pr[ X j > X ] = n i − Pr[ X ≥ X j ] j =1 j =1 b ij n i p i,j,ℓ E [ X ] � � Loose! (Markov Ineq.) n i − v i,j,ℓ . ≥ j =1 ℓ =1 b ij n i τ � � v i,j,ℓ = r − ( t, D i ) . (6) n i − p i,j,ℓ ≥ j =1 ℓ =1 Now the global rank r ( t ) must satisfy m � r ( t ) ≥ r − ( t, D i ) = r − (7) λ i =1 22-4

  64. Linear Programming Lower Bound Any unseen tuple t must have E [ X ] ≤ τ 23-1

  65. Linear Programming Lower Bound Any unseen tuple t must have E [ X ] ≤ τ We’ve seen how to derive a lower bound r − λ on the global rank for any unseen tuple t using Markov’s Inequality 23-2

  66. Linear Programming Lower Bound Any unseen tuple t must have E [ X ] ≤ τ We’ve seen how to derive a lower bound r − λ on the global rank for any unseen tuple t using Markov’s Inequality We want to find as tight a r − λ as possible by finding the small- est possible r − ( t, D i ) ’s at each site 23-3

  67. Linear Programming Lower Bound Any unseen tuple t must have E [ X ] ≤ τ We’ve seen how to derive a lower bound r − λ on the global rank for any unseen tuple t using Markov’s Inequality We want to find as tight a r − λ as possible by finding the small- est possible r − ( t, D i ) ’s at each site We can use Linear Programming in order to derive the r − ( t, D i ) at each site to find a tight r − λ 23-4

  68. Linear Programming The idea is to construct the best possible X for an unseen tuple t at each site s i that obtains the smallest possible local rank for each s i 24-1

  69. Linear Programming The idea is to construct the best possible X for an unseen tuple t at each site s i that obtains the smallest possible local rank for each s i X could take on arbitrary v ℓ ’s as it’s possible score values, some of which do not exist in value universe U i at a site s i 24-2

  70. Linear Programming The idea is to construct the best possible X for an unseen tuple t at each site s i that obtains the smallest possible local rank for each s i X could take on arbitrary v ℓ ’s as it’s possible score values, some of which do not exist in value universe U i at a site s i We can show this problem is irrelevant after studying the se- mantics of the r ( t, D i ) ’s and the q ( v ) ’s 24-3

  71. Linear Programming: a Note on q ( v ) ’s Recall that r ( t i,j , D y ) = � b i,j ℓ =1 p i,j,l q y ( v i,j,l ) and q ( v ) is essentially a stair case curve as above 25-1

  72. Linear Programming X may take a value v ℓ not in U i with v 2 as its nearest left neighbor 25-2

  73. Linear Programming Even if X takes a value v ℓ not in U i we can decrease v ℓ until we hit v 2 in U i and E [ X ] ≤ τ clearly still holds as we are only decreasing the value of one of the choices in X 25-3

  74. Linear Programming Also note that during this transformation q ( v ℓ ) = q ( v 2 ) and so the local rank of t remains the same 25-4

  75. Linear Programming Formulation Now we can assume X draws values from U i 26-1

  76. Linear Programming Formulation Now we can assume X draws values from U i Then we can define a linear program with the constraints 0 ≤ p ℓ ≤ 1 ℓ = 1 , . . . , γ = | U i | p 1 + . . . + p γ = 1 p 1 v 1 + . . . + p γ v γ ≤ τ and minimize the local rank which is, r ( X, D i ) = � γ ℓ =1 p ℓ q i ( v ℓ ) 26-2

Recommend


More recommend