bayesian statistics with rankings
play

(Bayesian) Statistics with Rankings Marina Meil a University of - PowerPoint PPT Presentation

(Bayesian) Statistics with Rankings Marina Meil a University of Washington www.stat.washington.edu/mmp with Alnur Ali, Harr Chen, Bhushan Mandhani, Le Bao, Kapil Phadnis, Artur Patterson, Brendan Murphy, Jeff Bilmes Permutations (rankings)


  1. (Bayesian) Statistics with Rankings Marina Meil˘ a University of Washington www.stat.washington.edu/mmp with Alnur Ali, Harr Chen, Bhushan Mandhani, Le Bao, Kapil Phadnis, Artur Patterson, Brendan Murphy, Jeff Bilmes

  2. Permutations (rankings) data represents preferences Burger preferences Elections Ireland, n = 5 , N = 1100 n = 6 , N = 600 Roch Scal McAl Bano Nall Scal McAl Nall Bano Roch med-rare med rare ... Roch McAl done med-done med ... med-rare rare med ... College programs n = 533 , N = 53737 , t = 10 DC116 DC114 DC111 DC148 DB512 DN021 LM054 WD048 LM020 LM050 WD028 DN008 TR071 DN012 DN052 FT491 FT353 FT471 FT541 FT402 FT404 TR004 FT351 FT110 FT352 Ranking data discrete many valued combinatorial structure

  3. The Consensus Ranking problem Given a set of rankings { π 1 , π 2 , . . . π N } ⊂ S n find the consensus ranking (or central ranking) π 0 that best agrees with the data Elections Ireland, n = 5 , N = 1100 Roch Scal McAl Bano Nall Scal McAl Nall Bano Roch Roch McAl Consensus = [ Roch Scal McAl Bano Nall ] ?

  4. The Consensus Ranking problem Problem (also called Preference Aggregation, Kemeny Ranking) Given a set of rankings { π 1 , π 2 , . . . π N } ⊂ S n find the consensus ranking (or central ranking) π 0 such that N � π 0 = argmin d ( π i , π 0 ) S n i =1 for d = inversion distance / Kendall τ -distance / “bubble sort” distance

  5. The Consensus Ranking problem Problem (also called Preference Aggregation, Kemeny Ranking) Given a set of rankings { π 1 , π 2 , . . . π N } ⊂ S n find the consensus ranking (or central ranking) π 0 such that N � π 0 = argmin d ( π i , π 0 ) S n i =1 for d = inversion distance / Kendall τ -distance / “bubble sort” distance Relevance voting in elections (APA, Ireland, Cambridge), panels of experts (admissions, hiring, grant funding) aggregating user preferences (economics, marketing) subproblem of other problems (building a good search engine: leaning to rank [Cohen, Schapire,Singer 99]) Equivalent to finding the “mean” or “median” of a set of points

  6. The Consensus Ranking problem Problem (also called Preference Aggregation, Kemeny Ranking) Given a set of rankings { π 1 , π 2 , . . . π N } ⊂ S n find the consensus ranking (or central ranking) π 0 such that N � π 0 = argmin d ( π i , π 0 ) S n i =1 for d = inversion distance / Kendall τ -distance / “bubble sort” distance Relevance voting in elections (APA, Ireland, Cambridge), panels of experts (admissions, hiring, grant funding) aggregating user preferences (economics, marketing) subproblem of other problems (building a good search engine: leaning to rank [Cohen, Schapire,Singer 99]) Equivalent to finding the “mean” or “median” of a set of points Fact: Consensus ranking for the inversion distance is NP hard

  7. Consensus ranking problem N � π 0 = argmin d ( π i , π 0 ) S n i =1 This talk Will generalize the problem from finding π 0 to estimating statistical model Will generalize the data From complete, finite permutations to top-t rankings, countably many items ( n → ∞ ). . .

  8. Outline Statistical models for permutations and the dependence of ranks 1 Codes, inversion distance and the precedence matrix 2 Mallows models over permutations 3 Maximum Likelihood estimation 4 The Likelihood A Branch and Bound Algorithm Related work, experimental comparisons Mallows and GM and other statistical models Top-t rankings and infinite permutations 5 Statistical results 6 Bayesian Estimation, conjugate prior, Dirichlet process mixtures Conclusions 7

  9. Some notation Base set { a , b , c , d } contains n items (or alternatives) E.g { rare, med-rare, med, med-done, . . . } S n = the symmetric group = the set of all permutations over n items π = [ c a b d ] ∈ S n a permutation/ranking π = [ c a ] a top-t ranking (is a partial order) t = | π | ≤ n the length of π We observe data π 1 , π 2 , . . . , π N ∼ sampled independently from distribution P over S n (where P is unknown)

  10. Representations for permutations reference permutation id = [ a b c d ] π = [ c a b d ] ranked list (2 3 1) cycle representation a b c d function on { a , b , c , d } [ 2 3 1 4 ] 0 1 0 0 0 0 1 0 Π = permutation matrix 1 0 0 0 0 0 0 1 − 1 0 1 0 − 0 1 Q = precedence matrix , Q ij = 1 if i ≺ π j , 1 1 − 1 0 0 0 − ( V 1 , V 2 , V 3 ) = (1 , 1 , 0) code ( s 1 , s 2 , s 3 ) = (2 , 0 , 0)

  11. Representations for permutations reference permutation id = [ a b c d ] π = [ c a b d ] ranked list (2 3 1) cycle representation a b c d function on { a , b , c , d } [ 2 3 1 4 ] 0 0 1 0 1 0 0 0 Π = permutation matrix 0 1 0 0 0 0 0 1 − 1 0 1 0 − 0 1 Q = precedence matrix , Q ij = 1 if i ≺ π j 1 1 − 1 0 0 0 − ( V 1 , V 2 , V 3 ) = (1 , 1 , 0) code ( s 1 , s 2 , s 3 ) = (2 , 0 , 0)

  12. Thurstone: Ranking by utility The Thurstone Model item j has expected utility µ j sample u j = µ j + ǫ j , j = 1 : n (independently or not) u j is the actual utility of item j sort ( u j ) j =1: n to obtain a π

  13. Thurstone: Ranking by utility The Thurstone Model item j has expected utility µ j sample u j = µ j + ǫ j , j = 1 : n (independently or not) u j is the actual utility of item j sort ( u j ) j =1: n to obtain a π rich model class typically ǫ j ∼ Normal (0 , σ 2 j ) parameters interpretable some simple probability calculations are intractable P [ a ≺ b ]] tractable, P [ i in first place ] tractable P [ i in 85th place ] intractable each rank of π depends on all the ǫ j

  14. Plackett-Luce: Ranking as drawing without replacement The Plackett-Luce model item j has weight w j > 0 w a w b P ([ a , b , . . . ]) ∝ i ′ w i ′ − w a . . . P P i ′ w i ′ items are drawn “without replacement” from distribution ( w 1 , w 2 . . . w n ) (Markov chain) normalization constant Z generally not known distribution of first ranks approximately independent item at rank j depends on all previous ranks

  15. Bradley-Terry: penalizing inversions The Bradley-Terry model   � P ( π ) ∝ exp  − α ij Q ij ( π )  i < j exponential family model one parameter for every pair ) i , j ) α ij is penalty for inverting i with j only qualitative interpretation normalization constant Z generally not known transitivity i ≺ j , j ≺ k = ⇒ i ≺ k therefore the sufficient statistics Q ij are dependent

  16. Bradley-Terry: penalizing inversions The Bradley-Terry model   � P ( π ) ∝ exp  − α ij Q ij ( π )  i < j exponential family model one parameter for every pair ) i , j ) α ij is penalty for inverting i with j only qualitative interpretation normalization constant Z generally not known transitivity i ≺ j , j ≺ k = ⇒ i ≺ k therefore the sufficient statistics Q ij are dependent Mallows models are a subclass of Bradley-Terry models do not suffer from this dependence coming next. . .

  17. Outline Statistical models for permutations and the dependence of ranks 1 Codes, inversion distance and the precedence matrix 2 Mallows models over permutations 3 Maximum Likelihood estimation 4 The Likelihood A Branch and Bound Algorithm Related work, experimental comparisons Mallows and GM and other statistical models Top-t rankings and infinite permutations 5 Statistical results 6 Bayesian Estimation, conjugate prior, Dirichlet process mixtures Conclusions 7

  18. The precedence matrix Q π = [ c a b d ] a b c d − 1 0 1 a Q ( π ) = 0 − 0 1 b 1 1 − 1 c 0 0 0 − d Q ij ( π ) = 1 iff i before j in π Q ij = 1 − Q ji reference permutation id = [ a b c d ] : determines the order of rows, columns in Q

  19. The number of inversions and Q π = [ c a b d ] a b c d − 1 0 1 a Q ( π ) = 0 − 0 1 b 1 1 − 1 c 0 0 0 − d define L ( Q ) = � i > j Q ij = sum( lower triangle ( Q ))

  20. The number of inversions and Q π = [ c a b d ] a b c d − 1 0 1 a Q ( π ) = 0 − 0 1 b 1 1 − 1 c 0 0 0 − d define L ( Q ) = � i > j Q ij = sum( lower triangle ( Q )) then #inversions( π ) = L ( Q ) = d ( π, id )

  21. The inversion distance and Q π = [ c a b d ] , Refence permutation Reference permutation id = [ a b c d ] π 0 = [ b a d c ] Π T Q ( π ) 0 Q ( π )Π 0 a b c d b a d c − 1 0 1 a − 0 1 0 b 0 − 0 1 b 1 − 1 0 a 1 1 − 1 c 0 0 − 0 d 0 0 0 − d 1 1 1 − c d ( π, id ) = 2 d ( π, π 0 ) = 4

  22. The inversion distance and Q To obtain d ( π, π 0 ) Construct Q ( π ) 1 Sort rows and columns by π 0 2 Sum elements in lower triangle 3

  23. The inversion distance and Q To obtain d ( π, π 0 ) π = [ c a b d ] , π 0 = [ b a d c ] Construct Q ( π ) 1 b a d c Sort rows and columns by π 0 2 − 0 1 0 b Sum elements in lower triangle 1 − 1 0 a 3 0 0 − 0 d Note also that 1 1 1 − c To obtain d ( π 1 , π 0 ) + d ( π 2 , π 0 ) + . . . d ( π, π 0 ) = 4 Construct Q ( π 1 ) , Q ( π 2 ) , . . . 1 Sum 2 Q = Q ( π 1 ) + Q ( π 2 ) + . . . Sort rows and columns of Q 3 by π 0 Sum elements in lower 4 triangle of Q

  24. A decomposition for the inversion distance d ( π, π 0 ) = # inversions between π and π 0 d ([ c a b d ] , [ b a d c ]) = # (inversions w.r.t b ) � �� � V 1 + # (inversions w.r.t a ) � �� � V 2 + # ( inversions w.r.t d ) � �� � V 3 + . . . V j = # inversions where π 0 ( j ) is disfavored

Recommend


More recommend