fast and accurate inference of plackett luce models
play

Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , - PowerPoint PPT Presentation

Fast and Accurate Inference of PlackettLuce Models Lucas Maystre , Matthias Grossglauser LCA 4, EPFL Swiss Machine Learning Day November 10 th , 2015 1 Outline 1. Introduction to PlackettLuce models 2. Model inference : state of the


  1. Fast and Accurate Inference of Plackett–Luce Models Lucas Maystre , Matthias Grossglauser LCA 4, EPFL Swiss Machine Learning Day — November 10 th , 2015 1

  2. Outline 1. Introduction to Plackett–Luce models 2. Model inference : state of the art 3. Unifying ML and spectral algorithms 4. Experimental results 2

  3. Plackett–Luce family of models

  4. Modeling preferences universe of n items Goal: describe, explain & predict choices between alternatives Probabilistic approach 4

  5. Luce's choice axiom Assumption (Luce, 1959.) The odds of choosing item i over item j are independent of the rest of the alternatives. sets of alternatives (contain i and j ) p ( j | A ) = p ( i | B ) p ( i | A ) alternatives p ( j | B ) . a.k.a. “independence of irrelevant alternatives” 5

  6. Consequence of axiom To each item i = 1, ..., n we can assign a number π i ∈ R >0 such that π i p ( i | { 1 , . . . , k } ) = π 1 + · · · + π k π i = strength (or utility , or score ) of item i 6

  7. Bradley–Terry model [Zermelo, 1928; Bradley & Terry, 1952; Ford, 1957] Variant of the model for pairwise comparisons π i p ( i � j ) = π i + π j 7

  8. Plackett–Luce model [Luce, 1959; Plackett 1975] Variant of the model for (partial or full) rankings p ( i � j � k ) = p ( i | { i, j, k } ) · p ( j | { j, k } ) π i π j = · π i + π j + π k π j + π k 8

  9. Rao–Kupper model [Rao & Kupper, 1967] Variant of the model for pairwise comparisons with ties π i p ( i � j ) = π i + απ j ( α 2 − 1) π i π j p ( i ≡ j ) = ( π i + απ j )( π j + απ i ) 9

  10. RUM perspective New parameterization : θ i = log( π i ) X i ⇠ Gumbel( θ i , 1) X i � X j ⇠ Logistic( θ i � θ j , 1) p ( i � j ) = P ( X i � X j > 0) 1 = 1 + e − ( θ i − θ j ) − 4 − 2 0 2 4 6 8 θ i θ j 10

  11. Identifying parameters π i p ( i | { 1 , . . . , k } ) = Defined up to π 1 + · · · + π k multiplicative term We use the following convention: X X θ i = 0 π i = 1 i i 11

  12. Beyond preferences NASCAR rankings GIFGIF experiment (comparative judgment) Chess games 12

  13. Model inference

  14. Maximum-likelihood For conciseness, we consider pairwise comparisons Data in the form of counts : a ji = # times i beat j ◆ a ji ✓ π i Y Y L ( π ) = π i + π j i j 6 = i X X log L ( π ) = a ji (log π i − log( π i + π j )) i j 6 = i Can lead to problems if = 0 Assumption. In every partition of the n items into two subsets A and B , some i ∈ A beats some j ∈ B . 14

  15. Rank Centrality [Negahban et al. 2012] Completely di ff erent take on parameter inference 4 3 1. Items are states of a Markov chain 2. Going from i to j more likely if j 5 6 o fu en won against i 3. Stationary distribution defines the scores 7 8 ( if i 6 = j , ε a ij P ij = 1 � ε P if i = j . k 6 = i a ik 9 1 0 2 15

  16. GMM estimators [Azari Soufiani et al. 2013, 2014] a � b � c � d Generalization of Rank Centrality to rankings � � 1. Breaks the rankings into m choose 2 pairwise comparisons � a � b � a � d 2. Constructs a Markov chain, finds the stationary distribution � a � c b � c � b � d � The resulting estimator is asymptotically consistent c � d 16

  17. Unifying ML inference and spectral algorithms

  18. MLE as stationary distribution X X log L ( π ) = a ji (log π i − log( π i + π j )) i j 6 = i ✓ ◆ 1 1 ∂ X log L ( π ) = − ( a ji + a ij ) a ji π i + π j ∂π i π i j 6 = i ✓ ◆ = 1 π j π i X a ji − a ij π i + π j π i + π j π i j 6 = i a ji a ij incoming X X ∀ i π j = Global balance equations of π i flow π i + π j π i + π j outgoing Markov chain on the states j 6 = i j 6 = i flow transition rates 18

  19. Corresponding MC a ij 8 if i 6 = j , ε k k > π i + π j > < k P ij = a ik X 1 � ε if i = j . k k > π i + π j > : k 6 = i We can iteratively adjust π ! • Stationary distribution is ML • (k +1 ) -th iterate is stationary estimate i ff π = ˆ π distribution of P k • If π i = 1/ n for all i , we recover Rank Centrality • Unique fixed point of iteration is the ML estimate 19

  20. Generalization The same Markov chain formulation applies to other models in the same family! For choices among many alternatives 1 8 X if i 6 = j , ε > P > k 2 A π k > < A 2 D i � j P ij = X 1 � if i = j . P ik > > > : k 6 = i Spectral formulation for ranking data, comparisons with ties , etc... 20

  21. Algorithms Algorithm 1 Luce Spectral Ranking Algorithm 2 Iterative Luce Spectral Ranking Require: observations D Require: observations D 1: � 0 n ⇥ n 1: π [1 /n, . . . , 1 /n ] | 2: for ( i, A ) 2 D do 2: repeat for j 2 A \ { i } do � 0 n ⇥ n 3: 3: � ji � ji + n/ | A | for ( i, A ) 2 D do 4: 4: for j 2 A \ { i } do end for 5: 5: � ji � ji + 1 / P t 2 A ⇡ t 6: end for 6: 7: ¯ π stat. dist. of Markov chain � end for 7: 8: return ¯ end for 8: π π stat. dist. of Markov chain � 9: 10: until convergence What is the statistical What is the e ff iciency of the spectral computational e ff iciency estimate? of the ML algorithm? 21

  22. Experimental results

  23. Statistical e ff iciency Which inference method works best? a � b � c � d � e � f � g � h 0 . 4 k =8 lower bound c � a � b � e � f � d � h � g ML-F GMM-F ML a � d � e � h RMSE LSR b � c � f � g 0 . 2 k =4 c � e � f � g a � b � d � h a � d e � h 0 . 1 b � g c � f k =2 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 k c � e f � g Take-away : careful derivation of MC leads to better estimator a � d b � h 23

  24. Computational e ff iciency Table 2: Performance of iterative ML inference algorithms. I-LSR MM Newton Dataset I T [s] I T [s] I T [s] γ D NASCAR 0 . 832 3 0 . 08 4 0 . 10 — — Sushi 0 . 890 2 0 . 42 4 1 . 09 3 10 . 45 0 . 002 22 443 . 88 YouTube 12 414 . 44 8 680 — — GIFGIF 0 . 408 10 22 . 31 119 109 . 62 5 72 . 38 0 . 007 55 . 61 49 . 37 Chess 15 43 . 69 181 3 • I-LSR is competitive / faster than the state of the art • MM seems to converge very slowly in certain cases 24

  25. I-LSR and MM mixing 10 0 10 − 2 10 − 4 RMSE 10 − 6 well mixing MM, k = 10 10 − 8 MM, k = 2 10 − 10 I-LSR, k = 10 I-LSR, k = 2 10 − 12 1 2 3 4 5 6 7 8 9 10 iteration Take-away : I-LSR seems to be robust to slow-mixing chains poorly mixing 25

  26. Conclusions • Variety of models derived from Luce's choice axiom • Can interpret maximum-likelihood estimate as stationary distribution of Markov chain • Gives rise to fast and e ff icient spectral inference algorithm • Gives rise to new iterative algorithm for maximum-likelihood inference Paper & code available at: lucas.maystre.ch/nips15 26

Recommend


More recommend