Statistical Inference for Incomplete Ranking Data: The Case of Rank-Dependent Coarsening Mohsen Ahmadi Fahandar 1 ullermeier 1 es Couso 2 Eyke H¨ In´ 1 Intelligent Systems Group, Paderborn University, Germany 2 Department of Statistics, University of Oviedo, Spain ICML 2017 Tuesday, August 8th
Contributions Considering statistical inference for incomplete ranking data, we: Propose a specific type of data-generating process, in which incompleteness is due to ”coarsening” of (latent) complete rankings. Introduce the concept of ”rank-dependent” coarsening. Under our proposed setting: We study the problem of rank aggregation and the performance of various rank aggregation methods, both theoretically and practically. 1 / 22
Rank Aggregation Given rankings over set of items (e.g., K = 5 ): (observations) a 4 ≻ a 5 ≻ a 3 ≻ a 2 ≻ a 1 a 5 ≻ a 2 ≻ a 1 ≻ a 3 ≻ a 4 a 3 ≻ a 1 ≻ a 5 ≻ a 4 ≻ a 2 . . . a 1 ≻ a 2 ≻ a 4 ≻ a 3 ≻ a 5 a ? ≻ a ? ≻ a ? ≻ a ? ≻ a ? Combine rankings into a (single) consensus ranking. 2 / 22
Ranking Distributions Plackett-Luce (PL) model The probability assigned to ranking π given parameter vector θ = ( θ 1 , θ 2 , . . . , θ K ) ∈ R K + : K θ π ( i ) � P θ ( π ) = θ π ( i ) + θ π ( i +1) + . . . + θ π ( K ) i =1 - The mode of the PL distribution (i.e., π ∗ ) is the natural consensus in this case. θ a 2 θ a 1 + θ a 3 )( θ a 3 θ a 1 For example, P θ ( a 2 ≻ a 1 ≻ a 3 ) = ( θ a 1 + θ a 2 + θ a 3 )( θ a 3 ) Bradley-Terry-Luce (BTL) model θ a 1 P θ ( a 1 ≻ a 2 ) = θ a 1 + θ a 2 3 / 22
Incomplete Rankings In most applications, the observed rankings are incomplete (e.g., K = 5 ) (observations) a 4 ≻ a 5 ≻ a 3 ≻ a 2 ≻ a 1 a 2 ≻ a 1 ≻ a 3 ≻ a 4 a 3 ≻ a 1 . . . a 1 ≻ a 4 ≻ a 5 a ? ≻ a ? ≻ a ? ≻ a ? ≻ a ? Rank aggregation for incomplete rankings is more challenging! 4 / 22
From Complete to Incomplete Ranking generation coarsening incomplete ranking full ranking ranking model P θ ( π ) P λ ( τ | π ) Where does the word ”coarsening” come from? 5 / 22
A Stochastic Model for Incomplete Rankings A collection of rankings S K : P θ,λ ( τ, π ) = P θ ( π ) · P λ ( τ | π ) Generation of full rankings: P θ : S K → [0 , 1] , Coarsening process: P λ ( . | π ) : π ∈ S K , λ ∈ Λ . 6 / 22
Modeling of the Coarsening non-parametric (i.e., non-parametric (i.e., model and estimate P λ model and estimate P λ with no assumptions) with no assumptions) Estimate P λ parametric (i.e., take P λ parametric (i.e., take P λ from parametric family) from parametric family) Full Model ( P θ + P λ ) ignore coarsening but ignore coarsening but make assumptions make assumptions (e.g., rank-dependent) (e.g., rank-dependent) Not Estimate P λ ignore coarseing and ignore coarseing and make no assumption make no assumption 7 / 22
The Underlying Assumption Standard marginalization observed ranking set of items random subset a 4 ≻ a 3 { a 1 , a 2 , a 3 , a 4 } { a 4 , a 3 } a 4 ≻ a 1 ≻ a 3 ≻ a 2 full ranking What we propose: A coarsening that acts only on ”ranks” (positions) not items: P : 2 [ K ] → [0 , 1] observed ranking set of ranks random subset { 1 , 2 , 3 , 4 } { 2 , 4 } a 1 ≻ a 2 a 4 ≻ a 1 ≻ a 3 ≻ a 2 full ranking 8 / 22
Specific Instantiation generation coarsening ranking incomplete full ranking ranking model P θ ( π ) P λ ( τ | π ) pairwise observations Plackett-Luce model 9 / 22
Data Generating Process Rank-dependence in case of Pairwise Comparisons The entire distribution P λ is specified by the set of K ( K − 1) / 2 probabilities: � � � λ u,v | 1 ≤ u < v ≤ K, λ u,v ≥ 0 , λ u,v = 1 1 ≤ u<v ≤ K The probability to observe i better than j : q ′ � i,j = P θ ( π ) λ π ( i ) ,π ( j ) π ∈ E ( a i ≻ a j ) E ( a i ≻ a j ) is the set of all rankings consistent with a i ≻ a j . 10 / 22
Data Generating Process Generated rankings based on PL Coarsening λ 1 , 3 = 1 (a degenerate probability distribution) Observations D a 4 ≻ a 5 ≻ a 3 ≻ a 2 ≻ a 1 a 4 ≻ a 3 a 5 ≻ a 1 a 5 ≻ a 2 ≻ a 1 ≻ a 3 ≻ a 4 a 1 ≻ a 3 ≻ a 2 ≻ a 4 ≻ a 5 a 1 ≻ a 2 . . . . . . a 1 ≻ a 2 ≻ a 4 ≻ a 3 ≻ a 5 a 1 ≻ a 4 11 / 22
Introduced Bias Let θ = (14 , 5 , 1) and coarsening be degenerate: λ 1 , 2 = 1 (i.e., top-2): − 0 . 737 0 . 933 θ i ( p i,j = ) marginal matrix ≈ 0 . 263 − 0 . 833 θ i + θ j 0 . 067 0 . 167 − − 0 . 714 0 . 76 q ′ i,j ( q i,j = ) matrix ≈ 0 . 286 − 0 . 559 q ′ i,j + q ′ j,i 0 . 24 0 . 441 − 12 / 22
Definitions Comparison matrix C ( c i,j : number of wins a i over a j ): a 1 a 2 a 3 a 4 a 1 0 6 4 1 a 2 7 0 5 8 C = a 3 3 4 0 9 a 4 2 1 12 0 Probability matrix ˆ P (relative wins): a 1 a 2 a 3 a 4 a 1 0 0 . 46 0 . 57 0 . 33 a 2 0 . 54 0 0 . 56 0 . 89 ˆ P = a 3 0 . 43 0 . 44 0 0 . 43 a 4 0 . 67 0 . 11 0 . 57 0 where c i,j p i,j = ˆ c i,j + c j,i 13 / 22
Rank Estimation Framework observations D ( K = 4) a 4 ≻ a 3 esimated ranking a 2 ≻ a 1 aggregate estimate Matrix C π : a 2 ≻ a 4 ≻ a 1 ≻ a 3 ˆ = = = = ⇒ = = = = = ⇒ a 1 ≻ a 2 . . . a 1 ≻ a 4 14 / 22
Rank Aggregation Methods Statistical Estimation BTL, BTL(R) (Bradley & Terry, 1952) Least Squares/HodgeRank (LS) (Jiang et al., 2011) Voting Methods Borda (Borda, 1781) Copeland (CP) (Copeland, 1951) Spectral Methods Rank Centrality (RC) (Negahban et al., 2012) MC2, MC3 (Dwork et al., 2001) Graph-based Methods FAS, FAS(R), FAS(B) (Saab, 2001; Fomin et al., 2010) Pairwise Coupling HT (Hastie & Tibshirani, 1998) Price (Price et al., 1994) WU1, WU2 (Wu et al., 2004) 15 / 22
Research Questions Practical performance : How close is the prediction ˆ π to the ground truth ranking π ∗ ? Consistency Consistency Let ˆ π N denote the ranking produced as a prediction by a ranking method on the basis of N observed (pairwise) preferences. The method is consistent if π N = π ∗ ) → 1 P (ˆ for N → ∞ . 16 / 22
BTL (Bradley-Terry-Luce) Given comparison matrix C : a 1 a 2 a 3 a 4 a 1 0 6 4 1 a 2 7 0 5 8 C = a 3 3 4 0 9 a 4 2 1 12 0 BTL estimates the parameters by likelihood maximization: � c i,j � θ i ˆ � θ ∈ arg max θ i + θ j θ ∈ R K 1 ≤ i � = j ≤ K ˆ θ ≈ (0 . 253 , 0 . 382 , 0 . 178 , 0 . 187) π : a 2 ≻ a 1 ≻ a 4 ≻ a 3 ˆ 17 / 22
Borda and Copeland (CP) Given probability matrix ˆ P : a 1 a 2 a 3 a 4 a 1 0 0 . 46 0 . 57 0 . 33 a 2 0 . 54 0 0 . 56 0 . 89 ˆ P = a 3 0 . 43 0 . 44 0 0 . 43 a 4 0 . 67 0 . 11 0 . 57 0 Borda assigns a score to each item: K � s i = p i,j ˆ s : (1 . 366 , 1 . 983 , 1 . 302 , 1 . 349) ⇒ ˆ π : a 2 ≻ a 1 ≻ a 4 ≻ a 3 i =1 Copeland (the number of pairwise victories): K p i,j > 1 � � � s i = I ˆ s : (1 , 3 , 0 , 2) ⇒ ˆ π : a 2 ≻ a 1 ≻ a 4 ≻ a 3 2 i =1 18 / 22
FAS (Feedback Arc Set) Given comparison matrix C : a 1 a 2 a 3 a 4 a 1 0 6 4 1 a 2 7 0 5 8 C = a 3 3 4 0 9 a 4 2 1 12 0 FAS seeks to find the ranking that causes the lowest sum of penalties: � ˆ π = arg min c j,i π ∈ S K ( i,j ): π ( i ) <π ( j ) π : a 2 ≻ a 4 ≻ a 1 ≻ a 3 ˆ 19 / 22
Recommend
More recommend