DA2PL 2016 Paderborn, Nov 7, 2016 Statistical inference for incomplete Inés Couso ranking data: A comparison of two Mohsen Ahmadi likelihood-based estimators Eyke Hüllermeier
Statement of the problem ❖ Sample of N individuals. ❖ K alternatives. ❖ Every individual provides a pairwise comparisons. ❖ Goal: estimate the most popular complete ranking over the K alternatives. ❖ Intermediate goal: estimate the probability distribution over the collection of K! possible rankings.
Notation • Alternatives: a 1 , . . . , a K . • Complete ranking: π : { 1 , . . . , K } ! { 1 , . . . , K } • π ( i ) = position of a i in the ranking. • Observable incomplete rankings: a i � a j . • E ( a i � a j ) = { π : π ( i ) < π ( j ) } . • Dataset: ( τ 1 , . . . , τ N ), sequence of i.i.d. observations (pairwise rankings).
A stochastic model of coarsening • p θ ( π ) = probability of appearance of π (complete ranking). • p λ ( τ | π ) = probability of observing τ , provided the true ranking is π . • p θ , λ ( π , τ ): joint mass function.
Two likelihood-based approaches N Y L V ( ✓ , � ) = P ( ~ ⌧ | ✓ , � ) = P ( Y = ⌧ i | ✓ , � ) i =1 N Y X = p θ ( ⇡ ) p λ ( ⌧ i | ⇡ ) . i =1 π ∈ S K N Y L F ( ✓ , � ) = P ( X ∈ E ( ⌧ i ) | ✓ , � ) i =1 N Y X = p θ ( ⇡ ) . i =1 π ∈ E ( τ i )
Plackett-Luce parametric family • Parameter θ = ( θ 1 , . . . , θ K ) . θ π − 1( i ) • PL θ ( { π } ) = Q K θ π − 1( i ) + θ π − 1( i +1) + ... + θ π − 1( K ) . i =1 • π ∗ = arg max π ∈ S K PL θ ( π ) = arg sort k ∈ [K] { θ 1 , . . . , θ K } . (If, for instance, θ 1 > . . . > θ K then π ∗ = [1 . . . k ] . ) θ i • PL θ ( E ( a i � a j )) = θ i + θ j .
Known coarsening: example a 1 a 2 a 2 a 1 a 1 a 3 a 3 a 1 a 2 a 3 a 3 a 2 0 0 0 0 1 0 a 1 a 2 a 3 0 0 0 0 0 1 a 1 a 3 a 2 0 1 0 0 0 0 a 2 a 1 a 3 0 1 0 0 0 0 a 2 a 3 a 1 0 0 0 1 0 0 a 3 a 1 a 2 0 1 0 0 0 0 a 3 a 2 a 1 ⌘ n ij ⇣ ⌧ ; ✓ ) = Q 3 θ i Q L F ( ~ . i =1 j 6 = i θ i + θ j ✓ = (0 . 99 , 0 . 0 . 5 , 0 . 05) ˆ ✓ = arg max L F ( ~ ⌧ ; ✓ ) − → (0 , 0 . 5 , 0 . 5) .
Unknown coarsening a 1 a 2 a 2 a 1 a 1 a 3 a 3 a 1 a 2 a 3 a 3 a 2 λ π 1 λ π 1 λ π 1 π 1 = a 1 a 2 a 3 0 0 0 1 , 2 1 , 3 2 , 3 λ π 2 λ π 2 λ π 2 π 2 = a 1 a 3 a 2 0 0 0 1 , 3 1 , 2 2 , 3 λ π 3 λ π 3 λ π 3 π 3 = a 2 a 1 a 3 0 0 0 1 , 2 2 , 3 1 , 3 λ π 4 λ π 4 λ π 4 π 4 = a 2 a 3 a 1 0 0 0 1 , 3 2 , 3 1 , 2 λ π 5 λ π 5 λ π 5 π 5 = a 3 a 1 a 2 0 0 0 2 , 3 1 , 2 1 , 3 λ π 6 λ π 6 λ π 6 π 6 = a 3 a 2 a 1 0 0 0 2 , 3 1 , 3 1 , 2 ✓ ◆ K Number of unknowns: 2 K ! . 2
Rank-dependent coarsening assumption a 1 a 2 a 2 a 1 a 1 a 3 a 3 a 1 a 2 a 3 a 3 a 2 λ 1 , 2 λ 1 , 3 λ 2 , 3 0 0 0 a 1 a 2 a 3 λ 1 , 3 λ 1 , 2 λ 2 , 3 0 0 0 a 1 a 3 a 2 λ 1 , 2 λ 2 , 3 λ 1 , 3 0 0 0 a 2 a 1 a 3 λ 1 , 3 λ 2 , 3 λ 1 , 2 0 0 0 a 2 a 3 a 1 λ 2 , 3 λ 1 , 2 λ 1 , 3 0 0 0 a 3 a 1 a 2 λ 2 , 3 λ 1 , 3 λ 1 , 2 0 0 0 a 3 a 2 a 1 K ( K − 1) Number of unknowns: − 1 . 2 The FLM is able to estimate the mode of the PL model. (a.s. and provided N is su ffi ciently large).
Three experimental settings 0.5 0.35 ❖ The MLM (solid), the FLM 0.45 0.3 0.4 0.25 0.35 (dashed) and the TLM (dotted) 0.3 0.2 0.25 0.15 0.2 are compared. 0.15 0.1 0.1 0.05 0.05 0 0 ❖ Three different coarsening 0 500 1000 1500 2000 0 500 1000 1500 2000 processes are considered. 0.4 0.25 0.35 0.2 0.3 0.25 ❖ The same PL parameter is taken 0.15 0.2 0.1 0.15 in the three cases. 0.1 0.05 0.05 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 ❖ Left column: Euclidean distance 0.45 0.35 0.4 true par. - par. estimate. 0.3 0.35 0.25 0.3 0.2 0.25 ❖ Right column: Kendall distance 0.2 0.15 0.15 0.1 0.1 true mode - predicted ranking. 0.05 0.05 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000
Case 1: uniform selection of pairwise comparison 0.5 0.35 0.45 0.3 0.4 0.25 0.35 0.3 0.2 0.25 0.15 0.2 0.15 0.1 0.1 0.05 0.05 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 set λ 1 , 2 = . . . = λ 3 , 4 = 1 / 6 . selected uniformly at random. In
Case 2: top 2 case 0.4 0.25 0.35 0.2 0.3 0.25 0.15 0.2 0.1 0.15 0.1 0.05 0.05 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 xperiment, λ 1 , 2 = 1 corresponds to the top-2
Case 3: rank proportional selection 0.45 0.35 0.4 0.3 0.35 0.25 0.3 0.2 0.25 0.2 0.15 0.15 0.1 0.1 0.05 0.05 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 ranks: λ i,j ∝ (8 − i − j ) . with a higher probability than
Conclusion ❖ MLM is theoretically the best one, but it involves to many parameters. ❖ FML is simpler, but it ignores the coarsening process. It may lead to biased estimations. ❖ Biased estimations of the parameter do not imply non- accurate predictions of the most popular ranking. ❖ Future directions: search for computational acceptable methods with a good performance.
Recommend
More recommend