Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton University Joint work with Yuxin Chen, Jianqing Fan and Kaizheng Wang
Ranking A fundamental problem in a wide range of contexts • web search, recommendation systems, admissions, sports competitions, voting, ... PageRank figure credit: Dzenan Hamzic Top- K ranking 2/ 20
Rank aggregation from pairwise comparisons pairwise comparisons for ranking top tennis players figure credit: Boz´ oki, Csat´ o, Temesi Top- K ranking 3/ 20
Parametric models Assign latent preference score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ k i : rank Top- K ranking 4/ 20
Parametric models Assign latent preference score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ k i : rank • This work: Bradley-Terry-Luce model: for w ∗ ∈ R n + w ∗ j P { item j beats item i } = w ∗ i + w ∗ j Top- K ranking 4/ 20
Other parametric models • Thurstone model: for w ∗ ∈ R n � � w ∗ j − w ∗ P { item j beats item i } = Φ ���� i Gaussian cdf Top- K ranking 5/ 20
Other parametric models • Thurstone model: for w ∗ ∈ R n � � w ∗ j − w ∗ P { item j beats item i } = Φ ���� i Gaussian cdf • Parametric models: for nondecreasing f : R → [0 , 1] which obey f ( t ) = 1 − f ( − t ) , ∀ t ∈ R Then we set � � w ∗ j − w ∗ P { item j beats item i } = f i Top- K ranking 5/ 20
Typical ranking procedures Estimate latent scores − → rank items based on score estimates Top- K ranking 6/ 20
Top- K ranking Estimate latent scores − → rank items based on score estimates Goal: identify the set of top- K items with pairwise comparisons Top- K ranking 6/ 20
Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons w ∗ 1 , with prob. j y ( l ) ind. i + w ∗ w ∗ = 1 ≤ l ≤ L j i,j 0 , else Top- K ranking 7/ 20
Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons L � y i,j = 1 y ( l ) ( sufficient statistic ) i,j L l =1 Top- K ranking 7/ 20
Spectral method (Rank Centrality) Negahban, Oh, Shah ’12 • Construct a probability transition matrix P = [ P i,j ] 1 ≤ i,j ≤ n : 1 d y i,j , if ( i, j ) ∈ E , � 1 − 1 P i,j = if i = j, k :( i,k ) ∈E y i,k , d 0 , otherwise . • Return score estimate as leading left eigenvector of P Top- K ranking 8/ 20
Rationale behind spectral method → P ∗ = [ P ∗ L →∞ In large-sample case, P − − − − i,j ] 1 ≤ i,j ≤ n : w ∗ 1 j j , if ( i, j ) ∈ E , d w ∗ i + w ∗ � P ∗ w ∗ i,j = 1 − 1 if i = j, k k , k :( i,k ) ∈E i + w ∗ d w ∗ 0 , otherwise . Top- K ranking 9/ 20
Rationale behind spectral method → P ∗ = [ P ∗ L →∞ In large-sample case, P − − − − i,j ] 1 ≤ i,j ≤ n : w ∗ 1 j j , if ( i, j ) ∈ E , d w ∗ i + w ∗ � P ∗ w ∗ i,j = 1 − 1 if i = j, k k , k :( i,k ) ∈E i + w ∗ d w ∗ 0 , otherwise . • Stationary distribution of P ∗ : 1 π ∗ := [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] ⊤ � n i =1 w ∗ i Top- K ranking 9/ 20
Rationale behind spectral method → P ∗ = [ P ∗ L →∞ In large-sample case, P − − − − i,j ] 1 ≤ i,j ≤ n : w ∗ 1 j j , if ( i, j ) ∈ E , d w ∗ i + w ∗ � P ∗ w ∗ i,j = 1 − 1 if i = j, k k , k :( i,k ) ∈E i + w ∗ d w ∗ 0 , otherwise . • Stationary distribution of P ∗ : 1 π ∗ := [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] ⊤ � n i =1 w ∗ i • Check detailed balance! Top- K ranking 9/ 20
Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G Top- K ranking 10/ 20
Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G � � �� � θ i =log w i 1 + e θ i − θ j − − − − − − → L ( θ ) := − y j,i ( θ i − θ j ) + log ( i,j ) ∈G Top- K ranking 10/ 20
Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G � � �� � θ i =log w i 1 + e θ i − θ j − − − − − − → L ( θ ) := − y j,i ( θ i − θ j ) + log ( i,j ) ∈G L λ ( θ ) := L ( θ ) + 1 2 λ � θ � 2 ( Regularized MLE ) minimize θ 2 � np log n choose λ ≍ L Top- K ranking 10/ 20
Prior art mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 11/ 20
Prior art a “meta metric” mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 11/ 20
Small ℓ 2 loss � = high ranking accuracy Top- K ranking 12/ 20
Small ℓ 2 loss � = high ranking accuracy Top- K ranking 12/ 20
Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Top- K ranking 12/ 20
Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Need to control entrywise error! Top- K ranking 12/ 20
Optimality? Is spectral method or MLE alone optimal for top- K ranking? Top- K ranking 13/ 20
Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense Top- K ranking 13/ 20
Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime � �� � inc. sparse graphs Top- K ranking 13/ 20
Main result comparison graph G ( n, p ) ; sample size ≍ pn 2 L 4 5 3 6 2 7 1 8 12 9 11 10 Theorem 1 (Chen, Fan, Ma, Wang ’17) When p � log n n , both spectral method and regularized MLE achieve optimal sample complexity for top- K ranking! Top- K ranking 14/ 20
Main result le sample size separation: score separation K achievable by both methods infeasible 4 on: ∆ K : score separation w ∗ ( K ) − w ∗ ( K +1) • ∆ K := : score separation � w ∗ � ∞ Top- K ranking 14/ 20
Empirical top- K ranking accuracy 1 Spectral Method top- K ranking accuracy Regularized MLE 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 " K : score separation n = 200 , p = 0 . 25 , L = 20 Top- K ranking 15/ 20
Optimal control of entrywise error ∆ K z }| { w ∗ w ∗ 2 w ∗ w ∗ w ∗ 1 3 · · · K K +1 · · · true score ore score estimates w K w 1 w 2 w 3 w K +1 · · · · · · |{z} |{z} < 1 2 ∆ K < 1 2 ∆ K Theorem 2 Suppose p � log n and sample size � n log n K . Then with high prob., ∆ 2 n the estimates w returned by both methods obey (up to global scaling) � w − w ∗ � ∞ < 1 2∆ K � w ∗ � ∞ Top- K ranking 16/ 20
Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) … 1 2 3 𝑛 … 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y y = [ y i,j ] 1 ≤ i,j ≤ n Top- K ranking 17/ 20
Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w Top- K ranking 18/ 20
Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ � π − � π � π ∗ � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� � asymmetric Top- K ranking 18/ 20
Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ � π − � π � π ∗ � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� � asymmetric • MLE: local strong convexity �∇L λ ( θ ; � y ) � 2 � θ − � θ � 2 � strong convexity parameter Top- K ranking 18/ 20
Summary Linear-time Optimal sample computational complexity complexity ✔ ✔ Spectral method ✔ ✔ Regularized MLE Novel entrywise perturbation analysis for spectral method and convex optimization Paper : “ Spectral method and regularized MLE are both optimal for top- K ranking ”, Y. Chen, J. Fan, C. Ma, K. Wang, arxiv:1707.09971, 2017 Top- K ranking 19/ 20
Recommend
More recommend