spectral method and regularized mle are both optimal for
play

Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts


  1. Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang

  2. Ranking A fundamental problem in a wide range of contexts • web search, recommendation systems, admissions, sports competitions, voting, ... PageRank figure credit: Dzenan Hamzic Top- K ranking 2/ 21

  3. Rank aggregation from pairwise comparisons pairwise comparisons for ranking top tennis players figure credit: Boz´ oki, Csat´ o, Temesi Top- K ranking 3/ 21

  4. Parametric models Assign latent score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ i : rank k Top- K ranking 4/ 21

  5. Parametric models Assign latent score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ i : rank k • This work: Bradley-Terry-Luce (logistic) model w ∗ j P { item j beats item i } = w ∗ i + w ∗ j • Other models: Thurstone model, low-rank model, ... Top- K ranking 4/ 21

  6. Typical ranking procedures Estimate latent scores − → rank items based on score estimates Top- K ranking 5/ 21

  7. Top- K ranking Estimate latent scores − → rank items based on score estimates Goal: identify the set of top- K items under minimal sample size Top- K ranking 5/ 21

  8. Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons  w ∗  1 , with prob. j y ( l ) ind. i + w ∗ w ∗ = 1 ≤ l ≤ L j i,j  0 , else Top- K ranking 6/ 21

  9. Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons L � y i,j = 1 y ( l ) ( sufficient statistic ) i,j L l =1 Top- K ranking 6/ 21

  10. Prior art mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 7/ 21

  11. Prior art a “meta metric” mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 7/ 21

  12. Small ℓ 2 loss � = high ranking accuracy Top- K ranking 8/ 21

  13. Small ℓ 2 loss � = high ranking accuracy Top- K ranking 8/ 21

  14. Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Top- K ranking 8/ 21

  15. Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Need to control entrywise error! Top- K ranking 8/ 21

  16. Optimality? Is spectral method or MLE alone optimal for top- K ranking? Top- K ranking 9/ 21

  17. Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense Top- K ranking 9/ 21

  18. Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime � �� � inc. sparse graphs Top- K ranking 9/ 21

  19. Spectral method (Rank Centrality) Negahban, Oh, Shah ’12 • Construct a probability transition matrix P , whose off-diagonal entries obey � y i,j , if ( i, j ) ∈ G P i,j ∝ 0 , if ( i, j ) / ∈ G • Return score estimate as leading left eigenvector of P Top- K ranking 10/ 21

  20. Rationale behind spectral method In large-sample case, P → P ∗ , whose off-diagonal entries obey  w ∗  j j , if ( i, j ) ∈ G P ∗ w ∗ i + w ∗ i,j ∝  0 , if ( i, j ) / ∈ G P ∗ • Stationary distribution of reversible � �� � check detailed balance π ∗ ∝ [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] � �� � true score Top- K ranking 11/ 21

  21. Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G • L ( w ) becomes convex after reparametrization: − → θ = [ θ 1 , · · · , θ n ] , θ i = log w i w Top- K ranking 12/ 21

  22. Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G • L ( w ) becomes convex after reparametrization: − → θ = [ θ 1 , · · · , θ n ] , θ i = log w i w L λ ( θ ) := L ( θ ) + 1 2 λ � θ � 2 ( Regularized MLE ) minimize θ 2 � np log n choose λ ≍ L Top- K ranking 12/ 21

  23. Main result comparison graph G ( n, p ) ; sample size ≍ pn 2 L 4 5 3 6 2 7 1 8 12 9 11 10 Theorem 1 (Chen, Fan, Ma, Wang ’17) When p � log n n , both spectral method and regularized MLE achieve optimal sample complexity for top- K ranking! Top- K ranking 13/ 21

  24. Main result le sample size separation: score separation K achievable by both methods infeasible 4 on: ∆ K : score separation w ∗ ( K ) − w ∗ ( K +1) • ∆ K := : score separation � w ∗ � ∞ Top- K ranking 13/ 21

  25. Comparison with Jang et al ’16 � log n Jang et al ’16: spectral method controls entrywise error if p � n � �� � relatively dense Top- K ranking 14/ 21

  26. Comparison with Jang et al ’16 � log n Jang et al ’16: spectral method controls entrywise error if p � n � �� � relatively dense al sample size Our work / optimal sample size e Jang et al ’16 ⌘ 1 / 4 ⇣ on: ∆ K : score separation log n n Top- K ranking 14/ 21

  27. Empirical top- K ranking accuracy 1 Spectral Method top- K ranking accuracy Regularized MLE 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 " K : score separation n = 200 , p = 0 . 25 , L = 20 Top- K ranking 15/ 21

  28. Optimal control of entrywise error ∆ K z }| { w ∗ w ∗ 2 w ∗ w ∗ w ∗ 1 3 · · · K K +1 · · · true score ore score estimates w K w 1 w 2 w 3 w K +1 · · · · · · |{z} |{z} < 1 2 ∆ K < 1 2 ∆ K Theorem 2 Suppose p � log n and sample size � n log n K . Then with high prob., ∆ 2 n the estimates w returned by both methods obey (up to global scaling) � w − w ∗ � ∞ < 1 2∆ K � w ∗ � ∞ Top- K ranking 16/ 21

  29. Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) … 𝑛 … 1 2 3 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y y = [ y i,j ] 1 ≤ i,j ≤ n Top- K ranking 17/ 21 asible

  30. Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) ize statistical independence ce stability Top- K ranking 17/ 21

  31. Exploit statistical independence … 1 2 𝑛 … 3 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y = [ y i,j ] 1 ≤ i,j ≤ n y leave-one-out estimate w ( m ) all data related to m th item = | Top- K ranking 18/ 21

  32. Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w Top- K ranking 19/ 21

  33. Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ π � π ∗ � � π − � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� � asymmetric Top- K ranking 19/ 21

  34. Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ π � π ∗ � � π − � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� � asymmetric • MLE: local strong convexity �∇L λ ( θ ; � y ) � 2 � θ − � θ � 2 � strong convexity parameter Top- K ranking 19/ 21

  35. A small sample of related works • Parametric models ◦ Ford ’57 ◦ Hunter ’04 ◦ Negahban, Oh, Shah ’12 ◦ Rajkumar, Agarwal ’14 ◦ Hajek, Oh, Xu ’14 ◦ Chen, Suh ’15 ◦ Rajkumar, Agarwal ’16 ◦ Jang, Kim, Suh, Oh ’16 ◦ Suh, Tan, Zhao ’17 • Non-parametric models ◦ Shah, Wainwright ’15 ◦ Shah, Balakrishnan, Guntuboyina, Wainwright ’16 ◦ Chen, Gopi, Mao, Schneider ’17 • Leave-one-out analysis ◦ El Karoui, Bean, Bickel, Lim, Yu ’13 ◦ Zhong, Boumal ’17 ◦ Abbe, Fan, Wang, Zhong ’17 ◦ Ma, Wang, Chi, Chen ’17 ◦ Chen, Chi, Fan, Ma ’18 ◦ Chen, Chi, Fan, Ma, Yan ’19 Top- K ranking 20/ 21

  36. Summary Linear-time Optimal sample computational complexity complexity ✔ ✔ Spectral method ✔ ✔ Regularized MLE Novel entrywise perturbation analysis for spectral method and convex optimization Paper : “ Spectral method and regularized MLE are both optimal for top- K ranking ”, Y. Chen, J. Fan, C. Ma, K. Wang, Annals of Statistics , vol. 47, 2019 Top- K ranking 21/ 21

Recommend


More recommend