Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang

Ranking A fundamental problem in a wide range of contexts • web search, recommendation systems, admissions, sports competitions, voting, ... PageRank figure credit: Dzenan Hamzic Top- K ranking 2/ 21

Rank aggregation from pairwise comparisons pairwise comparisons for ranking top tennis players figure credit: Boz´ oki, Csat´ o, Temesi Top- K ranking 3/ 21

Parametric models Assign latent score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ i : rank k Top- K ranking 4/ 21

Parametric models Assign latent score to each of n items w ∗ = [ w ∗ 1 , · · · , w ∗ n ] i : preference score w ∗ i : rank k • This work: Bradley-Terry-Luce (logistic) model w ∗ j P { item j beats item i } = w ∗ i + w ∗ j • Other models: Thurstone model, low-rank model, ... Top- K ranking 4/ 21

Typical ranking procedures Estimate latent scores − → rank items based on score estimates Top- K ranking 5/ 21

Top- K ranking Estimate latent scores − → rank items based on score estimates Goal: identify the set of top- K items under minimal sample size Top- K ranking 5/ 21

Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons  w ∗  1 , with prob. j y ( l ) ind. i + w ∗ w ∗ = 1 ≤ l ≤ L j i,j  0 , else Top- K ranking 6/ 21

Model: random sampling • Comparison graph: Erd˝ os–R´ enyi graph G ∼ G ( n, p ) 4 5 3 6 2 7 1 8 12 9 11 10 • For each ( i, j ) ∈ G , obtain L paired comparisons L � y i,j = 1 y ( l ) ( sufficient statistic ) i,j L l =1 Top- K ranking 6/ 21

Prior art mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 7/ 21

Prior art a “meta metric” mean square error top- K ranking for estimating scores accuracy ✔ Negahban et al. ‘12 Spectral method Negahban et al. ’12 ✔ MLE Hajek et al. ‘14 ✔ ✔ Spectral MLE Chen & Suh. ’15 Top- K ranking 7/ 21

Small ℓ 2 loss � = high ranking accuracy Top- K ranking 8/ 21

Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Top- K ranking 8/ 21

Small ℓ 2 loss � = high ranking accuracy These two estimates have same ℓ 2 loss, but output different rankings Need to control entrywise error! Top- K ranking 8/ 21

Optimality? Is spectral method or MLE alone optimal for top- K ranking? Top- K ranking 9/ 21

Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense Top- K ranking 9/ 21

Optimality? Is spectral method or MLE alone optimal for top- K ranking? Partial answer (Jang et al ’16): spectral method works if comparison graph is sufficiently dense This work: affirmative answer for both methods + entire regime � �� inc. sparse graphs Top- K ranking 9/ 21

Spectral method (Rank Centrality) Negahban, Oh, Shah ’12 • Construct a probability transition matrix P , whose off-diagonal entries obey � y i,j , if ( i, j ) ∈ G P i,j ∝ 0 , if ( i, j ) / ∈ G • Return score estimate as leading left eigenvector of P Top- K ranking 10/ 21

Rationale behind spectral method In large-sample case, P → P ∗ , whose off-diagonal entries obey  w ∗  j j , if ( i, j ) ∈ G P ∗ w ∗ i + w ∗ i,j ∝  0 , if ( i, j ) / ∈ G P ∗ • Stationary distribution of reversible � �� check detailed balance π ∗ ∝ [ w ∗ 1 , w ∗ 2 , . . . , w ∗ n ] � �� true score Top- K ranking 11/ 21

Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G • L ( w ) becomes convex after reparametrization: − → θ = [ θ 1 , · · · , θ n ] , θ i = log w i w Top- K ranking 12/ 21

Regularized MLE Negative log-likelihood � � � w i w j L ( w ) := − y j,i log + (1 − y j,i ) log w i + w j w i + w j ( i,j ) ∈G • L ( w ) becomes convex after reparametrization: − → θ = [ θ 1 , · · · , θ n ] , θ i = log w i w L λ ( θ ) := L ( θ ) + 1 2 λ � θ � 2 ( Regularized MLE ) minimize θ 2 � np log n choose λ ≍ L Top- K ranking 12/ 21

Main result comparison graph G ( n, p ) ; sample size ≍ pn 2 L 4 5 3 6 2 7 1 8 12 9 11 10 Theorem 1 (Chen, Fan, Ma, Wang ’17) When p � log n n , both spectral method and regularized MLE achieve optimal sample complexity for top- K ranking! Top- K ranking 13/ 21

Main result le sample size separation: score separation K achievable by both methods infeasible 4 on: ∆ K : score separation w ∗ ( K ) − w ∗ ( K +1) • ∆ K := : score separation � w ∗ � ∞ Top- K ranking 13/ 21

Comparison with Jang et al ’16 � log n Jang et al ’16: spectral method controls entrywise error if p � n � �� relatively dense Top- K ranking 14/ 21

Comparison with Jang et al ’16 � log n Jang et al ’16: spectral method controls entrywise error if p � n � �� relatively dense al sample size Our work / optimal sample size e Jang et al ’16 ⌘ 1 / 4 ⇣ on: ∆ K : score separation log n n Top- K ranking 14/ 21

Empirical top- K ranking accuracy 1 Spectral Method top- K ranking accuracy Regularized MLE 0.8 0.6 0.4 0.2 0 0.1 0.2 0.3 0.4 0.5 " K : score separation n = 200 , p = 0 . 25 , L = 20 Top- K ranking 15/ 21

Optimal control of entrywise error ∆ K z }| { w ∗ w ∗ 2 w ∗ w ∗ w ∗ 1 3 · · · K K +1 · · · true score ore score estimates w K w 1 w 2 w 3 w K +1 · · · · · · |{z} |{z} < 1 2 ∆ K < 1 2 ∆ K Theorem 2 Suppose p � log n and sample size � n log n K . Then with high prob., ∆ 2 n the estimates w returned by both methods obey (up to global scaling) � w − w ∗ � ∞ < 1 2∆ K � w ∗ � ∞ Top- K ranking 16/ 21

Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) … 𝑛 … 1 2 3 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y y = [ y i,j ] 1 ≤ i,j ≤ n Top- K ranking 17/ 21 asible

Key ingredient: leave-one-out analysis For each 1 ≤ m ≤ n , introduce leave-one-out estimate w ( m ) ize statistical independence ce stability Top- K ranking 17/ 21

Exploit statistical independence … 1 2 𝑛 … 3 𝑜 1 2 3 … te w ( m ) = ⇒ 𝑛 … 𝑜 y = [ y i,j ] 1 ≤ i,j ≤ n y leave-one-out estimate w ( m ) all data related to m th item = | Top- K ranking 18/ 21

Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w Top- K ranking 19/ 21

Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ π � π ∗ � � π − � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� asymmetric Top- K ranking 19/ 21

Leave-one-out stability leave-one-out estimate w ( m ) ≈ true estimate w • Spectral method: eigenvector perturbation bound � � � π ( P − � � P ) π ∗ π � π ∗ � � π − � spectral-gap ◦ new Davis-Kahan bound for probability transition matrices � �� asymmetric • MLE: local strong convexity �∇L λ ( θ ; � y ) � 2 � θ − � θ � 2 � strong convexity parameter Top- K ranking 19/ 21

A small sample of related works • Parametric models ◦ Ford ’57 ◦ Hunter ’04 ◦ Negahban, Oh, Shah ’12 ◦ Rajkumar, Agarwal ’14 ◦ Hajek, Oh, Xu ’14 ◦ Chen, Suh ’15 ◦ Rajkumar, Agarwal ’16 ◦ Jang, Kim, Suh, Oh ’16 ◦ Suh, Tan, Zhao ’17 • Non-parametric models ◦ Shah, Wainwright ’15 ◦ Shah, Balakrishnan, Guntuboyina, Wainwright ’16 ◦ Chen, Gopi, Mao, Schneider ’17 • Leave-one-out analysis ◦ El Karoui, Bean, Bickel, Lim, Yu ’13 ◦ Zhong, Boumal ’17 ◦ Abbe, Fan, Wang, Zhong ’17 ◦ Ma, Wang, Chi, Chen ’17 ◦ Chen, Chi, Fan, Ma ’18 ◦ Chen, Chi, Fan, Ma, Yan ’19 Top- K ranking 20/ 21

Summary Linear-time Optimal sample computational complexity complexity ✔ ✔ Spectral method ✔ ✔ Regularized MLE Novel entrywise perturbation analysis for spectral method and convex optimization Paper : “ Spectral method and regularized MLE are both optimal for top- K ranking ”, Y. Chen, J. Fan, C. Ma, K. Wang, Annals of Statistics , vol. 47, 2019 Top- K ranking 21/ 21

Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton

Draft EE 8235: Lectures 17 & 18 1 Lectures 17 & 18: Numerical methods Spectral

Pointwise convergence of the feasibility violation for Moreau-Yosida regularized optimal control

A Review of Regularized Optimal Transport Marco Cuturi Joint work with many people, including:

A regularized least-squares method for sparse low-rank approximation of multivariate functions

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

An optimal Lp -bound on the Krein spectral shift function (Birmingham, November 1012, 2000)

A Circle Detection Method Based on Optimal A Circle Detection Method Based on Optimal Parameter

A conservative spectral method for the Boltzmann equation with anisotropic scattering and the

1 Micromechanics using Spectral Method Interface Decohesion in Polycrystals L. Sha harma, P.

Optimal control of parabolic equations using spectral calculus Ivica Naki Faculty of Science,

Detection and Diagnosis of Plant-wide Oscillations using the Spectral Envelope Method Hailei

The Variational Nystrm Method for Large-Scale Spectral Problems Max Vladymyrov Miguel

Spectral Performance of Nitsches Method Isaac Harari, Uri Albocher Tel Aviv University

Discontinuous Galerkin and spectral element method for rotating shallow water equation on the

Faster Agreement via a Spectral Method for Detecting Malicious Behavior Valerie King Jared

A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian, Princy Dikshit, Hongbing Hu

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Mimetic Least Squares Spectral/ hp Finite Element Method for the Poisson Equation Artur Palha 1

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

Optimal Preconditioning for the Interval Parametric GaussSeidel Method Milan Hlad k

What is an Optimal Bayesian Method? Chris. J. Oates Newcastle University & Lloyds

Multi-domain Bivariate Spectral Local Linearisation method for solving non-similar boundary layer

Spectral Method and Regularized MLE Are Both Optimal for Top- K - PowerPoint PPT Presentation

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental problem in a wide range of contexts

Spectral Method and Regularized MLE Are Both Optimal for Top- K Ranking Cong Ma ORFE, Princeton

Draft EE 8235: Lectures 17 &amp; 18 1 Lectures 17 &amp; 18: Numerical methods Spectral

Pointwise convergence of the feasibility violation for Moreau-Yosida regularized optimal control

A Review of Regularized Optimal Transport Marco Cuturi Joint work with many people, including:

A regularized least-squares method for sparse low-rank approximation of multivariate functions

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

An optimal Lp -bound on the Krein spectral shift function (Birmingham, November 1012, 2000)

A Circle Detection Method Based on Optimal A Circle Detection Method Based on Optimal Parameter

A conservative spectral method for the Boltzmann equation with anisotropic scattering and the

1 Micromechanics using Spectral Method Interface Decohesion in Polycrystals L. Sha harma, P.

Optimal control of parabolic equations using spectral calculus Ivica Naki Faculty of Science,

Detection and Diagnosis of Plant-wide Oscillations using the Spectral Envelope Method Hailei

The Variational Nystrm Method for Large-Scale Spectral Problems Max Vladymyrov Miguel

Spectral Performance of Nitsches Method Isaac Harari, Uri Albocher Tel Aviv University

Discontinuous Galerkin and spectral element method for rotating shallow water equation on the

Faster Agreement via a Spectral Method for Detecting Malicious Behavior Valerie King Jared

A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu*

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Mimetic Least Squares Spectral/ hp Finite Element Method for the Poisson Equation Artur Palha 1

Regularized generalized CCA (RGCCA) Arthur Tenenhaus (SUPELEC) Michel Tenenhaus (HEC Paris) 1

Optimal Preconditioning for the Interval Parametric GaussSeidel Method Milan Hlad k

What is an Optimal Bayesian Method? Chris. J. Oates Newcastle University &amp; Lloyds

Multi-domain Bivariate Spectral Local Linearisation method for solving non-similar boundary layer

Draft EE 8235: Lectures 17 & 18 1 Lectures 17 & 18: Numerical methods Spectral

A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian, Princy Dikshit, Hongbing Hu

What is an Optimal Bayesian Method? Chris. J. Oates Newcastle University & Lloyds