Alex Olshevsky Department of ECE, Boston University Joint work with Julien Hendrickx (UC Louvain) and Venkatesh Saligrama (BU) Graph Resistance and Learning from Pairwise Comparisons
pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from
pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from
pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from
pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from
pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from
pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from
• Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G and if i j is an edge in this graph, items i and j are compared k times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j
• Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G and if i j is an edge in this graph, items i and j are compared k times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j
• Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j and if ( i , j ) is an edge in this graph, items i and j are compared k
• Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j and if ( i , j ) is an edge in this graph, items i and j are compared k
• Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j and if ( i , j ) is an edge in this graph, items i and j are compared k
3 3 measurements. • Each edge label represents the outcomes of noisy comparisons. 4 2 1 Example 1 1 0 0 0 0 0 0 0 1 0 0 • Need to compute (scaled versions of) w 1 , w 2 , w 3 , w 4 from these
b 5 log n • Worst case scaling is O n 7 k . 4 1 2 2 O 1 k 2 w 2 d max d 2 min • Scaling with degrees recently improved by [Agarwal, Patil, Agarwal, ICML 2018]. w 2 • The dominant approach has been to construct a Markov chain i j based on the data whose stationary distribution is an estimate of the true weights. • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001] and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under the assumption max w i 2 w j b the estimate W satisfies w w 1 W Previous Work – I
• Worst case scaling is O n 7 k . 4 w Agarwal, ICML 2018]. • Scaling with degrees recently improved by [Agarwal, Patil, min d 2 d max 2 k 2 2 w • The dominant approach has been to construct a Markov chain 2 W 2 based on the data whose stationary distribution is an estimate of the true weights. • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001] and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under the assumption max w i w j W satisfies Previous Work – I ≤ b , i , j the estimate ˆ � � � � � � � � ( 1 ) b 5 log n || w || 1 − ˆ � � � � ≤ O , � � � � � � � � λ 2 � � � � || w || 1
4 w Agarwal, ICML 2018]. • Scaling with degrees recently improved by [Agarwal, Patil, min d 2 d max 2 k 2 2 w • The dominant approach has been to construct a Markov chain 2 W 2 w i of the true weights. max the assumption and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under W satisfies w j • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001] based on the data whose stationary distribution is an estimate Previous Work – I ≤ b , i , j the estimate ˆ � � � � � � � � ( 1 ) b 5 log n || w || 1 − ˆ � � � � ≤ O , � � � � � � � � λ 2 � � � � || w || 1 • Worst case scaling is O ( n 7 / k ) .
4 w Agarwal, ICML 2018]. • Scaling with degrees recently improved by [Agarwal, Patil, min d 2 d max 2 k 2 2 w • The dominant approach has been to construct a Markov chain 2 W 2 w i of the true weights. max the assumption and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under W satisfies w j • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001] based on the data whose stationary distribution is an estimate Previous Work – I ≤ b , i , j the estimate ˆ � � � � � � � � ( 1 ) b 5 log n || w || 1 − ˆ � � � � ≤ O , � � � � � � � � λ 2 � � � � || w || 1 • Worst case scaling is O ( n 7 / k ) .
Recommend
More recommend