graph resistance and learning from pairwise comparisons
play

Graph Resistance and Learning from Pairwise Comparisons pairwise - PowerPoint PPT Presentation

Alex Olshevsky Department of ECE, Boston University Joint work with Julien Hendrickx (UC Louvain) and Venkatesh Saligrama (BU) Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In many contexts,


  1. Alex Olshevsky Department of ECE, Boston University Joint work with Julien Hendrickx (UC Louvain) and Venkatesh Saligrama (BU) Graph Resistance and Learning from Pairwise Comparisons

  2. pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from

  3. pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from

  4. pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from

  5. pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from

  6. pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from

  7. pairwise comparisons of items. • In many contexts, comparisons are the right way to model the available data: • A patient compares how painful or helpful two treatments have been. • A customer purchases one of several items recommended by an e-commerce site. • A user clicks on one of the items suggested by a search engine. • A user chooses one of several movies recommended by a streaming site. 1 Problem Statement • Given a collection of items with unknown qualities w 1 , . . . , w n , we want to compute w = ( w 1 , . . . , w n ) up to scaling from

  8. • Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G and if i j is an edge in this graph, items i and j are compared k times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j

  9. • Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G and if i j is an edge in this graph, items i and j are compared k times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j

  10. • Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j and if ( i , j ) is an edge in this graph, items i and j are compared k

  11. • Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j and if ( i , j ) is an edge in this graph, items i and j are compared k

  12. • Items are compared according to the Bradley-Terry-Luce (BTL) model: probability that item i wins against item j is w i • There are a number of models for item comparisons, and the BTL model is arguably the simplest. • We assume that there is an underlying “comparison graph” G times. • We do not choose the comparison graph. • Goal: understand how fast the error decays with k and G . 2 The Simplest Possible Model: BTL over a graph w i + w j and if ( i , j ) is an edge in this graph, items i and j are compared k

  13. 3 3 measurements. • Each edge label represents the outcomes of noisy comparisons. 4 2 1 Example 1 1 0 0 0 0 0 0 0 1 0 0 • Need to compute (scaled versions of) w 1 , w 2 , w 3 , w 4 from these

  14. b 5 log n • Worst case scaling is O n 7 k . 4 1 2 2 O 1 k 2 w 2 d max d 2 min • Scaling with degrees recently improved by [Agarwal, Patil, Agarwal, ICML 2018]. w 2 • The dominant approach has been to construct a Markov chain i j based on the data whose stationary distribution is an estimate of the true weights. • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001] and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under the assumption max w i 2 w j b the estimate W satisfies w w 1 W Previous Work – I

  15. • Worst case scaling is O n 7 k . 4 w Agarwal, ICML 2018]. • Scaling with degrees recently improved by [Agarwal, Patil, min d 2 d max 2 k 2 2 w • The dominant approach has been to construct a Markov chain 2 W 2 based on the data whose stationary distribution is an estimate of the true weights. • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001] and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under the assumption max w i w j W satisfies Previous Work – I ≤ b , i , j the estimate ˆ � � � � � � � � ( 1 ) b 5 log n || w || 1 − ˆ � � � � ≤ O , � � � � � � � � λ 2 � � � � || w || 1

  16. 4 w Agarwal, ICML 2018]. • Scaling with degrees recently improved by [Agarwal, Patil, min d 2 d max 2 k 2 2 w • The dominant approach has been to construct a Markov chain 2 W 2 w i of the true weights. max the assumption and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under W satisfies w j • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001] based on the data whose stationary distribution is an estimate Previous Work – I ≤ b , i , j the estimate ˆ � � � � � � � � ( 1 ) b 5 log n || w || 1 − ˆ � � � � ≤ O , � � � � � � � � λ 2 � � � � || w || 1 • Worst case scaling is O ( n 7 / k ) .

  17. 4 w Agarwal, ICML 2018]. • Scaling with degrees recently improved by [Agarwal, Patil, min d 2 d max 2 k 2 2 w • The dominant approach has been to construct a Markov chain 2 W 2 w i of the true weights. max the assumption and first analyzed [Neghaban, Oh, Shah, NeurIPS 2012]. Under W satisfies w j • First proposed by [Dwork, Kumar, Naor, Sivakumar, WWW 2001] based on the data whose stationary distribution is an estimate Previous Work – I ≤ b , i , j the estimate ˆ � � � � � � � � ( 1 ) b 5 log n || w || 1 − ˆ � � � � ≤ O , � � � � � � � � λ 2 � � � � || w || 1 • Worst case scaling is O ( n 7 / k ) .

Recommend


More recommend