ranking median regression learning to order through local
play

Ranking Median Regression: Learning to Order through Local Consensus - PowerPoint PPT Presentation

Statistics/Learning at Paris-Saclay @IHES January 19 2018 Ranking Median Regression: Learning to Order through Local Consensus Anna Korba Stphan Clmenon Eric Sibony Telecom ParisTech, Shifu Technology 1 Outline 1.


  1. Statistics/Learning at Paris-Saclay @IHES January 19 2018 Ranking Median Regression: Learning to Order through Local Consensus Anna Korba ⋆ Stéphan Clémençon ⋆ Eric Sibony † ⋆ Telecom ParisTech, † Shifu Technology 1

  2. Outline 1. Introduction to Ranking Data 2. Background on Ranking Aggregation 3. Ranking Median Regression 4. Local Consensus Methods for Ranking Median Regression 5. Conclusion 2

  3. Outline Introduction to Ranking Data Background on Ranking Aggregation Ranking Median Regression Local Consensus Methods for Ranking Median Regression Conclusion 3

  4. satisfying the following properties: Ranking Data Set of items � n � := { 1 , . . . , n } Definition (Ranking) A ranking is a strict partial order ≺ over � n � , i.e. a binary relation Irreflexivity For all i ∈ � n � , i ̸≺ i Transitivity For all i , j , k ∈ � n � , if i ≺ j and j ≺ k then i ≺ k Asymmetry For all i, j ∈ � n � , if i ≺ j then j ̸≺ i 4

  5. Ranking data arise in a lot of applications Traditional applications ▶ Elections : � n � = a set of candidates → A voter ranks a set of candidates ▶ Competitions : � n � = a set of players → Results of a race ▶ Surveys : � n � = political goals → A citizen ranks according to its priorities Modern applications ▶ E-commerce : � n � = items of a catalog → A user expresses its preferences (see ”implicit feedback”) ▶ Search engines : � n � = web-pages → A search engine ranks by relevance for a given query 5

  6. become a subfield of the machine learning literature. The analysis of ranking data spreads over many fields of the scientific literature ▶ Social choice theory ▶ Economics ▶ Operational Research ▶ Machine learning ⇒ Over the past 15 years, the statistical analysis of ranking data has 6

  7. NIPS 2011 NIPS 2001 Special track on Ranking and Preferences ICML 2015-2017 Analysis of Rank Data NIPS 2014 Seminar on Preference Learning Dagstuhl 2014 From Decision Analysis to Preference Learning DA2PL 2012,2014,2016 Preference Learning ECAI 2012 Special track on Preference Learning EURO 09-16 Choice Models and Preference Learning Advances in Ranking Learning on Functions, Graphs and Groups NIPS 09 Preference Learning ECML/PKDD 08-10 Learning to Rank for Information Retrieval SIGIR 07-10 Advances in Preference Handling IJCAI 2005 Learning to Rank NIPS 2005 Learning with Structured Outputs NIPS 2004 Beyond Classification and Regression NIPS 2002 New Methods for Preference Elicitation NIPS 2017 Many efforts to bring them together 7

  8. with without ties with Common types of rankings Set of items � n � := { 1 , . . . , n } ▶ Full ranking. All the items are ranked, without ties a 1 ≻ a 2 ≻ · · · ≻ a n ▶ Partial ranking. All the items are ranked, with ties (”buckets”) r ∑ a 1 , 1 , . . . , a 1 ,n 1 ≻ · · · ≻ a r, 1 , . . . , a r,n r n i = n i =1 ⇒ Top-k ranking is a particular case: a 1 , . . . , a k ≻ the rest ▶ Incomplete ranking. Only a subset of items are ranked, a 1 ≻ · · · ≻ a k k < n ⇒ Pairwise comparison is a particular case: a 1 ≻ a 2 8

  9. Probabilistic Modeling. The dataset is a collection of random Notation. with Detailed example: analysis of full rankings ▶ A full ranking: a 1 ≻ a 2 ≻ · · · ≻ a n ▶ Also seen as the permutation σ that maps an item to its rank: a 1 ≻ · · · ≻ a n ⇔ σ ∈ S n such that σ ( a i ) = i S n : set of permutations of � n � , the symmetric group. permutations drawn IID from a probability distribution P over S n : D N = (Σ 1 , . . . , Σ N ) Σ i ∼ P P is called a ranking model. 9

  10. structure Detailed example: analysis of full rankings ▶ Ranking data are very natural for human beings ⇒ Statistical modeling should capture some interpretable Questions ▶ How to analyze a dataset of permutations D N = (Σ 1 , . . . , Σ N ) ? ▶ How to characterize the variability? What can be inferred? 10

  11. estimation)... but No natural notion of variance for Few statistical relevance No canonical ordering of the rankings! The random variables are highly dependent and the sum is not a random permutation! Apply a method from p.d.f. estimation (e.g. kernel density The set of permutations is finite... but Exploding cardinality: Detailed example: analysis of full rankings Challenges ▶ A random permutation Σ can be seen as a random vector (Σ(1) , . . . , Σ( n )) ∈ R n ... but 11

  12. estimation)... but is finite... but The set of permutations Exploding cardinality: Few statistical relevance Apply a method from p.d.f. estimation (e.g. kernel density No canonical ordering of the rankings! Detailed example: analysis of full rankings Challenges ▶ A random permutation Σ can be seen as a random vector (Σ(1) , . . . , Σ( n )) ∈ R n ... but The random variables Σ(1) , . . . , Σ( n ) are highly dependent and the sum Σ + Σ ′ is not a random permutation! ⇒ No natural notion of variance for Σ 11

  13. estimation)... but Exploding cardinality: Few statistical relevance Apply a method from p.d.f. estimation (e.g. kernel density No canonical ordering of the rankings! Detailed example: analysis of full rankings Challenges ▶ A random permutation Σ can be seen as a random vector (Σ(1) , . . . , Σ( n )) ∈ R n ... but The random variables Σ(1) , . . . , Σ( n ) are highly dependent and the sum Σ + Σ ′ is not a random permutation! ⇒ No natural notion of variance for Σ ▶ The set of permutations S n is finite... but 11

  14. estimation)... but Apply a method from p.d.f. estimation (e.g. kernel density No canonical ordering of the rankings! Detailed example: analysis of full rankings Challenges ▶ A random permutation Σ can be seen as a random vector (Σ(1) , . . . , Σ( n )) ∈ R n ... but The random variables Σ(1) , . . . , Σ( n ) are highly dependent and the sum Σ + Σ ′ is not a random permutation! ⇒ No natural notion of variance for Σ ▶ The set of permutations S n is finite... but Exploding cardinality: | S n | = n ! ⇒ Few statistical relevance 11

  15. No canonical ordering of the rankings! estimation)... but Detailed example: analysis of full rankings Challenges ▶ A random permutation Σ can be seen as a random vector (Σ(1) , . . . , Σ( n )) ∈ R n ... but The random variables Σ(1) , . . . , Σ( n ) are highly dependent and the sum Σ + Σ ′ is not a random permutation! ⇒ No natural notion of variance for Σ ▶ The set of permutations S n is finite... but Exploding cardinality: | S n | = n ! ⇒ Few statistical relevance ▶ Apply a method from p.d.f. estimation (e.g. kernel density 11

  16. No canonical ordering of the rankings! estimation)... but Detailed example: analysis of full rankings Challenges ▶ A random permutation Σ can be seen as a random vector (Σ(1) , . . . , Σ( n )) ∈ R n ... but The random variables Σ(1) , . . . , Σ( n ) are highly dependent and the sum Σ + Σ ′ is not a random permutation! ⇒ No natural notion of variance for Σ ▶ The set of permutations S n is finite... but Exploding cardinality: | S n | = n ! ⇒ Few statistical relevance ▶ Apply a method from p.d.f. estimation (e.g. kernel density 11

  17. “Parametric” approach “Nonparametric” approach Main approaches ▶ Fit a predefined generative model on the data ▶ Analyze the data through that model ▶ Infer knowledge with respect to that model ▶ Choose a structure on S n ▶ Analyze the data with respect to that structure ▶ Infer knowledge through a “regularity” assumption 12

  18. Parametric Approach - Classic Models ▶ Thurstone model [Thurstone, 1927] Let { X 1 , X 2 , . . . , X n } r.v with a continuous joint distribution F ( x 1 , . . . , x n ) : P ( σ ) = P ( X σ − 1 (1) < X σ − 1 (2) < · · · < X σ − 1 ( n ) ) ▶ Plackett-Luce model [Luce, 1959], [Plackett, 1975] Each item i is parameterized by w i with w i ∈ R + : n ∏ w σ i ∑ n P ( σ ) = j = i w σ j i =1 w 2 w 1 Ex: 2 ≻ 1 ≻ 3 = w 1 + w 2 + w 3 w 1 + w 3 ▶ Mallows model [Mallows, 1957] Parameterized by a central ranking σ 0 ∈ S n and a dispersion parameter γ ∈ R + P ( σ ) = Ce − γd ( σ 0 ,σ ) with d a distance on S n . 13

  19. . . Permutation matrices [Plis et al., 2011] . . . Kemeny embedding [Jiao et al., 2016] . Multiresolution analysis for incomplete rankings [Sibony et al., 2015] Fourier analysis [Clémençon et al., 2011], [Kondor and Barbosa, 2010] Nonparametric approaches - Examples 1 ▶ Embeddings S n → R n × n , σ �→ P σ with P σ ( i, j ) = I { σ ( i ) = j }     S n → R n ( n − 1)/2 ,   σ �→ φ σ with φ σ = sign ( σ ( i ) − σ ( j ))   i<j ▶ Harmonic analysis ∑ ˆ h ( σ ) ρ λ ( σ ) où ρ λ ( σ ) ∈ C d λ × d λ for all λ ⊢ n. h λ = σ ∈ S n 14

  20. Modeling of pairwise comparisons as a graph: HodgeRank exploits the topology of the graph [Jiang et al., 2011] Approximation of pairwise comparison matrices [Shah and Wainwright, 2015] Nonparametric approaches - Examples 2 j i ≻ j k ≻ j i ≻ k i k l ≻ k i ≻ l l 15

Recommend


More recommend