multiresolution analysis for the statistical analysis of incomplete rankings Eric Sibony Anna Korba Stéphan Clémençon NIPS Workshop on Multiresolution Methods for Large-Scale Learning December 12 2015 LTCI UMR 5141, Telecom ParisTech/CNRS 0
introduction Why rankings? Ranking data naturally appear in a wide variety of situations ∙ elections ∙ survey answers ∙ expert judgments ∙ race results ∙ competition rankings ∙ customers behaviors ∙ users preferences ∙ … 1
introduction Probabilistic modeling on rankings Catalog of items � n � := { 1 , . . . , n } Full ranking a 1 ≻ · · · ≻ a n ⇔ Permutation σ ∈ S n that maps an item to its rank: σ ( a i ) = i The variability of full rankings is therefore modeled by a probability distribution p over the set of permutations S n . p is called a ranking model. 2
introduction Example: probability distribution over S 5 (APA dataset) 3
introduction Probabilistic modeling on rankings “Parametric” models - psychological interpretation ∙ Thurstone, ∙ Mallows, ∙ Plackett-Luce … “Nonparametric” approaches - mathematical interpretation ∙ Distance-based, ∙ Independence modeling, ∙ Fourier analysis … Why Multiresolution Analysis? To exploit another relevant structure of rankings 4
fourier analysis on the symmetric group
abstract fourier analysis Fourier analysis consists in decomposing a signal into projections on subspaces that are stable under translations. Example: Fourier series For e k ( x ) = e 2i π kx , the space C e k is stable under translations T a : f �→ f ( · − a ) for all a ∈ R / Z . Fourier coefficient � f ( k ) is defined by ∫ 1 f ( x ) e 2i π kx dx � f ( k ) = ⟨ f , e k ⟩ = 0 6
abstract fourier analysis the symmetric group Let L ( S n ) := { f : S n → R } . Translations on L ( S n ) are the operators T τ : f �→ f ( · τ − 1 ) defined for τ ∈ S n . Theorem (From group representation theory) ⊕ L ( S n ) ∼ d λ S λ = λ ⊢ n ∙ λ ⊢ n: indexes of the irreducible representations of S n ∙ S λ : space of irreducible representation indexed by λ ∙ d λ = dim S λ 7
abstract fourier transform Let ρ λ : S n → R d λ × d λ be a representative of the irreducible representation indexed by λ . ∑ ∈ R d λ × d λ � f ( λ ) = f ( σ ) ρ λ ( σ ) “ = ⟨ f , ρ λ ⟩ ” σ ∈ S n “projection on d λ S λ ”. The Fourier transform is then defined by (� ) F : f �→ f ( λ ) λ ⊢ n Satisfies classic properties ∙ Parseval identity ∙ Inverse Fourier transform ∙ Turns convolution into (matrix) product 8
specificities Fourier coefficients are matrices “Frequencies” λ are not numbers (no canonic total order) They are partitions of n: tuples ( λ 1 , . . . , λ r ) ∈ N r such that λ 1 ≥ · · · ≥ λ r and ∑ r i = 1 λ i = n. ( n ) , ( n − 1 , 1 ) , ( n − 2 , 2 ) , ( n − 2 , 1 , 1 ) , . . . The canonic partial order on partitions however orders the Fourier coefficients by “levels of smoothness”. 9
utilizations Classic methods of Fourier analysis apply to ranking data ∙ Band-limited approximation (e.g. [Huang et al., 2009]) ∙ Phase-magnitude decomposition (e.g. [Kakarala, 2011]) ∙ Analysis of random walks (e.g. [Diaconis, 1988]) ∙ Construction of kernels (e.g. [Kondor and Barbosa, 2010]) ∙ Hypothesis testing (e.g. [Diaconis, 1989]) 10
looking for a new representation
natural extension As in classic Fourier analysis, Fourier coefficients contain global information on S n . ∑ � f ( λ ) = f ( σ ) ρ λ ( σ ) σ ∈ S n ⇒ The Fourier transform only allows to characterize the global smoothness of a function Probability distributions over S n may show local irregularities ⇒ One needs some form of multiresolution analysis to characterize the local smoothness of a function 12
construction of a multiresolution analysis Natural attempt ∙ Fourier analysis is constructed from translations ∙ Multiresolution analysis should be constructed from translations and dilations Problem: No equivalent of dilations in a discrete setting. 13
“space-scale” decomposition Relevant approach Directly construct a Multiresolution analysis that allows to characterize local singularities. The multiresolution analysis introduced in [Kondor and Dempsey, 2012] allows to characterize singularities f localized both ∙ in “space”: f with a small support in S n ∙ in “scale/frequency”: F f with small support in { λ ⊢ n } 14
item localization Some modern applications require a different type of localization. In these applications, observed rankings are incomplete: they only involve small subsets of items among the catalog � n � a 1 ≻ · · · ≻ a k with k ≪ n (e.g. users preferences). Such applications require “item localization”. 15
our purpose : “item-scale” decomposition Does Fourier analysis offers some “item localization”? No ⇒ We introduce a multiresolution analysis that allows to characterize singularities f localized both in ∙ in “items”: f only “impacts” the rankings of a subset of items ∙ in “scale/frequency”: F f with small support in { λ ⊢ n } 16
our purpose : “item-scale” decomposition What do we mean by “item localization”? 17
rank information localization
rank information ( ) 1 2 3 4 5 Permutation ↔ Ranking 5 ≻ 1 ≻ 4 ≻ 3 ≻ 2 2 5 4 3 1 Absolute rank information Relative rank information ∙ What is the rank σ ( 3 ) of item ∙ How are items 1 and 3 3? 4 relatively ordered? 1 ≻ 3 ∙ What item σ − 1 ( 2 ) is ranked at ∙ How are the items of the 2 nd position? 1 subset { 2 , 4 , 5 } relatively ordered? 5 ≻ 4 ≻ 2 ∙ What are the ranks σ ( { 2 , 4 , 5 } ) of items { 2 , 4 , 5 } ? { 5 , 3 , 1 } 19
rank information Ranking σ − 1 ( 1 ) ≻ · · · ≻ σ − 1 ( n ) Permutation σ ↔ Absolute rank information Relative rank information ∙ What is the rank σ ( i ) of item i? ∙ How are items a and b relatively ordered? ∙ What item σ − 1 ( j ) is ranked at j th position? ∙ How are the items of the subset A relatively ∙ What are the ranks σ ( { i , j , k } ) ordered? of items { i , j , k } ? 20
rank information rnd Ranking Σ − 1 ( 1 ) ≻ · · · ≻ Σ − 1 ( n ) rnd Permutation Σ ↔ Absolute rank information Relative rank information ∙ What is the law of the rank ∙ What is the probability Σ( i ) of item i ? P [Σ( a ) < Σ( b )] that a is ranked higher than b? ∙ What is the law of the item Σ − 1 ( j ) ranked at j th position? ∙ What is the law of the ranking Σ | A induced by ∙ What is the law of the ranks Σ on the subset A? Σ( { i , j , k } ) of items { i , j , k } ? 21
marginals of a ranking model For a random permutation Σ drawn from a ranking model p, all these laws are marginals of p. Example ∑ P [Σ( i ) = j ] = p ( σ ) σ ∈ S n , σ ( i )= j ∑ P [Σ( a ) < Σ( b )] = p ( σ ) σ ∈ S n , σ ( a ) <σ ( b ) Associated marginal operators M ( n − 1 , 1 ) : p �→ law of Σ( i ) M { a , b } : p �→ law of I { Σ( a ) < Σ( b ) } i 22
marginals of a ranking model Absolute marginals For λ ⊢ n, M λ A 1 ,..., A r : p �→ law of (Σ( A 1 ) , . . . , Σ( A r )) where ( A 1 , . . . , A r ) is an partition of � n � such that | A i | = λ i . Absolute marginals For A ⊂ � n � with | A | ≥ 2, M A : p �→ law of Σ | A where Σ | A is the ranking induced by Σ on the items of A. 23
marginals localize nested levels of rank information Example for absolute marginals The knowledge of all ( n − 2 , 1 , 1 ) marginals induces the knowledge of all ( n − 1 , 1 ) marginals. M ( n − 1 , 1 ) p ( j ) = P [Σ( i ) = j ] i ∑ P [Σ( i ) = j , Σ( i ′ ) = j ′ ] = j ′ ̸ = j ∑ M ( n − 2 , 1 , 1 ) p ( j , j ′ ) = ( i , i ′ ) j ′ ̸ = j for all i ′ ̸ = i. 24
marginals localize nested levels of rank information Example for relative marginals The knowledge of the marginal on { a , b , c } induces the knowledge of the marginal on { a , c } . M { a , c } p ( b ≻ c ) = P [Σ( b ) < Σ( c )] = P [Σ( a ) < Σ( b ) < Σ( c )] + P [Σ( b ) < Σ( a ) < Σ( c )] + P [Σ( b ) < Σ( c ) < Σ( a )] = M { a , b , c } p ( a ≻ b ≻ c ) + M { a , b , c } p ( b ≻ a ≻ c ) + M { a , b , c } p ( b ≻ c ≻ a ) 25
fourier analysis localizes absolute rank information A classic result from S n representation theory (Young’s rule) says informally that: 1. Absolute marginals are nested according to the canonic order on partitions � 2. The part of information of a function f : S n → R that is specific to its λ -marginals M λ f is contained in its Fourier coefficient � f ( λ ) : ∑ A 1 ,..., A r F − 1 � K µ,λ � M λ A 1 ,..., A r f “=” M λ f ( λ ) + f ( µ ) µ ◃ λ 26
fourier analysis localizes absolute rank information Illustration from Jonathan Huang’s thesis 27
fourier analysis does not localize relative rank information f M { 1 , 2 , 3 } f M { 1 , 2 , 4 } f M { 1 , 3 , 4 } f M { 2 , 3 , 4 } f M { 1 , 2 } f M { 1 , 3 } f M { 1 , 4 } f M { 2 , 3 } f M { 2 , 4 } f M { 3 , 4 } f 28
the mra representation
Recommend
More recommend