Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach Mastane Achab, Anna Korba, Stephan Cl´ emen¸ con DA2PL’2018, Poznan, Poland
Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Introduction (1/2) ◮ Permutations over n items � n � = { 1 , . . . , n }
Introduction (1/2) ◮ Permutations over n items � n � = { 1 , . . . , n } ◮ Number of permutations explodes: # S n = n !
Introduction (1/2) ◮ Permutations over n items � n � = { 1 , . . . , n } ◮ Number of permutations explodes: # S n = n ! ◮ Distribution P on S n : n ! − 1 parameters
Introduction (2/2) ◮ Question: ”How to summarize P ?”
Introduction (2/2) ◮ Question: ”How to summarize P ?” ◮ Answer: dimensionality reduction
Introduction (2/2) ◮ Question: ”How to summarize P ?” ◮ Answer: dimensionality reduction ◮ Problem: no vector space structure for permutations
Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n �
Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n � ◮ ∪ K k =1 C k = � n �
Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n � ◮ ∪ K k =1 C k = � n � ◮ K : ”size” of C
Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n � ◮ ∪ K k =1 C k = � n � ◮ K : ”size” of C ◮ (# C 1 , . . . , # C K ): ”shape” of C Partial order: ” i is ranked lower than j in C ” if ∃ k < l s.t. ( i , j ) ∈ C k × C l .
Preliminaries (2/2) P C : set of all bucket distributions P ′ associated to C ◮ P ′ distribution on S n
Preliminaries (2/2) P C : set of all bucket distributions P ′ associated to C ◮ P ′ distribution on S n ◮ if ( i , j ) ∈ C k × C l ( k < l ), then p ′ j , i = 0
Preliminaries (2/2) P C : set of all bucket distributions P ′ associated to C ◮ P ′ distribution on S n ◮ if ( i , j ) ∈ C k × C l ( k < l ), then p ′ j , i = 0 i , j = P (Σ ′ ( i ) < Σ ′ ( j )) for Σ ′ ∼ P ′ ◮ p ′ P ′ ∈ P C described by d C = � k ≤ K # C k ! − 1 ≤ n ! − 1 parameters
Background on Consensus Ranking Consensus ranking (or ”ranking aggregation”): summarize permutations σ 1 , . . . , σ N by a consensus/median ranking σ ∗ ∈ S n by solving: N � min d ( σ, σ s ) . σ ∈ S n s =1
Background on Consensus Ranking Consensus ranking (or ”ranking aggregation”): summarize permutations σ 1 , . . . , σ N by a consensus/median ranking σ ∗ ∈ S n by solving: N � min d ( σ, σ s ) . σ ∈ S n s =1 If Σ 1 , . . . , Σ N i.i.d. sampled from P (Korba et al., 2017), solve: σ ∈ S n E Σ ∼ P [ d (Σ , σ )] . min
Kemeny medians Particular choice for metric d :
Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } .
Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } . ◮ Kemeny medians are solutions of: min σ ∈ S n E Σ ∼ P [ d τ (Σ , σ )].
Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } . ◮ Kemeny medians are solutions of: min σ ∈ S n E Σ ∼ P [ d τ (Σ , σ )]. Unique Kemeny median σ ∗ P if P strictly stochastically transitive:
Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } . ◮ Kemeny medians are solutions of: min σ ∈ S n E Σ ∼ P [ d τ (Σ , σ )]. Unique Kemeny median σ ∗ P if P strictly stochastically transitive: ◮ p i , j ≥ 1 / 2 and p j , k ≥ 1 / 2 ⇒ p i , k ≥ 1 / 2 ◮ p i , j � = 1 / 2 for all i < j ◮ given by Copeland ranking � σ ∗ P ( i ) = 1 + I { p i , j < 1 / 2 } . j � = i
Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n .
Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n . ◮ C = ( { σ ∗− 1 (1) } , . . . , { σ ∗− 1 ( n ) } )
Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n . ◮ C = ( { σ ∗− 1 (1) } , . . . , { σ ∗− 1 ( n ) } ) ◮ P C = { δ σ ∗ } , hence dimension d C = 0
Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n . ◮ C = ( { σ ∗− 1 (1) } , . . . , { σ ∗− 1 ( n ) } ) ◮ P C = { δ σ ∗ } , hence dimension d C = 0 Problem: generalization for any bucket order.
A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”.
A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”. ◮ Our answer: Wasserstein distance W d , q ( P , P ′ ). Definition � P , P ′ � � � d q (Σ , Σ ′ ) W d , q = Σ ∼ P , Σ ′ ∼ P ′ E inf
A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”. ◮ Our answer: Wasserstein distance W d , q ( P , P ′ ). Definition � P , P ′ � � � d q (Σ , Σ ′ ) W d , q = Σ ∼ P , Σ ′ ∼ P ′ E inf ◮ Why: because it generalizes consensus ranking. Indeed: W d , 1 ( P , δ σ ) = E Σ ∼ P [ d (Σ , σ )] .
A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”. ◮ Our answer: Wasserstein distance W d , q ( P , P ′ ). Definition � P , P ′ � � � d q (Σ , Σ ′ ) W d , q = Σ ∼ P , Σ ′ ∼ P ′ E inf ◮ Why: because it generalizes consensus ranking. Indeed: W d , 1 ( P , δ σ ) = E Σ ∼ P [ d (Σ , σ )] . ◮ Focus on d = d τ and q = 1.
Distortion measure A bucket order C represents well P if small distortion Λ P ( C ). Definition W d τ , 1 ( P , P ′ ) Λ P ( C ) = min P ′ ∈ P C
Distortion measure A bucket order C represents well P if small distortion Λ P ( C ). Definition W d τ , 1 ( P , P ′ ) Λ P ( C ) = min P ′ ∈ P C Explicit expression for Λ P ( C ): Proposition � � Λ P ( C ) = p j , i 1 ≤ k < l ≤ K ( i , j ) ∈C k ×C l .
Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Empirical setting Training sample: Σ 1 , . . . , Σ N i.i.d. from P . ◮ Empirical pairwise probabilities: N � p i , j = 1 � I { Σ s ( i ) < Σ s ( j ) } . N s =1
Empirical setting Training sample: Σ 1 , . . . , Σ N i.i.d. from P . ◮ Empirical pairwise probabilities: N � p i , j = 1 � I { Σ s ( i ) < Σ s ( j ) } . N s =1 ◮ Empirical distortion of any bucket order C : � � Λ N ( C ) = � p j , i = Λ � P N ( C ) . i ≺ C j
Rate bound Empirical distortion minimizer � C K ,λ is solution of: � min Λ N ( C ) , C∈ C K ,λ where C K ,λ set of bucket orders C of size K and shape λ (i.e. # C k = λ k for all 1 ≤ k ≤ K ). Theorem For all δ ∈ (0 , 1) , we have with probability at least 1 − δ : � log( 1 δ ) Λ P ( � C K ,λ ) − inf Λ P ( C ) ≤ β ( n , λ ) × . N C∈ C K ,λ
The Strong Stochastic Transitive Case Assume that P is strongly (and strictly) stochastically transitive i.e.: p i , j ≥ 1 / 2 and p j , k ≥ 1 / 2 ⇒ p i , k ≥ max ( p i , j , p j , k ) .
The Strong Stochastic Transitive Case Assume that P is strongly (and strictly) stochastically transitive i.e.: p i , j ≥ 1 / 2 and p j , k ≥ 1 / 2 ⇒ p i , k ≥ max ( p i , j , p j , k ) . Theorem (i). Λ P ( C ) has a unique minimizer over C K ,λ , denote it C ∗ ( K ,λ ) . (ii). C ∗ ( K ,λ ) is the unique bucket order in C K ,λ agreeing with the Kemeny median.
Consequence: agglomerative algorithm. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset
Experiments Sushi dataset (Kamishima, 2003): ◮ n = 10 sushi dishes ◮ N = 5000 full rankings. sushi dataset K 10 4 3 dimension 4 10 3 5 10 2 6 7 10 1 8 0 10 20 distortion
Thank you!
Recommend
More recommend