dimensionality reduction and bucket ranking a mass
play

Dimensionality Reduction and (Bucket) Ranking: a Mass - PowerPoint PPT Presentation

Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach Mastane Achab, Anna Korba, Stephan Cl emen con DA2PL2018, Poznan, Poland Outline Introduction Dimensionality Reduction on S n Empirical Distortion


  1. Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach Mastane Achab, Anna Korba, Stephan Cl´ emen¸ con DA2PL’2018, Poznan, Poland

  2. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  3. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  4. Introduction (1/2) ◮ Permutations over n items � n � = { 1 , . . . , n }

  5. Introduction (1/2) ◮ Permutations over n items � n � = { 1 , . . . , n } ◮ Number of permutations explodes: # S n = n !

  6. Introduction (1/2) ◮ Permutations over n items � n � = { 1 , . . . , n } ◮ Number of permutations explodes: # S n = n ! ◮ Distribution P on S n : n ! − 1 parameters

  7. Introduction (2/2) ◮ Question: ”How to summarize P ?”

  8. Introduction (2/2) ◮ Question: ”How to summarize P ?” ◮ Answer: dimensionality reduction

  9. Introduction (2/2) ◮ Question: ”How to summarize P ?” ◮ Answer: dimensionality reduction ◮ Problem: no vector space structure for permutations

  10. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  11. Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n �

  12. Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n � ◮ ∪ K k =1 C k = � n �

  13. Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n � ◮ ∪ K k =1 C k = � n � ◮ K : ”size” of C

  14. Preliminaries (1/2) Bucket order C = ( C 1 , . . . , C K ): ordered partition of � n � ◮ C i ’s disjoint non empty subsets of � n � ◮ ∪ K k =1 C k = � n � ◮ K : ”size” of C ◮ (# C 1 , . . . , # C K ): ”shape” of C Partial order: ” i is ranked lower than j in C ” if ∃ k < l s.t. ( i , j ) ∈ C k × C l .

  15. Preliminaries (2/2) P C : set of all bucket distributions P ′ associated to C ◮ P ′ distribution on S n

  16. Preliminaries (2/2) P C : set of all bucket distributions P ′ associated to C ◮ P ′ distribution on S n ◮ if ( i , j ) ∈ C k × C l ( k < l ), then p ′ j , i = 0

  17. Preliminaries (2/2) P C : set of all bucket distributions P ′ associated to C ◮ P ′ distribution on S n ◮ if ( i , j ) ∈ C k × C l ( k < l ), then p ′ j , i = 0 i , j = P (Σ ′ ( i ) < Σ ′ ( j )) for Σ ′ ∼ P ′ ◮ p ′ P ′ ∈ P C described by d C = � k ≤ K # C k ! − 1 ≤ n ! − 1 parameters

  18. Background on Consensus Ranking Consensus ranking (or ”ranking aggregation”): summarize permutations σ 1 , . . . , σ N by a consensus/median ranking σ ∗ ∈ S n by solving: N � min d ( σ, σ s ) . σ ∈ S n s =1

  19. Background on Consensus Ranking Consensus ranking (or ”ranking aggregation”): summarize permutations σ 1 , . . . , σ N by a consensus/median ranking σ ∗ ∈ S n by solving: N � min d ( σ, σ s ) . σ ∈ S n s =1 If Σ 1 , . . . , Σ N i.i.d. sampled from P (Korba et al., 2017), solve: σ ∈ S n E Σ ∼ P [ d (Σ , σ )] . min

  20. Kemeny medians Particular choice for metric d :

  21. Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } .

  22. Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } . ◮ Kemeny medians are solutions of: min σ ∈ S n E Σ ∼ P [ d τ (Σ , σ )].

  23. Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } . ◮ Kemeny medians are solutions of: min σ ∈ S n E Σ ∼ P [ d τ (Σ , σ )]. Unique Kemeny median σ ∗ P if P strictly stochastically transitive:

  24. Kemeny medians Particular choice for metric d : ◮ Kendall’s τ distance d τ ( σ, σ ′ ) = � i < j I { ( σ ( i ) − σ ( j ))( σ ′ ( i ) − σ ′ ( j )) < 0 } . ◮ Kemeny medians are solutions of: min σ ∈ S n E Σ ∼ P [ d τ (Σ , σ )]. Unique Kemeny median σ ∗ P if P strictly stochastically transitive: ◮ p i , j ≥ 1 / 2 and p j , k ≥ 1 / 2 ⇒ p i , k ≥ 1 / 2 ◮ p i , j � = 1 / 2 for all i < j ◮ given by Copeland ranking � σ ∗ P ( i ) = 1 + I { p i , j < 1 / 2 } . j � = i

  25. Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n .

  26. Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n . ◮ C = ( { σ ∗− 1 (1) } , . . . , { σ ∗− 1 ( n ) } )

  27. Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n . ◮ C = ( { σ ∗− 1 (1) } , . . . , { σ ∗− 1 ( n ) } ) ◮ P C = { δ σ ∗ } , hence dimension d C = 0

  28. Bucket orders of size n Consensus ranking: extreme case of bucket order C of size n . ◮ C = ( { σ ∗− 1 (1) } , . . . , { σ ∗− 1 ( n ) } ) ◮ P C = { δ σ ∗ } , hence dimension d C = 0 Problem: generalization for any bucket order.

  29. A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”.

  30. A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”. ◮ Our answer: Wasserstein distance W d , q ( P , P ′ ). Definition � P , P ′ � � � d q (Σ , Σ ′ ) W d , q = Σ ∼ P , Σ ′ ∼ P ′ E inf

  31. A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”. ◮ Our answer: Wasserstein distance W d , q ( P , P ′ ). Definition � P , P ′ � � � d q (Σ , Σ ′ ) W d , q = Σ ∼ P , Σ ′ ∼ P ′ E inf ◮ Why: because it generalizes consensus ranking. Indeed: W d , 1 ( P , δ σ ) = E Σ ∼ P [ d (Σ , σ )] .

  32. A Mass Transportation Approach ◮ Question: ”How to quantify approximation error between orginal distrib. P and bucket distrib. P ′ ∈ P C ?”. ◮ Our answer: Wasserstein distance W d , q ( P , P ′ ). Definition � P , P ′ � � � d q (Σ , Σ ′ ) W d , q = Σ ∼ P , Σ ′ ∼ P ′ E inf ◮ Why: because it generalizes consensus ranking. Indeed: W d , 1 ( P , δ σ ) = E Σ ∼ P [ d (Σ , σ )] . ◮ Focus on d = d τ and q = 1.

  33. Distortion measure A bucket order C represents well P if small distortion Λ P ( C ). Definition W d τ , 1 ( P , P ′ ) Λ P ( C ) = min P ′ ∈ P C

  34. Distortion measure A bucket order C represents well P if small distortion Λ P ( C ). Definition W d τ , 1 ( P , P ′ ) Λ P ( C ) = min P ′ ∈ P C Explicit expression for Λ P ( C ): Proposition � � Λ P ( C ) = p j , i 1 ≤ k < l ≤ K ( i , j ) ∈C k ×C l .

  35. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  36. Empirical setting Training sample: Σ 1 , . . . , Σ N i.i.d. from P . ◮ Empirical pairwise probabilities: N � p i , j = 1 � I { Σ s ( i ) < Σ s ( j ) } . N s =1

  37. Empirical setting Training sample: Σ 1 , . . . , Σ N i.i.d. from P . ◮ Empirical pairwise probabilities: N � p i , j = 1 � I { Σ s ( i ) < Σ s ( j ) } . N s =1 ◮ Empirical distortion of any bucket order C : � � Λ N ( C ) = � p j , i = Λ � P N ( C ) . i ≺ C j

  38. Rate bound Empirical distortion minimizer � C K ,λ is solution of: � min Λ N ( C ) , C∈ C K ,λ where C K ,λ set of bucket orders C of size K and shape λ (i.e. # C k = λ k for all 1 ≤ k ≤ K ). Theorem For all δ ∈ (0 , 1) , we have with probability at least 1 − δ : � log( 1 δ ) Λ P ( � C K ,λ ) − inf Λ P ( C ) ≤ β ( n , λ ) × . N C∈ C K ,λ

  39. The Strong Stochastic Transitive Case Assume that P is strongly (and strictly) stochastically transitive i.e.: p i , j ≥ 1 / 2 and p j , k ≥ 1 / 2 ⇒ p i , k ≥ max ( p i , j , p j , k ) .

  40. The Strong Stochastic Transitive Case Assume that P is strongly (and strictly) stochastically transitive i.e.: p i , j ≥ 1 / 2 and p j , k ≥ 1 / 2 ⇒ p i , k ≥ max ( p i , j , p j , k ) . Theorem (i). Λ P ( C ) has a unique minimizer over C K ,λ , denote it C ∗ ( K ,λ ) . (ii). C ∗ ( K ,λ ) is the unique bucket order in C K ,λ agreeing with the Kemeny median.

  41. Consequence: agglomerative algorithm. Outline Introduction Dimensionality Reduction on S n Empirical Distortion Minimization Numerical Experiments on a Real-world Dataset

  42. Experiments Sushi dataset (Kamishima, 2003): ◮ n = 10 sushi dishes ◮ N = 5000 full rankings. sushi dataset K 10 4 3 dimension 4 10 3 5 10 2 6 7 10 1 8 0 10 20 distortion

  43. Thank you!

Recommend


More recommend