Mixtures of Weighted Distance-Based Models for Ranking Data Paul H. Lee ∗ Philip L. H. Yu The University of Hong Kong 1 / 38
Outline of presentation Introduction ■ Introduction Distance-Based Models for Ranking Data ■ Distance-Based Models for Ranking Data Mixtures of Weighted Distance-based ■ Weighted Distance-based Models (with application) Models Conclusions and ■ Simulation Studies Further Research ■ Conclusions and Further Research ■ Question & Answer 2 / 38
Introduction Distance-Based Models for Ranking Data Mixtures of Weighted Distance-based Models Conclusions and Further Research Introduction 3 / 38
Introduction Introduction ■ What is ranking data? Distance-Based Models for Ranking Data ◆ Rank a set of items Mixtures of Weighted ◆ Types of soft drinks Distance-based Models Coke, 7-up, fanta Conclusions and Further Research ◆ Political goals ◆ Election candidates World footballer of the year 4 / 38
Introduction Introduction ■ Notations used in ranking literature Distance-Based Models for Ranking Data ◆ π : ranking Mixtures of π ( i ) is the rank assigned to item i Weighted Distance-based π = (2,4,1,3) Models Conclusions and Item 1 rank 2nd, item 2 rank 4th Further Research ◆ π − 1 : ordering π − 1 ( i ) is the item having rank i π − 1 = (2,4,1,3) Item 2 rank 1st, item 4 rank 2nd 5 / 38
Examples of Ranking Data Introduction ■ Marketing research: Distance-Based Models for Ranking Data ◆ Green and Rao (1972): to rank 15 breakfast snack Mixtures of food items including toast, donut, etc. Weighted Distance-based Models Conclusions and ■ Travel behavior and mode of transportation: Further Research ◆ Beggs, et al. (1981), Hausman, et al. (1987): to rank order 16 car designs which differed over 9 attibutes. 6 / 38
Examples of Ranking Data Introduction ■ Politic: Distance-Based Models for Ranking Data ◆ Croon (1989): to rank 4 political goals: Order, Say, Mixtures of Price, and Freedom. Weighted Distance-based Models Conclusions and ■ Horse racing: Further Research ◆ Lo et al. (1994): to predict the top two winning horses. 7 / 38
Types of Ranking Data Given a set of J items. There are two types of ranking data: Introduction Distance-Based Models for Ranking ■ Complete rankings (rank all J items) Data Mixtures of Weighted ■ Incomplete (or Partial) rankings Distance-based Models Conclusions and ◆ Top q rankings (select the top q items and rank them) Further Research When q = 1 , top q ranking = discrete choice ◆ Subset rankings (select a subset of m items and rank them) When m = 2 , subset ranking = paired comparison When m = 3 , subset ranking = triple ranking 8 / 38
Problems of Interest Introduction ■ Graphical representation of ranking data Distance-Based Models for Ranking Data ◆ visualize rankings given by judges preferably in a Mixtures of low-dimensional space Weighted Distance-based Models ◆ existing work: Dual scaling (Nishisato, 1994), vector models Conclusions and (Tucker, 1960; Carroll, 1980; Yu and Chan, 2001), ideal point Further Research models (Coombs, 1950; De Soete, et al., 1986; Yu, Chung and Leung, 2008), polyhedron representation (Thompson, 2003) 9 / 38
Problems of Interest Introduction ■ Factor analysis Distance-Based Models for Ranking Data ◆ identify latent factors that affect ranking decision. Mixtures of Weighted ◆ existing work: Yu, Lam and Lo (2005) Distance-based Models Conclusions and ■ Cluster analysis / Latent class analysis Further Research ◆ find group of judges with similar rank-order preference within clusters. ◆ recent work: Murphy and Martin (2003), Lee and Yu (2010) 10 / 38
Problems of Interest Introduction ■ Modelling Distance-Based Models for Ranking Data ◆ determine probabilistic structure of probability of Mixtures of observing a ranking Weighted Distance-based Models ◆ existing work: a lot, see Marden (1995) for a review, Yu (2000) Conclusions and Further Research ◆ Different types of statistical models for ranking data ■ Order-statistics ■ Paired comparison ■ Distance-based ■ Multistage ◆ This talk: a weighted distance-based model? ◆ mixtures models? 11 / 38
Introduction Introduction ■ Properties of distance measure Distance-Based Models for Ranking Data ◆ d ( π i , π i ) = 0 Mixtures of Weighted ◆ d ( π i , π j ) = d ( π j , π i ) Distance-based Models ◆ d ( π i , π j ) > 0 if π i � = π j Conclusions and Further Research ■ Property of metric Triangular inequality d ( π i , π k ) ≤ d ( π i , π j ) + d ( π j , π k ) 12 / 38
Introduction Distance-Based Models for Ranking Data Mixtures of Weighted Distance-based Models Conclusions and Further Research Distance-Based Models for Ranking Data 13 / 38
Distance-Based Models for Ranking Data Introduction ■ Model assumption: Distance-Based Models for Ranking Data ◆ Probability of observing a ranking π depends on Mixtures of its distance to the modal ranking π 0 Weighted Distance-based Models ◆ The effect of distance is controlled by Conclusions and the dispersion parameter λ Further Research ■ Model specification: ◆ P ( π | λ, π 0 ) = C ( λ ) e − λd ( π , π 0 ) ◆ λ > 0 for identification problem ◆ d ( π , π 0 ) is the distance between π and π 0 ◆ C ( λ ) is the proportionality constant 14 / 38
Distance-Based Models for Ranking Data Introduction ■ Different types of distance Distance-Based Models for Ranking Data ◆ Kendall’s tau Mixtures of T ( π , π 0 ) = � i<j I { [ π ( i ) − π ( j )][ π 0 ( i ) − π 0 ( j )] } Weighted Distance-based Used in Mallow’s φ -model (1957) Models P ( π | φ, π 0 ) = C ( φ ) φ T ( π , π 0 ) Conclusions and Further Research ◆ Minimum number of pairwise adjacent transpositions needed to transform π to π 0 ◆ Spearman’s rho square R 2 ( π , π 0 ) = � i [ π ( i ) − π 0 ( i )] 2 Used in Mallow’s θ -model (1957) P ( π | θ, π 0 ) = C ( θ ) θ R 2 ( π , π 0 ) A distance but not a metric 15 / 38
Distance-Based Models for Ranking Data Introduction ■ Different types of distance Distance-Based Models for Ranking Data ◆ Spearman’s rho Mixtures of i [ π ( i ) − π 0 ( i )] 2 � 0 . 5 �� R ( π , π 0 ) = Weighted Distance-based A metric Models Conclusions and ◆ Spearman’s footrule Further Research F ( π , π 0 ) = � i | π ( i ) − π 0 ( i ) | ■ Cayley’s distance C ( π , π 0 ) = minimum number of transpositions needed to transform π to π 0 16 / 38
Distance-Based Models for Ranking Data Introduction ■ Different types of distance Distance-Based Models for Ranking Data ◆ Proportionality constant C ( λ ) is difficult to compute Mixtures of Weighted ◆ Close form solution available only for: Distance-based Models Kendall’s tau Conclusions and Cayley’s distance Further Research ◆ Can be solved numerically by 1 C ( λ ) = � k ! i =1 e − λd ( π i, π 0) ■ Computational time increases exponentially when number of items increase 17 / 38
Distance-Based Models for Ranking Data Introduction ■ φ -component model Distance-Based Models for Ranking Data ◆ Extension of Mallow’s φ -model Mixtures of (Fligner and Verducci, 1988) Weighted Distance-based Models ◆ For ranking of k items, Kendall’s tau can be Conclusions and decomposed Further Research T ( π , π 0 ) = � k − 1 i =1 V i All V ’s are independent ■ V 1 = m means the m + 1 st best item, with reference to π 0 , is chosen in π ■ This item is dropped and will not be considered anymore ■ V 2 = m means the m + 1 st best item is chosen in the remaining items ■ The process is repeated until all items are ranked 18 / 38
Distance-Based Models for Ranking Data Introduction ■ φ -component model Distance-Based Models for Ranking Data ◆ The V ’s can be weighted : Mixtures of � k − 1 i =1 θ i V i Weighted Distance-based Models ◆ The resulting model is: Conclusions and P ( π | λ, π 0 ) = C ( λ ) e − � k − 1 i =1 λ i V i Further Research λ = { λ i , i = 1 , ..., k − 1 } ◆ Also named k − 1 parameter model ◆ Under the re-parameterizations φ i = e − λ i , i = 1 , ...k − 1 , the resulting model will be: P ( π | φ, π 0 ) = C ( φ ) � k − 1 i =1 φ iV i 19 / 38
Distance-Based Models for Ranking Data Introduction ■ The model has closed form proportionality constant if the Distance-Based Models for Ranking V ’s are independent Data Mixtures of ■ Only Kendall’s tau and Cayley’s distance can be Weighted Distance-based decomposed in such form Models Conclusions and Further Research ■ The extension based on Cayley’s distance is named Cyclic structure model ■ The model based on decomposition of Kendall’s tau is more commonly used than Cayley’s distance 20 / 38
Recommend
More recommend