A Generative Model for Rank Data Based on an Insertion Sorting - PowerPoint PPT Presentation

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks A Generative Model for Rank Data Based on an Insertion Sorting Algorithm J. Jacques & C. Biernacki Laboratory of Mathematics, UMR CNRS 8524 & University Lille 1 (France) COMPSTAT’2010

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Outline Motivation 1 Importance of rank data Models for rank data The Insertion Sorting Rank model 2 Formalization Properties Estimation of the model parameters Numerical illustration 3 Comparison of isr and Mallows Φ A specificity of isr : Initial rank σ Concluding remarks 4

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Importance of rank data Ranking and ordering notations Objects to rank Three holidays destinations: O 1 = Campaign, O 2 = Mountain and O 3 = Sea Rank notations Unformalized: First Sea, second Campaign, and last Mountain Ordering: 1 st 2 nd 3 th x = (3 , 1 , 2) = ( O 3 , O 1 , O 2 ) Ranking: O 1 O 2 O 3 x − 1 = (2 , 3 , 1) = ( 2 nd , 3 th , 1 st )

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Importance of rank data Interest of rank data Human activities involving preferences, attitudes or choices Web Page ranking Sport Sociology Politics Economics Educational Testing Biology Psychology Marketing . . . They often result from a transformation of other kinds of data!

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Models for rank data A model of reference: Mallows Φ ( ∼ 1950) pr( x ; µ, θ ) ∝ exp( − θ d K ( x , µ )) µ = ( µ 1 , . . . , µ m ): Rank of reference parameter ( m objects) d K ( x , µ ): Kendall distance between x = ( x 1 , . . . , x m ) and µ θ ∈ R + : Dispersion parameter θ > 0: µ is the mode and dispersion decreases with θ θ = 0: Uniformity (max. of dispersion) Interesting ... Many other models are linked with it Other distances can be retained (Cayley. . . )

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Models for rank data Motivation for an alternative model Two fundamental hypotheses x results from a sorting algo. based on paired comparisons 1 � = between x and µ only result from bad paired comparisons 2 ⇒ Mallows Φ model can be interpreted as a sorting algorithm where all pairs comparisons are performed. ⇓ Minimizing errors ⇔ minimizing paired comparisons If m ≤ 10, the insertion sorting algorithm has to be retained ⇓ The present work! Formalize, study, estimate and experiment a new model. . .

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Formalization Notations x = ( x 1 , . . . , x m ): Observed rank µ = ( µ 1 , . . . , µ m ): Rank of reference parameter (“true” rank) p ∈ [0 , 1]: Probability of good paired comparison (parameter) σ = ( σ 1 , . . . , σ m ): Initial rank (latent data!) Example: µ = (1 , 2 , 3) and σ = (1 , 3 , 2)

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Formalization Model expression good( x , σ, µ ): Total number of good paired comparisons bad( x , σ, µ ): Total number of bad paired comparisons pr( x | σ ; µ, p ) = p good ( x ,σ,µ ) (1 − p ) bad ( x ,σ,µ ) But σ is latent: Marginal over p ( σ ) = m ! − 1 pr( x ; µ, p ) = m ! − 1 � pr( x | σ ; µ, p ) σ

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Properties Properties of the isr model Well-behaved model µ the anti-mode ( p > 1 µ is the mode and ¯ 2 ) pr( µ ; µ, p ) − pr( x ; µ, p ) is an increasing function of p Identifiability of ( µ, p ) if p > 1 2 Uniform distribution when p = 1 2 Space reduction for p p ∈ [ 1 Symmetry: pr( x ; ¯ µ, 1 − p ) = pr( x ; µ, p ) ⇒ 2 , 1]

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Estimation of the model parameters The EM algorithm Maximizing the likelihood from incomplete data ( x 1 , . . . , x n ) E step: pr( x i | σ ; ( µ, p )) t i σ = pr( σ | x i ; µ, p ) = � s pr( x i | s ; ( µ, p )) M step: µ + given by browsing the half space (symmetry) � n σ t i σ good( x i , σ, µ ) � p + = i =1 � n � σ t i σ (good( x i , σ, µ ) + bad( x i , σ, µ )) i =1 Possibility to restrict the candidates µ . . . . . . to a stochastic subset of ( x 1 , . . . , x n ) related to empirical freq.

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Comparison of isr and Mallows Φ Five real data sets Data set Quizz m n µ ∗ Objects O 1 , . . . , O m Rank the four national football teams according to increasing number of victories in the football World Cup Football Yes 4 40 (1,2,4,3) France, Germany, Brasil, Italy Rank chronologically these Quentin Tarantino movies Cinema Yes 4 40 (3,2,4,1) Inglourious Basterds, Pulp Fiction Reservoir Dogs, Jackie Brown Results of the four nations rugby league, from 1910 to 1999 (except years where they were tie) Rugby 4N No 4 20 None England, Scotland, Ireland, Walles Rank five words according to strength of association (least to most associated) with the target word “Idea” Word Yes 5 98 None Thought, Play, Theory, association Dream, Attention Rank seven sports according to their preference in participating Sports Yes 7 130 None Baseball, Football, Basketball, Tennis, Cycling, Swimming, Jogging

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Comparison of isr and Mallows Φ Results p / ˆ � Data set Model µ ˆ ˆ θ L p-value # µ Time (s) Football (1,2,4,3) 0.834 -89.58 0.001 1 1.6 isr Φ (1,2,4,3) 1.093 -90.22 0.001 1 3.0 Cinema (4,3,2,1) 0.723 -112.99 0.042 14 4.2 isr Φ (4,3,2,1) 0.627 -113.16 0.029 2 7.3 Rugby 4N (2,4,1,3) 0.681 -59.53 0.538 12 2.7 isr Φ (2,4,1,3) 0.528 -59.18 0.395 2 7.0 Word (2,5,4,3,1) 0.879 -283.00 0.001 1 6.0 isr association Φ (2,5,4,3,1) 1.432 -252.57 0.019 1 19.0 Sports (1,3,2,4,5,7,6) 0.564 -1103.50 0.999 1 1353.1 isr Φ (1,3,4,2,5,6,7) 0.080 -1104.24 0.045 11 15842 Both models are hard competitors Computational feasibility, even for m = 7 Efficiency of µ space restriction (both models) p / ˆ p cinema and ˆ θ football > ˆ Consistency in the ˆ θ meaning: ˆ p football > ˆ θ cinema Often both models with same ˆ µ except “Sports”: isr more coherent? Parameter p of isr easier to understand

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks A specificity of isr : Initial rank σ isr detects quizz or no-quizz through ˆ σ ! pr( σ 1 = . . . = σ n = s | x 1 , . . . , x n , σ 1 = . . . = σ n ; ˆ µ, ˆ p ) 0.8 0.6 0.15 0.5 0.6 0.6 0.4 0.10 probability probability probability probability 0.4 0.4 0.3 0.2 0.05 0.2 0.2 0.1 0.0 0.0 0.0 0.00 2 4 6 8 10 12 2 4 6 8 10 12 5 10 15 20 5 10 15 20 rank rank rank rank Football Cinema Word Sports 0.15 probability 0.10 0.05 2 4 6 8 10 12 rank Rugby 4N (no-quizz!)

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks Summary about the isr proposal Optimality when m ≤ 10: Minimize number of errors Meaningful parameters The initial rank σ is taken into account and meaningful Good results when compare to the Mallows Φ Computational feasible for m ≤ 7 in r , probably 10 with c Estimation easy with an EM algorithm Efficient starting strategy for avoiding combinatory about µ Future work m ≤ 10: Try non-optimal but realistic sorting algorithms m > 10: Which sorting algorithm? Computational cost?

A Generative Model for Rank Data Based on an Insertion Sorting - PowerPoint PPT Presentation

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks A Generative Model for Rank Data Based on an Insertion Sorting Algorithm J. Jacques & C. Biernacki Laboratory of Mathematics, UMR CNRS 8524 &

Insertion-Sort M. Esponda Insertion-Sort M. Esponda Insertion-Sort M. Esponda Insertion-Sort

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

generative design systems Generative Brief Design Definitions Workshop Processes

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Insertion Sort Insertion Sort next card? What assumptions do we make at each CSE 680 step?

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Model for U-Insertion RNA Editing Activites needed for U-insertion: Endonuclease to cut the

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

MergeSort [5] In the last class Insertion sort Analysis of insertion sorting algorithm

Sunday Homework 3 : an Diniohlet Allocation Model Latent Generative : Generative model

Customized IOLs- -Post Insertion Post Insertion Customized IOLs Christian A. Sandstedt, Ph.D.

Fault Insertion using IEEE1149.x Implementation of Fault Insertion in a commercial product and

An Optimal Jumper An Optimal Jumper Insertion Algorithm for Antenna Insertion Algorithm for

Global Knot Insertion Algorithms Scott Schaefer Ron Goldman Department of Computer Science

Network Function Insertion for Reliable and Secure Control Messaging Over Commodity Transport

List Order Maintenance E B H D I C F G A Insert(D,I) Build data structure Insert( x , y

Analyzing algorithms, Growth of functions, and Divide-and-conquer Course: CS 5130 - Advanced Data

CS141: Intermediate Data Structures and Algorithms Analysis of Algorithms Amr Magdy Analyzing

Embedded computing projects, Learning TinyOS & nesC Rahav

OPAVES An open platform for autonomous vehicle tinkerers Fabien Chouteau Embedded Software

Compensable transactions Tony Hoare Microsoft Research, Cambridge, England Summary. The concept

Conversations Around Insider and Organizational Threat Luke Osterritter losterritter@cmu.edu

A Generative Model for Rank Data Based on an Insertion Sorting - PowerPoint PPT Presentation

Motivation The Insertion Sorting Rank model Numerical illustration Concluding remarks A Generative Model for Rank Data Based on an Insertion Sorting Algorithm J. Jacques & C. Biernacki Laboratory of Mathematics, UMR CNRS 8524 &

Insertion-Sort M. Esponda Insertion-Sort M. Esponda Insertion-Sort M. Esponda Insertion-Sort

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

generative design systems Generative Brief Design Definitions Workshop Processes

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Insertion Sort Insertion Sort next card? What assumptions do we make at each CSE 680 step?

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Model for U-Insertion RNA Editing Activites needed for U-insertion: Endonuclease to cut the

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

MergeSort [5] In the last class Insertion sort Analysis of insertion sorting algorithm

Sunday Homework 3 : an Diniohlet Allocation Model Latent Generative : Generative model

Customized IOLs- -Post Insertion Post Insertion Customized IOLs Christian A. Sandstedt, Ph.D.

Fault Insertion using IEEE1149.x Implementation of Fault Insertion in a commercial product and

An Optimal Jumper An Optimal Jumper Insertion Algorithm for Antenna Insertion Algorithm for

Global Knot Insertion Algorithms Scott Schaefer Ron Goldman Department of Computer Science

Network Function Insertion for Reliable and Secure Control Messaging Over Commodity Transport

List Order Maintenance E B H D I C F G A Insert(D,I) Build data structure Insert( x , y

Analyzing algorithms, Growth of functions, and Divide-and-conquer Course: CS 5130 - Advanced Data

CS141: Intermediate Data Structures and Algorithms Analysis of Algorithms Amr Magdy Analyzing

Embedded computing projects, Learning TinyOS &amp; nesC Rahav

OPAVES An open platform for autonomous vehicle tinkerers Fabien Chouteau Embedded Software

Compensable transactions Tony Hoare Microsoft Research, Cambridge, England Summary. The concept

Conversations Around Insider and Organizational Threat Luke Osterritter losterritter@cmu.edu

Embedded computing projects, Learning TinyOS & nesC Rahav