Analysis of sorting data using multiple correspondence analysis and a related method E.M. Qannari Ph. Courcoux V. Cariou ONIRIS, Nantes, F-44322, France 1
Sorting data : Procedure n stimuli evaluated by m subjects: “ Please, sort the stimuli in as many groups as you consider necessary with the understanding that stimuli in the same group are perceived as similar ” Acid Salty Salty Fresh Sweet Bitter Subject 1 Subject 2 Subject m 2
General setting and notations K m groups K 2 groups K j group K 1 group indicators indicators indicators indicators n X 1 X 2 X m X j m categorical variables (represented by their indicator variables) 3
Beer data Data from Abdi H., Chollet S., Valentin D. and Chréa C. (2007) Analysing assessors and products in sorting tasks: DISTATIS,theory and applications. Food Quality and Preference. 4
Data from Abdi et al. (2007) • The data relate to an experiment where ten consumers were instructed to sort eight commercial beers. # Beer Subj1 Subj2 Subj3 Subj4 Subj5 Subj6 Subj7 Subj8 Subj9 Subj10 1 Affligen 1 4 3 4 1 1 2 2 1 3 2 Budweiser 4 5 2 5 2 3 1 1 4 3 3 BucklerBlonde 3 1 2 3 2 4 3 1 1 2 4 Killian 4 2 3 3 1 1 1 2 1 4 5 StLandelin 1 5 3 5 2 1 1 2 1 3 6 BucklerHighland 2 3 1 1 3 5 4 4 3 1 7 FruitDefendu 1 4 3 4 1 1 2 2 2 4 8 EKU28 5 2 4 2 4 2 5 3 4 5 5
Discrimination indices and MCA • Given a (quantitative) variable z and let’s consider (categorical) variable X j : 2 (z/j) : discrimination index : the between groups to total variance ratio associated with z and X j . • We seek z so as to maximize : m 2 I ( z ) ( z / j ) j 1 • It is know that this problem leads to MCA • Subsequent z variables (factors) are sought following the same strategy, under orthogonality constraints. 6
Standardized MCA • Alternatively: m 1 2 I ( z ) ( z / j ) K j 1 j 7
MCA applied to beer data Reprsentation of the beers axes 3&4 Reprsentation of the beers axes 1&2 Buckler Blonde EKU28 0.4 Fruit Defendu 0.8 0.2 Affligen 0.6 EKU28 Killian 0.0 Buckler Highland axis 2 0.4 axe 4 -0.2 0.2 St Landelin -0.4 0.0 Buckler Highland Budweise r Killian -0.6 St Landelin Buckler Blonde -0.2 Affligen Budweiser Fruit Defendu 0.0 0.2 0.4 0.6 0.8 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 axis 1 axe 3 8
Alternative method: maximizing the between groups variances • X=[X 1 , X 2 , …, X m ] (the indicator variables supposed to be centered) • Let z=Xu and denote by B(z/j) the between groups variance of z with respect to X j . • We define the total between groups variance as: m B ( z ) B ( z / j ) j 1 9
An alternative method to MCA • We can show that the vector of loadings u is an eigenvector of the matrix (associated with the largest eigenvalue). m 1 T T T T X X X X X X X PX j j j j j 1 m 1 T T with P X X X X j j j j j 1 • Subsequent z variables can be sought following the same strategy, under orthogonality constraints. 10
The rationale behind the method of analysis • In addition to investigating the relationships between the categorical variables, we take account of the variances of the indicator variables. • VAR(Indicator)=p*(1-p) Variance of an indicator variable 0.25 0.20 p(1-p) 0.15 0.10 0.05 Presence of Presence of 0.00 rare categories rare categories 0.0 0.2 0.4 0.6 0.8 1.0 p 11
Alternative method applied to beer data Representation of the beers axes 1&2 Representation of the beers axes 3&4 Buckler Highland 2.0 Buckler Blonde 1.5 1.5 Fruit Defendu 1.0 Fruit Defendu 1.0 0.5 EKU28 Affligen Affligen axis 2 0.5 axis 4 Killian 0.0 Killian Buckler Highland 0.0 -0.5 St Landelin -0.5 -1.0 St Landelin -1.0 Buckler Blonde -1.5 EKU28 Budweiser Budweiser -1.5 -1 0 1 2 -2 -1 0 1 axis 1 axis 3 12
A continuum approach • MCA z=Xu with u eigenvetor of : T 1 T ( X X ) X PX • Alternative method z=Xu with u eigenvetor of : X T PX • Regularized MCA: z=Xu with u eigenvetor of : 1 T T 1 X X I X PX 13
continuum approach and Ridge Regularization The eigenvectors of : 1 T T 1 X X I X PX are also eigenvectors of : 1 T T X X kI X PX Ridge regularization with k 1 14
RMCA (lambda=0.95) Représentation des produits axes 1&2 Représentation des produits axes 3&4 EKU28 Buckler Blonde 1.5 2 1.0 Fruit Defendu 1 Budweiser 0.5 Affligen Buckler Blonde Killian EKU28 axe 2 axe 4 0.0 Buckler Highland Killian 0 St Landelin -0.5 Affligen -1.0 St Landelin -1 Fruit Defendu Buckler Highland -1.5 Budweiser -1 0 1 2 -2 -1 0 1 axe 1 axe 3 15
Property 1 illustrated on beer data The variance of z increases with Alternative MCA 16
Property 2 illustrated on beer data The between groups variance of z increases with Alternative MCA 17
Property 3 illustrated on beer data The discrimination index (between to total variance ratio) of z decreases with 0.0 0.2 0.4 0.6 0.8 1.0 Alternative lambda MCA 18
Conclusion • Proposition of an alternative method that handles the problem of rare categories • Further research work is needed to investigate this alternative method. • Proposition of a continuum approach whose end points are MCA and the alternative method. • This approach enjoys interesting properties and can easily be extended to the framework of Generalized Canonical Correlation Analysis. • See how it relates to Regularized MC by Takane and Hwang. 19
TRUGAREZ! 20
Co-occurrence matrix B e e r s 1 2 3 4 5 6 7 8 1 10 1 1 5 6 0 8 0 2 1 10 3 2 5 0 0 1 3 1 3 10 2 2 0 0 0 B e e r s 4 5 2 2 10 5 0 5 1 5 6 5 2 5 10 0 4 0 0 0 0 0 0 10 0 0 6 7 8 0 0 5 4 0 10 0 8 0 1 0 1 0 0 0 10 21
Recommend
More recommend