analysis of sorting data using multiple
play

Analysis of sorting data using multiple correspondence analysis and - PowerPoint PPT Presentation

Analysis of sorting data using multiple correspondence analysis and a related method E.M. Qannari Ph. Courcoux V. Cariou ONIRIS, Nantes, F-44322, France 1 Sorting data : Procedure n stimuli evaluated by m subjects: Please, sort the


  1. Analysis of sorting data using multiple correspondence analysis and a related method E.M. Qannari Ph. Courcoux V. Cariou ONIRIS, Nantes, F-44322, France 1

  2. Sorting data : Procedure n stimuli evaluated by m subjects: “ Please, sort the stimuli in as many groups as you consider necessary with the understanding that stimuli in the same group are perceived as similar ” Acid Salty Salty Fresh Sweet Bitter Subject 1 Subject 2 Subject m 2

  3. General setting and notations K m groups K 2 groups K j group K 1 group indicators indicators indicators indicators n X 1 X 2 X m X j m categorical variables (represented by their indicator variables) 3

  4. Beer data Data from Abdi H., Chollet S., Valentin D. and Chréa C. (2007) Analysing assessors and products in sorting tasks: DISTATIS,theory and applications. Food Quality and Preference. 4

  5. Data from Abdi et al. (2007) • The data relate to an experiment where ten consumers were instructed to sort eight commercial beers. # Beer Subj1 Subj2 Subj3 Subj4 Subj5 Subj6 Subj7 Subj8 Subj9 Subj10 1 Affligen 1 4 3 4 1 1 2 2 1 3 2 Budweiser 4 5 2 5 2 3 1 1 4 3 3 BucklerBlonde 3 1 2 3 2 4 3 1 1 2 4 Killian 4 2 3 3 1 1 1 2 1 4 5 StLandelin 1 5 3 5 2 1 1 2 1 3 6 BucklerHighland 2 3 1 1 3 5 4 4 3 1 7 FruitDefendu 1 4 3 4 1 1 2 2 2 4 8 EKU28 5 2 4 2 4 2 5 3 4 5 5

  6. Discrimination indices and MCA • Given a (quantitative) variable z and let’s consider (categorical) variable X j :  2 (z/j) : discrimination index : the between groups to total variance ratio associated with z and X j . • We seek z so as to maximize : m    2 I ( z ) ( z / j )  j 1 • It is know that this problem leads to MCA • Subsequent z variables (factors) are sought following the same strategy, under orthogonality constraints. 6

  7. Standardized MCA • Alternatively: m 1    2 I ( z ) ( z / j ) K  j 1 j 7

  8. MCA applied to beer data Reprsentation of the beers axes 3&4 Reprsentation of the beers axes 1&2 Buckler Blonde EKU28 0.4 Fruit Defendu 0.8 0.2 Affligen 0.6 EKU28 Killian 0.0 Buckler Highland axis 2 0.4 axe 4 -0.2 0.2 St Landelin -0.4 0.0 Buckler Highland Budweise r Killian -0.6 St Landelin Buckler Blonde -0.2 Affligen Budweiser Fruit Defendu 0.0 0.2 0.4 0.6 0.8 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 axis 1 axe 3 8

  9. Alternative method: maximizing the between groups variances • X=[X 1 , X 2 , …, X m ] (the indicator variables supposed to be centered) • Let z=Xu and denote by B(z/j) the between groups variance of z with respect to X j . • We define the total between groups variance as: m   B ( z ) B ( z / j )  j 1 9

  10. An alternative method to MCA • We can show that the vector of loadings u is an eigenvector of the matrix (associated with the largest eigenvalue).     m     1  T T T T X X X X X X X PX   j j j j    j 1   m   1  T T with P X X X X j j j j  j 1 • Subsequent z variables can be sought following the same strategy, under orthogonality constraints. 10

  11. The rationale behind the method of analysis • In addition to investigating the relationships between the categorical variables, we take account of the variances of the indicator variables. • VAR(Indicator)=p*(1-p) Variance of an indicator variable 0.25 0.20 p(1-p) 0.15 0.10 0.05 Presence of Presence of 0.00 rare categories rare categories 0.0 0.2 0.4 0.6 0.8 1.0 p 11

  12. Alternative method applied to beer data Representation of the beers axes 1&2 Representation of the beers axes 3&4 Buckler Highland 2.0 Buckler Blonde 1.5 1.5 Fruit Defendu 1.0 Fruit Defendu 1.0 0.5 EKU28 Affligen Affligen axis 2 0.5 axis 4 Killian 0.0 Killian Buckler Highland 0.0 -0.5 St Landelin -0.5 -1.0 St Landelin -1.0 Buckler Blonde -1.5 EKU28 Budweiser Budweiser -1.5 -1 0 1 2 -2 -1 0 1 axis 1 axis 3 12

  13. A continuum approach • MCA z=Xu with u eigenvetor of :  T 1 T ( X X ) X PX • Alternative method z=Xu with u eigenvetor of : X T PX • Regularized MCA: z=Xu with u eigenvetor of :    1     T T 1 X X I X PX 13

  14. continuum approach and Ridge Regularization The eigenvectors of :    1     T T 1 X X I X PX are also eigenvectors of :  1  T T X X kI X PX Ridge regularization   with k     1 14

  15. RMCA (lambda=0.95) Représentation des produits axes 1&2 Représentation des produits axes 3&4 EKU28 Buckler Blonde 1.5 2 1.0 Fruit Defendu 1 Budweiser 0.5 Affligen Buckler Blonde Killian EKU28 axe 2 axe 4 0.0 Buckler Highland Killian 0 St Landelin -0.5 Affligen -1.0 St Landelin -1 Fruit Defendu Buckler Highland -1.5 Budweiser -1 0 1 2 -2 -1 0 1 axe 1 axe 3 15

  16. Property 1 illustrated on beer data The variance of z increases with  Alternative MCA 16

  17. Property 2 illustrated on beer data The between groups variance of z increases with  Alternative MCA 17

  18. Property 3 illustrated on beer data The discrimination index (between to total variance ratio) of z decreases with  0.0 0.2 0.4 0.6 0.8 1.0 Alternative lambda MCA 18

  19. Conclusion • Proposition of an alternative method that handles the problem of rare categories • Further research work is needed to investigate this alternative method. • Proposition of a continuum approach whose end points are MCA and the alternative method. • This approach enjoys interesting properties and can easily be extended to the framework of Generalized Canonical Correlation Analysis. • See how it relates to Regularized MC by Takane and Hwang. 19

  20. TRUGAREZ! 20

  21. Co-occurrence matrix B e e r s 1 2 3 4 5 6 7 8 1 10 1 1 5 6 0 8 0 2 1 10 3 2 5 0 0 1 3 1 3 10 2 2 0 0 0 B e e r s 4 5 2 2 10 5 0 5 1 5 6 5 2 5 10 0 4 0 0 0 0 0 0 10 0 0 6 7 8 0 0 5 4 0 10 0 8 0 1 0 1 0 0 0 10 21

Recommend


More recommend