Singly and doubly ordered cumulative correspondence analysis. L. D’Ambra*, E. Beh** and I. Camminatiello* *University of Naples Federico II (Italy) ** University of Newcasle (Australia) dambra@unina.it
Outline A short review Singly ordered cumulative correspondence analysis: methodology and application in industrial experiment Doubly ordered cumulative correspondence analysis: some developments and application with Van Rijckevorsel’s data We also propose an unified approach
Review (1/3) In multidimensional data analysis for studying the association between two categorical variables, Correspondence Analysis (CA) is one of the most popular tool. This method is based on chi-squared Drawback It does not take in consideration the ordered nature of the categories
Review (2/3) There are some contributions that deal with ordinal categorical variables, including those of Parsa and Smith (1993), Ritov and Gilula (1993) and Schriever (1983) These procedures involve constraining the output obtained from applying singular value decomposition (SVD) so that the coordinates in the first dimension have an ordered structure. An alternative approach applies moment decomposition (MD - Beh, 1997) or hybrid decomposition (HD - Beh, 2004) that involve using the orthogonal polyonomials in order to detect linear , quadratic , cubic components
Review (3/3) In some industrial experiments, sometimes the output consists of categorical data (contingency table ) with an ordering in the categories. For analyzing such data, Taguchi (1966, 1974) proposed the Accumulation Analysis method as an alternative to Pearson's chi- squared test. His motivation for recommending this technique appears to be its similarity to ANOVA for quantitative variables. More recently, Light and Margolin (1971) proposed a method called CATANOVA by defining an appropriate measure of variation for categorical data. Unlike these methods Taguchi considers situations with ordered categories and does ANOVA on the cumulative frequencies
Aim of our paper In this paper we explore the development of correspondence analysis which takes into account the presence of ordered variables by considering the cumulative sum of cell frequencies across the variables.
Singly ordered cumulative correspondence analysis Beh, D’Ambra, Simonetti (Carme 2007; Communication in Statistics 2011) performed correspondence analysis when cross-classified variables have an ordered structure by considering the Taguchi’s statistic. Taguchi’s statistic is an appropriate measure of non - symmetric association for two categorical variables of which one is on ordinal scale. It takes into account the presence of an ordered variable by considering the cumulative sum of cell frequencies across the variable.
Notation (1/3) N n ij the absolute two-way contingency table that cross-classifies n units according to I ordered row categories and J ordered column categories n 1 the relative two-way contingency table P N n n , the row and column marginals. j i A triangular matrix of 1’s with the last J -th row is removed so that it is of dimension (J -1) × J. M J A triangular matrix of 1’s of dimension J x J M A triangular matrix of 1’s of dimension I x I L
Notation (2/3) r c the vectors with the marginal frequencies of P and the diagonal matrices with the marginal D D and r c frequencies of P the cumulative frequencies , , , z n z n n z n n i 1 i 1 i 2 i 1 i 2 iJ i 1 iJ the cumulative column proportions n n n n n 1 1 2 1 J , , , d d d 1 2 J n n n
Taguchi’s statistic (1/2). Taguchi (1966) proposed the following statistic 2 J 1 I z ij T w n d j i j n j 1 i 1 i w , , w are weights >0. Two choices are 1 J 1 1 possible or w j 1 J w d 1 d j j j
Taguchi’s statistic (2/2). The properties of T, Taguchi'(1966, 1974) "cumulative-sums' statistic obtained by assigning a weight to each column that is inversely proportional to its conditional expectation of the j-th term (conditional on the given marginals) In this paper we use this weighting system . 1 1 ........ 1 .... 1 w d d j J j j j A simpler statistic, T, which assigns each column constant weights 1/J
The Pearson chi-squared statistic and Taguchi’s statistic Nair (1987) demonstrated that the link between the Pearson chi- squared statistic and Taguchi’s statistic is J 1 2 T j j 1 2 is the Pearson chi-squared statistic for the contingency j table obtained by aggregating column categories 1 to j, and aggregating the column categories j+1 to J. For this reason, it is also referred to as cumulative chi- squared statistic.
Taguchi’s statistic in matrix notation (1/2) The Taguchi’s statistic may be expressed in matrix notation by 1 2 1 2 T T D NA WAN D T n trace r r W (J-1,J-1) is the diagonal matrix of weights A (J-1,J) is the matrix involving the cumulative column proportions 1 d d d d 1 1 1 1 1 d 1 d d d 2 2 2 2 A 1 d 1 d 1 d d J 1 J 1 J 1 J 1
Taguchi’s statistic in matrix notation (2/2) Considering that T d M c A M d1 J J The Taguchi’s statistic after some algebra may be rewritten by T 1 2 1 2 T T T T D NM rd W NM rd D n trace n n r J J r T 1 2 T T T T 1 2 n trace D n PM n rd W n PM n rd D r J J r T T 1 2 T T T 1 2 n trace D P rc M WM P rc D r J J r T 1 2 1 T T T 1 2 n trace D D P 1 c M WM P rc D r r r J J r T 1 2 T 1 T 1 2 T n trace D P rc D P rc D (C.A.) r c r
Approach proposed Beh, D’Ambra, Simonetti (Carme 2007 and Communication in Statistics 2011) carried out CA when cross-classified variables have an ordered structure by considering the Taguchi’s statistic. In terms of the Taguchi's statistic, Beh et al . (2010) perform SVD on 1 2 T T 1 2 D P rc M W r J X 1 2 1 T T 1 2 D D P 1 c M W r r r J Matrix X is centered
Special cases and Properties of Cumulative Correspondence analysis For I > 2 and in the case of EQUIPROBABLE categories the eigenvectors are given by CHEBYCHEV POLYNOMIALS For I > 2, and in the equiprobable case, the first component (location or linear ) is proportional the Kruskal-Wallis statistic for contingency tables Similarly the second component ( dispersion or quadratic ) is the generalizzation of the grouped data version of Mood's (1954) statistic. In general case this is no true In the case of 2xJ table we have two components: the first component ( linear ) of Taguchi’statistics is equivalent to Wilcoxon statistics The second component (Quadratic ) is equivalent to Mood’s test (1954) (See Nair 1987) See Beh- D’Ambra - Simonetti in Communication in Statistics 2011 Coordinates Distances Properties of decomposition of Taguchi’s Statistic and Non Symmetrical Correspondence Analysis (NSCA)
Relationship between the coordinates in the cumulative analysis and in the classical C.A (1/2) For cumulative analysis we may write the row coordinates by 1 T T 1 2 O D P 1 c M W V r r r J For classical CA the row coordinates are defined by V ~ ~ 1 T O D P 1 c r r r V ~ are the matrices containing the right singular vectors for , V cumulative analysis and classical CA, respectively . ~ ~ therefore T T 1 2 O O V M W V r r J This shows that you may be able to go from classical CA coordinates to cumulative coordinates easily.
Relationship between the coordinates in the cumulative analysis and in the classical analysis (2/2) Using the same argument we can obtain the classical coordinates from the cumulative coordinates from the relationship V ~ ~ 1 T T 1 2 O O V M W r r J
Recommend
More recommend