data mining
play

Data Mining and Multiple Ordered Correspondence via Polynomial - PowerPoint PPT Presentation

Data Mining and Multiple Ordered Correspondence via Polynomial Transformations Rosaria Lombardo Second University of Naples, Via Gran Priorato di Malta, 81043 Capua (CE) -Italy- rosaria.lombardo@unina2.it What will we consider? Data


  1. Data Mining and Multiple Ordered Correspondence via Polynomial Transformations Rosaria Lombardo Second University of Naples, Via Gran Priorato di Malta, 81043 Capua (CE) -Italy- rosaria.lombardo@unina2.it

  2. What will we consider?  Data Mining and Customer Interaction System Data  Exploring huge data sets  Customer Satisfaction and Job Satisfaction studies  Collecting ordered categorical variables  Ordered multiple correspondence analysis -OMCA-  Singular Value Decomposition and Hybrid Value Decomposition  Applications of OMCA to customer satisfaction and job satisfaction data sets

  3. The Learning Management System Data • The Learning Management System data and the subsequent Customer Interaction System data can help to provide “ Early Warning System data ” for risk detection in enterprises • various EWSs have been established (Kim et al. , 2004): for detecting fraud, for credit-risk evaluation (Phua, et al. , 2009) , to detection of risks potentially existing in medical organizations, to support decision making in customer-centric planning tasks (Lessman & Vob, 2009) • we focus on EWS of LMSD for customer-centric planning tasks, to develop exploratory tools that identify at-risk customers and allow for more timely interventions

  4. Multiple Correspondence Analysis X k  indicator matrix of dimension n x J k of the k. th variable 1 2 X X 1 X X X p 2 j = n Aim: to analyse large survey data: X =[ X 1 |..| X p ] complete disjunctive/ indicator matrices of P variables rows  individuals/observations /units  columns  ordered categories  preference data  replying  questionnaire Fisher (1940), Guttman (1941), Hayashi (1952), Benzecri (1973) Gifi(1981), Greenacre (1984), etc …

  5. Multiple CA via the Indicator Super-Matrix   1    1 / 2    X  ' SVD XD Column Singular    '     Vectors p n D I    Row Singular Vectors ' I   where D is the super-diagonal matrix 0 D 1    D    0  D 2 We could also consider the Burt matrix constructed for two variables P=2 P D 1 B = X’X  X’ 1 X 2 ' P D 2     , , D diag p p   X’ 2 X 1 1 k J k k   Remember that the sum of squares of a non-diagonal sub- 2   matrix equals the Pearson chi-squared statistic divided by n Total Inertia trace X (Bekker & de Leeuw ,1988)

  6. Ordered MCA  Hybrid Value Decomposition (Lombardo & Meulman, 2010, Lombardo & Beh, 2010) – combining features of Singular Value Decomposition and Bivariate Moment Decomposition (Best & Rayner, 1996; Beh, 1997;1998)  Tools: orthogonal polynomials for ordered categorical variables by Emerson (1968), singular vectors of indicator super-matrix  Visualising the relationships among ordinal-scale categories and simultaneously representing the units in clusters  there is extra information to be obtained, concerning the statistical significance of the decomposed inertia Data trend interpretation

  7. Hybrid Decomposition for OMCA   1    1 / 2   Z  ' XD HD   Orthogonal Polynomials  D   ' I (categories)   p n Singular Vectors (for rows, or where    ' I individuals) 1  1 / 2    ' Z XD p n and D is the super-diagonal matrix consisting of orthogonal polynomials for the ordinal variables       2     Total Inertia trace Z' Z trace ZZ' trace X

  8. Properties of OMCA OMCA  permits to decompose the inertia in function of eigenvalues and of polynomial trasformations of different degree associated to the ordered categorical variables Property 1 the total inertia can be expressed in terms of squared z-values (bivariate moments) and eigenvalues  ( 1 ) J p M M k 2 2        Total Inertia z mv X k m     1 1 1 1 m k v m k Where M=J-p is the number of non-trivial solutions We can compute the contribution of the linear component to the overall inertia Property 2 it is possible to identify which polynomial component (linear, quadratic or higher order) more contributes to the eigenvalue and so to the inertia of each axis. 2 2 2 2      For example the first non trivial eigenvalue ... z z z  11 12 1 , J p X 1 See also Beh (2001) for p = 2

  9. Graphical Displays in OMCA 1. Individual coordinates 2. Category coordinates 1 1 1       1   1  ΦZ ' ' G D Z D X F X / / p n p n p n       2     Total Inertia trace F' F trace G' DG trace X Category coordinates are identical to MCA coordinates Individual coordinates computed by polynomials are not the same as the “ classical ” ones  clusters of units in relation with the expressed ordered scores

  10. How can you consider nominal variables without destroying the ordered structure?  Ordered multiple correspondence analysis and nominal variables  Splitting the ordinal data using the nominal categories  Apply OMCA to these data sub-sets

  11. The Evaluation of Customer Satisfaction in Health Care Services To gauge the quality of five key characteristics of a Naples hospital based on a sample of 511 patients. Service Quality Tangibility Capacity of Response Reliability Empathy Assurance Capacity 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Ordered Responses : 1 = Not satisfied, 5 = Very satisfied

  12. Comparing OMCA and MCA in overall hospital Cluster % of Patients in Cluster E: very much satisfied 13,6% D: a lot satisfied 41,7% C: satisfied 30,6% B: little satisfied 4,7% A: not satisfied 9,4% OMCA plots CRass1 CRIS1 172 171 181 214 30 AFF1 0.15 0.2 EMPAT 1 CRass5 CRIS5 TANG5 AFF5 103 EMPAT 5 100 23 0.10 218 0.1 TANG2 EMPAT 4 120 174 105 67 41 61 1 TANG1 22 164 188 92 42 88 MCA plots 146 125 86 3 CRass4 21 212 148 63 40 6 141 77 CRIS2 17 137 56 46 0.05 185 198 AFF2 106 0.0 155 151 234 96 203 121 140 204 138 131 145 CRIS4 TANG4 220 182 81 116 20 228 217 55 230 14 223 CRass2 154 153 83 49 65 118 114 186 60 85 51 25 70 215 219 229 201 107 73 195 158 52 128 173 AFF4 175 135 144 110 209 191 143 31 29 43 59 7 0.0 207 111 109 113 37 91 200 199 -0.1 108 48 53 139 197 112 162 80 202 4 2 EMPAT 2 102 235 149 101 231 206 68 95 15 35 13 184 127 18 216 24 187 27 189 45 225 39 36 98 165 87 126 34 9 130 104 193 180 5 TANG3 47 64 26 183 163 233 161 192 54 71 117 194 28 97 190 AFF3 EMPAT 3 38 196 115 132 226 122 10 8 78 90 213 157 57 152 44 210 99 170 221 124 74 94 224 150 136 62 178 166 89 142 205 167 66 72 -0.05 CRass3 222 147 133 176177 211 129 119 179 69 11 19 79 227 75 33 84 169 123 168 12 82 -0.2 134 208 160 159 156 232 32 50 58 16 93 76 CRIS3 -0.3 -0.2 -0.1 0.0 0.1 0.2 -0.10 -0.05 0.0 0.05 0.10 0.15

  13. Ordered Multiple Analysis in overall hospital Table 1: Decomposition of the first two non-trivial eigenvalues and chi-square tests.  2  2 Variable Component d.f. Tangibility Location 0.104 73.230*** 0.030 2.093 8 Dispersion 0.000 0.328 0.051 35.956*** 8 Skewness 0.001 0.362 0.008 2.398 8 Kurtosis 0.002 1.567 0.000 5.936 8 Reliability Location 0.140 98.781*** 0.000 0.282 8 Dispersion 0.000 0.219 0.099 69.999*** 8 Skewness 0.001 0.368 0.003 2.217 8 Kurtosis 0.000 0.038 0.000 0.033 8 The statistically Capability of Location 0.153 0.002 8 significant Response components 107.539*** 1.154 are identified at Dispersion 0.003 1.950 0.131 92.568*** 8 three levels of Skewness 0.001 0.523 0.008 5.806 8 significance: Kurtosis 0.000 0.027 0.002 1.748 8 0.01(***) Capability of Location 0.151 0.002 8 0.05 (**) Assurance 106.328*** 1.106 0.10 (*) Dispersion 0.005 3.313 0.119 84.106*** 8 Skewness 0.001 0.529 0.013 9.315 8 Kurtosis 0.001 0.454 0.000 0.011 8 Empathy Location 0.143 101.009*** 0.003 2.094 8 Dispersion 0.003 2.242 0.093 65.398*** 8 Skewness 0.001 0.615 0.016 11.082 8 Kurtosis 0.002 1.665 0.000 0.020 8 Total 0.711 501.088*** 0.558 393.320*** 160 Tangibility, Reliability, Capability of response, Capability of assurance and Empathy account for 15.9%, 18.3%, 25.6%, 24.6% and 20.1% of the explained inertia

Recommend


More recommend