Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring 17 November - 28 November 2014 Héctor Jorge Sánchez Multivariate Methods Principal Components Analysis
Summary Introduction • Aims • Introduction to PCA • Advantages of PCA Basic Statistics • Basic definitions • Covariance matrix and correlation Principal Component analysis • What is it? • PCA and linear algebra • PCA and geometry Applications • Chinese porcelains classification • Dog hair analysis Héctor Jorge Sánchez 2 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Aims To describe a multivariate statistical technique, applicable to x-ray spectrometry. To show some applications of Principal Components Analysis methodology. Héctor Jorge Sánchez 3 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Aims Multivariate data set Several samples ( n ) with several variables ( p ) por each simple Multivariate Analysis Principal Components Analysis Techniques for the reduction of dimensions and analysis of the covariance structure among Héctor Jorge Sánchez 4 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Introduction to PCA Principal Components Analysis (PCA) • Describe the total variability of a set of multivariate observations, representing the cases in a reduced dimension space with A mathematical tool of respect to the dimension space of the original variables. linear algebra that allows to : • Explore the covariance among variables. • Identify the most important variables that explain the variability of the data set. Héctor Jorge Sánchez 5 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Advantages of PCA Gathering Reducing Advantages Analysis of information data set variables for future of PCA dimension samplings Héctor Jorge Sánchez 6 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Basic Statistics Basic definitions Aritmetic mean of the i-esima variable Variance of the variable i Covariance between the variables i and k Correlation coefficient Héctor Jorge Sánchez 7 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Basic Statistics Covariance Matrix Correlation Matrix Héctor Jorge Sánchez 8 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Basic Statistics Calculating the covariance matrix • Given • we can define the matrix: • and the matrix, centered to the coordinate origin defined by the mean values: Héctor Jorge Sánchez 9 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Basic Statistics Calculating the covariance matrix Héctor Jorge Sánchez 10 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Basic Statistics The correlation matrix • It is the standardized covariance matrix : Héctor Jorge Sánchez 11 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Principal Components Analysis How do we understand it? Algebraically Geometrically PCA operates on or R , looking for Stablishing a new coordinate system by centering and rotating particular linear combinations among the original system using the p original variables X 1 , X 2 ,…, X p as new axes. X 1 , X 2 ,…, X p . Héctor Jorge Sánchez 12 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Principal Components Analysis PCA algebraically with a covariance matrix of eigenvalues • Let • Considering the system • then Z 1 , Z 2 ,…, Z p linear combinations of null covariances, • PRINCIPAL COMPONENTS whose variances are maximal Héctor Jorge Sánchez 13 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Principal Components Analysis PCA algebraically • We look for the máximum of The máximum is the máximum eigenvalue of • • The normalized eigenvector a 1 corresponding to the highest eigenvalue i is the coefficient vector in • The normalized eigenvector a 2 corresponding to the second highest eigenvalue 2 is the coefficient vector in • The normalized eigenvector a p corresponding to the lowest eigenvalue p is the coefficient vector in Héctor Jorge Sánchez 14 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Principal Components Analysis PCA algebraically • The total variance of the system is, therefore, the sum of the eigenvalue Total Variance = 11 + 22 +..+ pp = 11 + 22 +..+ pp • Hence, the proportion of the variance explained by the k th component is: 𝜇 𝑙 𝑄𝑠𝑝𝑞𝑝𝑠𝑢𝑗𝑝𝑜 𝑝𝑔 𝑢ℎ𝑓 𝑙th 𝑤𝑏𝑠𝑗𝑏𝑐𝑗𝑚𝑗𝑢𝑧 = 𝑞 𝑗=1 𝜇 𝑗 Héctor Jorge Sánchez 15 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Principal Components Analysis PCA Geometrically x ´ y y ´ x Héctor Jorge Sánchez 16 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Applications EDXRF studies of porcelains (800 – 1600 A.D.) from Fujian, China with chemical proxies and principal component analysis J. Wu et al., X-Ray Spectrom. 29 , 239 – 244 (2000) 41 Dehua porcelain samples from three different regions of China. Xunzhong: Qudou-Gong (DQ) wares (960 – 1368 a.d., Song-Yuan dynasty) Gaide: Wanping-Lun (DWP) wares (960 - 1368 a.d., Song-Yuan dynasty) Meihu: Mulin (DM) wares (618 - 960 a.d., Tang dynasty) Héctor Jorge Sánchez 17 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Applications Trace elements present in Major and minor elements the samples. present in the samples. (9 variables: Cr 2 ; Ni; Cu; (9 variables: Si; Al; Fe; Ti; Zn; Rb; Sr; Y; Zr and Ba ). Ca; Mg; K; Na 2 O and Mn ). Major, minor Traces Eigenvalue Acc. % Eigenvalue Acc % 1 1 49 45 2 2 63 63 3 3 75 83 Accumulative percentage of the total variability explained by the first three principal components data matrix for major and minor elements, and trace elements. Héctor Jorge Sánchez 18 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Applications Plot of first two principal components with the The chemical compositions were used for recognizing the provenience concentrations of elements SiO2 – of Dehua porcelain. The 41 samples from eight kiln sites are distributed • References: DQ=Qudou-Gong, DB=Biangu-Xu, MnO (left). in three areas, corresponding to their original places of production, DL=Lingdou, HL=Housuo ( Xunzhong) ; DWP=Wanping- Xunzhong, Gaide and Meihu towns, respectively. Principal component Plot of first and third principal Lung, DWY=Wanyang-Keng (Gaide) ; DM=Mulin (Meihu ) analysis (PRIN 1, PRIN 2 and PRIN 3) reveals well defined regions for components with the the samples. However, some the data points are very scattered concentrations of elements Cr2O3 because some concentration of the trace elements appears in – BaO (right). abnormal values. Héctor Jorge Sánchez 19 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014
Recommend
More recommend