multivariate methods
play

Multivariate Methods Principal Components Analysis Summary - PowerPoint PPT Presentation

Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring 17 November - 28 November 2014 Hctor Jorge Snchez Multivariate Methods Principal Components


  1. Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring 17 November - 28 November 2014 Héctor Jorge Sánchez Multivariate Methods Principal Components Analysis

  2. Summary Introduction • Aims • Introduction to PCA • Advantages of PCA Basic Statistics • Basic definitions • Covariance matrix and correlation Principal Component analysis • What is it? • PCA and linear algebra • PCA and geometry Applications • Chinese porcelains classification • Dog hair analysis Héctor Jorge Sánchez 2 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  3. Aims To describe a multivariate statistical technique, applicable to x-ray spectrometry. To show some applications of Principal Components Analysis methodology. Héctor Jorge Sánchez 3 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  4. Aims Multivariate data set Several samples ( n ) with several variables ( p ) por each simple Multivariate Analysis Principal Components Analysis Techniques for the reduction of dimensions and analysis of the covariance structure among Héctor Jorge Sánchez 4 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  5. Introduction to PCA Principal Components Analysis (PCA) • Describe the total variability of a set of multivariate observations, representing the cases in a reduced dimension space with A mathematical tool of respect to the dimension space of the original variables. linear algebra that allows to : • Explore the covariance among variables. • Identify the most important variables that explain the variability of the data set. Héctor Jorge Sánchez 5 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  6. Advantages of PCA Gathering Reducing Advantages Analysis of information data set variables for future of PCA dimension samplings Héctor Jorge Sánchez 6 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  7. Basic Statistics Basic definitions Aritmetic mean of the i-esima variable Variance of the variable i Covariance between the variables i and k Correlation coefficient Héctor Jorge Sánchez 7 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  8. Basic Statistics Covariance Matrix Correlation Matrix Héctor Jorge Sánchez 8 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  9. Basic Statistics Calculating the covariance matrix • Given • we can define the matrix: • and the matrix, centered to the coordinate origin defined by the mean values: Héctor Jorge Sánchez 9 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  10. Basic Statistics Calculating the covariance matrix Héctor Jorge Sánchez 10 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  11. Basic Statistics The correlation matrix • It is the standardized covariance matrix : Héctor Jorge Sánchez 11 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  12. Principal Components Analysis How do we understand it? Algebraically Geometrically PCA operates on  or R , looking for Stablishing a new coordinate system by centering and rotating particular linear combinations among the original system using the p original variables X 1 , X 2 ,…, X p as new axes. X 1 , X 2 ,…, X p . Héctor Jorge Sánchez 12 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  13. Principal Components Analysis PCA algebraically with a covariance matrix  of eigenvalues • Let • Considering the system • then Z 1 , Z 2 ,…, Z p linear combinations of null covariances, • PRINCIPAL COMPONENTS whose variances are maximal Héctor Jorge Sánchez 13 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  14. Principal Components Analysis PCA algebraically • We look for the máximum of The máximum  is the máximum eigenvalue of • • The normalized eigenvector a 1 corresponding to the highest eigenvalue  i is the coefficient vector in • The normalized eigenvector a 2 corresponding to the second highest eigenvalue  2 is the coefficient vector in • The normalized eigenvector a p corresponding to the lowest eigenvalue  p is the coefficient vector in Héctor Jorge Sánchez 14 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  15. Principal Components Analysis PCA algebraically • The total variance of the system is, therefore, the sum of the eigenvalue Total Variance =  11 +  22 +..+  pp =  11 +  22 +..+  pp • Hence, the proportion of the variance explained by the k th component is: 𝜇 𝑙 𝑄𝑠𝑝𝑞𝑝𝑠𝑢𝑗𝑝𝑜 𝑝𝑔 𝑢ℎ𝑓 𝑙th 𝑤𝑏𝑠𝑗𝑏𝑐𝑗𝑚𝑗𝑢𝑧 = 𝑞 𝑗=1 𝜇 𝑗 Héctor Jorge Sánchez 15 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  16. Principal Components Analysis PCA Geometrically x ´ y y ´ x Héctor Jorge Sánchez 16 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  17. Applications EDXRF studies of porcelains (800 – 1600 A.D.) from Fujian, China with chemical proxies and principal component analysis J. Wu et al., X-Ray Spectrom. 29 , 239 – 244 (2000) 41 Dehua porcelain samples from three different regions of China. Xunzhong: Qudou-Gong (DQ) wares (960 – 1368 a.d., Song-Yuan dynasty) Gaide: Wanping-Lun (DWP) wares (960 - 1368 a.d., Song-Yuan dynasty) Meihu: Mulin (DM) wares (618 - 960 a.d., Tang dynasty) Héctor Jorge Sánchez 17 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  18. Applications Trace elements present in Major and minor elements the samples. present in the samples. (9 variables: Cr 2 ; Ni; Cu; (9 variables: Si; Al; Fe; Ti; Zn; Rb; Sr; Y; Zr and Ba ). Ca; Mg; K; Na 2 O and Mn ). Major, minor Traces Eigenvalue Acc. % Eigenvalue Acc %  1  1 49 45  2  2 63 63  3  3 75 83 Accumulative percentage of the total variability explained by the first three principal components data matrix for major and minor elements, and trace elements. Héctor Jorge Sánchez 18 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

  19. Applications Plot of first two principal components with the The chemical compositions were used for recognizing the provenience concentrations of elements SiO2 – of Dehua porcelain. The 41 samples from eight kiln sites are distributed • References: DQ=Qudou-Gong, DB=Biangu-Xu, MnO (left). in three areas, corresponding to their original places of production, DL=Lingdou, HL=Housuo ( Xunzhong) ; DWP=Wanping- Xunzhong, Gaide and Meihu towns, respectively. Principal component Plot of first and third principal Lung, DWY=Wanyang-Keng (Gaide) ; DM=Mulin (Meihu ) analysis (PRIN 1, PRIN 2 and PRIN 3) reveals well defined regions for components with the the samples. However, some the data points are very scattered concentrations of elements Cr2O3 because some concentration of the trace elements appears in – BaO (right). abnormal values. Héctor Jorge Sánchez 19 11/10/2014 Joint ICTP-IAEA School on Novel Experimental Methodologies for Synchrotron Radiation Applications in Nano-science and Environmental Monitoring - 17 November - 28 November 2014

Recommend


More recommend