exploratory factor analysis pca analysis a review
play

Exploratory Factor Analysis PCA Analysis A Review Precipitation - PowerPoint PPT Presentation

Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2 + Comp.3 ~= 95% Loadings PCA


  1. Multivariate Fundamentals: Rotation Exploratory Factor Analysis

  2. PCA Analysis – A Review Precipitation Temperature Ecosystems

  3. PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2 + Comp.3 ~= 95% Loadings

  4. PCA Analysis with Spatial Data Loadings Loadings indicate MANY climate variables are associated with Component 1 Intuitively these variables are associated with growth conditions

  5. PCA Analysis with Spatial Data Loadings Loadings indicate temperature variables are associated with Component 2 Intuitively these variables are associated with continetality

  6. PCA Analysis with Spatial Data Loadings Loadings indicate moisture variables are associated with Component 3 Intuitively these variables are associated with weather jet stream

  7. PCA Analysis with Spatial Data

  8. Exploratory Factor Analysis (EFA) Objective - Rotate that data so that new axis explains the greatest amount of variation within the data (same as PCA) But , unlike PCA, the key concept of factor analysis is that multiple observed variables have similar patterns of responses because they are all associated with a latent (i.e. not directly measured) variable. In the context of EFA, PCA can be considered a technique to simply reduce variables Think of EFA as: “ There is a bigger picture controlling the variables I am analyzing, and I want to better understand the relationship with those underlying unobservable factors ” Charles Spearman (1863-1945)

  9. The math behind EFA Student Ecology, Policy, Geography, Design, Statistics, ID Y 1 Y 2 Y 3 Y 4 Y 5 Renewable Resources MSc Example: 1 3 2 3 6 5 Assume Y i is linearly related to F 1 and F 2 as 2 7 7 7 4 4 follows: 3 4 4 4 4 4 Y 1 = β 10 + β 11 F 1 + β 12 F 2 + ε 1 4 5 5 5 5 5 Y 2 = β 20 + β 21 F 1 + β 22 F 2 + ε 2 5 7 6 6 4 5 Y 3 = β 30 + β 31 F 1 + β 32 F 2 + ε 3 Y 4 = β 40 + β 41 F 1 + β 42 F 2 + ε 4 Y 5 = β 50 + β 51 F 1 + β 52 F 2 + ε 5 F i = unobservable factor (e.g. writing ability, mathematical ability) Policy ε 1 = error term β i = loadings Comes from PCA Ecology PC 2 EFA utilizes the outputs of PCA, but continues to Geography rotate the data to investigate the relationship with underlying unobservable factors Design Statistics Because EFA utilizes PCA components EFA does NOT completely maximize variance explained PC 1

  10. Determining the number of factors FA should ONLY include important factors There are a number of different ways to determine the number of factors you should use But the easiest way is to set the number of factors to the number of PC components that explain a significant portion of the variation in the data

  11. Exploratory Factor Analysis in R Data matrix of original predictor variables NOT principal components The number of factors you want to include Note : R will ONLY let you calculate the number of factors where variance explained is sufficient Exploratory FA in R: factanal(dataMatrix,factors=n,rotation= " type " ) (stats package) The statistical method used to rotation the data Rotation options fall in to 2 categories Orthogonal rotation – assumes your factors are uncorrelated function options: "varimax", "quatimax" Oblique rotation – assumes your factors are correlated function options: "promax", "oblimin" Additional function options: "none" , "simplimax ", "cluster“ When selecting your rotation method consider correlation between factors Simple criteria commonly used (Tabachnick & Fidell, 2007): 1. Start with an oblique option "promax" and look at the correlation between your factors (assume underlying factors are correlated) 2. If the correlation between factors is <|0.32| then use orthogonal option "varimax" default for R For more information : Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Upper Saddle River, NJ: Pearson Allyn & Bacon, pg 646-647.

  12. Popular rotation methods Orthogonal (rotates factors 90°, assumes uncorrelated factors) Varimax - minimizes the number of variables that have high loadings on each factor and works to make small loadings even smaller Quartimax - involves the minimization of the number of factors needed to explain each variable Oblique (rotates factors > or < 90°, assumes correlated factors) Promax - involves raising the loadings to a power of four which ultimately results in greater correlations among the factors and achieves a simple structure. Preferred methods because its speed with larger datasets Direct Oblimin - attempts to simplify the structure and the mathematics of the output Oblique rotation is more complex than orthogonal rotation, since it can involve one of two coordinate systems: a system of primary axes or a system of reference axes Most commonly used are Varimax and Promax For more information : Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Upper Saddle River, NJ: Pearson Allyn & Bacon, pg 646-647.

  13. Exploratory Factor Analysis in R First we want to look at the correlation between factors derived from the analysis Tabachnick & Fidell Criterion: 1. Start with the assumption factors are correlated so we have to use the oblique rotation option "promax" 2. If correlations <|0.32| then the equation solutions remains nearly orthogonal, and we should use the orthogonal rotation option "varimax" If your data is correlated you should report the correlation value with your results

  14. Exploratory Factor Analysis in R Uniqueness : tells you if there is a variable that is significantly different than the rest Factor loadings : tells you the relationship between the calculated factors and the original variables Variance explained : how much of data variance is explained by each factor P-value : tests the hypothesis “The number of factors included in this analysis are sufficient enough to capture the underlying unobservable relationships” Large P-value = do not reject null hypothesis – i.e. YES the number of factors in the analysis are sufficient

  15. Exploratory Factor Analysis in R Typically weak relationships between factors original variables are generally discarded from the analysis Debate within the literature to what the cutoff for strong relationship should be Stringent cutoffs are often not realistic for environmental data Common Rule of Thumb: A relationship is strong if the absolute value > 0.4 But , this should be evaluated for your data as to what logically makes sense In the Master Grades example: Factor 1 has a strong relationship with grades in Ecology, Policy, and Geography Factor 2 has a strong relationship with grades in Statistics and a moderate relationship with Experimental Design Using rationale and the commonalities between the class subjects we could infer that Factor 1 is likely related to writing ability and Factor 2 is likely related to mathematical ability

  16. PCA vs EFA output Renewable Resources MSc Example: PCA EFA Rotations were in multi-dimensional space (i.e. 5 variables were included)

Recommend


More recommend