from simple structure to sparse components a comparative
play

From simple structure to sparse components: a comparative - PowerPoint PPT Presentation

ERCIM12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain From simple structure to sparse components: a comparative introduction Nickolay T. Trendafilov Department of Mathematics and Statistics


  1. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain From simple structure to sparse components: a comparative introduction Nickolay T. Trendafilov Department of Mathematics and Statistics The Open University, UK 1 / 30

  2. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Contents Intro/Motivation Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA) PCA interpretation via rotation methods Example: the Pitprop data Analyzing high-dimensional multivariate data Abandoning the rotation methods Algorithms for sparse component analysis Taxonomy of PCA subject to ℓ 1 constraint (LASSO) Function-constrained sparse components Orthonormal sparse loadings and correlated components Uncorrelated sparse components Application to simple structure rotation Thurstone’s 26 box problem Twenty-four psychological tests 2 / 30

  3. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Intro/Motivation Why sparse PCA? Main goal: Analyzing high-dimensional multivariate data Main tools: Low-dimensional data representation, e.g. PCA Interpretation Main problems: PCA might be too slow results involve all input variables which complicates the interpretation 3 / 30

  4. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA) Simple structure rotation in PCA (and FA) Steps Low-dimensional data approximation,... Followed by rotation of the PC loadings The rotation is found by optimizing certain criterion which defines/formalizes the perception for simple (interpretable) structure Drawbacks of the rotated components: still difficult to interpret loadings correlated components, which also do not explain decreasing amount of variance. 4 / 30

  5. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA) The Thurstone’s simple structure concept... (Thurstone, 1947, p. 335) 1 Each row of the factor matrix should have at least one zero , 2 If there are r common factors each column of the factor matrix should have at least r zeros , 3 For every pair of columns of the factor matrix there should be several variables whose entries vanish in one column but not in the other, 4 For every pair of columns of the factor matrix, a large proportion of the variables should have vanishing entries in both columns when there are four or more factors, 5 For every pair of columns of the factor matrix there should be only a small number of variables with non-vanishing entries in both columns. 5 / 30

  6. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Simple structure rotation/concept in PCA (and FA) ... and implementing it Rotation 1 Graphical (subjective?) 2 Analytical (too many criteria!) 3 Hyperplane counting: maxplane, functionplane, hyball, and recently revived as CLF/CLC 4 Hyperplane fitting rotations: promax, promaj, promin 5 Rotation to independent components: ICA as a rotation method (applicable for p ≫ n ) Main problems: 1 Formalizing the Thurstone’s rules into a single formula 2 Achieving vanishing entries, i.e. exact zeros 3 Correlated components 4 Do not explain decreasing amount of variance 5 Impractical for modern applications when p ≫ n 6 / 30

  7. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA PCA interpretation via rotation methods The interpretation issue Traditionally, PCs are considered easily interpretable if there are plenty of small component loadings indicating the negligible importance of the corresponding variables. Jollife, 2002, p.269 The most common way of doing this is to ignore (effectively set to zero) coefficients whose absolute values fall below some threshold. Thus, implicitly, the PCs simplicity and interpretability are associated with the sparseness of the component loadings. 7 / 30

  8. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA PCA interpretation via rotation methods The interpretation issue (continued) However, ignoring the small loadings is subjective and misleading, especially for PCs from covariance matrix (Cadima & Jollife, 1995). Cadima & Jollife, 1995 One of the reasons for this is that it is not just loadings but also the size (standard deviation) of each variable which determines the importance of that variable in the linear combination. Therefore it may be desirable to put more emphasis on simplicity than on variance maximization. 8 / 30

  9. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Example: the Pitprop data The Pitprop data consist of 14 variables which were measured for each of 180 pitprops cut from Corsican pine timber. One variable is compressive strength , and the other 13 variables are physical measurements on the pitprops (Jeffers, 1967). Table: Jeffers’s Pitprop data: Loadings of the first six PCs and their interpretation by normalizing each column, and then, taking loadings greater than .7 only (Jeffers, 1967) Component loadings ( AD ) Jeffers’s interpretation Vars 1 2 3 4 5 6 1 2 3 4 5 6 topdiam .83 .34 -.28 -.10 .08 .11 1.0 length .83 .29 -.32 -.11 .11 .15 1.0 moist .26 .83 .19 .08 -.33 -.25 1.0 testsg .36 .70 .48 .06 -.34 -.05 .84 .73 ovensg .12 -.26 .66 .05 -.17 .56 1.0 1.0 ringtop .58 -.02 .65 -.07 .30 .05 .70 .99 ringbut .82 -.29 .35 -.07 .21 .00 .99 bowmax .60 -.29 -.33 .30 -.18 -.05 .72 bowdist .73 .03 -.28 .10 .10 .03 .88 whorls .78 -.38 -.16 -.22 -.15 -.16 .93 clear -.02 .32 -.10 .85 .33 .16 1.0 knots -.24 .53 .13 -.32 .57 -.15 1.0 diaknot -.23 .48 -.45 -.32 -.08 .57 1.0 9 / 30

  10. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Classic vs. sparse PCA Example: the Pitprop data Table: Jeffers’s Pitprop data: Rotated loadings by varimax and their interpretation by normalizing each column, and then, taking loadings greater than .59 only VARIMAX loadings Normalized loadings greater than .55 Vars 1 2 3 4 5 6 1 2 3 4 5 6 topdiam .91 .26 -.01 .03 .01 .08 .97 length .94 .19 -.00 .03 .00 .10 1.0 moist .13 .96 -.14 .08 .08 .04 1.0 testsg .13 .95 .24 .03 .06 -.03 .98 ovensg -.14 .03 .90 -.03 -.18 -.03 1.0 ringtop .36 .19 .61 -.03 .28 -.49 .68 ringbut .62 -.02 .47 -.13 -.01 -.55 .66 bowmax .54 -.10 -.10 .11 -.56 -.23 -.64 bowdist .77 .03 -.03 .12 -.16 -.12 .82 whorls .68 -.10 .02 -.40 -.35 -.34 .73 clear .03 .08 -.04 .97 .00 -.00 1.0 knots -.06 .14 -.14 .04 .87 .09 1.0 diaknot .10 .04 -.07 -.01 .15 .93 1.0 10 / 30

  11. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Abandoning the rotation methods Alternative to rotation: modify PCA to produce explicitly simple principal components The first method to directly construct sparse components was proposed by Hausman (1982): it finds PC loadings from a prescribed subset of values, say S = {− 1 , 0 , 1 } Jolliffe & Uddin (2000) were the first to modify the original PCs to additionally satisfy the Varimax criterion ( s implified co mponent t echnique, SCoT) Jolliffe, Trendafilov & Uddin (2003) were the first to modify the original PCs to additionally satisfy the LASSO constraint, which drives many loadings to exact zeros (SCoTLASS) 11 / 30

  12. ERCIM’12, Session ES11: Sparse dimension reduction, 1-3 December 2012, Conference Centre, Oviedo, Spain Analyzing high-dimensional multivariate data Algorithms for sparse component analysis A great number of efficient numerical procedures: Zou, Hastie & Tibshirany (2006) transform the standard PCA into a regression form to propose fast algorithm (SPCA) for sparse PCA, applicable to large data Moghaddam, Weiss & Avidan (2006) use spectral bounds of submatrices of the sample correlation matrix to identify the subset of m variables explaining the maximum variance among all possible subsets of size m d’Aspremont, Ghaoui, Jordan & Lanckriet (2007) replace LASSO by cardinality constraint and apply semidefinite programming, SDP (sound theory, but not very fast!) d’Aspremont, Bach, & Ghaoui (2008) another SDP relaxation to construct more efficient greedy algorithm than d’Aspremont, et al. (2007) and Moghaddam, et al. (2006). 12 / 30

Recommend


More recommend