Introduction to Principal Component Analysis and Indepedent - PowerPoint PPT Presentation

National Aeronautics and Space Administration Principal Component Analysis To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis. Definition n random samples y 1 , y 2 , . . . are un-correlated if their sample covariance matrix is diagonal :   a 1 0 � T = � � � ... 1 Y − ¯ Y − ¯   S Y = Y1 Y1   n − 1 0 a n S Y is always a square, symmetric matrix Diagonal elements are the individual variances of y 1 , y 2 , . . . Off-diagonal elements are the covariances of y 1 , y 2 , . . . S Y quantifies the correlation between all possible pairings of { y 1 , . . . , y n } www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis So to remove redundancy, we must find new basis vectors (Principal Components) such that the covariance matrix of the transformed data is diagonal. PCA also assumes that the basis vectors are orthogonal , to simplify the computation of the new basis. Definition Two vectors x , y are said to be orthogonal if their dot product is zero: n � x · y = x i y i = 0 i =1 www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal. Definition A function T ( x ) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x , given T ( x ), is not a function of any unknown distribution parameters: P ( X = x | T ( x ) , θ ∈ Ω) = P ( X = x | T ( x )) www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis Solving for the PCs: WLOG, assume ¯ X is normalized with zero mean. Seek an orthonormal matrix P (where Y = PX ) such that n − 1 YY T is diagonalized. The rows of P will be the 1 S Y = principal components of X . So: � � 1 1 n − 1 Y Y T = P n − 1 XX T P T S Y = � �� symmetric! www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis Any real, symmetric matrix is diagonalized by an orthonormal matrix of its eigenvectors. Therefore, normalizing the data matrix X and computing the n − 1 XX T = S X with give the principal 1 eigenvectors of components! Best approach: the singular value decomposition www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis Definition The singular value decomposition of a real m × xn matrix X is given by: X = UΣV T where U is an m × m matrix containing the eigenvectors of XX T , V is an n × n matrix containing the eigenvectors of X T X , and Σ is an m × n matrix with the square roots of the eigenvalues of XX T along its main diagonal. The singular values σ (elements of Σ) or ordered from greatest to least, and each correspond to a basis vector in U and V . Dimension reduction: choose a minimum acceptable value for the σ s; consider as the principal components only the vectors corresponding to σ s larger than the chosen threshold. www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: � � T � � 1 1 1 n − 1X T ⇒ Z T Z = n − 1X T n − 1X T Z = √ √ √ � X T � T 1 1 X T = n − 1 XX T = S X = n − 1 So the matrix V given by the SVD of Z will give the eigenvectors of S X , which are the principal components! Therefore P = V T . Once P is found, the data can be transformed: Y = PX www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis 2D Example Let x 1 = [ x 1 , 1 , . . . , x 1 , 1000 ] , x 2 = [ x 2 , 1 , . . . , x 2 , 1000 ] be random i . i . d . i . i . d . ∼ P 1 and x 2 , j ∼ P 2 ∀ i , j with the two variables such that x 1 , i distributions P 1 , P 2 unknown. So, x 1 , x 2 are two different measurement types (sensors, etc) each containing 1000 measurements. www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis 2D Example We can plot x 1 vs x 2 data to show that they are strongly correlated: www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis The SVD of X = [ x 1 , x 2 ] T is computed to be:   3.77 × 10 − 2 − 3.61 × 10 − 2 · · · . . ...   . . U = . .   − 4 . 57 × 10 − 2 · · · 0.97 � 142.85 � 0 Σ = 0 43.61 � � 0 . 63 0 . 77 V T = − 0 . 77 0 . 63 www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis 2D Example PCA provides a transformation into a new basis in which the data becomes uncorrelated. www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis 3D Example Let us introduct a new component, so that the data is 3 dimensional: x 3 = x 1 − x 2 ⇒ x 3 provides no new information about the underlying system! Thanks to the SVD, the PCA provides a mechanism for detecting this and removing the redundant dimension. www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis 3D Example www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis The SVD of X = [ x 1 , x 2 ] T is computed to be:   3.77 × 10 − 2 − 3.61 × 10 − 2 · · · . . ...  . .  U = . .   − 4 . 57 × 10 − 2 · · · 0.97   142 . 97 0 0 Σ = 0 73 . 35 0   4 . 29 × 10 − 14 0 0   0 . 61 0 . 77 − 0 . 16 V T = 0 . 54 − 0 . 25 0 . 80   − 0 . 577 0 . 577 0 . 577 www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis 3D Example Since the singular value corresponding to third PC is small, the contribution of that axis in the new basis is minimal ⇒ Projection onto the first two PCs is sufficient to charectorize the data! www.nasa.gov

National Aeronautics and Space Administration Principal Component Analysis 2-Source Audio Example ⊲ ⊲ www.nasa.gov

National Aeronautics and Space Administration In the previous two examples, PCA was not successfull in completely seperating the mixed signals. What is needed: A transformation driven by a stronger measure of independence. www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear:   w i x i .  .  WX = Y ⇒ y i = .   w i x i where x i , y i represent columns of the source and transformed data matrices X , Y and w i represents a row of the transform matrix W . So the rows of W form a new basis for the columns of X ; they are the Independent Components of the given data. www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis However, unlike PCA: The vectors of the new basis are not assumed to be orthogonal. Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics ( > 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA. Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed. www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered. www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Seek W , Y such that Y = W − 1 X and each row of Y maximizes some high-order measure of independence. Typical perspectives: Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy The optimization for any choice of the above measures is motivated by the Central Limit Theorem . www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Central Limit Theorem (Lyapunov) Let X n , n ∈ N be any sequence of independent random variables; N = � n each with finite mean µ n and variance σ 2 n . Define S 2 i =1 σ 2 i . � [ X k ] 2+ δ � If for some δ > 0 the expectations E are finite for every N � [ X n − µ n ] 2+ δ � � 1 k ∈ N and the condition lim = 0 is E S 2+ δ N →∞ N i =1 satisfied, then: � N i =1 ( X n − µ n ) distr. → Normal (0 , 1) as N → ∞ S n www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Heuristic arguement: The sum of any group of independent random variables is ’more gaussian’ they any of the individual random variables. Assume that none of the original sources has a gaussian distribution: Then minimizing gaussinity w.r.t. higher order statistical measures should seperate the sources in X ! www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Definition The Kurtosis of a a random variable x is defined to be: y 2 �� 2 � x 4 � � � κ ( x ) = E − 3 E Kurtosis is a measure of ’peakedness’ and thickness of tails for a distribution. Note that if x is gaussian: y 2 �� 2 − 3 y 2 �� 2 = 0 � � � � κ ( x ) = 3 E E So, simultaneously minimizing | κ ( Y 1 ) | , . . . , | κ ( Y m ) | or ( κ ( Y 1 )) 2 , . . . , ( κ ( Y m )) 2 can provide a basis where the recovered sources are (in one sense) maximally non-gaussian. www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Drawbacks of using kurtosis as an optimality critereon: Very sensitive to outliers. Note a robust measure of gaussinity. A more suitable measure of gaussinity is required to produce stable ICA methods. www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Definition The Differential Entropy of a a continuous random variable X with density function f X ( x )is defined to be: � H ( X ) = − f X ( x ) log f X ( x ) dx Can be interpreted as the degree of information carried by a random variable. Fundamental result in information theory: A gaussian random variable has the greatest entropy among all random variables of equal variance . www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Consider the following: Definition The Negative Entropy (or Negentropy ) of a a continuous random variable X with density function f X ( x )is defined to be: J ( X ) = H ( X gauss ) − H ( X ) where X gauss is a random variable with identical variance to X (or identical covariance matrix). Advantages: Always non-negative; equal to 0 for a gaussian random variable. Not sensitive to sample outliers. www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Difficulties: Negentropy optimization is computationally difficult to deal with directly. Estimates: y 3 � 2 + 1 � J ( X ) ≈ 1 48 κ ( y ) 2 12 E Same problems as in the case of just using kurtosis! n � k i ( E [ G i ( y )] − E [ G i ( v )]) 2 , where { k i } are J ( X ) ≈ i =1 positive constants, v is a standard gaussian random variable and { G i } are some non-quadratic functions. www.nasa.gov

National Aeronautics and Space Administration Independent Component Analysis Typically: All of the G i are the same function. Very good results have been demonstrated using: 1 G ( u ) = α 1 log [cosh ( α 1 u )], for some constant 1 ≤ α 1 ≤ 2 � � − u 2 / 2 G ( u ) = − exp www.nasa.gov

Introduction to Principal Component Analysis and Indepedent - PowerPoint PPT Presentation

National Aeronautics and Space Administration Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn Bioscience and Technology Branch, NASA Glenn Research Center May 29, 2010 www.nasa.gov National

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Principal Component Analysis http://setosa.io/ev/principal- Food consumption in the UK

CS475/CS675 Lecture 23: July 19, 2016 Principal Component Analysis, Eigenfaces CS475/CS675 (c)

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Hebbian Learning, Hebbian Learning Principal Component Analysis, and Independent Component

Principal Component Analysis in a Linear Algebraic View by Anna Orosz under the mentorship of

Lecture 3 Principal Component Analysis Lin ZHANG, PhD School of Software Engineering Tongji

For use in AIM Awards centres Component Level: Level Three Component Guided Learning Hours: 21

For use in AIM Awards centres Component Level: Level Three Component Guided Learning Hours: 28

Advanced Section #4: Methods of Dimensionality Reduction: Principal Component Analysis (PCA)

Advanced Section #4: Methods of Dimensionality Reduction: Principal Component Analysis (PCA)

Top Feeds Errors on Shopping How to fix it in Lengow Rozenn LHelgoualch - Shopping Specialist

MyZoobug is the new sunglass range with 5 styles for babies to 12yrs by award-winning London kids

+ The right answer to the wrong question The use of factor analysis and principal component

On the Karhunen-Love basis for continuous mechanical systems R. Sampaio Pontifcia

Kernel-Based Dimensionality Reduction Methods on Synthesized and Facial Image Data Jonathan L.

Application of Big Data Analytics via Soft Computing Yunus Yetis INTRODUCTION System of