introduction to principal component analysis and
play

Introduction to Principal Component Analysis and Indepedent - PowerPoint PPT Presentation

National Aeronautics and Space Administration Introduction to Principal Component Analysis and Indepedent Component Analysis Tristan A. Hearn Bioscience and Technology Branch, NASA Glenn Research Center May 29, 2010 www.nasa.gov National


  1. National Aeronautics and Space Administration Principal Component Analysis To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis. Definition n random samples y 1 , y 2 , . . . are un-correlated if their sample covariance matrix is diagonal :   a 1 0 � T = � � � ... 1 Y − ¯ Y − ¯   S Y = Y1 Y1   n − 1 0 a n S Y is always a square, symmetric matrix Diagonal elements are the individual variances of y 1 , y 2 , . . . Off-diagonal elements are the covariances of y 1 , y 2 , . . . S Y quantifies the correlation between all possible pairings of { y 1 , . . . , y n } www.nasa.gov

  2. National Aeronautics and Space Administration Principal Component Analysis To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis. Definition n random samples y 1 , y 2 , . . . are un-correlated if their sample covariance matrix is diagonal :   a 1 0 � T = � � � ... 1 Y − ¯ Y − ¯   S Y = Y1 Y1   n − 1 0 a n S Y is always a square, symmetric matrix Diagonal elements are the individual variances of y 1 , y 2 , . . . Off-diagonal elements are the covariances of y 1 , y 2 , . . . S Y quantifies the correlation between all possible pairings of { y 1 , . . . , y n } www.nasa.gov

  3. National Aeronautics and Space Administration Principal Component Analysis To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis. Definition n random samples y 1 , y 2 , . . . are un-correlated if their sample covariance matrix is diagonal :   a 1 0 � T = � � � ... 1 Y − ¯ Y − ¯   S Y = Y1 Y1   n − 1 0 a n S Y is always a square, symmetric matrix Diagonal elements are the individual variances of y 1 , y 2 , . . . Off-diagonal elements are the covariances of y 1 , y 2 , . . . S Y quantifies the correlation between all possible pairings of { y 1 , . . . , y n } www.nasa.gov

  4. National Aeronautics and Space Administration Principal Component Analysis To minimize redundancy in the new basis, the sampled data should be un-correlated in the new basis. Definition n random samples y 1 , y 2 , . . . are un-correlated if their sample covariance matrix is diagonal :   a 1 0 � T = � � � ... 1 Y − ¯ Y − ¯   S Y = Y1 Y1   n − 1 0 a n S Y is always a square, symmetric matrix Diagonal elements are the individual variances of y 1 , y 2 , . . . Off-diagonal elements are the covariances of y 1 , y 2 , . . . S Y quantifies the correlation between all possible pairings of { y 1 , . . . , y n } www.nasa.gov

  5. National Aeronautics and Space Administration Principal Component Analysis So to remove redundancy, we must find new basis vectors (Principal Components) such that the covariance matrix of the transformed data is diagonal. PCA also assumes that the basis vectors are orthogonal , to simplify the computation of the new basis. Definition Two vectors x , y are said to be orthogonal if their dot product is zero: n � x · y = x i y i = 0 i =1 www.nasa.gov

  6. National Aeronautics and Space Administration Principal Component Analysis So to remove redundancy, we must find new basis vectors (Principal Components) such that the covariance matrix of the transformed data is diagonal. PCA also assumes that the basis vectors are orthogonal , to simplify the computation of the new basis. Definition Two vectors x , y are said to be orthogonal if their dot product is zero: n � x · y = x i y i = 0 i =1 www.nasa.gov

  7. National Aeronautics and Space Administration Principal Component Analysis Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal. Definition A function T ( x ) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x , given T ( x ), is not a function of any unknown distribution parameters: P ( X = x | T ( x ) , θ ∈ Ω) = P ( X = x | T ( x )) www.nasa.gov

  8. National Aeronautics and Space Administration Principal Component Analysis Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal. Definition A function T ( x ) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x , given T ( x ), is not a function of any unknown distribution parameters: P ( X = x | T ( x ) , θ ∈ Ω) = P ( X = x | T ( x )) www.nasa.gov

  9. National Aeronautics and Space Administration Principal Component Analysis Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal. Definition A function T ( x ) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x , given T ( x ), is not a function of any unknown distribution parameters: P ( X = x | T ( x ) , θ ∈ Ω) = P ( X = x | T ( x )) www.nasa.gov

  10. National Aeronautics and Space Administration Principal Component Analysis Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal. Definition A function T ( x ) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x , given T ( x ), is not a function of any unknown distribution parameters: P ( X = x | T ( x ) , θ ∈ Ω) = P ( X = x | T ( x )) www.nasa.gov

  11. National Aeronautics and Space Administration Principal Component Analysis Summary of assumptions: Linearity of the transformation. The sample mean and sample variance are sufficient statistics for the underlying seperation problem. Large variances in X correspond to important dynamics in the underlying system. The principal components are orthogonal. Definition A function T ( x ) is said to be a sufficient statistic for the random variable x if the conditional probability distribution of x , given T ( x ), is not a function of any unknown distribution parameters: P ( X = x | T ( x ) , θ ∈ Ω) = P ( X = x | T ( x )) www.nasa.gov

  12. National Aeronautics and Space Administration Principal Component Analysis Solving for the PCs: WLOG, assume ¯ X is normalized with zero mean. Seek an orthonormal matrix P (where Y = PX ) such that n − 1 YY T is diagonalized. The rows of P will be the 1 S Y = principal components of X . So: � � 1 1 n − 1 Y Y T = P n − 1 XX T P T S Y = � �� � symmetric! www.nasa.gov

  13. National Aeronautics and Space Administration Principal Component Analysis Solving for the PCs: WLOG, assume ¯ X is normalized with zero mean. Seek an orthonormal matrix P (where Y = PX ) such that n − 1 YY T is diagonalized. The rows of P will be the 1 S Y = principal components of X . So: � � 1 1 n − 1 Y Y T = P n − 1 XX T P T S Y = � �� � symmetric! www.nasa.gov

  14. National Aeronautics and Space Administration Principal Component Analysis Solving for the PCs: WLOG, assume ¯ X is normalized with zero mean. Seek an orthonormal matrix P (where Y = PX ) such that n − 1 YY T is diagonalized. The rows of P will be the 1 S Y = principal components of X . So: � � 1 1 n − 1 Y Y T = P n − 1 XX T P T S Y = � �� � symmetric! www.nasa.gov

  15. National Aeronautics and Space Administration Principal Component Analysis Any real, symmetric matrix is diagonalized by an orthonormal matrix of its eigenvectors. Therefore, normalizing the data matrix X and computing the n − 1 XX T = S X with give the principal 1 eigenvectors of components! Best approach: the singular value decomposition www.nasa.gov

  16. National Aeronautics and Space Administration Principal Component Analysis Any real, symmetric matrix is diagonalized by an orthonormal matrix of its eigenvectors. Therefore, normalizing the data matrix X and computing the n − 1 XX T = S X with give the principal 1 eigenvectors of components! Best approach: the singular value decomposition www.nasa.gov

  17. National Aeronautics and Space Administration Principal Component Analysis Any real, symmetric matrix is diagonalized by an orthonormal matrix of its eigenvectors. Therefore, normalizing the data matrix X and computing the n − 1 XX T = S X with give the principal 1 eigenvectors of components! Best approach: the singular value decomposition www.nasa.gov

  18. National Aeronautics and Space Administration Principal Component Analysis Definition The singular value decomposition of a real m × xn matrix X is given by: X = UΣV T where U is an m × m matrix containing the eigenvectors of XX T , V is an n × n matrix containing the eigenvectors of X T X , and Σ is an m × n matrix with the square roots of the eigenvalues of XX T along its main diagonal. The singular values σ (elements of Σ) or ordered from greatest to least, and each correspond to a basis vector in U and V . Dimension reduction: choose a minimum acceptable value for the σ s; consider as the principal components only the vectors corresponding to σ s larger than the chosen threshold. www.nasa.gov

  19. National Aeronautics and Space Administration Principal Component Analysis Definition The singular value decomposition of a real m × xn matrix X is given by: X = UΣV T where U is an m × m matrix containing the eigenvectors of XX T , V is an n × n matrix containing the eigenvectors of X T X , and Σ is an m × n matrix with the square roots of the eigenvalues of XX T along its main diagonal. The singular values σ (elements of Σ) or ordered from greatest to least, and each correspond to a basis vector in U and V . Dimension reduction: choose a minimum acceptable value for the σ s; consider as the principal components only the vectors corresponding to σ s larger than the chosen threshold. www.nasa.gov

  20. National Aeronautics and Space Administration Principal Component Analysis Definition The singular value decomposition of a real m × xn matrix X is given by: X = UΣV T where U is an m × m matrix containing the eigenvectors of XX T , V is an n × n matrix containing the eigenvectors of X T X , and Σ is an m × n matrix with the square roots of the eigenvalues of XX T along its main diagonal. The singular values σ (elements of Σ) or ordered from greatest to least, and each correspond to a basis vector in U and V . Dimension reduction: choose a minimum acceptable value for the σ s; consider as the principal components only the vectors corresponding to σ s larger than the chosen threshold. www.nasa.gov

  21. National Aeronautics and Space Administration Principal Component Analysis The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: � � T � � 1 1 1 n − 1X T ⇒ Z T Z = n − 1X T n − 1X T Z = √ √ √ � X T � T 1 1 X T = n − 1 XX T = S X = n − 1 So the matrix V given by the SVD of Z will give the eigenvectors of S X , which are the principal components! Therefore P = V T . Once P is found, the data can be transformed: Y = PX www.nasa.gov

  22. National Aeronautics and Space Administration Principal Component Analysis The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: � � T � � 1 1 1 n − 1X T ⇒ Z T Z = n − 1X T n − 1X T Z = √ √ √ � X T � T 1 1 X T = n − 1 XX T = S X = n − 1 So the matrix V given by the SVD of Z will give the eigenvectors of S X , which are the principal components! Therefore P = V T . Once P is found, the data can be transformed: Y = PX www.nasa.gov

  23. National Aeronautics and Space Administration Principal Component Analysis The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: � � T � � 1 1 1 n − 1X T ⇒ Z T Z = n − 1X T n − 1X T Z = √ √ √ � X T � T 1 1 X T = n − 1 XX T = S X = n − 1 So the matrix V given by the SVD of Z will give the eigenvectors of S X , which are the principal components! Therefore P = V T . Once P is found, the data can be transformed: Y = PX www.nasa.gov

  24. National Aeronautics and Space Administration Principal Component Analysis The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: � � T � � 1 1 1 n − 1X T ⇒ Z T Z = n − 1X T n − 1X T Z = √ √ √ � X T � T 1 1 X T = n − 1 XX T = S X = n − 1 So the matrix V given by the SVD of Z will give the eigenvectors of S X , which are the principal components! Therefore P = V T . Once P is found, the data can be transformed: Y = PX www.nasa.gov

  25. National Aeronautics and Space Administration Principal Component Analysis The SVD is a very important matrix factorization with a wide variety of applications. For PCA, note that: � � T � � 1 1 1 n − 1X T ⇒ Z T Z = n − 1X T n − 1X T Z = √ √ √ � X T � T 1 1 X T = n − 1 XX T = S X = n − 1 So the matrix V given by the SVD of Z will give the eigenvectors of S X , which are the principal components! Therefore P = V T . Once P is found, the data can be transformed: Y = PX www.nasa.gov

  26. National Aeronautics and Space Administration Principal Component Analysis 2D Example Let x 1 = [ x 1 , 1 , . . . , x 1 , 1000 ] , x 2 = [ x 2 , 1 , . . . , x 2 , 1000 ] be random i . i . d . i . i . d . ∼ P 1 and x 2 , j ∼ P 2 ∀ i , j with the two variables such that x 1 , i distributions P 1 , P 2 unknown. So, x 1 , x 2 are two different measurement types (sensors, etc) each containing 1000 measurements. www.nasa.gov

  27. National Aeronautics and Space Administration Principal Component Analysis 2D Example We can plot x 1 vs x 2 data to show that they are strongly correlated: www.nasa.gov

  28. National Aeronautics and Space Administration Principal Component Analysis The SVD of X = [ x 1 , x 2 ] T is computed to be:   3.77 × 10 − 2 − 3.61 × 10 − 2 · · · . . ...   . . U = . .   − 4 . 57 × 10 − 2 · · · 0.97 � 142.85 � 0 Σ = 0 43.61 � � 0 . 63 0 . 77 V T = − 0 . 77 0 . 63 www.nasa.gov

  29. National Aeronautics and Space Administration Principal Component Analysis 2D Example PCA provides a transformation into a new basis in which the data becomes uncorrelated. www.nasa.gov

  30. National Aeronautics and Space Administration Principal Component Analysis 3D Example Let us introduct a new component, so that the data is 3 dimensional: x 3 = x 1 − x 2 ⇒ x 3 provides no new information about the underlying system! Thanks to the SVD, the PCA provides a mechanism for detecting this and removing the redundant dimension. www.nasa.gov

  31. National Aeronautics and Space Administration Principal Component Analysis 3D Example www.nasa.gov

  32. National Aeronautics and Space Administration Principal Component Analysis The SVD of X = [ x 1 , x 2 ] T is computed to be:   3.77 × 10 − 2 − 3.61 × 10 − 2 · · · . . ...  . .  U = . .   − 4 . 57 × 10 − 2 · · · 0.97   142 . 97 0 0 Σ = 0 73 . 35 0   4 . 29 × 10 − 14 0 0   0 . 61 0 . 77 − 0 . 16 V T = 0 . 54 − 0 . 25 0 . 80   − 0 . 577 0 . 577 0 . 577 www.nasa.gov

  33. National Aeronautics and Space Administration Principal Component Analysis 3D Example Since the singular value corresponding to third PC is small, the contribution of that axis in the new basis is minimal ⇒ Projection onto the first two PCs is sufficient to charectorize the data! www.nasa.gov

  34. National Aeronautics and Space Administration Principal Component Analysis 2-Source Audio Example ⊲ ⊲ www.nasa.gov

  35. National Aeronautics and Space Administration Principal Component Analysis 2-Source Audio Example ⊲ ⊲ www.nasa.gov

  36. National Aeronautics and Space Administration In the previous two examples, PCA was not successfull in completely seperating the mixed signals. What is needed: A transformation driven by a stronger measure of independence. www.nasa.gov

  37. National Aeronautics and Space Administration Independent Component Analysis ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear:   w i x i .  .  WX = Y ⇒ y i = .   w i x i where x i , y i represent columns of the source and transformed data matrices X , Y and w i represents a row of the transform matrix W . So the rows of W form a new basis for the columns of X ; they are the Independent Components of the given data. www.nasa.gov

  38. National Aeronautics and Space Administration Independent Component Analysis ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear:   w i x i .  .  WX = Y ⇒ y i = .   w i x i where x i , y i represent columns of the source and transformed data matrices X , Y and w i represents a row of the transform matrix W . So the rows of W form a new basis for the columns of X ; they are the Independent Components of the given data. www.nasa.gov

  39. National Aeronautics and Space Administration Independent Component Analysis ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear:   w i x i .  .  WX = Y ⇒ y i = .   w i x i where x i , y i represent columns of the source and transformed data matrices X , Y and w i represents a row of the transform matrix W . So the rows of W form a new basis for the columns of X ; they are the Independent Components of the given data. www.nasa.gov

  40. National Aeronautics and Space Administration Independent Component Analysis ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear:   w i x i .  .  WX = Y ⇒ y i = .   w i x i where x i , y i represent columns of the source and transformed data matrices X , Y and w i represents a row of the transform matrix W . So the rows of W form a new basis for the columns of X ; they are the Independent Components of the given data. www.nasa.gov

  41. National Aeronautics and Space Administration Independent Component Analysis ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear:   w i x i .  .  WX = Y ⇒ y i = .   w i x i where x i , y i represent columns of the source and transformed data matrices X , Y and w i represents a row of the transform matrix W . So the rows of W form a new basis for the columns of X ; they are the Independent Components of the given data. www.nasa.gov

  42. National Aeronautics and Space Administration Independent Component Analysis ICA, like PCA, aims to compute a ’more meaningful’ basis in which to represent given data. ’More meaningful’: should reduce noise and redundancy in the data Goal: to seperate sources, filter data, and reveal ’hidden’ dynamics. ICA also begins by assuming that the transformation to the new basis is linear:   w i x i .  .  WX = Y ⇒ y i = .   w i x i where x i , y i represent columns of the source and transformed data matrices X , Y and w i represents a row of the transform matrix W . So the rows of W form a new basis for the columns of X ; they are the Independent Components of the given data. www.nasa.gov

  43. National Aeronautics and Space Administration Independent Component Analysis However, unlike PCA: The vectors of the new basis are not assumed to be orthogonal. Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics ( > 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA. Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed. www.nasa.gov

  44. National Aeronautics and Space Administration Independent Component Analysis However, unlike PCA: The vectors of the new basis are not assumed to be orthogonal. Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics ( > 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA. Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed. www.nasa.gov

  45. National Aeronautics and Space Administration Independent Component Analysis However, unlike PCA: The vectors of the new basis are not assumed to be orthogonal. Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics ( > 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA. Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed. www.nasa.gov

  46. National Aeronautics and Space Administration Independent Component Analysis However, unlike PCA: The vectors of the new basis are not assumed to be orthogonal. Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics ( > 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA. Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed. www.nasa.gov

  47. National Aeronautics and Space Administration Independent Component Analysis However, unlike PCA: The vectors of the new basis are not assumed to be orthogonal. Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics ( > 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA. Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed. www.nasa.gov

  48. National Aeronautics and Space Administration Independent Component Analysis However, unlike PCA: The vectors of the new basis are not assumed to be orthogonal. Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics ( > 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA. Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed. www.nasa.gov

  49. National Aeronautics and Space Administration Independent Component Analysis However, unlike PCA: The vectors of the new basis are not assumed to be orthogonal. Directions of highest variance are not assumed to be strongly charectoristic of the underlying dynamics of the system. Measures based on higher order statistics ( > 2) are assumed to be necessary to seperate the sources in a problem. There is no standard measure of independence or computational algorithm to perform ICA. Algorithms are iterative and tend to be much more computationally expensive than the SVD. In general, well-posedness is not guaranteed. www.nasa.gov

  50. National Aeronautics and Space Administration Independent Component Analysis Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered. www.nasa.gov

  51. National Aeronautics and Space Administration Independent Component Analysis Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered. www.nasa.gov

  52. National Aeronautics and Space Administration Independent Component Analysis Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered. www.nasa.gov

  53. National Aeronautics and Space Administration Independent Component Analysis Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered. www.nasa.gov

  54. National Aeronautics and Space Administration Independent Component Analysis Also: There is no framework for reducing the dimensionality of data within ICA (must perform PCA first!) Computationally efficient estimators used to approximate higher order statistics are typically biased. The variances of the original sources cannot be recovered. The signs of the original sources cannot be recovered. Any ordering of the sources which existed prior to mixing cannot be recovered. www.nasa.gov

  55. National Aeronautics and Space Administration Independent Component Analysis Seek W , Y such that Y = W − 1 X and each row of Y maximizes some high-order measure of independence. Typical perspectives: Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy The optimization for any choice of the above measures is motivated by the Central Limit Theorem . www.nasa.gov

  56. National Aeronautics and Space Administration Independent Component Analysis Seek W , Y such that Y = W − 1 X and each row of Y maximizes some high-order measure of independence. Typical perspectives: Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy The optimization for any choice of the above measures is motivated by the Central Limit Theorem . www.nasa.gov

  57. National Aeronautics and Space Administration Independent Component Analysis Seek W , Y such that Y = W − 1 X and each row of Y maximizes some high-order measure of independence. Typical perspectives: Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy The optimization for any choice of the above measures is motivated by the Central Limit Theorem . www.nasa.gov

  58. National Aeronautics and Space Administration Independent Component Analysis Seek W , Y such that Y = W − 1 X and each row of Y maximizes some high-order measure of independence. Typical perspectives: Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy The optimization for any choice of the above measures is motivated by the Central Limit Theorem . www.nasa.gov

  59. National Aeronautics and Space Administration Independent Component Analysis Seek W , Y such that Y = W − 1 X and each row of Y maximizes some high-order measure of independence. Typical perspectives: Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy The optimization for any choice of the above measures is motivated by the Central Limit Theorem . www.nasa.gov

  60. National Aeronautics and Space Administration Independent Component Analysis Seek W , Y such that Y = W − 1 X and each row of Y maximizes some high-order measure of independence. Typical perspectives: Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy The optimization for any choice of the above measures is motivated by the Central Limit Theorem . www.nasa.gov

  61. National Aeronautics and Space Administration Independent Component Analysis Seek W , Y such that Y = W − 1 X and each row of Y maximizes some high-order measure of independence. Typical perspectives: Maximum liklihood Direct high-order moments Maximization of mutual information Maximization of negative information entropy The optimization for any choice of the above measures is motivated by the Central Limit Theorem . www.nasa.gov

  62. National Aeronautics and Space Administration Independent Component Analysis Central Limit Theorem (Lyapunov) Let X n , n ∈ N be any sequence of independent random variables; N = � n each with finite mean µ n and variance σ 2 n . Define S 2 i =1 σ 2 i . � [ X k ] 2+ δ � If for some δ > 0 the expectations E are finite for every N � [ X n − µ n ] 2+ δ � � 1 k ∈ N and the condition lim = 0 is E S 2+ δ N →∞ N i =1 satisfied, then: � N i =1 ( X n − µ n ) distr. → Normal (0 , 1) as N → ∞ S n www.nasa.gov

  63. National Aeronautics and Space Administration Independent Component Analysis Heuristic arguement: The sum of any group of independent random variables is ’more gaussian’ they any of the individual random variables. Assume that none of the original sources has a gaussian distribution: Then minimizing gaussinity w.r.t. higher order statistical measures should seperate the sources in X ! www.nasa.gov

  64. National Aeronautics and Space Administration Independent Component Analysis Heuristic arguement: The sum of any group of independent random variables is ’more gaussian’ they any of the individual random variables. Assume that none of the original sources has a gaussian distribution: Then minimizing gaussinity w.r.t. higher order statistical measures should seperate the sources in X ! www.nasa.gov

  65. National Aeronautics and Space Administration Independent Component Analysis Heuristic arguement: The sum of any group of independent random variables is ’more gaussian’ they any of the individual random variables. Assume that none of the original sources has a gaussian distribution: Then minimizing gaussinity w.r.t. higher order statistical measures should seperate the sources in X ! www.nasa.gov

  66. National Aeronautics and Space Administration Independent Component Analysis Definition The Kurtosis of a a random variable x is defined to be: y 2 �� 2 � x 4 � � � κ ( x ) = E − 3 E Kurtosis is a measure of ’peakedness’ and thickness of tails for a distribution. Note that if x is gaussian: y 2 �� 2 − 3 y 2 �� 2 = 0 � � � � κ ( x ) = 3 E E So, simultaneously minimizing | κ ( Y 1 ) | , . . . , | κ ( Y m ) | or ( κ ( Y 1 )) 2 , . . . , ( κ ( Y m )) 2 can provide a basis where the recovered sources are (in one sense) maximally non-gaussian. www.nasa.gov

  67. National Aeronautics and Space Administration Independent Component Analysis Definition The Kurtosis of a a random variable x is defined to be: y 2 �� 2 � x 4 � � � κ ( x ) = E − 3 E Kurtosis is a measure of ’peakedness’ and thickness of tails for a distribution. Note that if x is gaussian: y 2 �� 2 − 3 y 2 �� 2 = 0 � � � � κ ( x ) = 3 E E So, simultaneously minimizing | κ ( Y 1 ) | , . . . , | κ ( Y m ) | or ( κ ( Y 1 )) 2 , . . . , ( κ ( Y m )) 2 can provide a basis where the recovered sources are (in one sense) maximally non-gaussian. www.nasa.gov

  68. National Aeronautics and Space Administration Independent Component Analysis Definition The Kurtosis of a a random variable x is defined to be: y 2 �� 2 � x 4 � � � κ ( x ) = E − 3 E Kurtosis is a measure of ’peakedness’ and thickness of tails for a distribution. Note that if x is gaussian: y 2 �� 2 − 3 y 2 �� 2 = 0 � � � � κ ( x ) = 3 E E So, simultaneously minimizing | κ ( Y 1 ) | , . . . , | κ ( Y m ) | or ( κ ( Y 1 )) 2 , . . . , ( κ ( Y m )) 2 can provide a basis where the recovered sources are (in one sense) maximally non-gaussian. www.nasa.gov

  69. National Aeronautics and Space Administration Independent Component Analysis Definition The Kurtosis of a a random variable x is defined to be: y 2 �� 2 � x 4 � � � κ ( x ) = E − 3 E Kurtosis is a measure of ’peakedness’ and thickness of tails for a distribution. Note that if x is gaussian: y 2 �� 2 − 3 y 2 �� 2 = 0 � � � � κ ( x ) = 3 E E So, simultaneously minimizing | κ ( Y 1 ) | , . . . , | κ ( Y m ) | or ( κ ( Y 1 )) 2 , . . . , ( κ ( Y m )) 2 can provide a basis where the recovered sources are (in one sense) maximally non-gaussian. www.nasa.gov

  70. National Aeronautics and Space Administration Independent Component Analysis Drawbacks of using kurtosis as an optimality critereon: Very sensitive to outliers. Note a robust measure of gaussinity. A more suitable measure of gaussinity is required to produce stable ICA methods. www.nasa.gov

  71. National Aeronautics and Space Administration Independent Component Analysis Definition The Differential Entropy of a a continuous random variable X with density function f X ( x )is defined to be: � H ( X ) = − f X ( x ) log f X ( x ) dx Can be interpreted as the degree of information carried by a random variable. Fundamental result in information theory: A gaussian random variable has the greatest entropy among all random variables of equal variance . www.nasa.gov

  72. National Aeronautics and Space Administration Independent Component Analysis Definition The Differential Entropy of a a continuous random variable X with density function f X ( x )is defined to be: � H ( X ) = − f X ( x ) log f X ( x ) dx Can be interpreted as the degree of information carried by a random variable. Fundamental result in information theory: A gaussian random variable has the greatest entropy among all random variables of equal variance . www.nasa.gov

  73. National Aeronautics and Space Administration Independent Component Analysis Definition The Differential Entropy of a a continuous random variable X with density function f X ( x )is defined to be: � H ( X ) = − f X ( x ) log f X ( x ) dx Can be interpreted as the degree of information carried by a random variable. Fundamental result in information theory: A gaussian random variable has the greatest entropy among all random variables of equal variance . www.nasa.gov

  74. National Aeronautics and Space Administration Independent Component Analysis Consider the following: Definition The Negative Entropy (or Negentropy ) of a a continuous random variable X with density function f X ( x )is defined to be: J ( X ) = H ( X gauss ) − H ( X ) where X gauss is a random variable with identical variance to X (or identical covariance matrix). Advantages: Always non-negative; equal to 0 for a gaussian random variable. Not sensitive to sample outliers. www.nasa.gov

  75. National Aeronautics and Space Administration Independent Component Analysis Consider the following: Definition The Negative Entropy (or Negentropy ) of a a continuous random variable X with density function f X ( x )is defined to be: J ( X ) = H ( X gauss ) − H ( X ) where X gauss is a random variable with identical variance to X (or identical covariance matrix). Advantages: Always non-negative; equal to 0 for a gaussian random variable. Not sensitive to sample outliers. www.nasa.gov

  76. National Aeronautics and Space Administration Independent Component Analysis Consider the following: Definition The Negative Entropy (or Negentropy ) of a a continuous random variable X with density function f X ( x )is defined to be: J ( X ) = H ( X gauss ) − H ( X ) where X gauss is a random variable with identical variance to X (or identical covariance matrix). Advantages: Always non-negative; equal to 0 for a gaussian random variable. Not sensitive to sample outliers. www.nasa.gov

  77. National Aeronautics and Space Administration Independent Component Analysis Difficulties: Negentropy optimization is computationally difficult to deal with directly. Estimates: y 3 � 2 + 1 � J ( X ) ≈ 1 48 κ ( y ) 2 12 E Same problems as in the case of just using kurtosis! n � k i ( E [ G i ( y )] − E [ G i ( v )]) 2 , where { k i } are J ( X ) ≈ i =1 positive constants, v is a standard gaussian random variable and { G i } are some non-quadratic functions. www.nasa.gov

  78. National Aeronautics and Space Administration Independent Component Analysis Difficulties: Negentropy optimization is computationally difficult to deal with directly. Estimates: y 3 � 2 + 1 � J ( X ) ≈ 1 48 κ ( y ) 2 12 E Same problems as in the case of just using kurtosis! n � k i ( E [ G i ( y )] − E [ G i ( v )]) 2 , where { k i } are J ( X ) ≈ i =1 positive constants, v is a standard gaussian random variable and { G i } are some non-quadratic functions. www.nasa.gov

  79. National Aeronautics and Space Administration Independent Component Analysis Difficulties: Negentropy optimization is computationally difficult to deal with directly. Estimates: y 3 � 2 + 1 � J ( X ) ≈ 1 48 κ ( y ) 2 12 E Same problems as in the case of just using kurtosis! n � k i ( E [ G i ( y )] − E [ G i ( v )]) 2 , where { k i } are J ( X ) ≈ i =1 positive constants, v is a standard gaussian random variable and { G i } are some non-quadratic functions. www.nasa.gov

  80. National Aeronautics and Space Administration Independent Component Analysis Typically: All of the G i are the same function. Very good results have been demonstrated using: 1 G ( u ) = α 1 log [cosh ( α 1 u )], for some constant 1 ≤ α 1 ≤ 2 � � − u 2 / 2 G ( u ) = − exp www.nasa.gov

  81. National Aeronautics and Space Administration Independent Component Analysis Typically: All of the G i are the same function. Very good results have been demonstrated using: 1 G ( u ) = α 1 log [cosh ( α 1 u )], for some constant 1 ≤ α 1 ≤ 2 � � − u 2 / 2 G ( u ) = − exp www.nasa.gov

  82. National Aeronautics and Space Administration Independent Component Analysis Typically: All of the G i are the same function. Very good results have been demonstrated using: 1 G ( u ) = α 1 log [cosh ( α 1 u )], for some constant 1 ≤ α 1 ≤ 2 � � − u 2 / 2 G ( u ) = − exp www.nasa.gov

Recommend


More recommend