Introduction to Machine Learning 10701 Independent Component - PowerPoint PPT Presentation

Introduction to Machine Learning 10701 Independent Component Analysis Barnabás Póczos & Aarti Singh

Independent Component Analysis 2

Independent Component Analysis Model Observations (Mixtures) original signals ICA estimated signals 3

Independent Component Analysys Model We observe We want Goal: 4

The Cocktail Party Problem SOLVI NG WI TH PCA Sources Observation PCA Estimation Mixing x(t) = As(t) y(t)=Wx(t) s(t) 5

The Cocktail Party Problem SOLVI NG WI TH I CA Sources Observation ICA Estimation Mixing x(t) = As(t) y(t)=Wx(t) s(t) 6

ICA vs PCA, Similarities • Perform linear transformations • Matrix factorization PCA : low rank matrix factorization for compression M<N N S = U X Columns of U = PCA vectors M ICA : full rank matrix factorization to remove dependency among the rows N = S A X Columns of A = ICA vectors 7 N

ICA vs PCA, Similarities  PCA: X= US, U T U= I  ICA: X= AS, A is invertible  PCA does compression • M< N  ICA does not do compression • same # of features (M= N)  PCA just removes correlations, not higher order dependence  ICA removes correlations, and higher order dependence  PCA: some components are more important than others (based on eigenvalues)  ICA: components are equally important 8

ICA vs PCA Note • PCA vectors are orthogonal • ICA vectors are not orthogonal 9

ICA vs PCA 10

ICA basis vectors extracted from natural images Gabor wavelets, edge detection, receptive fields of V1 cells..., deep neural networks 11

PCA basis vectors extracted from natural images 12

Some ICA Applications TEMPORAL STATI C • Image denoising • Medical signal processing – fMRI, ECG, EEG • Microarray data processing • Brain Computer Interfaces • Decomposing the spectra of • Modeling of the hippocampus, galaxies place cells • Face recognition • Modeling of the visual cortex • Facial expression recognition • Time series analysis • Feature extraction • Financial applications • Clustering • Blind deconvolution • Classification • Deep Neural Networks 13

ICA Application, Removing Artifacts from EEG  EEG ~ Neural cocktail party  Severe con ont am inat ion on of EEG activity by • eye movements • blinks • muscle • heart, ECG artifact • vessel pulse • electrode noise • line noise, alternating current (60 Hz)  ICA can improve signal • effectively det ect ct , separat e and rem ove activity in EEG records from a wide variety of artifactual sources. (Jung, Makeig, Bell, and Sejnowski)  ICA weights (mixing matrix) help find location of sources 14

ICA Application, Removing Artifacts from EEG 15 Fig from Jung

Removing Artifacts from EEG 16 Fig from Jung

ICA for Image Denoising original noisy Wiener filtered ICA denoised (Hoyer, Hyvarinen) median filtered 17 17

ICA for Motion Style Components  Method for analysis and synthesis of human motion from motion captured data  Provides perceptually meaningful “style” components  109 markers, (327dim data)  Motion capture ) data matrix Goal: Find motion style components. ICA ) 6 independent components (emotion, content,…) (Mori & Hoshino 2002, Shapiro et al 2006, Cao et al 2003) 18

walk sneaky walk with sneaky sneaky with walk 19

ICA Theory 20

Statistical (in)dependence Definition (Independence) Definition (Shannon entropy) Definition (KL divergence) Definition (Mutual Information) 21

Solving the ICA problem with i.i.d. sources 22

Solving the ICA problem 23

Whitening (We assumed centered data) 24

Whitening We have, 25

Whitening solves half of the ICA problem Note: The number of free parameters of an N by N orthogonal matrix is (N-1)(N-2)/2. ) whitening solves half of the ICA problem original mixed whitened 26

Solving ICA I CA task: Given x ,  find y (the estimation of s ),  find W (the estimation of A -1 ) I CA solution : y= Wx  Remove mean, E[ x ]= 0  Whitening, E[ xx T ]= I  Find an orthogonal W optimizing an objective function • Sequence of 2-d Jacobi (Givens) rotations rotated original mixed whitened 27 (demixed)

Optimization Using Jacobi Rotation Matrices p q q p 28

ICA Cost Functions Lemma Proof: Homework Therefore, 29

ICA Cost Functions Therefore, The covariance is fixed: I. Which distribution has the largest entropy? ) go away from normal distribution 30

Central Limit Theorem The sum of independent variables converges to the normal distribution ) For separation go far away from the normal distribution ) Negentropy, |kurtozis| maximization 31 31 Figs from Ata Kaban

ICA Algorithms 32

Maximum Likelihood ICA Algorithm David J.C. MacKay (97) rows of W 33

Maximum Likelihood ICA Algorithm 34

ICA algorithm based on Kurtosis maximization Kurtosis = 4 th order cumulant Measures •the distance from normality •the degree of peakedness 35

The Fast ICA algorithm (Hyvarinen) Probably the most famous ICA algorithm ( ¸ Lagrange multiplier) Solve this equation by Newton–Raphson’s method. 36

Newton method for finding a root 37

Newton Method for Finding a Root Goal: Linear Approximation ( 1 st order Taylor approx ) : Therefore, 38

Illustration of Newton’s method Goal : finding a root In the next step we will linearize here in x 39

Example: Finding a Root http://en.wikipedia.org/wiki/Newton%27s_method 40

Newton Method for Finding a Root This can be generalized to multivariate functions Therefore, [Pseudo inverse if there is no inverse] 41

Newton method for FastICA 42

The Fast ICA algorithm (Hyvarinen) Solve : Note : The derivative of F : 43

The Fast ICA algorithm (Hyvarinen) The Jacobian matrix becomes diagonal, and can easily be inverted. Therefore, 44

Other Nonlinearities 45

Other Nonlinearities Newton method: Algorithm: 46

Fast ICA for several units 47

Introduction to Machine Learning 10701 Independent Component - PowerPoint PPT Presentation

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos & Aarti Singh Independent Component Analysis 2 Independent Component Analysis Model Observations (Mixtures) original signals ICA estimated signals

Introduction to Machine Learning CMU-10701 Support Vector Machines Barnabs Pczos & Aarti

Introduction to Machine Learning CMU-10701 2. MLE, MAP What happened last time? Barnabs

Introduction to Machine Learning CMU-10701 Deep Learning Barnabs Pczos & Aarti Singh

Introduction to Machine Learning CMU-10701 8. Stochastic Convergence Barnabs Pczos

Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabs Pczos

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos &

Introduction to Machine Learning CMU-10701 19. Clustering and EM Barnabs Pczos Contents

Introduction to Machine Learning CMU-10701 14. Principal Component Analysis Barnabs Pczos

Introduction to Machine Learning CMU-10701 Stochastic Convergence Barnabs Pczos Motivation

Introduction to Machine Learning CMU-10701 11. Learning Theory Barnabs Pczos Learning

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex

Introduction to Machine Learning CMU-10701 Stochastic Convergence and Tail Bounds Barnabs

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos & Aarti Singh

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos & Aarti

Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabs Pczos Contents

Reinforcement Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon

Introduction to Machine Learning CMU-10701 3. Bayes classification Barnabs Pczos & Aarti

Deep Neural Networks Machine Learning http://www.cs.cmu.edu/~10701 Organizational info All

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos

Introduction to Machine Learning CMU-10701 9. Tail Bounds Barnabs Pczos Fourier Transform

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

CMU-10701 Support Vector Machines Barnabs Pczos & Aarti Singh 2014 Spring

Big Picture Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 2

Introduction to Machine Learning 10701 Independent Component - PowerPoint PPT Presentation

Introduction to Machine Learning 10701 Independent Component Analysis Barnabs Pczos & Aarti Singh Independent Component Analysis 2 Independent Component Analysis Model Observations (Mixtures) original signals ICA estimated signals

Introduction to Machine Learning CMU-10701 Support Vector Machines Barnabs Pczos &amp; Aarti

Introduction to Machine Learning CMU-10701 2. MLE, MAP What happened last time? Barnabs

Introduction to Machine Learning CMU-10701 Deep Learning Barnabs Pczos &amp; Aarti Singh

Introduction to Machine Learning CMU-10701 8. Stochastic Convergence Barnabs Pczos

Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabs Pczos

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos &amp;

Introduction to Machine Learning CMU-10701 19. Clustering and EM Barnabs Pczos Contents

Introduction to Machine Learning CMU-10701 14. Principal Component Analysis Barnabs Pczos

Introduction to Machine Learning CMU-10701 Stochastic Convergence Barnabs Pczos Motivation

Introduction to Machine Learning CMU-10701 11. Learning Theory Barnabs Pczos Learning

Introduction to Machine Learning CMU-10701 10. Risk Minimization Barnabs Pczos 10. Risk

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos &amp; Alex

Introduction to Machine Learning CMU-10701 Stochastic Convergence and Tail Bounds Barnabs

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos &amp; Aarti Singh

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos &amp; Aarti

Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabs Pczos Contents

Reinforcement Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon

Introduction to Machine Learning CMU-10701 3. Bayes classification Barnabs Pczos &amp; Aarti

Deep Neural Networks Machine Learning http://www.cs.cmu.edu/~10701 Organizational info All

Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabs Pczos

Introduction to Machine Learning CMU-10701 9. Tail Bounds Barnabs Pczos Fourier Transform

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

CMU-10701 Support Vector Machines Barnabs Pczos &amp; Aarti Singh 2014 Spring

Big Picture Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 2

Introduction to Machine Learning CMU-10701 Support Vector Machines Barnabs Pczos & Aarti

Introduction to Machine Learning CMU-10701 Deep Learning Barnabs Pczos & Aarti Singh

Introduction to Machine Learning CMU-10701 Principal Component Analysis Barnabs Pczos &

Introduction to Machine Learning CMU-10701 2. Basic Statistics Barnabs Pczos & Alex

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos & Aarti Singh

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos & Aarti

Introduction to Machine Learning CMU-10701 3. Bayes classification Barnabs Pczos & Aarti

CMU-10701 Support Vector Machines Barnabs Pczos & Aarti Singh 2014 Spring