a problem too many features
play

A problem - too many features D. Dubhashi D. Dubhashi Aim: To - PowerPoint PPT Presentation

Introduction Introduction A problem - too many features D. Dubhashi D. Dubhashi Aim: To build a classifier that can diagnose leukaemia Introduction Introduction using Gene expression data. Features Features TDA 231 Projections


  1. Introduction Introduction A problem - too many features D. Dubhashi D. Dubhashi ◮ Aim: To build a classifier that can diagnose leukaemia Introduction Introduction using Gene expression data. Features Features TDA 231 Projections ◮ Data: 27 healthy samples,11 leukaemia samples ( N = 38). Projections PCA Each sample is the expression (activity) level for 3751 genes. PCA Dimension Reduction: PCA ICA (Also have an independent test set) ICA Devdatt Dubhashi dubhashi@chalmers.se Department of Computer Science and Engg. Chalmers University March 3, 2017 ◮ In general, the number of parameters will increase with the number of features – D = 3751. ◮ e.g. Logistic regression – w would have length 3751! ◮ Fitting lots of parameters is hard – imagine Metropolis-Hastings in 3751 dimensions rather than 2! Introduction Introduction Features Making new features D. Dubhashi D. Dubhashi Introduction Introduction Features Features Projections Projections ◮ An alternative to choosing features is making new ones. ◮ For visualisation, most examples we’ve seen have had PCA PCA ◮ Cluster: only 2 features x = [ x 1 , x 2 ] T . ICA ICA ◮ Cluster the features (turn our clustering problem ◮ We sometimes created more: x = [1 , x 1 x 2 1 , x 3 1 , . . . ] T . around) ◮ If we use say K-means, our new features will be the K ◮ Now, we’ve been given lots (3751) to start with. mean vectors. ◮ We need to reduce this number. ◮ Projection/combination ◮ 2 general schemes: ◮ Reduce the number of features by projecting into a ◮ Use a subset of the originals. lower dimensional space. ◮ Make new ones by combining the originals. ◮ Do this by making new features that are combinations (linear) of the old ones.

  2. Introduction Introduction Projection Projection D. Dubhashi D. Dubhashi Introduction Introduction ◮ We can project data ( D dimensions) into a lower Features Features number of dimensions ( M ). A 3-dimensional Projections Projections object ◮ Z = XW PCA PCA ◮ X is N × D ICA ICA ◮ W is D × M ◮ Z is N × M – an M -dimensional representation of our N objects. ◮ W defines the projection ◮ Changing W is like changing where the light is coming from for the shadow (or rotating the hand). ◮ ( X is the hand, Z is the shadow) A 2-dimensional ◮ Once we’ve chosen W we can project test data into this projection new space too: Z new = X new W Introduction Introduction Choosing W Principal Components Analysis D. Dubhashi D. Dubhashi ◮ Principal Components Analysis (PCA) is a method for ◮ Different W will give us different projections (imagine Introduction Introduction choosing W . moving the light). Features Features ◮ It finds the columns of W one at a time (define the m th Projections Projections ◮ Which should we use? column as w m ). PCA PCA ◮ Not all will represent our data well... ◮ Each D × 1 column defines one new dimension. ICA ICA ◮ Consider one of the new dimensions (columns of Z ): z m = Xw m This doesn't look ◮ PCA chooses w m to maximise the variance of z m like a hand! N N 1 µ m = 1 � ( z mn − µ m ) 2 , � z mn N N n =1 n =1 ◮ Once the first one has been found, the w 2 is found that maximises the variance and is orthogonal to the first one etc etc.

  3. Introduction Introduction PCA – a visualisation PCA – a visualisation D. Dubhashi D. Dubhashi Introduction Introduction 3 Features Features 2 3 Projections Projections σ 2 z = 0 . 39 1 2 PCA PCA x 2 ICA ICA 0 1 −1 x 2 0 −2 −1 −3 −2 −3 −2 −1 0 1 2 3 x 1 −3 −3 −2 −1 0 1 2 3 x 1 ◮ Pick some arbitrary w . ◮ Project the data onto it. ◮ Original data in 2-dimensions. ◮ Compute the variance (on the line). ◮ We’d like a 1-dimensional projection. ◮ The position on the line is our 1 dimensional representation. Introduction Introduction PCA – a visualisation PCA – a visualisation D. Dubhashi D. Dubhashi Introduction Introduction 3 3 Features Features 2 2 Projections σ 2 Projections z = 1 . 9 σ 2 σ 2 1 z = 0 . 39 1 z = 0 . 39 PCA PCA x 2 x 2 0 ICA 0 ICA −1 −1 −2 −2 σ 2 σ 2 z = 1 . 2 z = 1 . 2 −3 −3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x 1 x 1 ◮ Pick some arbitrary w . ◮ Pick some arbitrary w . ◮ Project the data onto it. ◮ Project the data onto it. ◮ Compute the variance (on the line). ◮ Compute the variance (on the line). ◮ The position on the line is our 1 dimensional ◮ The position on the line is our 1 dimensional representation. representation.

  4. Introduction Introduction PCA – analytic solution PCA – analytic solution D. Dubhashi D. Dubhashi Introduction Introduction 3 Features Features Projections Projections 2 PCA PCA ICA ICA 1 ◮ Could search for w 1 , . . . , w M ◮ But, analytic solution is available. x 2 0 ◮ w are the eignvectors of the covariance matrix of X . −1 ◮ Matlab: princomp(x) −2 σ 2 z = 1 . 9 −3 −3 −2 −1 0 1 2 3 x 1 ◮ What would be the second component? Introduction Introduction PCA – leukaemia data PCA – leukaemia data D. Dubhashi D. Dubhashi Introduction Introduction 20 0.12 Features Features Projections Projections 10 0.1 PCA PCA ICA ICA 0 Test error 0.08 z 2 −10 0.06 −20 0.04 −30 −40 −20 0 20 40 0.02 z 1 0 5 10 15 20 25 30 M First two principal components in our leukaemia data (points Test error as more and more components are used. labeled by class).

  5. Introduction Introduction Summary D. Dubhashi D. Dubhashi Introduction Introduction Features Features Projections Projections PCA PCA ICA ICA Part 2: ICA ◮ Sometimes we have too much data (too many dimensions). (the cocktail party problem) ◮ Features can be dimensions that already exist. ◮ Or we can make new ones. Introduction Introduction The cocktail party problem Demo D. Dubhashi D. Dubhashi Introduction Introduction Microphone 4 Features Features Microphone 3 Projections Projections ◮ Online: PCA PCA ICA ICA ◮ http://www.cis.hut.fi/projects/ica/cocktail/ cocktail_en.cgi ◮ Matlab: ◮ Available on course webpage ◮ To run: ◮ load ica demo.mat Microphone 2 Microphone 1 ◮ ica image ◮ Each microphone will record a combination of all speakers. ◮ Can we separate them back out again?

  6. Introduction Introduction Independent components analysis – how it Inference D. Dubhashi D. Dubhashi works... Introduction Introduction ◮ Corrupted data (images/sounds) is a vector of D Features Features numbers. i.e. n th image: Projections Projections ◮ From Bayes’ (look back...) PCA PCA x n ICA ICA p ( S | X , A , σ 2 ) ∝ p ( X | S , A , σ 2 ) p ( S ) ◮ We have N images – stack them up into an N × D ◮ In our demo, we found values of S , A and σ 2 that matrix: X maximised the log posterior. ◮ MAP solution... ◮ Assume that this is the result of the following ◮ There is some further reading on the webpage if you corrupting process: want to know more... X = AS + E ◮ A is mixing matrix. E is noise. ( S is N × D ). e nd ∼ N (0 , σ 2 ) Introduction Introduction Aside – ICA and the central limit theorem Aside – ICA and the central limit theorem D. Dubhashi D. Dubhashi Introduction Introduction ◮ Central limit theorem (paraphrased): Features Features ◮ Sometimes ICA is performed by reversing this theorem: ◮ If we keep adding the outcomes of independent random Projections Projections PCA PCA variables together, we eventually get something that X = AS + E looks Gaussian. ICA ICA ◮ Example: Roll a die m times and take the average. ◮ X is some random variables added together. (Repeat this lots of times to get histogram) ◮ It will be more ‘Gaussian’ than S 200 200 250 200 ◮ Find S that is as non-Gaussian as possible. 150 150 150 100 100 100 ◮ More resource: 50 50 50 0 0 0 ◮ http://www.cis.hut.fi/projects/ica/icademo/ 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Average roll Average roll Average roll ◮ http://www.cis.hut.fi/projects/ica/ ◮ From left to right: m = 1, m = 2, m = 5. Looking more Gaussian as m increases.

  7. Introduction Summary D. Dubhashi Introduction Features Projections ◮ PCA and ICA are both examples of projection PCA techniques. ICA ◮ Both assume a linear transformation ◮ ICA: X = AS + E ◮ PCA: Z = XW ◮ PCA can be used for Data pre-processing or visualisation. ◮ ICA can be used to separate sources that have been mixed together. ◮ Also looked at PCA as a feature selection method.

Recommend


More recommend