machine learning kernel canonical correlation analysis
play

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Structure of todays and next weeks class 1) Briefly go through one


  1. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING Kernel Canonical Correlation Analysis 1

  2. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Structure of today’s and next week’s class 1) Briefly go through one extension of principal component analysis, namely Canonical Correlation Analysis (CCA). 2) Derive the non-linear version of CCA, kernel CCA (kCCA). 3) Make an exercise to understand the modulation of the space generated by CCA and kCCA. 2

  3. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Canonical Correlation Analysis (CCA) x  N y  N x y   1 1 x y ,   T T max w corr w x w y , x y x y , w   2 2 x , y Video description Audio description Determine features in two (or more) separate descriptions of the dataset that best explain each datapoint. Extract hidden structure that maximize correlation across two different projections. 3

  4. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Canonical Correlation Analysis (CCA)     M M     N N Pair of multidimensional X x , Y y y i x i   i 1 1 i zero mean variables   1 1 x y , We have M instances of the pairs.   T T max w corr w x w y , Search two projections w and w : x y x y x y , w   T T and z w X z w Y x x y y   2 2 x , y solutions of:     max max corr z ,z x y w w , x y 4

  5. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Canonical Correlation Analysis (CCA)   T T max w corr w x w y , Search two projections w and w : x y x y x y , w   T T and z w X z w Y x x y y solutions of:     max max corr z ,z x y w w , x y   T T T w w E XY w w C  x y  x y xy max max T T w X w Y T T w , w , w w w w C w C w x y x y x y x xx x y yy y With and zero mean, i.e. X Y       E X E Y 0 5

  6. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Canonical Correlation Analysis (CCA) Crosscovariance matrix  is C N N xy x y Measure crosscorrelation between and . X Y Covariance matrices    T C =E XX : N N solutions of: xx x x      T C =E YY : N N   yy y y max max corr z ,z x y w w , x y   T T T w w E XY w w C  x y  x y xy max max T T w X w Y T T w , w , w w w w C w C w x y x y x y x xx x y yy y With and zero mean, i.e. X Y       E X E Y 0 6

  7. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Canonical Correlation Analysis (CCA) Correlation not affected by rescaling the norm of the vectors,    T T we can ask that w C w w C w 1 x y xx x yy y   T max max w C w x xy y w , w x y   T T u. c. w C w w C w 1 x xx x y yy y solutions of:     max max corr z ,z x y w w , x y   T T T w w E XY w w C  x y  x y xy max max T T w X w Y T T w , w , w w w w C w C w x y x y x y x xx x y yy y 7

  8. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Canonical Correlation Analysis (CCA) Correlation not affected by rescaling the norm of the vectors,    T T we can ask that w C w w C w 1 x y xx x yy y   T max max w C w x xy y w , w x y   T T u. c. w C w w C w 1 x xx x y yy y  To determine the optimum (maximum) of , solve by Lagrange:               T T T L w w , , , = w C w w C w 1 w C w 1 x y x y x xy y x x xx x y y yy y Taking the partial derivatives over w w , x y       : / 2 x y 8

  9. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Canonical Correlation Analysis (CCA)  Replacing and write the set of equations gives:         0 C 0 w C w xy   x xx x         Generalized Eigenvalue Problem;   w 0 C w 0      C   It can be reduced to a classical eigenvalue y yy y yx problem if C xx is invertible  Which can be rewritten as    1 2 C C C w C w xy yy yx x xx x Solving for w gives: y    1 2 C C C w C w yx xx xy y yy y If is invertible, it becomes an eigenvalue problem as for . C w yy y These two eigenvalue problems yield a pair of    i i vectors , , where min( , ) q w w q N N x y x y  1.. i q   N i N i , w w y x x y 9

  10. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING CCA: Exercise I Consider the example below of a dataset of 4 points with 2-dimensional coordinates in both X and Y. • Determine by hand the directions found by CCA in each space. • Contrast to the directions found by PCA. X2 X3 1 Exercise - I 0.5 0 -0.5 X4 X1 -1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1 Y2 0.5 Y3 0 Y4 -0.5 Y1 -1 -1.5 -1 -0.5 0 0.5 1 10

  11. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Kernel Canonical Correlation Analysis CCA finds basis vectors, s.t. the correlation between the projections (of all datapoints in X and Y ) is mutually maximized. CCA is a generalized version of PCA for two or more multi- dimensional datasets, but unlike PCA it does have the constraint to find orthogonal vectors. Assumes a linear correlation. If correlation non-linear  Kernel CCA. 12

  12. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Kernel Canonical Correlation Analysis (kCCA) x  N y  N x y   1 1 x y ,         T T max , w corr w x w y x x y y x y w ,   2 2 x , y Assume two transformations   y x Video description Audio description And then perform correlation analysis in feature space across the two feature spaces. 13

  13. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING From CCA to Kernel CCA     M M     N N , X x Y y y i x i   1 i i 1 Send into two separate feature spaces for data in X and in Y.             M M   M M       x and y , with x 0 and y 0 i i i i x y x y   i 1 i 1   i 1 i 1 Construct associated kernel matrices:         T T i i , , columns of , are , K F F K F F F F x y x x x y y y x y x y 14

  14. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING From CCA to Kernel CCA In kernel CCA, we solve for: In Linear CCA, we were solving for:   T T T T max F F F F max w C w x x x y y y x y xy w w , w w , K K x y x y x y         T T T T T T T T u.c. F F F F F F F F 1 u.c. w w 1 C w C w x y x x x x x y y y y y x xx x y yy y K K y x Express the projection vectors as a linear combination of images of datapoints in feature space (as in kPCA): Replace the covariance and crosscovariance     and w F w F x x x y y y matrices by the product of the projection     M M          i i w x and w y vectors in feature space (as in kPCA): x x i , x x y i , y   i 1 i 1  T C F F xx x x  T C F F yy y y  T C F F xy x y 15

  15. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Kernel CCA In summary, in kernel CCA, we search the projection vectors , w w x y (that live in feature space) so as to maximize:           max max corr w x , w y x x y y , , w w w w x y x y     This is again a generalized eigenvalue problem T max max K K x x y y   w , w ,   x y x y with , the dual eigenvectors (as dual vectors     x y       T 2 T 2 u c . . K K 1 x x x y y y in kPCA), see documentation in annexes for derivation. Generalized eigenvalue problem:          2 0 K K K 0 x y   x  x  x            2 K K 0       0 K   y y y x y 16

  16. ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Kernel CCA   If the intersection between the spaces spanned by , is non-zero, K K x x y y       then the problem has a trivial solution, as ~ cos K , K 1 x x y y (see solution to the exercises).     T max max K K x x y y   w , w , x y x y           T 2 T 2 u c . . K K 1 x x x y y y Generalized eigenvalue problem:          2 0 K K K 0 x y   x  x  x            2 K K 0       0 K   y y y x y 17

Recommend


More recommend