kernel methods i
play

Kernel Methods - I Henrik I Christensen Robotics & Intelligent - PowerPoint PPT Presentation

Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernel Methods - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu


  1. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernel Methods - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Kernel Methods 1 / 22

  2. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Outline Introduction 1 Dual Representations 2 Kernel Design 3 Radial Basis Functions 4 Summary 5 Henrik I Christensen (RIM@GT) Kernel Methods 2 / 22

  3. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Introduction This far the process has been about data compression and optimal regressions / discrimination Once process complete the training set is discarded and the model is used for processing What if data were kept and used directly for estimation? Why you ask? The decision boundaries migth not be simple or the modelling is too complicated Already discussed Nearest Neighbor (NN) as an example of direct data processing A complete class of memory based techniques Q: how to measure similarity between a data point and samples in memory? Henrik I Christensen (RIM@GT) Kernel Methods 3 / 22

  4. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernel Methods What if we could predict based on a linear combination of features? Assume a mapping to a new feature space using φ ( x ) A kernel function is defined by k ( x , x ′ ) = φ ( x ) T φ ( x ′ ) Characteristics: The function is symmetric: k ( x , x ′ ) = k ( x ′ , x ) Can be used both on continuous and symbolic data Simple kernel k ( x , x ′ ) = x T x ′ the linear kernel. A kernel is basically an inner product performed in a feature/mapped space. Henrik I Christensen (RIM@GT) Kernel Methods 4 / 22

  5. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernels Consider a complete set of data in memory How can we interpolate new values based on training values? I.e., N 1 � y ( x ) = � k k ( x , x n ) x n n =1 consider k ( ., . ) a weight function that determines contribution based on distance between x and x n Henrik I Christensen (RIM@GT) Kernel Methods 5 / 22

  6. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Outline Introduction 1 Dual Representations 2 Kernel Design 3 Radial Basis Functions 4 Summary 5 Henrik I Christensen (RIM@GT) Kernel Methods 6 / 22

  7. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Dual Representation Consider a regression problem as seen earlier N J ( w ) = 1 � 2 + λ � � w T φ ( x n ) − t n 2 w T w 2 n =1 with the solution N N w = − 1 � � � � w T φ ( x n ) − t n a n φ ( x n ) = Φ T a φ ( x n ) = λ n =1 n =1 where a is defined by a n = − 1 � � w T φ ( x n ) − t n λ Substitute w = Φ t a into J ( w ) to obtain J ( a ) = 1 2 a t ΦΦ T Φ T Φ a − a T ΦΦ T t + 1 2 t T t + λ 2 a T ΦΦ T a which is termed the dual representation Henrik I Christensen (RIM@GT) Kernel Methods 7 / 22

  8. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Dual Representation II Define the Gram matrix - K = ΦΦ T to get J ( a ) = 1 2 a T KK T a − a T Kt + 1 2 t T t + λ 2 a T Ka where K nm = φ ( x m ) T φ ( x n ) = k ( x m , x n ) J ( a ) is then minimized by a = ( K + λ I N ) − 1 t Through substitution we obtain y ( x ) = w T φ ( x ) = a T Φ φ ( x ) = k ( x ) T ( K + λ I N ) − 1 t We have in reality mapped the program to another (dual) space in which it is possible to optimize the regression/discrimination problem Typically N >> M so the immediate advantage is not obvious. See later. Henrik I Christensen (RIM@GT) Kernel Methods 8 / 22

  9. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Outline Introduction 1 Dual Representations 2 Kernel Design 3 Radial Basis Functions 4 Summary 5 Henrik I Christensen (RIM@GT) Kernel Methods 9 / 22

  10. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Constructing Kernels How would we construct kernel functions? One approach is to choose a mapping and find corresponding kernels A one dimensional example M k ( x , x ′ ) = φ ( x ) T φ ( x ′ ) = � φ i ( x ) φ i ( x ′ ) n =1 where φ i ( . ) are basis functions Henrik I Christensen (RIM@GT) Kernel Methods 10 / 22

  11. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Kernel Basis Functions - Example 1 1 1 0.5 0.75 0.75 0 0.5 0.5 −0.5 0.25 0.25 −1 0 0 −1 0 1 −1 0 1 −1 0 1 1.0 2.0 6.0 1.0 3.0 0.0 −0.4 0.0 0.0 −1 0 1 −1 0 1 −1 0 1 Henrik I Christensen (RIM@GT) Kernel Methods 11 / 22

  12. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Construction of Kernels We can also design kernels directly. Must correspond to a scala product in “some” space Consider: k ( x , z ) = ( x T z ) 2 for a 2-dimensional space x = ( x 1 , x 2 ) ( x T z ) 2 = ( x 1 z 1 + x 2 z 2 ) 2 k ( x , z ) = x 2 1 z 2 1 + 2 x 1 z 1 x 2 z 2 + x 2 2 z 2 = z √ √ ( x 2 2 x 1 x 2 , x 2 2 )( z 2 2 z 1 z 2 , z 2 2 ) T = 1 , 1 , φ ( x ) T φ ( z ) = In general if the Gram matrix, K , is positive semi-definite the kernel function is valid Henrik I Christensen (RIM@GT) Kernel Methods 12 / 22

  13. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Techniques for construction of kernels k ( x , x ′ ) c 1 k ( x , x ′ ) = k ( x , x ′ ) f ( x ) k ( x , x ′ ) f ( x ′ ) = k ( x , x ′ ) q ( k ( x , x ′ )) = k ( x , x ′ ) exp ( k ( x , x ′ )) = k ( x , x ′ ) k 1 ( x , x ′ ) + k 2 ( x , x ′ ) = k ( x , x ′ ) k 1 ( x , x ′ ) k 2 ( x , x ′ ) = k ( x , x ′ ) x T Ax ′ = Henrik I Christensen (RIM@GT) Kernel Methods 13 / 22

  14. Introduction Dual Representations Kernel Design Radial Basis Functions Summary More kernel examples/generalizations We could generalize k ( x , x ′ ) = ( x T x ′ ) 2 in various ways k ( x , x ′ ) = ( x T x ′ + c ) 2 1 k ( x , x ′ ) = ( x T x ′ ) M 2 k ( x , x ′ ) = ( x T x ′ + c ) M 3 Example correlation between image regions Another option is k ( x , x ′ ) = e −|| x T − x ′ || / 2 σ 2 called the “Gaussian kernel” Several more examples in the book Henrik I Christensen (RIM@GT) Kernel Methods 14 / 22

  15. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Outline Introduction 1 Dual Representations 2 Kernel Design 3 Radial Basis Functions 4 Summary 5 Henrik I Christensen (RIM@GT) Kernel Methods 15 / 22

  16. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Radial Basis Functions What is a radial basis function? φ j ( x ) = h ( || x − x j || ) How to average/smooth across data entire based on distance? N � y ( x ) = w n h ( || x − x n || ) n =1 the weights w n could be estimated using LSQ A popular interpolation strategy is N � y ( x ) = t n h ( x − x n ) n =1 where ν ( x − x n ) h ( x − x n ) = � j ν ( x − x j ) Henrik I Christensen (RIM@GT) Kernel Methods 16 / 22

  17. Introduction Dual Representations Kernel Design Radial Basis Functions Summary The effect of normalization? 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 Henrik I Christensen (RIM@GT) Kernel Methods 17 / 22

  18. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Nadaraya-Watson Models Lets interpolate across all data! Using a Parzen density estimator we have N p ( x , t ) = 1 � f ( x − x n , t − t n ) N n =1 We can then estimate � ∞ y ( x ) = E [ t | x ] = tp ( t | x ) dt −∞ � tp ( x , t ) dt = � p ( x , t ) dt � n g ( x − x n ) t n = � m g ( x − x m ) � = k ( x , x n ) t n n where Henrik I Christensen (RIM@GT) Kernel Methods 18 / 22 �

  19. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Gaussian Mixture Example Assume a particular one-dimensional function (here sine) with noise Each data point is an iso-tropic Gaussian Kernel Smoothing factors are determined for the interpolation Henrik I Christensen (RIM@GT) Kernel Methods 19 / 22

  20. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Gaussian Mixture Example 1.5 1 0.5 0 −0.5 −1 −1.5 0 0.2 0.4 0.6 0.8 1 Henrik I Christensen (RIM@GT) Kernel Methods 20 / 22

  21. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Outline Introduction 1 Dual Representations 2 Kernel Design 3 Radial Basis Functions 4 Summary 5 Henrik I Christensen (RIM@GT) Kernel Methods 21 / 22

  22. Introduction Dual Representations Kernel Design Radial Basis Functions Summary Summary Memory based methods - keeping the data! Design of distrance metrics for weighting of data in learning set Kernels - a distance metric based on dot-product in some feature space Being creative about design of kernels We’ll come back to the complexity issues Henrik I Christensen (RIM@GT) Kernel Methods 22 / 22

Recommend


More recommend