Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef - PowerPoint PPT Presentation

Overview Kernel Methods Multiple Kernel Learning MKL and Feature Space Denoising Conclusions Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef Kittler and Krystian Mikolajczyk eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods Multiple Kernel Learning Overview MKL and Feature Space Denoising Conclusions Overview of the talk Kernel methods Kernel methods: an overview Three examples: kernel PCA, SVM, and kernel FDA Connection between SVM and kernel FDA Multiple kernel learning MKL: motivation ℓ p regularised multiple kernel FDA The effect of regularisation norm in MKL MKL and feature space denoising Conclusions eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions Kernel Methods: an overview Kernel methods: one of the most active areas in ML Key idea of kernel methods: Embed data in input space into high dimensional feature space Apply linear methods in feature space Input space can be: vector, string, graph, etc. Embedding is implicit via a kernel function k ( · , · ), which defines dot product in feature space Any algorithm that can be written with only dot products is “kernelisable” eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions What is PCA Principal component analysis (PCA): an orthogonal basis transformation Transform correlated variables into uncorrelated ones (principal components) Can be used for dimensionality reduction Retains as much variance as possible when reducing dimensionality eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions How PCA works Given m centred vectors: ˜ X = (˜ x 1 , ˜ x 2 , · · · , ˜ x m ) X : ˜ d × m data matrix, Eigen decomposition of covariance ˜ C = ˜ X ˜ X T : ˜ C = ˜ V ˜ Ω ˜ V T Diagonal matrix ˜ Ω: eigenvalues ˜ V = (˜ v 1 , ˜ v 2 , · · · ): eigenvectors, orthogonal basis sought Data can now be projected onto orthogonal basis Projecting only onto leading eigenvectors ⇒ dimensionality reduction with minimum variance loss eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions Kernelising PCA If we knew explicitly the mapping from input space to feature space x i = φ (˜ x i ): we could map all data: X = φ ( ˜ X ), where X is d × m diagonalise the covariance in feature space C = XX T : X T CV = X T V Ω: KA = A ∆ Diagonal matrix ∆: eigenvalues V = ( v 1 , v 2 , · · · ): orthogonal basis in feature space However... we have φ ( · ) only implicitly via: < φ (˜ x i ) , φ (˜ x j ) > = k (˜ x i , ˜ x j ) Kernelised PCA eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions Kernelising PCA Kernel matrix K : evaluation of kernel function on all pairs of samples; symmetric, positive semi-definite (PSD) Connection between C and K : C = XX T and K = X T X C is d × d and K is m × m C is not explicitly available but K is So we diagonalise K instead of C : K = A ∆ A T A = ( α 1 , α 2 , · · · ): eigenvectors eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions Kernelising PCA Using the connection between C and K , we have: C and K have the same eigenvalues Their i th eigenvectors are related by: v i = X α i v i is still not explicitly available: α i is, but X is not However... we are interested in projection onto the orthogonal basis, not the basis itself Projection onto v i : X T v i = X T X α i = K α i Both K and α i are available. eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions Support Vector Machine SVM: supervised learning as opposed to (kernel) PCA In binary classification setting: maximise the margin Integrating misclassification ⇒ soft margin svm: m 1 2 w T w + C � (1 − y i ( w T x i + b )) + min (1) w , b i =1 w : multiplicative inverse of the margin ( x ) + = max( x , 0): hinge loss penalising empirical error C : parameter controlling the tradeoff y i ∈ { +1 , − 1 } : label of training sample i Goal: seeking the hyperplane with maximum soft margin eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions Support Vector Machine SVM primal (1) is equivalent to its Lagrangian dual: � m � m � m i =1 α i − 1 max j =1 y i y j α i α j K ij (2) 2 i =1 α � m i =1 y i α i = 0 , 0 ≤ α ≤ C 1 subject to (2) depends only on kernel matrix K (and labels) Explicit mapping φ ( · ) into feature space not needed SVM can be kernelised eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions Kernel FDA Kernel Fisher discriminant analysis: another supervised learning technique Seeking the projection w maximising Fisher criterion w T m m + m − S B w max (3) w T ( S T + λ I ) w w m : numbers of samples m + and m − : numbers of positive and negative samples S B and S T : between class and total scatters λ : regularisation parameter eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions Kernel FDA It can be proved that (3) is equivalent to w || ( XP ) T w − a || 2 + λ || w || 2 min (4) P and a : constants determined by labels (4) is equivalent to its Lagrangian dual: 1 4 α T ( I + 1 λ K ) α − α T a min (5) α (5) depends only on K (and labels): FDA can be kernelised eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods: an overview Kernel Methods Kernel PCA Multiple Kernel Learning Support Vector Machine MKL and Feature Space Denoising Kernel FDA Conclusions Connection between SVM and kernel FDA Like SVM, kernel FDA is a special cases of Tikhonov regularisation Goals of Tikhonov regularisation: Small empirical error (loss function may vary) At the same time small norm w T w (for good generalisation) λ controls the tradeoff between error and good generalisation Instead of SVM’s hinge loss for empirical error, FDA uses squared loss eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods MKL: motivation Multiple Kernel Learning ℓ p regularised multiple kernel FDA MKL and Feature Space Denoising Effect of regularisation norm Conclusions MKL: motivation A recap on kernel methods: Embed (implicitly) into (very high dimensional) feature space Implicitly: only need dot product in feature space, i.e., the kernel function k ( · , · ) Apply linear methods in the feature space Easy balance of capacity (empirical error) and generalisation (norm w T w ) These sound nice but what kernel function to use? This choice is critically important, for it completely determines the embedding eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods MKL: motivation Multiple Kernel Learning ℓ p regularised multiple kernel FDA MKL and Feature Space Denoising Effect of regularisation norm Conclusions MKL: motivation Ideal case: learn kernel function from data If that is hard, can we learn a good combination of given kernel matrices: the multiple kernel learning problem Given n m × m kernel matrices, K 1 , · · · , K n Most MKL formulations consider linear combination: n � K = β j K j , β j ≥ 0 (6) j =1 Goal of MKL: learn the “optimal” weights β ∈ R n eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Overview Kernel Methods MKL: motivation Multiple Kernel Learning ℓ p regularised multiple kernel FDA MKL and Feature Space Denoising Effect of regularisation norm Conclusions MKL: motivation Kernel matrix K j : pairwise dot products in feature space j Geometrical interpretation of unweighted sum K = � n j =1 K j : Cartesian product of the feature spaces Geometrical interpretation of weighted sum K = � n j =1 β j K j : � Scale feature spaces with β j , then take Cartesian product Learning kernel weights: seeking the “optimal” scaling eNTERFACE10 Multiple Kernel Learning and Feature Space Denoising

Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef - PowerPoint PPT Presentation

Overview Kernel Methods Multiple Kernel Learning MKL and Feature Space Denoising Conclusions Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef Kittler and Krystian Mikolajczyk eNTERFACE10 Multiple Kernel Learning and

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Modeling Background Noise for Denoising in Chemical Spectroscopy Problem Formulation An

Applications Applications Overview Overview Denoising Tone mapping Relighting &

CW ESR denoising when triplets meet wavelets Boris Dzikovski, ACERT Denoising with wavelets

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Feature Space Aleix M. Martinez aleix@ece.osu.edu Feature Space Many problems in science

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel Learning? Whats the

Kernel Methods Barnabs Pczos Outline Quick Introduction Feature space Perceptron

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Introduction to Kubernetes Containers container vs virtual machine Virtual machine Container

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Assessing the use of Google Trends to predict credit developments* E. Burdeau, E. Kintzler

Phase-field Modeling of Hydride Reorientation in Zirconium Cladding Materials under Applied Stress

How does it apply to my project? Guillaume Labilloy Center for Data Solutions 03.03.2020 Agenda

Increasing the Insight from Network Flows - Connecting Science to Operational Reality Grant Babb

Unsupervised machine learning projects Hugo Gabriel Eyherabide (hugo.eyherabide@helsinki.fi) 3

Farewell to flexicurity? Austerity and labour policies in the European Union Dr. Thomas Hastings

Rating Soccer Defenders Jason van der Merwe Bridge Eimon Jack Craddock Motivation Methodology

P RINCIPAL C OMPONENT A NALYSIS (PCA) Singular Value Decomposition (SVD) is a method to Y

Sambuz

Useful Links

Newsletter

Mail Us

Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef - PowerPoint PPT Presentation

Overview Kernel Methods Multiple Kernel Learning MKL and Feature Space Denoising Conclusions Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef Kittler and Krystian Mikolajczyk eNTERFACE10 Multiple Kernel Learning and

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Modeling Background Noise for Denoising in Chemical Spectroscopy Problem Formulation An

Applications Applications Overview Overview Denoising Tone mapping Relighting &amp;

CW ESR denoising when triplets meet wavelets Boris Dzikovski, ACERT Denoising with wavelets

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Feature Space Aleix M. Martinez aleix@ece.osu.edu Feature Space Many problems in science

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel Learning? Whats the

Kernel Methods Barnabs Pczos Outline Quick Introduction Feature space Perceptron

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Introduction to Kubernetes Containers container vs virtual machine Virtual machine Container

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Assessing the use of Google Trends to predict credit developments* E. Burdeau, E. Kintzler

Phase-field Modeling of Hydride Reorientation in Zirconium Cladding Materials under Applied Stress

How does it apply to my project? Guillaume Labilloy Center for Data Solutions 03.03.2020 Agenda

Increasing the Insight from Network Flows - Connecting Science to Operational Reality Grant Babb

Unsupervised machine learning projects Hugo Gabriel Eyherabide (hugo.eyherabide@helsinki.fi) 3

Farewell to flexicurity? Austerity and labour policies in the European Union Dr. Thomas Hastings

Rating Soccer Defenders Jason van der Merwe Bridge Eimon Jack Craddock Motivation Methodology

P RINCIPAL C OMPONENT A NALYSIS (PCA) Singular Value Decomposition (SVD) is a method to Y

Sambuz

Useful Links

Newsletter

Mail Us

Applications Applications Overview Overview Denoising Tone mapping Relighting &