Meta-parameters of kernel methods and their optimization Petra - PowerPoint PPT Presentation

Meta-parameters of kernel methods and their optimization Petra Vidnerová Roman Neruda Institute of Computer Science Academy of Sciences of the Czech Republic ITAT 2014

Motivation Learning given set of data samples find underlying trend, description of data Supervised learning data – input-output patterns create model representing IO mapping classification, regression, prediction, etc.

Motivation Learning methods wide range of methods available statistical approaches neural networks (MLP , RBF networks, etc.) kernel methods (SVM, etc.) Learning steps data preprocessing, feature selection model selection parameter setup

Motivation Aim of this work some experience needed to achieve best results our ultimate goal - automatic setup model recomendation meta-parameters setup in this talk: meta-parameters setup for the family of kernel models Outline brief overview SVM, RN role of kernel function meta-parameters optimisation methods some experimental results

Kernel methods family of models, became famous with SVM learning schema 1. data is processed into a kernel matrix 2. learning algorithm applied using only the information in the kernel matrix resulting model - linear combination of kernel functions

Kernel methods - basic idea choose a mapping to some (high dimensional) dot-product space - feature space Φ : X → H work in feature space dot product in feature space given by kernel fucntion K ( · , · )

Support Vector Machine classification task input points are mapped to the feature space classification via separating hyperplane with maximal margin such hyperplane is determined by support vectors many implementations available, i.e. libSVM parameter setup includes: kernel function C trade-of between maximal margin and minimum training error

Regularization Networks approximation tasks, neural networks with one hidden layer x i , y i ) ∈ R d × R } N given { ( � i = 1 , recover the unknown function find f that minimizes H [ f ] = � N i = 1 ( f ( � x i ) − y i ) 2 generally ill-posed choose one solution according to a priori knowledge ( smoothness, etc. ) Regularization approach x i ) − y i ) 2 + γ Φ[ f ] add a stabiliser H [ f ] = � N i = 1 ( f ( �

Derivation of Regularization Network stabilizer based on fourier transform penalize functions that oscillate too much ˜ f Fourier transform of f s | ˜ ˜ f ( � s ) | 2 G positive function � R d d � Φ[ f ] = ˜ ˜ G ( � s ) → 0 for || s || → ∞ G ( � s ) 1 / ˜ G high-pass filter for a wide class of stabilizers the solution has a form N � w i G ( � x − � f ( x ) = x i ) , i = 1 where ( γ I + G ) � w = � y meta-parameters: G kernel function, γ

Role of Kernel Function Choice of Kernel Function choice of a stabilizer choice of a function space for learning (hypothesis space) geometry of the feature space represent our prior knowledge about the problem should be chosen according to the given problem Frequently used kernel functions linear K ( � x ,� y ) = � x T � y x T � y + r ) d , γ > 0 polynomialial ( � x ,� y ) = ( γ� radial basis function ( � x ,� y ) = exp ( − γ || � x − � y || 2 ) , γ > 0 x T � sigmoid ( � x ,� y ) = tanh ( γ� y + r )

Toy example - image approximation 10 − 5 10 − 4 10 − 3 10 − 2 0 . 0 0.5 1.0 1.5 2.0

Meta-parameters setup Parameters of kernel learning algorithms kernel function type additional kernel parameter(s) (i.e. width for Gaussian) regularization parameter γ

Search for optimal meta-parameters minimization of cross-validation error winning parameters used for training on the whole data set Grid search extensive search, various couples of parameters tried time consuming start with coarse grid, than make finer quite standard way, implemented for example in libSVM

Search for optimal meta-parameters Genetic algorithm robust optimisation technique often used in combination with learning algorithms or NNs individuals coding kernel function, its parameters, regularization parameter I = { K , p , γ } Simulated annealing stochastic optimisation method search least number of evaluations

Thank you! Questions?

Meta-parameters of kernel methods and their optimization Petra - PowerPoint PPT Presentation

Meta-parameters of kernel methods and their optimization Petra Vidnerov Roman Neruda Institute of Computer Science Academy of Sciences of the Czech Republic ITAT 2014 Motivation Learning given set of data samples find underlying trend,

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

1 Kernel methods & optimization One example of a kernel that is frequently used in practice

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Chapter 4 Parameters 1 Parameters T wo methods of passing arguments to parameters

Camera Parameters INEL 6088 Computer Vision Camera Parameters Extrinsic parameters: define

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Spectral regularization methods for statistical inverse learning problems G. Blanchard

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier juliahmr@illinois.edu Linear classifiers

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis

12.1 Active Learning: A Review When learning, it may be the case that getting the true labels of

splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

Meta-parameters of kernel methods and their optimization Petra - PowerPoint PPT Presentation

Meta-parameters of kernel methods and their optimization Petra Vidnerov Roman Neruda Institute of Computer Science Academy of Sciences of the Czech Republic ITAT 2014 Motivation Learning given set of data samples find underlying trend,

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

1 Kernel methods &amp; optimization One example of a kernel that is frequently used in practice

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Chapter 4 Parameters 1 Parameters T wo methods of passing arguments to parameters

Camera Parameters INEL 6088 Computer Vision Camera Parameters Extrinsic parameters: define

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Spectral regularization methods for statistical inverse learning problems G. Blanchard

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier juliahmr@illinois.edu Linear classifiers

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis

12.1 Active Learning: A Review When learning, it may be the case that getting the true labels of

splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP

Dynamic Programming Sequence Of Decisions Sequence of decisions. As in the greedy

1 Kernel methods & optimization One example of a kernel that is frequently used in practice