Kernel Principal Component Ranking: Robust Ranking on Noisy Data - PowerPoint PPT Presentation

Kernel Principal Component Ranking: Robust Ranking on Noisy Data Evgeni Tsivtsivadze Botond Cseke Tom Heskes Institute for Computing and Information Sciences, Radboud University Nijmegen, Toernooiveld 1, 6525 ED Nijmegen, The Netherlands firstname.lastname@science.ru.nl

Presentation Outline 1 Motivation 2 Ranking Setting 3 KPCRank Algorithm 4 Experiments

Learning on Noisy Data • Real world data is usually corrupted by noise (e.g. in bioinformatics, natural language processing, information retrieval, etc.)

Learning on Noisy Data • Real world data is usually corrupted by noise (e.g. in bioinformatics, natural language processing, information retrieval, etc.) • Learning on noisy data is a challenge: ML methods frequently use low-rank approximation of the data matrix

Learning on Noisy Data • Real world data is usually corrupted by noise (e.g. in bioinformatics, natural language processing, information retrieval, etc.) • Learning on noisy data is a challenge: ML methods frequently use low-rank approximation of the data matrix • Any manifold learner or dimensionality reduction technique can be used for de-noising

Learning on Noisy Data • Real world data is usually corrupted by noise (e.g. in bioinformatics, natural language processing, information retrieval, etc.) • Learning on noisy data is a challenge: ML methods frequently use low-rank approximation of the data matrix • Any manifold learner or dimensionality reduction technique can be used for de-noising • Our algorithm is an extension of nonlinear principal component regression applicable to preference learning task

Learning to Rank Learning to rank (total order is given over all data points) • Applications - collaborative filtering in electronic commerce, protein ranking (e.g. RankProp: Protein Ranking by Network Propagation), parse ranking, etc. • We aim to learn scoring function that is capable of ranking data points • Several accepted settings for learning (ref. upcoming Preference Learning Book) • Object ranking • Label ranking • Instance ranking

KPCRank Algorithm • Main idea: Create new feature space with reduced dimensionality (only most expressive features are preserved) and use the ranking algorithm in that space to learn noise insensitive ranking function

KPCRank Algorithm • Main idea: Create new feature space with reduced dimensionality (only most expressive features are preserved) and use the ranking algorithm in that space to learn noise insensitive ranking function • KPCRank scales linearly with the number of data points in the training set and is equal to that of KPCR

KPCRank Algorithm • Main idea: Create new feature space with reduced dimensionality (only most expressive features are preserved) and use the ranking algorithm in that space to learn noise insensitive ranking function • KPCRank scales linearly with the number of data points in the training set and is equal to that of KPCR • KPCRank regularizes by projecting data onto lower dimensional space (number of principal components is a model parameter)

KPCRank Algorithm • Main idea: Create new feature space with reduced dimensionality (only most expressive features are preserved) and use the ranking algorithm in that space to learn noise insensitive ranking function • KPCRank scales linearly with the number of data points in the training set and is equal to that of KPCR • KPCRank regularizes by projecting data onto lower dimensional space (number of principal components is a model parameter) • In conducted experiments KPCRank performs better than the baseline methods when learning to rank from data corrupted by noise

Dimensionality Reduction Consider covariance matrix m C = 1 Φ( z i )Φ( z i ) t = 1 � m Φ( Z )Φ( Z ) t m i =1 To find the first principal component we solve Cv = λ v The key observation: v = � m i =1 a i Φ( z i ), therefore, 1 mKa = λ a m m 1 1 � � � v l , Φ( z ) � = a l a l √ m λ l i � Φ( z i )Φ( z ) � = √ m λ l i k ( z i , z ) i =1 i =1

KPCRank Algorithm We start with the disagreement error: m d ( f , T ) = 1  � � � � � W ij s i − s j − sign f ( z i ) − f ( z j )  .  sign   2 i , j =1 The least squares ranking objective is J ( w ) = ( S − Φ( Z ) t w ) t L ( S − Φ( Z ) t w ) and using projected data (reduced feature space) the objective can be rewritten as w ) = ( S − Φ( Z ) t V ¯ w ) t L ( S − Φ( Z ) t V ¯ J (¯ w ) Regularization is performed by selecting optimal number of principle components.

KPCRank Algorithm We set the derivative to zero and solve with respect to ¯ w V ) − 1 ¯ 1 w = ¯ 2 ( ¯ V t KLK ¯ V t KLS ¯ Λ Finally we obtain the predicted score of the unseen instance-label pair based on the first p principal components by p m 1 � � a l f ( z ) = √ m λ l w l ¯ j k ( z j , z ) l =1 j =1 • Efficient selection of the optimal number of principal components • Detailed computation complexity considerations • Alternative approaches for reducing computational complexity (e.g. subset method)

Experiments • Label ranking - Parse Ranking dataset • Pairwise preference learning - Synthetic dataset based on sinc(x) function • Baseline methods: Regularized least-squares, RankRLS, KPC regression, Probabilistic ranker.

Parse Ranking Dataset Method Without noise σ = 0 . 5 σ = 1 . 0 KPCR 0.40 0.46 0.47 KPCRank 0.37 0.41 0.42 RLS 0.34 0.43 0.46 RankRLS 0.35 0.45 0.47 Table: Comparison of the parse ranking performances of the KPCRank, KPCR, RLS, and RankRLS algorithms using a normalized version of the disagreement error as performance evaluation measure.

A Probabilistic Ranker A probabilistic counterpart of the RankRLS algorithm would be regression with Gaussian noise and Gaussian processes prior. Given the score differences w ij = s i − s j p ( w ij | f ( x i ) , f ( x i ) , v ) = N ( w ij | f ( x i ) − f ( x j ) , 1 / v ) . Then the posterior distribution is n 1 � p ( f | D , v , θ ) = N ( w ij | f ( x i ) − f ( x j ) , 1 / v ) N ( f | 0 , K ) . p ( D | v , θ ) i , j =1 • The posterior distribution p ( f | w , v , θ ) is Gaussian, its mean and covariance matrix can be computed by solving a system of linear equations and inverting a matrix, respectively. • Note that predictions obtained by the RankRLS algorithm correspond to the predicted mean values of the Gaussian process regression

Sinc Dataset We use sinc function sinc ( x ) = sin ( π x ) , π x to generate the values used for creating magnitudes of pairwise preferences. • We get 2000 equidistant points from the interval [ − 4 , 4] • Sample 1000 for constructing the training pairs and 338 for constructing the test pairs • From these pairs we randomly sample 379 used for the training and 48 for the testing The magnitude of pairwise preference is calculated as w = sinc ( x ) − sinc ( x ′ ) .

Sinc Dataset GP approximation (MLII) and KPCRank 1 sinc function GP post. mean 0.8 KPCRank 0.6 0.4 0.2 0 −0.2 −0.4 −4 −2 0 2 4 Figure: The sinc function and the approximate posterior means of the f using the preference with magnitudes and KPCRank predictions

Thank you.

Kernel Principal Component Ranking: Robust Ranking on Noisy Data - PowerPoint PPT Presentation

Kernel Principal Component Ranking: Robust Ranking on Noisy Data Evgeni Tsivtsivadze Botond Cseke Tom Heskes Institute for Computing and Information Sciences, Radboud University Nijmegen, Toernooiveld 1, 6525 ED Nijmegen, The Netherlands

Continuous Latent Variables Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 12 Principal Component

Functional Principal Component Analysis May 14, 2018 Empirical Principal Component FPC for the

Section 1 Principal Component Analysis 1 / 16 Principal Component Analysis ST 810-006

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Functional components Notification component Application received Refuse ? Notification

WIO IOSAP Project Budget Nairobi Convention WIO IOSAP Budget per Project Component COMPONENT

Principal Component Analysis Powerpoint Presentation What is multivariate analysis? Summarizing

Principal component analysis Ingo Blechschmidt December 17th, 2014 Kleine Bayessche AG

Epimorphisms in varieties of square-increasing residuated structures T. Moraschini 1 , J.G.

Charlo'e Streck PMR Workshop Post-2020 Accoun;ng: Implica;ons for Carbon Pricing and

Concluding reflections from RLS Berlin meeting A common battle, south and north, against debt

L INEAR KERNEL CONT . For the linear kernel, 1 2 + 2 || Y K c || 2 2 c T K c min c R

Ad Adap aptable able Hum uman an In Inten enti tion on an and Trajector ajectory y Pr

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

Robert M artinage e Senior Fellow 1 Agenda Introduction Antecedents of a Third

Certification for Adaptive Controls John Rushby Computer Science Laboratory SRI International