Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised - PowerPoint PPT Presentation

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach

Motivation: Need for Supervision • Data may not form clusters in Mapping to Feature Space Constraint Vector Input Points Clustering Projection the input space. 1.5 1.5 1.5 1.5 1 1 1 1 φ (c 2 )- φ (c 1 ) • Mapping to a different space φ (c 2 ) φ (c 2 ) helps. 0.5 0.5 0.5 0.5 x 2 x 2 x 2 φ (c 1 ) φ (c 1 ) 0 0 0 0 • Pairwise constraints can guide c 1 c 2 clustering to find the desired -0.5 -0.5 -0.5 -0.5 structure. -1 -1 -1 -1 -1 0 1 -1 -1 -1 0 0 0 1 1 1 x x x x • Mapping function is not always known. 28

Previous Work: Kernel Mean Shift • Given and a p.s.d. kernel satisfying is an unknown mapping to a feature space. • For n points , the n x n kernel matrix K is computed using the kernel function • Distance between two points and in the feature space can be computed implicitly using the kernel matrix Tuzel, O., Porikli, F., Meer, P., “Kernel methods for weakly supervised mean shift clustering”. 29 ICCV, 48 – 55, 2009

Previous Work: Kernel Mean Shift • The mean shift update can be written using this kernel matrix as denotes the i -th canonical basis vector R n 30

Previous Work: Kernel Learning Using Linear Projections Feature Space • Let be the set of 1.5 similarity constraint pairs and the 1 φ (c 2 )- φ (c 1 ) φ (c 2 ) constraint matrix is 0.5 φ (c 1 ) 0 -0.5 -1 -1 0 1 Projection • A transformation is defined 1.5 using the projection matrix 1 0.5 0 -0.5 -1 -1 0 1 31

Previous Work: Kernel Learning Using Linear Projections Feature Space • The corresponding learned kernel 1.5 function is 1 φ (c 2 )- φ (c 1 ) φ (c 2 ) 0.5 φ (c 1 ) 0 -0.5 • The learned kernel matrix can be -1 -1 0 1 expressed in terms of the original kernel Projection 1.5 is n x m 1 0.5 where each column corresponds to a 0 constraint point in the kernel matrix. -0.5 is the m x m scaling matrix. -1 -1 0 1 32

Limitations: Kernel Learning Using Linear Projections Example – Five concentric circles • No dissimilarity constraints • No relaxation of constraints Clustering using clean Initial data with training data constraint points – Prone to overfitting. • Sensitive to labeling errors in training samples. Clustering after adding one mislabeled constraint (black link) 33

Bregman Divergences • For a strictly convex function , the Bregman divergence between real, symmetric n x n matrices X and Y is defined • Bregman divergences: – squared Frobenius norm, – K-L divergence, – squared Euclidean distance Squared Euclidean distance • The log det divergence is a Bregman divergence for the convex function 34

Log Det Divergence: Properties • Nonnegative scalar function = 0 iff X = Y It is not a metric since it does not follow triangle inequality. • Transformation invariance for an n x n invertible matrix M • Defined only for positive semidefinite matrices 35

Kernel Learning Using the Log Det Divergence • Using the log det divergence with both similarity ( ) and dissimilarity ( ) constraints Addresses all the limitations of the linear projections method. 36

Kernel Learning Using the Log Det Divergence • For each constraint, the optimization is solved by Bregman projection based updates. • Updates are repeated until convergence. • The initial kernel matrix K has rank r ≤ n , then, and the update can be rewritten as • The scalar variable is computed in each iteration using the kernel matrix and the constraint pair. • The n x r matrix is updated using the Cholesky decomposition. • The final learned kernel matrix is 37 Jain,P. et. al., “Metric and kernel learning using a linear transformation”. JMLR, 13:519-547, 2012.

Low rank kernel learning algorithm 38

Kernel Learning Using the Log Det Divergence • For very large datasets, it is infeasible – to learn the entire kernel matrix, and – to store it in the memory. • Generalization to out of sample points where x or y or both are out of sample. • Distances can be computed using the learned kernel function 39

Semi-Supervised Kernel Mean Shift Clustering • Input: – Unlabeled data – Pairwise constraints – Number of expected clusters • Output: – Clusters and labels 40

Kernel Parameter Selection σ • Gaussian kernel function for the initial kernel matrix • Kernel parameter estimation using log det divergence • The initial kernel matrix is Desired distances 41

Low Rank Representation • K r low rank approximation of • Learning using Log Det divergence to find . • Mean shift parameters estimated from the curves. • The trade-off parameter was determined by crossvalidation. 42

Experimental Evaluation • Two synthetic data sets – Olympic circles (5 classes) – Concentric circles (10 classes) Nonlinearly separable. Can have intersecting boundaries. • Four real data sets – Small number of classes. • USPS (10 Classes) • MIT Scene (8 Classes) – Large number of classes. • PIE faces (68 Classes) • Caltech Objects (50 Classes) 43

Comparisons 1. Efficient and exhaustive constraint propagation for spectral clustering. 2. Semi-supervised kernel k-means. 3. Kernel k-means using Bregman divergences All these methods have to be given the number of clusters as input. 1. Zhiwu Lu and Horace H.S. Ip, Constrained Spectral Clustering via Exhaustive and Efficient Constraint Propagation, ECCV, 1—14,2010 2. B. Kulis, S. Basu, I. S. Dhillon, and R. J. Mooney. Semi-supervised graph clustering: A kernel 44 approach. Machine Learning, 74:1–22, 2009

Evaluation Criterion Adjusted Rand Index • Scalar measure to evaluate clustering performance from the clustering output. TP – true positive; TN – true negative; FP – false positive; FN – false negative. • Compensates for chance; randomly assigned cluster labels get a low score. 45

Pairwise Constraint Generation • Assuming b labeled points are selected at random from each class. • similarity pairs are generated from each class. • An equal number of dissimilarity pairs are also generated. • The value of b is varied. 46

Synthetic Example 1: Olympic Circles Original • 300 points along each of the five circles. • 25 points per class Sample Result selected at random. • Experiment 1: – Varied number of labeled points [5, 7, 10, 12, 15, 17, 20, 25] from each class to generate pairwise constraints. 47

Synthetic Example 1: Olympic Circles • Experiment 2 – 20 labeled points per class. – Introduce labeling errors by swapping similarity pairs with dissimilarity pairs. – Varied fraction of mislabeled constraints. 48

Synthetic Example 2: Concentric Circles • 100 points along each of the ten concentric circles. • Experiment 1 – Varied number of labeled points [5, 7, 10, 12, 15, 17, 20, 25] from each class to generate pairwise constraints. • Experiment 2 – 25 labeled points per class. – Introduce labeling errors by swapping similarity pairs with dissimilarity pairs. 49

Real Example 1: USPS Digits • Ten classes with 1100 points per class. A total of 11000 points. • 100 points per class → K 1000x1000 initial kernel matrix. • Varied number of labeled points [5, 7, 10, 12, 15, 17, 20, 25] from each class to generate pairwise constraints. • Cluster all 11000 data points by generalizing to the remaining 10000 points. ARI = 0.7529 0.051 11000 x 11000 PDM 50

Real Example 2: MIT Scene • Eight classes with 2688 points. The number of samples range between 260 and 410. • 100 points per class → K 800x800 initial kernel matrix. • Varied number of labeled points [5, 7, 10, 12, 15, 17, 20] from each class to generate pairwise constraints. • Cluster all 2688 data points by generalizing to the remaining 1888 points. 51

Real Example 3: PIE Faces • 68 subjects with 21 samples per subjects. • K → 1428 x 1428 full initial kernel matrix. • Varied number of labeled points [3, 4, 5, 6, 7] from each class to generate pairwise constraints. • Obtained perfect clustering for more than 5 labeled points per class. 52

Real Example 4: Caltech-101 (subset) • 50 categories with number of samples ranging between 31 and 40 points per class. • K → 1959 x 1959 full initial kernel matrix. • Varied number of labeled points [5, 7, 10, 12, 15] from each class to generate pairwise constraints. 53

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised - PowerPoint PPT Presentation

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation: Need for Supervision Data may not form clusters in Mapping to Feature Space Constraint Vector Input Points Clustering Projection the input

1 2 nd Shift Associates 2 nd Shift Associates 3 rd Shift Associates 3 rd Shift Associates 2

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Mean-Shift Tracker 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Mean Shift

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

HOLY SHIFT! Linda Zheng Roadmap You are here My Shift Introduction Shift AST Experience

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

[ l I Subscrib Members 1 j Sign In at 99 ce~ Lung illness c.! s more likely near Logan Airport

A real world perspective on research Heather Richardson Introduction to me. patterns

Abstract Presentations 4. Mariam Maglakelidze, Georgia Breathing and feeling well through

International Breathlessness Conference: Developing treatments for breathlessness Copenhagen -

Towards an Integrated Air Quality Climate Modelling System Bram Bregman TNO, The Netherlands

Michael Boyle, President Domain Experience Domain Experience C Level, VP, and Level, VP, and l

Chlo Marshall Chlo Marshall, PhD Acknowledgements Kathryn Mason Katherine Rowley

Child Find Teams CELEBRATE! Colorado Department of Education Puentes CulturalesClara Prez