supervising unsupervised learning
play

Supervising Unsupervised Learning Vikas K. Garg & Adam Kalai - PowerPoint PPT Presentation

Supervising Unsupervised Learning Vikas K. Garg & Adam Kalai Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning Clustering problem Clustering repository: in isolation: 1 14 2 15 0 db 3 db How many clusters?


  1. Supervising Unsupervised Learning Vikas K. Garg & Adam Kalai Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

  2. Clustering problem Clustering repository: in isolation: 1˚ 14˚ 2˚ 15˚ 0 db 3 db How many clusters? 65˚ 34˚ Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

  3. Contributions Introduce a principled framework to evaluate unsupervised settings Show how to transfer knowledge across heterogeneous datasets different sizes, dimensions, representations, domains... Design provably efficient algorithms select clustering algorithm and number of clusters, determine threshold in single-linkage clustering remove outliers, recycle problems Make good meta-clustering possible introduce meta-scale-invariance property show how to circumvent Kleinberg’s impossibility result Automate deep feature learning across very small datasets encode diverse small data effectively into big data perform non-trivial zero shot learning Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

  4. General approach Define a meta-distribution µ over all problems in the universe Each training sample is a dataset drawn i.i.d. from µ Learn a mapping from an intrinsic measure to an extrinsic measure Intrinsic measure avoids labels and abstracts away heterogeneity Each test problem is drawn from µ but labels are hidden Compute intrinsic measure on test and predict the extrinsic quality Encode covariance of small datasets for deep zero-shot learning Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

  5. Number of clusters Summary Run k -means algorithm with different k on each train dataset. Use Silhouette Index (SI) as intrinsic measure. Use Adjusted Rand Index (ARI) as extrinsic measure. Selecting the number of clusters Silhouette 0 . 12 Ours 0 . 115 Average ARI 0 . 11 0 . 105 0 . 1 40 60 80 100 120 140 160 180 200 220 240 260 280 300 Number of training datasets Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

  6. Clustering algorithm (assume fixed k for simplicity) Summary Run different algorithms to get k clusters & compute SI. Form a feature vector from SI and dataset specific features (e.g. max and min singular values, size, dimensionality). Use Adjusted Rand Index (ARI) as extrinsic measure. Performance of different algorithms 0 . 13 Ours KMeans 0 . 12 Adjusted Rand Index (ARI) KMeans-N 0 . 11 Ward Ward-N 0 . 1 Average 0 . 09 Average-N Complete 0 . 08 Complete-N 0 . 07 Spectral Spectral-N 0 . 06 0 . 05 0 . 04 0 . 03 0 . 02 40 60 80 100 120 140 160 180 200 220 Number of training datasets Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

  7. Fraction of outliers Summary Remove points with large norms, cluster other points, and compute SI. Put the removed points into clusters, and compute ARI. Find the candidate fraction that performs best on test set. Extensions possible to customize fractions for each test set. Performance with outlier removal 0 . 13 5% Average Adjusted Rand Index 4% 0 . 125 3% 2% 1% 0 . 12 0% 0 . 115 0 . 11 0 . 105 0 . 1 0 50 100 150 200 250 300 Number of training datasets Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

  8. Deep learning binary similarity function Summary Sample pairs of examples from each small dataset. For each pair, also include covariance features specific to its dataset. Label 1 if the sampled pair comes from same cluster, 0 otherwise. Train a deep net classifier on all the pairs together. Predict whether test pair comes from same cluster or not. Average binary similarity prediction accuracy 0 . 8 Ours 0 . 75 Majority Average accuracy 0 . 7 0 . 65 0 . 6 0 . 55 0 . 5 Internal test (IT) External test (ET) Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

  9. See you... Tue Dec 4th 05:00 – 07:00 PM Room 210 & 230 AB Poster #164 Vikas K. Garg & Adam Kalai Supervising Unsupervised Learning

Recommend


More recommend