unsupervised learning clustering
play

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING - PowerPoint PPT Presentation

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised learning: X - y pairs, f(x) function approximation Unsupervised learning: only X, no y Exploring the space of X measurements,


  1. UNSUPERVISED LEARNING, CLUSTERING

  2. UNSUPERVISED LEARNING UNSUPERVISED LEARNING ▸ Supervised learning: ▸ X - y pairs, f(x) function approximation ▸ Unsupervised learning: ▸ only X, no y ▸ Exploring the space of X measurements, understanding data, identifying populations, problems, outliers (before modelling) ▸ Dimension reduction, important when working with high dimensional data ▸ Usually part of exploratory data analysis, which may lead to measuring the “supervising” signal when interesting structure is found in the X data ▸ Not a well defined problem

  3. UNSUPERVISED LEARNING DATA EXPLORATION, DIMENSIONALITY REDUCTION ▸ Large dimensional datasets (N dim often >> N data) ▸ impossible to “visually” find structure, clusters, outliers, batch effects, etc. ▸ One way to explore the data is to somehow embed it into a few dimensions, which humans are capable of inspecting visually (1,2,3?) ▸ It is very important to know the internal structure of your data! ▸ Usually the first step with large dimensional data is dimensionality reduction ( in parallel with opening your data in a spreadsheet and just eyeballing it for a few hours :) )

  4. UNSUPERVISED LEARNING PCA - PRINCIPAL COMPONENT ANALYSIS ▸ PCA is a linear basis transformation from the original bases to new bases dictated by the variation in the data itself ▸ 1st component direction is along the largest variance in the data ▸ 2nd component is the orthogonal direction to the 1st σ 2 signal with the largest variance and so on … σ 2 y noise ▸ Number of components is min(n_features, n_data) x ▸ The projections of the original data points give the scores ▸ Projected data points (scores) are uncorrelated in PCA space ▸ The first components capture the largest variation in the data, the interesting things! We can reveal some structure ▸ (Image from (Shlens)) of the data using only few dimensions.

  5. UNSUPERVISED LEARNING 1.0 PCA - PRINCIPAL COMPONENT ANALYSIS Second principal component 0.5 ▸ Standard use: 2D plots of projections 0.0 ▸ Original base directions may be useful to − 0.5 plot − 1.0 − 1.0 − 0.5 0.0 0.5 1.0 First principal component ▸ Outliers: Sometimes − 0.5 0.0 0.5 components correspond UrbanPop 3 to individual data 2 Second Principal Component 0.5 points, outliers. 
 * * * * * * * 1 * * * * * * These should be * Rape * * * * * * * * * * * 0.0 * * 0 * * * inspected and removed. * * * * * * Assault * * * − 1 * * * PCA should be repeated * * * Murder * * − 0.5 − 2 * without the outliers. * * − 3 − 3 − 2 − 1 0 1 2 3 First Principal Component

  6. UNSUPERVISED LEARNING PCA - PRINCIPAL COMPONENT ANALYSIS 1.0 1.0 ▸ How many components do you need? Cumulative Prop. Variance Explained 0.8 0.8 Prop. Variance Explained Proportion of variance explained. 0.6 0.6 0.4 0.4 0.2 0.2 ▸ Zero mean per dimension is assumed, 0.0 0.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 do it! (Fitting ellipse around the origin) Principal Component Principal Component ▸ If different quantities are measured, Scaled Unscaled − 0.5 0.0 0.5 − 0.5 0.0 0.5 1.0 units may not be comparable (Number 1.0 UrbanPop 3 UrbanPop 150 of fingers or height in cm?) 
 2 Second Principal Component Second Principal Component 0.5 100 * * * 0.5 * * * * 1 In this case, normalise original * * * * 50 * * * Rape * * * * * * Rape * * * * 0.0 * * * * * * 0 * * * * * * * * * * * * * * * * * * 0.0 dimensions to have variance = 1 * * * * * * * * * * 0 * * * * * * * * Murder * * * * Assau * * Assault * * * * * * * * * * * * − 1 * * * * * * * * * * − 50 * Murder * * − 0.5 − 0.5 − 2 * − 100 ▸ Only line of direction is defined: -1 * * − 3 flips might occur! − 3 − 2 − 1 0 1 2 3 − 100 − 50 0 50 100 150 First Principal Component First Principal Component

  7. UNSUPERVISED LEARNING MORE DIMENSION REDUCTION, EMBEDDING ▸ MDS, Multi dimensional scaling (embed the points in low dimension, given their measured distances) ▸ T-SNE, t-distributed stochastic neighbour embedding (Local embedding, usually works best with complex data) ▸ UMAP: Uniform Manifold Approximation and Projection (way way faster than TSNE) ▸ ICA, independent component analysis (PCA: uncorrelated, ICA independent, e.g.: EEG) ▸ NMF Non-negative matrix factorisation e.g.: mutations) ▸ And more, left source http://scikit- learn.org/stable/modules/manifold.html

  8. CLUSTERING CLUSTERING ▸ Data points can be meaningfully categorised: clusters ▸ Classification: we have labels (y) for groups ▸ Clustering: labels are not measured, they are inferred from the (X) data ▸ Not a well defined problem ▸ Clusters inferred should be validated (with measurements, new data)

  9. CLUSTERING K-MEANS CLUSTERING ▸ A priori fix the number of clusters ▸ Minimise the sum of intra-cluster distances ▸ Algorithm: ▸ 1. randomly data assign each data point to clusters ▸ 2. calculate cluster centroids, reassign each data point to the closest centroid, repeat until convergence ▸ Distance metric is generally Euclidean ▸ Local minimum is found, repeat multiple times to for best solution, and assessment of stability ▸ Left: possible failure modes. 
 source (http://scikit-learn.org/stable/ auto_examples/cluster/ plot_kmeans_assumptions.html)

  10. CLUSTERING HIERARCHICAL CLUSTERING 10 10 10 ▸ Number of clusters not fixed 8 8 8 6 6 6 ▸ Iteratively agglomerate clusters from individual observations 4 4 4 2 2 2 ▸ Algorithm: 0 0 0 ▸ 1. assign each data point to a cluster ▸ 2. join the two closest clusters Average Linkage Complete Linkage Single Linkage ▸ Cluster distance metric is super important ▸ Single (smallest pairwise distance), Average, Complete (maximal distance) ▸ The result is not a clustering, it is a dendrogram. A horizontal cut defines a clustering. Where to cut? Well.

  11. CLUSTERING MORE CLUSTERING ▸ DBSCAN, density thresholds define clusters ▸ Spectral clustering: using the eigenvectors of the pairwise distance matrix ▸ Gaussian mixture models ▸ And more, 
 left source: http://scikit- learn.org/stable/modules/ clustering.html

  12. SEMI-SUPERVISED LEARNING SEMI-SUPERVISED LEARNING ▸ Few data points have labels, most others not ▸ Exploit data structure of unlabelled examples for most effective supervised learning ▸ Use unsupervised learning to explore the data structure, clusters, and use few points to assign labels to cluster ▸ Hot topic, as data labelling is often much more expensive data unlabelled data collection

  13. SELF-SUPERVISED LEARNING ▸ Images: Lotter et al, Zhang et al, Noroozi et Favaro, Walker et al SELF-SUPERVISED LEARNING ▸ Unsupervised learning, where a part of the data is predicted from another part of the data. ▸ Examples explain it ▸ Future video frame prediction ▸ Grayscale image colorisation ▸ Impainting ▸ Jigsaw puzzle solving ▸ Motion direction predictions → ▸ etc.. (a) (b) (c) ▸ orders of magnitudes unsupervised data is collected (images videos) ▸ Human visual learning is supposedly unsupervised (maybe it is self supervised)

  14. REFERENCES REFERENCES ▸ ISLR, chapter 10. ▸ ESL, chapter 14. ▸ http://scikit-learn.org/stable/modules/decomposition.html#decompositions ▸ http://scikit-learn.org/stable/modules/manifold.html ▸ http://scikit-learn.org/stable/modules/clustering.html#clustering ▸ https://umap-learn.readthedocs.io/en/latest/ ▸ Shlens, J., 2014. A Tutorial on Principal Component Analysis. arXiv:1404.1100 [cs, stat]. ▸ Walker, J., Gupta, A., Hebert, M., 2015. Dense Optical Flow Prediction from a Static Image. arXiv:1505.00295 [cs]. ▸ Lotter, W., Kreiman, G., Cox, D., 2016. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning. arXiv:1605.08104 [cs, q-bio]. ▸ Zhang, R., Isola, P., Efros, A.A., 2016. Colorful Image Colorization. arXiv:1603.08511 [cs]. ▸ Noroozi, M., Favaro, P., 2016. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. arXiv: 1603.09246 [cs].

Recommend


More recommend