large scale clustering through functional
play

Large-Scale Clustering through Functional NCut Embedding Embedding - PowerPoint PPT Presentation

Large-Scale Clustering Ratle, Weston & Miller Introduction Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary Frdric Ratle Jason Weston Matthew L. Miller IGAR - University of


  1. Large-Scale Clustering Ratle, Weston & Miller Introduction Large-Scale Clustering through Functional NCut Embedding Embedding Experiments Summary Frédéric Ratle ∗ Jason Weston † Matthew L. Miller † ∗ IGAR - University of Lausanne Switzerland † NEC Labs America Princeton NJ - USA ECML PKDD 2008

  2. Large-Scale Clustering A new way of performing data Ratle, Weston & Miller clustering. Introduction NCut Embedding • Dimensionality reduction with direct optimization over Experiments discrete labels. Summary • Joint optimization of embedding and clustering → improved results. • Training by stochastic gradient descent → fast and scalable. • Implementation within a neural network → no out-of-sample problem.

  3. Large-Scale Clustering Clustering - the usual way Ratle, Weston & Miller Introduction Popular clustering algorithms such as spectral clustering are NCut based on a two-stage approach: Embedding Experiments 1 Find a “good” embedding Summary 2 Perform k-means (or a similar variant) Also: • K-means in feature space (e.g. Dhillon et al. 2004) • Margin-based clustering (e.g. Ben-Hur et al. 2001)

  4. Large-Scale Clustering Embedding Algorithms Ratle, Weston & Miller Introduction Many existing embedding algorithms optimize: NCut Embedding U Experiments f i ∈ R d � L ( f ( x i ) , f ( x j ) , W ij ) , min Summary i , j = 1 minimize ( || f i − f j || − W ij ) 2 MDS: ISOMAP: same, but W defined by shortest path on neighborhood graph. ij W ij || f i − f j || 2 Laplacian Eigenmaps: minimize � subject to “balancing constraint”: f ⊤ Df = I and f ⊤ D 1 = 0. Spectral clustering → add k-means on top.

  5. Large-Scale Clustering Siamese Networks: functional Ratle, Weston & Miller embedding Introduction Equivalent to Lap. Eigenmaps but f ( x ) is a NN. NCut Embedding DrLIM [Hadsell et al.,’06 ]: Experiments Summary � || f i − f j || if W ij = 1, L ( f i , f j , W ij ) = max ( 0 , m − || f i − f j || ) 2 if W ij = 0. → neighbors close, others have distance of at least m • Balancing handled by W ij = 0 case → easy optimization • f ( x ) not just a lookup-table → control capacity, add prior knowledge, no out-of-sample problem

  6. Large-Scale Clustering NCut Embedding Ratle, Weston & Miller Introduction • Many approaches exist to learn manifolds with functional NCut models. Embedding • We wish to learn the clustering task directly. Experiments Summary • The main idea is to train a classifier f ( x ) to: • Classify neighbors together. • Classify non-neighbors apart. updated current current updated

  7. Large-Scale Clustering Functional Embedding for Ratle, Weston & Miller Clustering Introduction NCut Embedding Experiments We use a general objective of this type: Summary L ( f i , f j , W ij ) = � � H ( f ( x i ) , c ) Y c ( f ( x i ) , f ( x j ) , W ij ) c ij where H ( · ) is a classification based loss function such as the hinge loss: H ( f ( x ) , y ) = max ( 0 , 1 − yf ( x ))

  8. Large-Scale Clustering 2-class clustering Ratle, Weston & Miller Y c ( f ( x i ) , f ( x j ) , W ij ) encodes the weight to assign to point i being in Introduction cluster c . NCut Embedding Experiments It can be expressed as follows: Summary if sign ( f i + f j ) = c and W ij = 1  η (+)   Y c ( f i , f j , W ij ) = if sign ( f j ) = c and W ij = 0 − η ( − )  0 otherwise.  Optimization by stochastic gradient descent: w t + 1 ← w t + ∇ L ( f i , f j , 1 )

  9. Large-Scale Clustering NCut Embedding Algorithm. Ratle, Weston & Miller Introduction NCut Input: unlabeled data x ∗ i , and matrix W Embedding Experiments repeat Summary Pick a random pair of neighbors x ∗ i , x ∗ j . Select the class c i = sign ( f i + f j ) if BalancingConstraint( c i ) then Gradient step for L ( x ∗ i , x ∗ j , 1 ) end if Pick a random pair x ∗ i , x ∗ k . Gradient step for L ( x ∗ i , x ∗ k , 0 ) until stopping criterion

  10. Large-Scale Clustering Balancing constraint - 2 class Ratle, Weston & Miller Introduction NCut Embedding Balancing constraints prevent the solution from getting trapped. Experiments Many possible ways: Summary 1 “Hard” constraint • Keep a list of the N last predictions in memory. • Ignore examples of class c i if seen ( c i ) > N 2 + ξ 2 “Soft” constraint • Weigh the learning rate for each class. • η = η 0 seen ( c i )

  11. Large-Scale Clustering Multiclass algorithm. Ratle, Weston & Miller Introduction Two different flavours: MAX and ALL. NCut Embedding 1 MAX approach Experiments Select class c i , with i = argmax ( max ( f i ) , max ( f j )) Summary 2 ALL approach: one learning rate per class � if W ij = 1 η c Y c ( f i , f j , W ij ) = 0 otherwise where η c ← η (+) f c ( x i ) We use balancing constraints similar to those for 2-class clustering.

  12. Large-Scale Clustering Small-scale datasets. Ratle, Weston & Miller Introduction NCut Embedding data set classes dims points Experiments g50c 2 50 550 Summary text 2 7511 1946 bcw 2 9 569 ellips 4 50 1064 glass 6 10 214 usps 10 256 2007 Table: Small-scale datasets used throughout the experiments.

  13. Large-Scale Clustering 2-class experiments. Ratle, Weston & Miller Clustering error: Introduction NCut Embedding bcw g50c text k -means Experiments 3.89 4.64 7.26 Summary 6.73 spectral-rbf 3.94 5.56 spectral-knn 3.60 6.02 12.9 NCutEmb h 3.63 4.59 7.03 NCutEmb s 3.15 4.41 7.89 Out-of-sample error: k -means 6.06 4.22 8.75 NCutEmb h 3.21 6.06 7.68 NCutEmb s 7.38 3.64 6.36

  14. Large-Scale Clustering Multiclass experiments. Ratle, Weston & Miller Clustering error: Introduction NCut Embedding ellips glass usps k -means Experiments 20.29 25.71 30.34 Summary spectral-rbf 10.16 39.30 32.93 2.51 spectral-knn 40.64 33.82 NCutEmb max 24.58 4.76 19.36 NCutEmb all 19.05 2.75 24.91 Out-of-sample error: k -means 20.85 28.52 29.44 NCutEmb max 5.11 25.16 20.80 NCutEmb all 2.88 24.96 17.31

  15. Large-Scale Clustering MNIST experiments Ratle, Weston & Miller Introduction NCut Embedding Experiments Summary

  16. Large-Scale Clustering Clustering MNIST. Ratle, Weston & Miller Introduction # clusters method train test NCut k -means 50 18.46 17.70 Embedding NCutEmb max 13.82 14.23 Experiments NCutEmb all Summary 18.67 18.37 k -means 20 29.00 28.03 NCutEmb max 20.12 23.43 NCutEmb all 17.64 21.90 k -means 10 40.98 39.89 NCutEmb max 21.93 24.37 NCutEmb all 24.10 24.90 Table: Clustering the MNIST database (60k train, 10k test). A one-hidden layer network has been used.

  17. Large-Scale Clustering Training on Pairs? Ratle, Weston & Miller Introduction NCut Embedding • k -nn Experiments Summary • OK for small datasets. • Very slow otherwise, but many methods to speed it up. • Sequences • video: frames t & t + 1 → same label • audio: consecutive audio frames → same speaker • text: two words close in text → same topic • web: link information

  18. Large-Scale Clustering Summary Ratle, Weston & Miller Introduction • The joint optimization of clustering and embedding NCut provides better results - or at least similar - to existing Embedding clustering methods. Experiments Summary • Functional embedding allows fast training and avoids out-of-sample problem. • Neural nets provide a scalable and flexible framework to perform clustering.

Recommend


More recommend