clustering by contrast
play

Clustering by contrast Cyril CHHUN Tlcom Paris Advisor: Jean-Louis - PowerPoint PPT Presentation

Clustering by contrast Cyril CHHUN Tlcom Paris Advisor: Jean-Louis DESSALLES June 20, 2019 Outline 1 Introduction 2 The algorithm 3 Test results 4 Conclusion Introduction views on Youtube Clustering by contrast Cyril CHHUN


  1. Clustering by contrast Cyril CHHUN Télécom Paris Advisor: Jean-Louis DESSALLES June 20, 2019

  2. Outline 1 Introduction 2 The algorithm 3 Test results 4 Conclusion

  3. Introduction views on Youtube” Clustering by contrast Cyril CHHUN • learn from a single example: a “Siamese cat” • detect anomalies: a talking cat • produce negations and explanations: “she is not a writer” without going through the set of “small” objects The algorithm Design a clustering algorithm able to: What are the end goals of contrast learning? Introduction Conclusion Test results 3 / 17 • understand the meaning of “small bacteria” and “small galaxy” • produce relevant descriptions: “it’s a singer who has ten million

  4. Introduction properties. Clustering by contrast Cyril CHHUN Information Processing Systems 15 , 2002. 1 Jon Kleinberg. An impossibility theorem for clustering. Advances in Neural functions • Solution: forsake one of those properties or use non-metric function-based clustering algorithm which verifjes those three The algorithm • Scale-invariance, richness, consistency Which properties would we expect of a clustering algorithm? Impossibility theorem Conclusion Test results 4 / 17 • Kleinberg (2002) 1 : it is impossible to design a distance

  5. Introduction The algorithm Test results Conclusion Vocabulary • Object: observed instance • Prototype: mental representation of a group as a basic object • Contrast: “difgerence” between two objects • Weight: number of times a prototype has been recalled to its prototype • Order: real-life observations are fjrst-order objects, contrasts are second-order objects, etc. Cyril CHHUN Clustering by contrast 5 / 17 • Deviation: acceptable range of an object’s properties compared

  6. Introduction . Clustering by contrast Cyril CHHUN w . . . The algorithm . . 6 / 17 Mean Test results Weight Conclusion Deviation Design Prototypes How to represent prototypes?     µ 1 σ 1         µ m σ m

  7. Introduction m Clustering by contrast Cyril CHHUN • Problem: many prototypes can verify the smallest distance. verifjes scale-invariance along any axis . j The algorithm 7 / 17 Finding the clusters Test results • Dimension-agnostic, scale-invariant, not density-based. Conclusion • The prametric function Design Given object b , how to fjnd the best prototype a of deviation a ′ ? � � | a j − b j | � d ( a , b ) �→ ✶ > θ j a ′ j =1 • It is not a distance, as none of the three properties are verifjed!

  8. Introduction The algorithm Clustering by contrast Cyril CHHUN • Deviations are not used in this step so as to avoid hubs. • Using this rule, we make a tournament and pick the winner. dimensions. The other cluster is eliminated. one whose mean is closer to the object along the most avoid the hub? reasonable: how to cluster seems more Figure: The smaller Comparing the clusters Design Conclusion Test results 8 / 17 • We simply take the best prototypes two by two and choose the

  9. Introduction • The winning cluster is updated as follows: Clustering by contrast Cyril CHHUN improve effjciency. Unused prototypes are forgotten fjrst. • We enforce a limited memory to cope with initial errors and The algorithm 9 / 17 • The object is added as a prototype no matter what, with a How to stock the new information in the memory? Updating the memory Design Conclusion Test results deviation equal to ε times itself and a weight of 1 mean = weight ∗ prototype + object weight + 1 deviation = weight ∗ deviation + | prototype − object | weight + 1 weight = weight + 1

  10. Introduction The algorithm Test results Conclusion Design Skeleton def feed_data_online(data): for obj in data: closest_clusters = find_closest_clusters(obj) winner = cluster_battles(obj, closest_clusters) update_memory(obj, winner) Cyril CHHUN Clustering by contrast 10 / 17 • Clustering: simple loop with complexity O ( mem _ size × n ) → Online learning

  11. Introduction The algorithm Test results Conclusion Understanding results • Softer clustering than k-means; difgerent ways to classify when seeing a new object: – Find the closest prototype to the object (by tournament for example) Cyril CHHUN Clustering by contrast 11 / 17 – Assign object b to prototype a if d ( a , b ) = 0

  12. Introduction The algorithm Test results Conclusion Live demonstration Cyril CHHUN Clustering by contrast 12 / 17

  13. Introduction contrast c such that Clustering by contrast Cyril CHHUN contrast. • Example: seeing a black tomato would give a “red-to-black” j The algorithm 13 / 17 • Given an object b and its closest prototype a , we extract the low-dimensional and applicable between similar objects. • The contrast features should be meaningful, i.e. How to extract relevant contrasts? What about contrasts? Conclusion Test results � � | a j − b j | c j = ( a j − b j ) · ✶ > θ j a ′

  14. Introduction The algorithm Test results Conclusion What about contrasts? How to stock the contrasts in memory? deviation and weight. Then, how to refjne the contrasts? • We can use the same procedure ! Cyril CHHUN Clustering by contrast 14 / 17 • We can use the same principle! Contrast-prototypes with mean,

  15. Introduction The algorithm Test results Conclusion Second demonstration Cyril CHHUN Clustering by contrast 15 / 17

  16. Introduction The algorithm Test results Conclusion Feedback on the checklist without going through the set of “small” objects views on Youtube” Cyril CHHUN Clustering by contrast 16 / 17 ✓ understand the meaning of “small bacteria” and “small galaxy” ✗ produce relevant descriptions: “it’s a singer who has ten million ✗ produce negations and explanations: “she is not a writer” ✓ detect anomalies: a talking cat ✓ learn from a single example: a “Siamese cat”

  17. Introduction The algorithm Test results Conclusion Conclusion • The algorithm is dimension-agnostic and verifjes scale-invariance • It learns on-the-fmy and has a reasonable complexity (linear on average) • Designed to be used on relatively high-level datasets • Contrasts still need testing: some inconsistent results can appear Cyril CHHUN Clustering by contrast 17 / 17

Recommend


More recommend