Clustering Lesson 3 : Lab Session Advanced Machine Learning, - PowerPoint PPT Presentation

Clustering Lesson 3 : Lab Session Advanced Machine Learning, CentraleSupelec Teacher’s Assistant : Omar CHEHAB Professors : Emilie CHOUZENOUX, Frederic PASCAL 1

General Information • Assignment : alone or in pairs, you will code the algorithms you learnt in ‘scikit- learn formalism’, and apply them to images and text. • Due : the 5 lab assignments for lessons 3-7 are due a week from when they are given, at aml.centralesupelec.2020@gmail.com • Grading : each assignment is worth 4 points — your 4 best labs out of the 5 will be retained and will count for half of your final grade. • Questions : questions or feedback are welcome after class or by email at l-emir-omar.chehab@inria.fr 2

Lesson: recap Robust type n_clusters Objective Algorithm Clusters to K m 2 Points ∑ ∑ x i − c k min δ ik alternatively assign points to clusters, K-Means partitional hardcoded that are δ ik , c k recompute clusters as center-of-points k =1 i =1 near.. cluster sets within-cluster (location and assign.) variance hierarchical Agglomerative given by… sequentially compute distance (e.g. min) (bottom- Single- - between clusters and merge the two nearest init …nearest up: Linkage ‘cuto ff ’ ε clusters, until you end up with one cluster. merge) given by… Identify core points as having at least minPts in their ε -neighborhood. …and … and in ‘cuto ff ’ ε Their connected components on the ε - partitional - outliers, dense DBSCAN neighbor graph make the clusters. noise regions density Non-core points either join an ε -nearby minPts cluster, else are noise. 1. Build complete graph weighted by specific metric that penalizes sparsity* 2. Extract the minimum spanning tree given by… 3. Construct a cluster hierarchy of connected … that hierarchical components by removing heaviest edges ‘cuto ff ’ ε …and are not HDBSCAN (top-down: - 4. Condense the cluster hierarchy based on n_clusters easily split) a min. cluster size before merge (less is density split noise) minPts 5. Extract the clusters with long antecedance (robust to cuto ff ) in the condensed tree : tunes ε for each cluster. *for two ‘close’ points, clamp their distance to that to the farthest Minpts neighbor.

From a modelling standpoint hierarchical ‘family’ partitional ‘cut’ inter-cluster A partitional clustering can sometimes be framed as the ‘cuto ff ’ of a hierarchical clustering, i.e. as the instance of a relaxed problem in which it is embedded. For e.g., DBSCAN ( partitional ) can be understood as the ε -‘cut’ of HDBSCAN ( hierarchical, top-down ) without steps 4 and 5, or of Agglomerative Single-Linkage ( hierarchical, bottom-up ) where the space is transformed s.t. sparse points (‘not having a core-point eps-neighbor’) are farther away*. * transforming thusly the space is equivalent to keeping the original space but modifying the metric to that of Step 1 of HDBSCAN 4

Assignment: plan 1. K-Means ( scikit-learn ) 2. Agglomerative Single-Linkage ( your own code ) 3. DBSCAN ( scikit-learn ) 4. HDBSCAN ( scikit-learn ) 5. Applications : clustering observations on Mars and color-reduction ( scikit-learn ) 5

Clustering Lesson 3 : Lab Session Advanced Machine Learning, - PowerPoint PPT Presentation

Clustering Lesson 3 : Lab Session Advanced Machine Learning, CentraleSupelec Teachers Assistant : Omar CHEHAB Professors : Emilie CHOUZENOUX, Frederic PASCAL 1 General Information Assignment : alone or in pairs, you will code the algorithms

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Graph Clustering Why graph clustering is useful? Distance matrices are graphs as useful as

Anderson Secondary School Briefing for Sec 2 Normal(Tech) Subjects Briefing 2019 Na Nationa

VHL Systemic Therapy Eric Jonasch, MD UT MD Anderson Cancer Center Disclosures Research

Shared-Memory Exact Minimum Cuts M. Henzinger, A. Noe, C. Schulz, D. Strash 1 Christian Schulz :

Lecture 18: Elements of Dynamic Programming COMS10007 - Algorithms Dr. Christian Konrad

Ch 1/2/3: Intro, Data, Tasks Paper: Design Study Methodology Tamara Munzner Department of

THIS IS HOW YOU WORSHIP YHWH (Gen 12:1-3)1 YHWH had said to Abram, Go from your country, your

the Development Speed for Crosscutting Code? An Empirical Study Stefan Hanenberg, Sebastian

Dynamic Programming CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang