Machine Learning 2 DS 4420 - Spring 2020 Clustering I Byron C. - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Clustering I Byron C. Wallace

Unsupervised learning • So far we have reviewed some fundamentals, discussed Maximum Likelihood Estimation (MLE) for probabilistic models, and neural networks/backprop SGD • We have mostly considered supervised settings (implicitly) although the above methods are general; we will shift focus to unsupervised learning for a few weeks • Both the probabilistic and neural perspectives will continue to be relevant here — and we will consider the former explicitly for clustering next week

Clustering

Clustering Unsupervised learning (no labels for training) Group data into similar classes that • Maximize inter-cluster similarity • Minimize intra-cluster similarity

What is a natural grouping? Choice of clustering criterion can be task-dependent Simpson’s School Females Males Family Employees

Defining Distance Measures Peter Piotr 3 0.2 342.7 Dissimilarity/distance: d ( x 1 , x 2 ) Similarity: s ( x 1 , x 2 ) } Proximity: p ( x 1 , x 2 )

Distance Measures s k P ( x i − y i ) 2 ) Euclidean Distance ( i =1 k P Mahattan Distance | x i − y i | i =1 ✓ k ◆ 1 q ( | x i − y i | ) q P Minkowski Distance i =1

Similarity over functions of inputs • The preceding measures are distances defined on the original input space X   • A better representation may be some function of these classification representation φ ( x ) features xplore the two

Similarity: Kernels Linear (inner-product) Polynomial Radial Basis Function (RBF)

Second feature Second feature First feature First feature Linear RBF kernel Figure from MML book

Why kernels? “The key insight in kernel-based learning is that you can rewrite many linear models in a way that doesn’t require you to ever explicitly compute φ (x)   - Daume, CIML

Similarities vs Distance Measure Distance Measure • D(A, B) = D(B, A) Symmetry • D(A, A) ≥ 0 Reflexivity • D(A, B) = 0 iff A = B Positivity (Separation) • D(A, B) ≤ D(A, C) + D(B, C) Triangular Inequality

Similarities vs Distance Measure Distance Measure • D(A, B) = D(B, A) Symmetry • D(A, A) ≥ 0 Reflexivity • D(A, B) = 0 iff A = B Positivity (Separation) • D(A, B) ≤ D(A, C) + D(B, C) Triangular Inequality Similarity functions • Less formal; encodes some notion of similarity but not necessarily well defined • Can be negative • May not satisfy triangular inequality

Cosine similarity

Four Types of Clustering 1. Centroid-based (K-means, K-medoids)

Four Types of Clustering 2. Connectivity-based (Hierarchical) Notion of Clusters: Cut off dendrogram at some depth

Four Types of Clustering 3. Density-based (DBSCAN, OPTICS) Notion of Clusters: Connected regions of high density

Four Types of Clustering 4. Distribution-based (Mixture Models) Notion of Clusters: Distributions on features

K-Means clustering (board)

K-means Algorithm X = { x 1 , x 2 , . . . , x N } Input: Number of clusters K Initialize: K random centroids µ 1 , µ 2 , . . . , µ K Repeat Until Convergence For i = 1 , . . . , K do 1 1  j  K k x � µ j k 2 } C i = { x 2 X | i = arg min For i = 1 , . . . , K do 2 k z � x k 2 } P µ i = arg min z x 2 C i Output: C 1 , C 2 , . . . , C K

K-means Clustering thm: K-means, Distance Metric: Euclidean Distanc 5 4 μ 1 3 μ 2 2 1 μ 3 0 0 1 2 3 4 5 Randomly initialize K centroids μ k

K-means Clustering 5 4 μ 1 3 μ 2 2 1 μ 3 0 0 1 2 3 4 5 Assign each point to closest centroid, then update centroids to average of points

K-means Clustering 5 4 μ 1 3 2 μ 3 μ 2 1 0 0 1 2 3 4 5 Assign each point to closest centroid, then update centroids to average of points

K-means Clustering 5 4 μ 1 3 2 μ 3 μ 2 1 0 0 1 2 3 4 5 Repeat until convergence   (no points reassigned, means unchanged)

K-means Clustering 5 4 μ 1 3 2 μ 2 μ 3 1 0 0 1 2 3 4 5 Repeat until convergence   (no points reassigned, means unchanged)

K-means Algorithm X = { x 1 , x 2 , . . . , x N } Input: Number of clusters K Initialize: K random centroids µ 1 , µ 2 , . . . , µ K Repeat Until Convergence For i = 1 , . . . , K do 1 1  j  K k x � µ j k 2 } C i = { x 2 X | i = arg min For i = 1 , . . . , K do 2 k z � x k 2 } P µ i = arg min z x 2 C i Output: C 1 , C 2 , . . . , C K • K-means: Set μ to mean of points in C • K-medoids: Set μ = x for point in C with minimum SSE

Let's see some examples in Python

“Good” Initialization of Centroids Iteration 1 Iteration 2 Iteration 3 + 3 3 3 + + 2.5 2.5 2.5 + + 2 2 2 + 1.5 1.5 1.5 + y y y + + 1 1 1 0.5 0.5 0.5 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x Iteration 4 Iteration 5 Iteration 6 3 3 3 2.5 2.5 2.5 + + + 2 2 2 1.5 1.5 1.5 y y y 1 1 1 + + 0.5 0.5 0.5 + + + + 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x

Machine Learning 2 DS 4420 - Spring 2020 Clustering I Byron C. - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Clustering I Byron C. Wallace Unsupervised learning So far we have reviewed some fundamentals, discussed Maximum Likelihood Estimation (MLE) for probabilistic models, and neural networks/backprop

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree

Measuring distance/ similarity of data objects Multiple data types Records of users

Passage Based Retrieval (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Passage Based

Generalized similarity measures for text data. Hubert Wagner (IST Austria) Joint work with

Trajectory Clustering: Visual Analytics Approaches Gennady Andrienko & Natalia Andrienko y

Lab 1: Packet Sniffing and Wireshark Fengwei Zhang Wayne State University CSC 5991 Cyber

Specific Simple Network Management Tools urgen Sch onw J alder University of Osnabr

Linux Kernel AgentX Sub-Agents Oliver Wellnitz wellnitz@ibr.cs.tu-bs.de Institute of Operating

Machine Learning 2 DS 4420 - Spring 2020 Clustering I Byron C. - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Clustering I Byron C. Wallace Unsupervised learning So far we have reviewed some fundamentals, discussed Maximum Likelihood Estimation (MLE) for probabilistic models, and neural networks/backprop

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree

Measuring distance/ similarity of data objects Multiple data types Records of users

Passage Based Retrieval (COSC 488) Nazli Goharian nazli@cs.georgetown.edu 1 Passage Based

Generalized similarity measures for text data. Hubert Wagner (IST Austria) Joint work with

Trajectory Clustering: Visual Analytics Approaches Gennady Andrienko &amp; Natalia Andrienko y

Lab 1: Packet Sniffing and Wireshark Fengwei Zhang Wayne State University CSC 5991 Cyber

Specific Simple Network Management Tools urgen Sch onw J alder University of Osnabr

Linux Kernel AgentX Sub-Agents Oliver Wellnitz wellnitz@ibr.cs.tu-bs.de Institute of Operating

Trajectory Clustering: Visual Analytics Approaches Gennady Andrienko & Natalia Andrienko y